CN106934007B

CN106934007B - Associated information pushing method and device

Info

Publication number: CN106934007B
Application number: CN201710137306.5A
Authority: CN
Inventors: 王立宁; 王志刚
Original assignee: Beijing Time Co ltd
Current assignee: Beijing time Ltd.
Priority date: 2017-02-14
Filing date: 2017-03-09
Publication date: 2021-02-12
Anticipated expiration: 2037-03-09
Also published as: CN106934007A

Abstract

The invention discloses a method and a device for pushing associated information, which can at least solve the technical problem that in the prior art, the pushing result cannot better meet the requirements of users because the association relation between semantics is not considered. The method comprises the following steps: performing machine learning on the obtained original corpus data according to a machine learning algorithm, and determining an association relation between the obtained original corpus data; storing the original corpus data and the incidence relation between the original corpus data into a preset corpus database; and determining the associated information corresponding to the display information according to the associated relation between the original corpus data stored in the corpus database, and pushing the associated information.

Description

Associated information pushing method and device

Technical Field

The invention relates to the technical field of information, in particular to a method and a device for pushing associated information.

Background

With the rapid development of the internet, more and more users are used to obtain various kinds of information through the network. Many users will perform extended reading on the current browsing information in order to deepen the understanding of the current browsing information. In a conventional manner, in order to achieve the purpose of extended reading, a user needs to determine key content included in current browsing information by himself, then perform manual search according to the key content, and finally filter information that the user needs to know from a large number of search results. This way is cumbersome to operate, which undoubtedly increases the time cost for the user.

Currently, technical solutions have appeared that can automatically push associated information for currently browsed information: firstly, performing word segmentation processing on current browsing information to obtain main words contained in the current browsing information; then, selecting partial vocabularies as keywords according to the occurrence frequency of each vocabulary and other factors; and finally, pushing the information containing the keywords as associated information to the user.

However, the inventor finds that the above scheme in the prior art has at least the following defects in the process of implementing the invention: on the one hand, various similar words exist in Chinese, and in many cases, the same subject can be described by distinct expression modes, for example, "double eleven" and "shopping festival" have different literal expression modes but identical semanteme; on the other hand, the above simple keyword matching manner also cannot reflect deeper association relationship between things, for example, the "yaoming" and the "science ratio" are both basketball players, a user reading the related information of the "yaoming" is likely to want to perform extended reading on the near condition of the "science ratio", and the existing pushing manner is obviously not realized.

Therefore, the existing associated information pushing mode does not consider the association relation between semantics, so that the pushing result cannot better meet the user requirement.

Disclosure of Invention

In view of the above, the present invention is proposed to provide a method and an apparatus for pushing associated information, which overcome the above problems or at least partially solve the above problems.

According to an aspect of the present invention, a method for pushing association information is provided, including: performing machine learning on the obtained original corpus data according to a machine learning algorithm, and determining an association relation between the obtained original corpus data; storing the original corpus data and the association relation between the original corpus data into a preset corpus database; and determining associated information corresponding to display information according to the associated relation between the original corpus data stored in the corpus database, and pushing the associated information.

Optionally, the step of performing machine learning on the obtained original corpus data according to a machine learning algorithm to determine an association relationship between the obtained original corpus data specifically includes: and converting the obtained original corpus data into corresponding word vectors, inputting the word vectors into an input layer in a preset neural network model, and obtaining associated output results corresponding to the word vectors through an output layer in the neural network model.

Optionally, the neural network model further comprises: a hidden layer located between the input layer and the output layer; the step of obtaining the associated output result corresponding to the word vector through the output layer in the neural network model specifically includes: and performing feature extraction on the word vectors input by the input layer through the hidden layer, and outputting associated output results corresponding to the word vectors by the output layer according to the result of the feature extraction.

Optionally, the step of performing machine learning on the obtained original corpus data according to a machine learning algorithm further includes: and judging whether the correlation output result corresponding to the word vector meets a preset precision condition, and correcting the neural network model according to a back propagation algorithm when the judgment result is negative.

Optionally, the step of determining, according to the correlation between the original corpus data stored in the corpus database, correlation information corresponding to display information specifically includes: acquiring keywords corresponding to each piece of network information, and determining an association mapping relation between each piece of network information according to the keywords corresponding to the network information and an association relation between original corpus data stored in the corpus database; and storing the association mapping relation among the network information into a preset mapping database, and determining the association information corresponding to the display information according to the keywords corresponding to the display information and the mapping database.

Optionally, after the step of obtaining the keyword corresponding to each piece of network information, the method further includes the steps of: and establishing an information index for inquiring the network information according to the key words according to the corresponding relation between the key words and the network information.

Optionally, the step of determining the associated information corresponding to the display information according to the keyword corresponding to the display information and the mapping database further includes: when the number of the associated information is multiple, further determining the similarity between each associated information and the display information according to a similarity algorithm, and deleting the associated information of which the similarity is greater than a preset first threshold and/or the similarity is smaller than a preset second threshold; wherein the first threshold is greater than the second threshold.

Optionally, the original corpus data is obtained through a distributed message queue, and the corpus database can be updated according to an update result of the distributed message queue.

Optionally, the association information includes at least one of: news information, navigation information, web page information, and search information.

According to another aspect of the present invention, there is provided a push apparatus for associated information, including: the learning module is suitable for performing machine learning on the obtained original corpus data according to a machine learning algorithm and determining the association relation between the obtained original corpus data; the storage module is suitable for storing the original corpus data and the incidence relation between the original corpus data into a preset corpus database; the determining module is suitable for determining the associated information corresponding to the display information according to the associated relation between the original corpus data stored in the corpus database; and the pushing module is suitable for pushing the associated information.

Optionally, the learning module is specifically configured to: and converting the obtained original corpus data into corresponding word vectors, inputting the word vectors into an input layer in a preset neural network model, and obtaining associated output results corresponding to the word vectors through an output layer in the neural network model.

Optionally, the neural network model further comprises: a hidden layer located between the input layer and the output layer; the learning module is further to: and performing feature extraction on the word vectors input by the input layer through the hidden layer, and outputting the associated output results corresponding to the word vectors by the output layer according to the result of the feature extraction.

Optionally, the learning module is further configured to: and judging whether the correlation output result corresponding to the word vector meets a preset precision condition, and correcting the neural network model according to a back propagation algorithm when the judgment result is negative.

Optionally, the determining module includes: the first determining unit is suitable for acquiring keywords corresponding to each piece of network information and determining the incidence mapping relation between each piece of network information according to the keywords corresponding to the network information and the incidence relation between the original corpus data stored in the corpus database; and the second determining unit is suitable for storing the association mapping relation among the network information into a preset mapping database, and determining the association information corresponding to the display information according to the keywords corresponding to the display information and the mapping database.

Optionally, the apparatus further comprises: and the information index establishing module is suitable for establishing an information index for inquiring the network information according to the key words according to the corresponding relation between the key words and the network information.

Optionally, the apparatus further comprises: the screening module is suitable for further determining the similarity between each piece of associated information and the display information according to a similarity algorithm when the number of the associated information is multiple, and deleting the associated information of which the similarity is greater than a preset first threshold and/or the similarity is smaller than a preset second threshold; wherein the first threshold is greater than the second threshold.

According to the method and the device for pushing the associated information, machine learning can be carried out on the obtained original corpus data according to a machine learning algorithm, so that the association relation between the original corpus data is determined; then, the association information corresponding to the display information can be determined from the determined association relationship between the original corpus data. Therefore, the method and the device can excavate the association relationship between the original corpus data in a machine learning mode, and the association relationship not only can embody the association between the similar meaning words, but also can embody the deeper association between the things through semantic analysis, so that the scheme in the method and the device can meet the requirements of users better, and the quality of the pushed association information is greatly improved.

The above description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating a method for pushing association information according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a push method for associated information according to a second embodiment of the present invention;

fig. 3 shows a block diagram of a push apparatus for associated information according to a third embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The invention provides a method and a device for pushing associated information, which can at least solve the technical problem that in the prior art, the pushing result cannot better meet the requirements of users because the association relation between semantics is not considered.

Example one

Fig. 1 shows a flowchart of a method for pushing association information according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

step S110: and performing machine learning on the obtained original corpus data according to a machine learning algorithm, and determining the association relation between the obtained original corpus data.

Specifically, the original corpus data is corpus data obtained by performing word processing such as part-of-speech segmentation and word meaning segmentation on a sentence in the acquired information or keyword extraction processing. For example, if a term in the acquired information is "related recommendation based on a semantic and knowledge base mixed model", the information may be processed into raw corpus data such as "semantic", "knowledge", "model", and "related recommendation" after performing word segmentation processing on the information. Therefore, when determining the association relationship between the acquired original corpus data, specifically, the association relationship between the words included in the acquired original corpus data is determined. The association relationship may include various association relationships including sentence component relationships and semantic relationships, such as a position of each word in the original corpus data, a distance between words, a filling relationship between words, and the like. The machine learning algorithm is specifically an algorithm for learning and correcting the association relationship between words in the obtained original corpus data. The machine learning algorithm can be implemented in various ways, for example, programming can be performed through an N-gram model, and various deep learning algorithms and neural network algorithms can be flexibly adopted.

Step S120: and storing the original corpus data and the association relation between the original corpus data into a preset corpus database.

Specifically, according to the association relationship between the original corpus data determined in step S110, the association relationship between the original corpus data and the original corpus data corresponding to the original corpus data is stored in a preset corpus database. The data stored in the preset corpus database can be continuously updated according to the updating condition of the online information, so that the effectiveness and the accuracy of the data stored in the corpus database are ensured.

Step S130: and determining the associated information corresponding to the display information according to the associated relation between the original corpus data stored in the corpus database, and pushing the associated information.

Wherein, the display information generally refers to: information displayed in the device display interface, i.e., information that the user is currently browsing. The display information and associated information may include various types of information, for example, may include at least one of: news information, navigation information, web page information, and search information. When the step is executed, the associated information corresponding to the display information can be directly determined according to the display information and the corpus database; or, the display information may be preprocessed first, and the associated information corresponding to the display information may be determined according to the preprocessing result of the display information and the corpus database. The preprocessing operation may include various operations, for example, a word segmentation process or a related process such as extracting a keyword may be included. In order to determine the associated information corresponding to the display information, a mapping relationship between the network information may be established in advance according to an associated relationship between original corpus data stored in the corpus database, and then the associated information corresponding to the display information is searched through the mapping relationship, and the searched associated information is pushed.

Therefore, in the method for pushing the associated information, the obtained original corpus data can be subjected to machine learning according to a machine learning algorithm, so that the association relation between the original corpus data is determined; then, the association information corresponding to the display information can be determined according to the determined association relationship between the original corpus data, and the association information is pushed. According to the invention, the incidence relation between the original corpus data can be mined in a machine learning manner, and the incidence relation not only can embody the incidence relation between the similar meaning words, but also can embody the deeper incidence relation between things through semantic analysis. Therefore, the scheme of the invention can better adapt to the related information pushing requirements of users in various aspects, and greatly improves the quality of the pushed related information.

Example two

Fig. 2 shows a flowchart of a push method for association information according to a second embodiment of the present invention. As shown in fig. 2, the method comprises the steps of:

step S210: and performing machine learning on the obtained original corpus data according to a machine learning algorithm, and determining the association relation between the obtained original corpus data.

The original corpus data may be obtained in a variety of ways, for example, network information on the internet may be obtained by a web crawler, and the original corpus data may be obtained according to the obtained network information. In one implementation, a large amount of network information may be crawled by a web crawler in advance, and the original corpus data may be obtained according to the crawled network information. In another implementation mode, the web crawler may also crawl recently updated network information periodically, and the crawled network information is used as an incremental part in the original corpus data, so that the original corpus data is dynamically expanded periodically. In this embodiment, the above two modes may be combined. Specifically, when the original corpus data is determined according to the network information, the network information may be directly used as the original expected data, or the network information may be subjected to a preset process first, and the processing result is used as the original corpus data. For example, the original corpus data may be obtained by performing word segmentation processing such as part-of-speech segmentation and word sense segmentation and/or keyword extraction processing on a sentence in the acquired network information. For example, the following steps are carried out: if the sentence in the acquired information is "related recommendation based on a semantic and knowledge base mixed model", the information may be processed into original corpus data such as "semantic", "knowledge", "model", and "related recommendation" after performing word segmentation processing on the information. In addition, for the convenience of access, the original corpus data can be stored in a distributed message queue, so that the original corpus data can be acquired by using the distributed message queue. For example, every time new original corpus data is generated according to the latest network information crawled by a web crawler, the new original corpus data is stored in a distributed message queue for subsequent consumption. The purpose of parallel consumption can be achieved through the distributed message queues, and therefore processing efficiency is improved.

Next, machine learning can be performed on the original corpus data through various machine learning algorithms. For example, various deep learning algorithms, neural network algorithms, classification algorithms, and the like can be adopted, and the specific type of the machine learning algorithm is not limited in the present invention, as long as the learning and correction can be performed on the association relationship between words in the original corpus data.

In the present embodiment, machine learning is performed by a neural network model implemented by a neural network algorithm. The inventor finds that the processing time can be shortened and the processing accuracy can be improved by converting the original corpus data into the corresponding word vectors and inputting the word vectors into the neural network model in the process of realizing the method. Therefore, in this embodiment, the obtained original corpus data is first converted into corresponding word vectors. The word vector can express the relation among all vocabularies through a vector form, so that text processing is simplified into vector operation, and the similarity on text semantics is expressed by calculating the similarity on a vector space. Thus, the word vector can delineate to some extent the semantic distance from word to word. The word vector can be obtained in various ways, for example, the original corpus data can be used as a training data set of a training word vector, and the word vector is obtained through training; word vectors may also be determined directly from the number of occurrences of each word.

And then, inputting the word vectors into an input layer in a preset neural network model, and acquiring associated output results corresponding to the word vectors through an output layer in the neural network model. In this embodiment, the neural network model includes three layers: an input layer, an output layer, and a hidden layer between the input layer and the output layer. Specifically, an input layer in the neural network model is used for receiving an input word vector, and the input word vector is an input port in the neural network model; the output layer in the neural network model is used for outputting the correlation output result corresponding to the word vector and is an output port in the neural network model; the hidden layer is positioned between the input layer and the output layer and is specifically used for carrying out feature extraction on the input word vectors. Wherein the feature extraction includes: and extracting the association relation including sentence components and semantic relations, such as the position of words, the distance between words, the filling relation between words and the like in the original language material data, and correspondingly, generating the association output result according to the result of feature extraction of the hidden layer aiming at the word vector input by the input layer.

Therefore, the neural network model can analyze the correlation between the original corpus data. In addition, in order to improve the accuracy of the correlation output result, the above process of performing machine learning on the original corpus data further includes: judging whether the correlation output result corresponding to the word vector meets a preset precision condition, and correcting the neural network model according to a back propagation algorithm when the judgment result is negative. When the correction process is performed, the correction can be performed in the training process of the neural network model or in the prediction process of the neural network model. The preset precision condition can be set by those skilled in the art according to actual conditions. For example, an accuracy threshold may be preset, all output results of the neural network model may be periodically obtained in the prediction stage or a part of the output results may be randomly extracted, and the neural network model may be modified when it is determined that the accuracy of the output results does not reach the threshold. For another example, it may also be determined whether an output result is correct every time an output result is obtained in the training stage, and if the determination result is negative, the neural network model is corrected. For example, the following steps are carried out: if three vocabularies with association relations are obtained in advance in the training stage, when any two vocabularies are input, if the other vocabulary can be accurately output, the result is correct, otherwise, the parameters and/or the weight values contained in the neural network model need to be adjusted until the result is correct. In this embodiment, the learning process of the neural network model can be supervised by a back propagation algorithm, the algorithm can send a training input into the network to obtain an excitation response, and perform a difference between the excitation response and a target output corresponding to the training input to obtain a response error of the hidden layer and the output layer, and then adjust each word vector correspondingly by adjusting attributes such as a weight and a parameter of each word vector, so that the neural network model is corrected.

Step S220: and storing the original corpus data and the association relation between the original corpus data into a preset corpus database.

Specifically, the association relationship between the original corpus data includes an association output result corresponding to a word vector acquired through an output layer in the neural network model. In this step, the original corpus data obtained in step S210 and the associated relationship between the original corpus data corresponding to the original corpus data are stored in a preset corpus database. Wherein, the corpus database can be updated according to the update result of the distributed message queue, that is: the data stored in the preset corpus database can be continuously updated according to the updating condition of the distributed message queue storing the original corpus data. That is, in this embodiment, the original corpus data and the association relationship thereof are dynamically changed. In specific implementation, the updated original corpus data may be acquired at preset time intervals, and correspondingly, step S210 and step S220 are repeatedly executed at preset time intervals. The specific time interval may be determined according to the update frequency of the on-line network information and/or the timeliness of the information. The method can ensure that the contents stored in the corpus database have stronger timeliness and higher accuracy by dynamically updating the original corpus data and the incidence relation thereof. In a specific case, due to the high timeliness of the news information, the corpus data in different time periods may have different association relations, for example, when yaoming participates in the tournament, the association between "yaoming" and "basketball" is high, and when yaoming is retired, the yaoming gradually exits from the court and participates in the public welfare, so that the association between "yaoming" and "basketball" is reduced, and the association between "yaoming" and "public welfare" is increased. Therefore, the accuracy of the corpus database can be improved by executing the steps S210 and S220 in a loop.

Step S230: and acquiring key words corresponding to each network information, and determining the association mapping relation between each network information according to the key words corresponding to the network information and the association relation between the original corpus data stored in the corpus database.

Specifically, in this step, firstly, extracting keywords for each network information, then establishing an association mapping relationship between the keywords according to an association relationship between original corpus data stored in the corpus database, and further determining an association mapping relationship between corresponding network information according to the association mapping relationship between the associated words. In addition, in order to facilitate quick and accurate query of the corresponding relationship between the network information and the keywords thereof, after the keywords corresponding to each piece of network information are obtained, an information index for querying the network information according to the keywords can be further established according to the corresponding relationship between the keywords and each piece of network information, and the information index can be an inverted index.

For example, assume that the association relationship between the original corpus data stored in the corpus database includes the following set of data records: yaoming, Koubi and basketball. Accordingly, the association mapping relationship between the associated words is established as follows: Yaoming-Kobe, basketball; cobi-yaoming and basketball; basketball-Yaoming, Kebi. In addition, assume that the established inverted index includes the following records: yaoming-document ID3, Cocobi-document ID4, basketball-document ID 6. Therefore, the relationship of the association mapping between the network information includes: document ID 3-document ID4, document ID 6; document ID 4-document ID3, document ID 6; document ID 6-document ID4, document ID 3. Therefore, through the step, the incidence mapping relation among the network information can be determined according to the incidence relation among the original language material data, so that a basis is provided for the subsequent information push.

Step S240: and storing the association mapping relation among the network information into a preset mapping database, determining the association information corresponding to the display information according to the keywords corresponding to the display information and the mapping database, and pushing the association information.

Specifically, in this step, the associated mapping relationship between the network information established in step S230 is first stored in a preset mapping database. The preset mapping database is an online database, and can perform operations such as online deployment, right collection, query and the like. And the preset mapping database can further dynamically update the association mapping relation stored in the language database according to the updating of the language database. In a specific implementation, the preset mapping database may be a Redis database.

And then extracting a keyword corresponding to the display information, and searching the associated information of the associated mapping relation corresponding to the keyword in a mapping database according to the keyword. In the present embodiment, the meaning of the associated information may include: information that can represent different aspects of a thing, while not being information similar to the current thing. If the current information and the current object are similar information, the current information may be repeated information of the current object. Therefore, the most desirable related information is information that can describe different aspects of one object, and if the related information indicates the same aspect of the same object, the smaller the similarity, the better. For example, the following steps are carried out: the information of the Yaoming NBA classic match and the information of the Yaoming NBA highlight match belong to the same information about the Yaoming match, and both comprise the same information of the Yaoming in the NBA multi-match information, so that the two information belong to the repeated information; however, the information of "yaoming NBA classic race" and "yaoming public welfare" belong to information on two different aspects of yaoming, i.e., information on different aspects of one thing described above, and the information of "yaoming NBA classic race" and "yaoming public welfare" belongs to the better related information mentioned in the present embodiment.

And finally, determining the searched associated information as associated information corresponding to the display information. And pushing the associated information. When the number of the determined associated information is multiple, further determining the similarity between each associated information and the display information according to a similarity algorithm, and deleting the associated information of which the similarity is greater than a preset first threshold and/or the similarity is less than a preset second threshold; wherein the first threshold is greater than the second threshold. Specifically, the preset first threshold is a minimum similarity value obtained by a person skilled in the art according to actual statistics when the related information and the displayed information are repetitive information, that is: when the calculation result according to the similarity algorithm is larger than a preset first threshold value, the associated information and the display information are repeated information; the preset second threshold is a maximum similarity value between the associated information and the display information, which is obtained by statistics of a person skilled in the art according to actual conditions, and is the non-associated information, that is: and when the calculation result is smaller than a preset second threshold value, the associated information and the display information are non-associated information. In a specific implementation, when the number of the associated information is multiple, in order to screen out too similar associated information and screen out too small associated information, similarity between each associated information and the display information is further calculated according to a similarity calculation method, the calculation result is compared with a preset first threshold and/or a second threshold, and when the calculation result is greater than the preset first threshold and/or the calculation result is less than the preset second threshold, the associated information corresponding to the calculation result is deleted. The above similarity algorithm is selected or set by those skilled in the art according to the actual situation, and the present invention is not limited thereto.

In addition, when the number of the associated information is plural, further processing may be performed on the plural associated information. For example, the associated information weight may also be set according to the result of the similarity calculation, that is: setting the weight of the associated information according to the order of the relevance of the associated information from high to low, so that the associated information can be displayed according to the set weight and the order of the relevance from high to low, and more accurate pushing results of the associated information are provided for a user; or, the weight of the associated information is set according to the information such as the search amount, publication time and the like of the keywords contained in the associated information, so that the associated information can be sorted according to a certain rule, and the sorted result is displayed in the push result, so as to meet various requirements of the user.

In addition, in the present embodiment, various changes and modifications of technical details thereof may be made by those skilled in the art. For example, the neural network model can be implemented based on an N-Gram model, and the incidence relation between a word and surrounding words can be learned and predicted by using the N-Gram model, so that the prediction accuracy can be improved by adding the N-Gram model to the neural network model. Moreover, when constructing the word vector corresponding to the original corpus data in the present embodiment, the word vector may be further determined based on the TF-IDF algorithm, by which the weight value of the vocabulary can be set based on the frequency of occurrence of the vocabulary in the current article and the frequency of occurrence of the vocabulary in other articles: if the appearance frequency of a certain vocabulary in the current article is high and the appearance frequency of the vocabulary in other articles is low, a higher weight value is set for the vocabulary, so that the accuracy of semantic analysis can be improved.

Therefore, in the method for pushing the associated information provided by the invention, firstly, machine learning is carried out on the obtained original corpus data according to a machine learning algorithm, and the association relation between the obtained original corpus data is determined; then storing the original corpus data and the incidence relation between the original corpus data into a preset corpus database; finally, obtaining keywords corresponding to each network information, and determining an association mapping relation between each network information according to the keywords corresponding to the network information and an association relation between original corpus data stored in a corpus database; and storing the associated mapping relation among the network information into a preset mapping database, and determining the associated information corresponding to the display information according to the keywords corresponding to the display information and the mapping database. Therefore, the scheme of the invention solves the problems that the relevance of the pushed relevant information is not high or the pushed relevant information is repetitive when the relevant information is pushed, and provides a method for searching and pushing the relevant information by utilizing the relevance relation including the semantics, so that the accuracy of mining the relevant information is increased, the relevance of the information can be analyzed based on the semantics included in the information, and the quality of the pushed relevant information is improved.

EXAMPLE III

Fig. 3 shows a block diagram of a push apparatus for associated information according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes: a learning module 31, a storage module 32, a determination module 33, a pushing module 34, an information index establishing module 35, and a filtering module 36. Wherein, the determining module 33 further comprises: a first determination unit 331 and a second determination unit 332.

The learning module 31 is adapted to perform machine learning on the obtained original corpus data according to a machine learning algorithm, and determine an association relationship between the obtained original corpus data.

The original corpus data may be obtained in a variety of ways, for example, network information on the internet may be obtained by a web crawler, and the original corpus data may be obtained according to the obtained network information. In one implementation, a large amount of network information may be crawled by a web crawler in advance, and the original corpus data may be obtained according to the crawled network information. In another implementation mode, the web crawler may also crawl recently updated network information periodically, and the crawled network information is used as an incremental part in the original corpus data, so that the original corpus data is dynamically expanded periodically. In this embodiment, the above two modes may be combined. Specifically, when determining the original corpus data according to the network information, the learning module 31 may directly use the network information as the original expected data, or may perform a predetermined process on the network information first, and use the processing result as the original corpus data. For example, the original corpus data may be obtained by performing word segmentation processing such as part-of-speech segmentation and word sense segmentation and/or keyword extraction processing on the sentences in the acquired network information. In addition, for the convenience of access, the original corpus data can be stored in a distributed message queue, so that the original corpus data can be acquired by using the distributed message queue. For example, every time new original corpus data is generated according to the latest network information crawled by a web crawler, the new original corpus data is stored in the distributed message queue for subsequent consumption. The purpose of parallel consumption can be achieved through the distributed message queues, and therefore processing efficiency is improved.

When the learning module 31 performs machine learning, it may perform machine learning on the original corpus data through various machine learning algorithms. For example, various deep learning algorithms, neural network algorithms, classification algorithms, and the like can be adopted, and the specific type of the machine learning algorithm is not limited in the present invention, as long as the learning and correction can be performed on the association relationship between words in the original corpus data. Specifically, in the present embodiment, the learning module 31 performs machine learning by a neural network model implemented by a neural network algorithm. The inventor finds that the processing time can be shortened and the processing accuracy can be improved by converting the original corpus data into the corresponding word vectors and then inputting the word vectors into the neural network model in the process of realizing the method. Therefore, in this embodiment, the learning module 31 first converts the obtained original corpus data into corresponding word vectors, then inputs the word vectors into an input layer in a preset neural network model, and obtains associated output results corresponding to the word vectors through an output layer in the neural network model. Specifically, the neural network model in the present embodiment includes a three-layer structure: an input layer, an output layer, and a hidden layer between the input layer and the output layer. The input layer in the neural network model is used for receiving input word vectors and is an input port in the neural network model; the output layer in the neural network model is used for outputting the correlation output result corresponding to the word vector and is an output port in the neural network model; the hidden layer is positioned between the input layer and the output layer and is specifically used for carrying out feature extraction on the input word vectors. Here, the feature extraction includes: extracting the correlation relations including sentence components and semantic relations, such as the positions of words, the distances between words, the filling relations between words and the like in the original corpus data, and correspondingly, generating the correlation output result according to the result of feature extraction performed by the hidden layer aiming at the word vectors input by the input layer.

In addition, in order to improve the accuracy of the correlation output result, the process of the learning module 31 performing machine learning on the original corpus data further includes: judging whether the correlation output result corresponding to the word vector meets a preset precision condition, and correcting the neural network model according to a back propagation algorithm when the judgment result is negative. The correction can be performed in the training process of the neural network model, and can also be performed in the prediction process of the neural network model. Wherein, the preset precision condition can be set by the technicians in the field according to the actual situation. For example, an accuracy threshold may be preset, all output results of the neural network model may be periodically obtained in the prediction stage or a part of the output results may be randomly extracted, and the neural network model may be modified when it is determined that the accuracy of the output results does not reach the threshold. Or, when an output result is obtained in the training stage, whether the output result is correct or not can be determined, and if the output result is not correct, the neural network model is corrected. The learning module 31 may supervise the learning process of the neural network model through a back propagation algorithm when correcting the learning result. The back propagation algorithm can send the training input into the network to obtain an excitation response, the excitation response and a target output corresponding to the training input are subjected to difference solving, so that response errors of the hidden layer and the output layer are obtained, and then each word vector is correspondingly adjusted by adjusting attributes such as weight and parameters of each word vector, so that the neural network model is corrected.

The storage module 32 is adapted to store the original corpus data and the association relationship between the original corpus data into a preset corpus database.

Specifically, the association relationship between the original corpus data includes an association output result corresponding to a word vector acquired through an output layer in the neural network model. The storage module 32 is specifically configured to store the association relationship between the original corpus data acquired in the learning module 31 and the original corpus data corresponding to the original corpus data in a preset corpus database. Wherein, the corpus database can be updated according to the update result of the distributed message queue, namely: the data stored in the preset corpus database can be continuously updated according to the updating condition of the distributed message queue storing the original corpus data. That is, the original corpus data and the association relationship thereof stored in the storage module 32 are dynamically changed. In specific implementation, the updated original corpus data may be acquired at preset time intervals, and accordingly, the updating process is repeatedly executed at preset time intervals in the learning module 31 and the storage module 32. The specific time interval may be determined according to the update frequency of the on-line network information and/or the timeliness of the information. The method can ensure that the contents stored in the corpus database have stronger timeliness and higher accuracy by dynamically updating the original corpus data and the incidence relation thereof. In the specific implementation, since the timeliness of the news information is high, the corpus data in different time periods may have different association relations, and therefore, the accuracy of the corpus database can be effectively improved by circularly executing the updating process in the storage module 32.

The determining module 33 is adapted to determine the associated information corresponding to the display information according to the associated relationship between the original corpus data stored in the corpus database. Wherein, the determining module 33 further comprises: a first determination unit 331 and a second determination unit 332.

The first determining unit 331 is adapted to obtain a keyword corresponding to each network information, and determine an association mapping relationship between each network information according to the keyword corresponding to the network information and an association relationship between original corpus data stored in the corpus database.

Specifically, the first determining unit 331 firstly extracts keywords for each network information, then establishes an association mapping relationship between the keywords according to an association relationship between original corpus data stored in the corpus database, and further determines an association mapping relationship between corresponding network information according to the association mapping relationship between the associated words. Here, the first determining unit 331 can determine the association mapping relationship between the network information according to the association relationship between the original corpus data, so as to provide a basis for the subsequent information pushing.

The second determining unit 332 is adapted to store the association mapping relationship between the network information into a preset mapping database, and determine the association information corresponding to the display information according to the keyword corresponding to the display information and the mapping database.

Specifically, the second determining unit 332 first stores the association mapping relationship between the respective network information established in the first determining unit 331 into a preset mapping database. The preset mapping database is an online database, and can perform operations such as online deployment, right collection, query and the like. And the preset mapping database can further dynamically update the association mapping relation stored in the language database according to the updating of the language database. In a specific implementation, the preset mapping database may be a Redis database. Then, the second determining unit 332 extracts the corresponding keyword in the display information, searches the associated information having the associated mapping relationship corresponding to the keyword in the mapping database according to the keyword, and sends the searched corresponding associated information to the pushing module 34. In this embodiment, the meaning of the associated information may include: information that can represent different aspects of a thing, while not being information-like to the current thing. If the current information and the current object are similar information, the current information may be repeated information of the current object. Therefore, the most desirable related information is information that can describe different aspects of one object, and if the related information indicates the same aspect of the same object, the smaller the similarity, the better.

The push module 34 is adapted to push the association information.

Specifically, after receiving the associated information sent by the second determining unit 332, the pushing module 34 determines the associated information as associated information corresponding to the display information and pushes the associated information.

The information index establishing module 35 is adapted to establish an information index for querying the network information according to the keywords according to the corresponding relationship between the keywords and the network information.

Specifically, in order to facilitate to quickly and accurately query the correspondence between the network information and the keywords thereof, after the first determining module 331 obtains the keywords corresponding to each piece of network information, the information index establishing module 35 further establishes an information index for querying the network information according to the keywords according to the correspondence between the keywords and each piece of network information. The information index may be an inverted index.

The screening module 36 is adapted to, when the number of the associated information is multiple, further determine the similarity between each associated information and the display information according to a similarity algorithm, and delete the associated information of which the similarity is greater than a preset first threshold and/or the similarity is smaller than a preset second threshold; wherein the first threshold is greater than the second threshold.

Specifically, the preset first threshold is a minimum similarity value obtained by statistics of a person skilled in the art according to actual conditions when the associated information and the display information are repeated information, that is, the minimum similarity value is: when the calculation result according to the similarity algorithm is larger than a preset first threshold value, the associated information and the display information are repeated information; the preset second threshold is a maximum similarity value between the associated information and the display information, which is obtained by statistics of a person skilled in the art according to actual conditions, and is the non-associated information, that is: and when the calculation result is smaller than a preset second threshold value, the associated information and the display information are non-associated information. In a specific implementation, when the number of the associated information is multiple, in order to screen out too similar associated information and screen out too small associated information, the screening module 36 calculates the similarity between each associated information and the display information according to a similarity calculation method, compares the calculation result with a preset first threshold and/or a preset second threshold, and deletes the associated information corresponding to the calculation result when the calculation result is greater than the preset first threshold and/or the calculation result is less than the preset second threshold. The above similarity calculation method is selected or set by those skilled in the art according to actual situations, and the present invention is not limited thereto.

Finally, it should be noted that specific structures and operation principles of the above modules may refer to descriptions of corresponding steps in the method embodiments, and are not described herein again. In addition, those skilled in the art may also combine the above modules into fewer modules or split the modules into more modules, and may also omit some of the modules, for example, the information index creating module and the screening module may be omitted.

Therefore, in the related information pushing device provided by the invention, firstly, the learning module 31 performs machine learning on the obtained original corpus data according to a machine learning algorithm to determine the related relationship between the obtained original corpus data; storing the original corpus data and the association relation between the original corpus data into a preset corpus database through a storage module 32; acquiring a keyword corresponding to each network information by a first determining unit 331 in the determining module 33, and determining an association mapping relationship between each network information according to the keyword corresponding to the network information and an association relationship between original corpus data stored in a corpus database; the second determining unit 332 in the determining module 33 stores the association mapping relationship between the network information into a preset mapping database, and determines the association information corresponding to the display information according to the keyword corresponding to the display information and the mapping database; finally, the determined association information is pushed by the pushing module 34. In the process of implementing the device of the present invention, an information index for querying the network information according to the keyword is further established by the information index establishing module 35 according to the corresponding relationship between the keyword and the network information; and when the number of the determined associated information is multiple, determining the similarity between each associated information and the display information through the screening module 36 according to a similarity algorithm, and deleting the associated information of which the similarity is greater than a preset first threshold and/or the similarity is less than a preset second threshold; wherein the first threshold is greater than the second threshold. Therefore, the method solves the problem that when the associated information is pushed, the pushing result cannot better meet the requirements of the user because the associated relation between the semantics is not considered, provides a method for searching and pushing the associated information by utilizing the associated relation including the semantics, increases the accuracy of mining the associated information, can analyze the relevance of the information based on the semantics included in the information, and improves the quality of the pushed associated information.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than others, the combination of features of different embodiments is intended to be within the scope of the invention and form part of different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in the information-associated push device module according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method for pushing associated information includes:

performing machine learning on the obtained original corpus data according to a machine learning algorithm, and determining an association relation between the obtained original corpus data; the association relation comprises a sentence component relation and a semantic relation, and specifically comprises the position of each word in the original corpus data, the distance between the words and the filling relation between the words; and the associated information is used for representing different aspects of the current object and is similar information which does not belong to the current object;

storing the original corpus data and the association relation between the original corpus data into a preset corpus database;

determining associated information corresponding to display information according to the association relation between the original corpus data stored in the corpus database, and pushing the associated information;

extracting keywords aiming at each network information, establishing an incidence mapping relation between the keywords according to the incidence relation between original corpus data stored in a corpus database, and determining the incidence mapping relation between the corresponding network information according to the incidence mapping relation between the keywords; storing the established association mapping relation among the network information into a preset mapping database; extracting keywords corresponding to the display information, and searching for associated information of an associated mapping relation corresponding to the keywords in a mapping database according to the keywords; the display information is information which is displayed in a display interface of the equipment and is browsed by a user currently, and the display information and the associated information comprise: news information, navigation information, web page information, and/or search information.

2. The method according to claim 1, wherein the step of performing machine learning on the obtained raw corpus data according to a machine learning algorithm to determine the association relationship between the obtained raw corpus data specifically comprises:

and converting the obtained original corpus data into corresponding word vectors, inputting the word vectors into an input layer in a preset neural network model, and obtaining associated output results corresponding to the word vectors through an output layer in the neural network model.

3. The method of claim 2, wherein the neural network model further comprises: a hidden layer located between the input layer and the output layer; the step of obtaining the associated output result corresponding to the word vector through the output layer in the neural network model specifically includes:

and performing feature extraction on the word vectors input by the input layer through the hidden layer, and outputting associated output results corresponding to the word vectors by the output layer according to the result of the feature extraction.

4. The method according to claim 2 or 3, wherein the step of performing machine learning on the obtained raw corpus data according to a machine learning algorithm further comprises:

and judging whether the correlation output result corresponding to the word vector meets a preset precision condition, and correcting the neural network model according to a back propagation algorithm when the judgment result is negative.

5. The method of claim 1, wherein after the step of obtaining the keyword corresponding to each piece of network information, the method further comprises the steps of: and establishing an information index for inquiring the network information according to the key words according to the corresponding relation between the key words and the network information.

6. The method of claim 1 or 5, wherein the step of determining the associated information corresponding to the display information according to the keyword corresponding to the display information and the mapping database further comprises:

when the number of the associated information is multiple, further determining the similarity between each associated information and the display information according to a similarity algorithm, and deleting the associated information of which the similarity is greater than a preset first threshold and/or the similarity is smaller than a preset second threshold; wherein the first threshold is greater than the second threshold.

7. The method according to claim 1, wherein the original corpus data is obtained via a distributed message queue, and the corpus database is updatable according to an update result of the distributed message queue.

8. A push device of associated information, comprising:

the learning module is suitable for performing machine learning on the obtained original corpus data according to a machine learning algorithm and determining the association relation between the obtained original corpus data; the association relation comprises a sentence component relation and a semantic relation, and specifically comprises the position of each word in the original corpus data, the distance between the words and the filling relation between the words; and the associated information is used for representing different aspects of the current object and is similar information which does not belong to the current object; the storage module is suitable for storing the original corpus data and the incidence relation between the original corpus data into a preset corpus database;

the determining module is suitable for determining the associated information corresponding to the display information according to the associated relation between the original corpus data stored in the corpus database; extracting keywords aiming at each network information, establishing an incidence mapping relation between the keywords according to the incidence relation between original corpus data stored in a corpus database, and determining the incidence mapping relation between the corresponding network information according to the incidence mapping relation between the keywords; storing the established association mapping relation among the network information into a preset mapping database; extracting keywords corresponding to the display information, and searching for associated information of an associated mapping relation corresponding to the keywords in a mapping database according to the keywords; the display information is information which is displayed in a display interface of the equipment and is browsed by a user currently, and the display information and the associated information comprise: news information, navigation information, web page information, and/or search information;

and the pushing module is suitable for pushing the associated information.

9. The apparatus of claim 8, wherein the learning module is specifically configured to:

10. The apparatus of claim 9, wherein the neural network model further comprises: a hidden layer located between the input layer and the output layer; the learning module is further to:

11. The apparatus of claim 9 or 10, wherein the learning module is further to:

12. The apparatus of claim 8, wherein the apparatus further comprises: and the information index establishing module is suitable for establishing an information index for inquiring the network information according to the key words according to the corresponding relation between the key words and the network information.

13. The apparatus of claim 8 or 12, wherein the apparatus further comprises:

the screening module is suitable for further determining the similarity between each piece of associated information and the display information according to a similarity algorithm when the number of the associated information is multiple, and deleting the associated information of which the similarity is greater than a preset first threshold and/or the similarity is smaller than a preset second threshold; wherein the first threshold is greater than the second threshold.

14. The apparatus according to claim 8, wherein the original corpus data is obtained via a distributed message queue, and the corpus database is updatable according to an update result of the distributed message queue.