CN107944032B

CN107944032B - Method and apparatus for generating information

Info

Publication number: CN107944032B
Application number: CN201711326807.4A
Authority: CN
Inventors: 张晓寒; 李双婕; 史亚冰; 梁海金; 张扬; 李京峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2021-12-31
Anticipated expiration: 2037-12-13
Also published as: CN107944032A

Abstract

The embodiment of the application discloses a method and a device for generating information. One embodiment of the method comprises: acquiring an article to be mined; mining at least two types of topics of the article to be mined by utilizing at least two topic mining modes, and determining the association degree of the mined topics and the article to be mined; and determining the theme of the article to be mined and the association degree of the article to be mined and the theme based on the mined theme and the determined association degree. According to the embodiment, the topic of the article to be mined is mined from different dimensions so as to obtain a more comprehensive and accurate topic.

Description

Method and apparatus for generating information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to a method and a device for generating information.

Background

At present, the internet is an important way for people to obtain information, and in order to accurately recommend articles of interest to users, the topics of the articles need to be accurately understood, and meanwhile, the association degree between the articles and the topics is calculated. At present, the theme of an article can be generated in a way of extracting keywords of the article, for example, firstly, words are cut into the full text of the article to obtain a word set; and then, filtering, calculating word frequency and the like are carried out on the word set, and the keywords in the obtained word set are used as the topic mining result of the article, so that the accuracy of the topic mining mode is easily influenced by the factors such as word segmentation and alias. At present, the topics of the articles may also be generated in an article topic classification manner, for example, word vector features are extracted from sentences in the articles, and the article topics are obtained by classifying the articles, and topic mining using this topic mining manner is easily limited by a candidate topic set, for example, if the candidate topic set used for classification is small and candidate topics are wide, then the topic mining range is limited, and the articles cannot be expressed comprehensively and accurately.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating information.

In a first aspect, an embodiment of the present application provides a method for generating information, including: acquiring an article to be mined; mining at least two types of topics of the article to be mined by utilizing at least two topic mining modes, and determining the association degree of the mined topics and the article to be mined; and determining the theme of the article to be mined and the association degree of the article to be mined and the theme based on the mined theme and the determined association degree.

In some embodiments, the mining at least two types of topics of the article to be mined by using at least two topic mining methods, and determining the association degree between the mined topics and the article to be mined includes: carrying out named entity recognition on the article to be mined, and determining whether the article to be mined comprises at least one first type article theme based on a named entity recognition result; in response to determining that the article to be mined comprises at least one first-type article topic, determining a first degree of association of the article to be mined with each of the at least one first-type article topic.

In some embodiments, performing named entity recognition on the article to be mined, and determining whether the article to be mined includes at least one first-type article topic based on a result of the named entity recognition includes: carrying out named entity identification on the article to be mined and determining whether the article to be mined contains at least one named entity; in response to the fact that the article to be mined contains at least one named entity, matching each named entity in the at least one named entity with a candidate topic in a pre-established candidate topic set, and determining whether the article to be mined contains at least one candidate topic according to a matching result, wherein the candidate topic set is constructed based on a knowledge graph; in response to the fact that the article to be mined comprises at least one candidate topic, for each candidate topic in the at least one candidate topic, counting the frequency of the candidate topic appearing in the article to be mined, and if the frequency of the candidate topic appearing in the article to be mined exceeds a preset first threshold value, determining that the candidate topic is a first type article topic of the article to be mined.

In some embodiments, the determining a first degree of association between the article to be mined and each of the at least one first-type article topic in response to determining that the article to be mined includes at least one first-type article topic includes: for each first-type article topic in the at least one first-type article topic, counting the frequency of the first-type article topic appearing in the article to be mined, and determining the first association degree of the article to be mined and the first-type article topic according to the counted frequency.

In some embodiments, counting the frequency of occurrence of the candidate topic in the article to be mined includes: determining whether the article to be mined contains the alias of the candidate theme according to the knowledge graph; in response to the fact that the article to be mined contains the alias of the candidate topic, counting a first frequency of the alias of the candidate topic appearing in the article to be mined; counting the second frequency of the candidate theme appearing in the article to be mined; and calculating the sum of the first frequency and the second frequency, and taking the calculation result as the frequency of the candidate topic appearing in the article to be mined.

In some embodiments, the mining at least two types of topics of the article to be mined by using at least two topic mining methods, and determining the association degree between the mined topics and the article to be mined includes: determining whether the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, wherein the source confidence of the source information of the article to be mined is acquired from a preset source information and source confidence relation table, and the source information and source confidence are correspondingly stored in the source information and source confidence relation table; and in response to determining that the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, taking the source information of the article to be mined as a second type of article topic, and taking the source confidence of the source information of the article to be mined as a second association degree of the article to be mined and the second type of article topic.

In some embodiments, the mining at least two types of topics of the article to be mined by using at least two topic mining methods, and determining the association degree between the mined topics and the article to be mined includes: performing word segmentation processing on the article to be mined to obtain at least one word segmentation; importing the at least one word segmentation into a pre-established topic classification model to obtain the probability that the article to be mined belongs to each third type candidate article topic in a preset third type candidate article topic set; and determining a third type article topic of the article to be mined and a third association degree between the article to be mined and the determined third type article topic based on the probability that the article to be mined belongs to each third type candidate article topic in the third type candidate article topic collection.

In some embodiments, the topic classification model is a deep neural network; and the method also comprises the step of establishing the deep neural network, which comprises the following steps: performing word segmentation processing on the sample article to obtain at least one sample word segmentation; filtering the at least one sample word segmentation to obtain a sample word segmentation set of the sample article; and taking the sample word segmentation set as input, taking a preset theme of the sample article as output, training an initial deep neural network, and obtaining the deep neural network.

In some embodiments, determining the topic of the article to be mined and the association degree of the article to be mined with the topic based on the mined topic and the determined association degree includes: when the mined topics comprise at least two types of topics, normalizing the association degree of the article to be mined and the topic of the type for each type of topic in the topics of the at least two types, and weighting the association degree after the normalization processing.

In some embodiments, the above method further comprises: and responding to the fact that the target keywords are matched with the theme of the article to be mined, and pushing the article to be mined.

In a second aspect, an embodiment of the present application provides an apparatus for generating information, including: the acquisition unit is used for acquiring an article to be mined; the mining unit is used for mining at least two types of topics of the article to be mined by utilizing at least two topic mining modes and determining the association degree of the mined topics and the article to be mined; and the determining unit is used for determining the theme of the article to be mined and the association degree of the article to be mined and the theme based on the mined theme and the determined association degree.

In some embodiments, the excavation unit includes: the recognition subunit is used for carrying out named entity recognition on the article to be mined and determining whether the article to be mined comprises at least one first type article theme based on a named entity recognition result; the first determining subunit is configured to determine, in response to determining that the article to be mined includes at least one first-type article topic, a first association degree between the article to be mined and each of the at least one first-type article topic.

In some embodiments, the identifier unit comprises: the identification and determination unit is used for carrying out named entity identification on the article to be mined and determining whether the article to be mined contains at least one named entity; the matching and determining unit is used for responding to the fact that the article to be mined contains at least one named entity, matching each named entity in the at least one named entity with a candidate theme in a preset candidate theme set, and determining whether the article to be mined contains at least one candidate theme according to a matching result, wherein the candidate theme set is constructed on the basis of a knowledge graph; and the statistic and determination unit is used for responding to the fact that the article to be mined comprises at least one candidate topic, counting the frequency of the candidate topic appearing in the article to be mined for each candidate topic in the at least one candidate topic, and determining the candidate topic as a first type of article topic of the article to be mined if the frequency of the candidate topic appearing in the article to be mined exceeds a preset first threshold value.

In some embodiments, the first determining subunit is further configured to: for each first-type article topic in the at least one first-type article topic, counting the frequency of the first-type article topic appearing in the article to be mined, and determining the first association degree of the article to be mined and the first-type article topic according to the counted frequency.

In some embodiments, the statistics and determination unit is further configured to: determining whether the article to be mined contains the alias of the candidate theme according to the knowledge graph; in response to the fact that the article to be mined contains the alias of the candidate topic, counting a first frequency of the alias of the candidate topic appearing in the article to be mined; counting the second frequency of the candidate theme appearing in the article to be mined; and calculating the sum of the first frequency and the second frequency, and taking the calculation result as the frequency of the candidate topic appearing in the article to be mined.

In some embodiments, the excavation unit described above is further configured to: determining whether the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, wherein the source confidence of the source information of the article to be mined is acquired from a preset source information and source confidence relation table, and the source information and source confidence are correspondingly stored in the source information and source confidence relation table; and in response to determining that the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, taking the source information of the article to be mined as a second type of article topic, and taking the source confidence of the source information of the article to be mined as a second association degree of the article to be mined and the second type of article topic.

In some embodiments, the excavation unit described above is further configured to: performing word segmentation processing on the article to be mined to obtain at least one word segmentation; importing the at least one word segmentation into a pre-established topic classification model to obtain the probability that the article to be mined belongs to each third type candidate article topic in a preset third type candidate article topic set; and determining a third type article topic of the article to be mined and a third association degree between the article to be mined and the determined third type article topic based on the probability that the article to be mined belongs to each third type candidate article topic in the third type candidate article topic collection.

In some embodiments, the topic classification model is a deep neural network; and the apparatus further comprises a training unit configured to: performing word segmentation processing on the sample article to obtain at least one sample word segmentation; filtering the at least one sample word segmentation to obtain a sample word segmentation set of the sample article; and taking the sample word segmentation set as input, taking a preset theme of the sample article as output, training an initial deep neural network, and obtaining the deep neural network.

In some embodiments, the determining unit is further configured to: when the mined topics comprise at least two types of topics, normalizing the association degree of the article to be mined and the topic of the type for each type of topic in the topics of the at least two types, and weighting the association degree after the normalization processing.

In some embodiments, the above apparatus further comprises: and the pushing unit is used for responding to the fact that the determined target keywords are matched with the theme of the article to be mined and pushing the article to be mined.

In a third aspect, an embodiment of the present application provides a server, where the server includes: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

The method and the device for generating information provided by the embodiment of the application firstly obtain the article to be mined, then utilize at least two topic mining modes to mine at least two types of topics of the article to be mined, determine the association degree of the mined topic and the article to be mined, and finally determine the topic of the article to be mined and the association degree of the article to be mined and the topic based on the mined topic and the determined association degree, thereby realizing mining of the topic of the article to be mined from different dimensions to obtain more comprehensive and accurate topics.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating information according to the present application;

FIG. 3 is a schematic diagram of a portion of the structure of a knowledge-graph as used herein;

FIG. 4 is a schematic illustration of an application scenario of a method for generating information according to the present application;

FIG. 5 is a flow diagram of yet another embodiment of a method for generating information according to the present application;

FIG. 6 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for generating information or the apparatus for generating information of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various client applications installed thereon, such as a web browser application, a shopping-like application, a search-like application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, for example, a background server performing topic mining on an article to be mined, where the background server may perform topic mining on the article to be mined acquired from the internet to obtain a topic of the article to be mined and a degree of association between the article to be mined and the mined topic.

It should be noted that the method for generating information provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for generating information is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present application is shown. The method for generating information comprises the following steps:

step 201, an article to be mined is obtained.

In the present embodiment, an electronic device (for example, the server 105 shown in fig. 1) on which the method for generating information operates may acquire an article to be mined, which may be an article in a text form, for example, a text article including a title, content, and the like, from the internet in various ways (for example, crawling by using a web crawler).

Step 202, at least two types of topics of the article to be mined are mined by utilizing at least two topic mining modes, and the association degree of the mined topics and the article to be mined is determined.

In this embodiment, the electronic device may mine at least two types of topics of the article to be mined by using at least two topic mining manners, and determine the association degree between the mined topics and the article to be mined, where the electronic device may mine the topics of the article to be mined in multiple manners, for example, the topics of the article may be generated by an article keyword extraction manner, and for example, keywords in a title of the article to be mined may be extracted as the topics of the article to be mined.

In some optional implementations of this embodiment, the step 202 may specifically include: first, the electronic device may perform named entity recognition on the article to be mined, determine whether the article to be mined includes at least one first-type article topic based on a result of the named entity recognition, for example, the electronic device may perform named entity recognition on the article to be mined, and determine whether the article to be mined includes one or more named entities (e.g., a person name, a mechanism name, a place name, a book name, etc.) according to the named entity recognition, and if the article to be mined includes one or more named entities, may use the one or more named entities as the first-type article topic of the article to be mined. Then, in response to determining that the article to be mined includes at least one first-type article topic, the electronic device may determine a first degree of association between the article to be mined and each of the at least one first-type article topic, for example, the electronic device may determine, according to a position where the first-type article topic appears in the article to be mined for the first time, a first degree of association between the first-type article topic and the article to be mined, where the first degree of association is higher as the position where the first-type article topic appears in the article to be mined is earlier, for example.

In some optional implementation manners, the performing named entity recognition on the article to be mined and determining whether the article to be mined includes at least one first type article topic based on a named entity recognition result may specifically include: firstly, the electronic device can perform named entity recognition on the article to be mined and determine whether the article to be mined contains at least one named entity. Then, in response to determining that the article to be mined includes at least one named entity, the electronic device may match each named entity in the at least one named entity with a candidate topic in a candidate topic set established in advance, and determine whether the article to be mined includes at least one candidate topic according to a matching result, for example, for each named entity in the at least one named entity, the electronic device may compare the named entity with each candidate topic in the candidate topic set, and if the named entity is the same as a candidate topic in the candidate topic set, it may be determined that the article to be mined includes a candidate topic that is the same as the named entity in the candidate topic set, where the candidate topic set may be constructed based on a knowledge graph, as shown in fig. 3, the structure diagram of a certain part of the knowledge graph is shown, in fig. 3, people, works, places, numerical values, heights and the like are taken as nodes in the knowledge graph, the electronic device can know more information of the named entity (for example, alias, upper and lower relations, and related entities/concepts of the named entity) through the knowledge graph, and a more comprehensive and accurate candidate topic set can be obtained through the knowledge graph. Finally, in response to determining that the article to be mined includes at least one candidate topic, for each candidate topic in the at least one candidate topic, the electronic device may count the frequency, i.e., the number of times, of the occurrence of the candidate topic in the article to be mined, and if the frequency of the occurrence of the candidate topic in the article to be mined exceeds a preset first threshold, the candidate topic may be determined to be a first type of article topic of the article to be mined, where the first threshold may be manually set according to actual needs.

In some optional implementations, the determining, in response to determining that the article to be mined includes at least one first-type article topic, a first degree of association between the article to be mined and each of the at least one first-type article topic may specifically include: for each first-type article topic in the at least one first-type article topic, the electronic device may count the frequency of the first-type article topic appearing in the article to be mined, and determine a first association degree between the article to be mined and the first-type article topic according to the counted frequency. For example, for a first type article topic, the number of times that the first type article topic appears in the article to be mined may be directly or indirectly (for example, a calculation result obtained by multiplying the number of times that the first type article topic appears by a preset weight) as the first association degree between the first type article topic and the article to be mined.

Optionally, the counting the frequency of the candidate topic appearing in the article to be mined may specifically include: first, the electronic device may determine whether the article to be mined includes an alias of the candidate topic according to the knowledge graph, for example, the electronic device may determine that "university of beijing" is named "big beijing" according to the knowledge graph. Then, in response to determining that the article to be mined contains the alias of the candidate topic, the electronic device may first count a first frequency of occurrences of the alias of the candidate topic in the article to be mined; then, counting a second frequency of the candidate theme appearing in the article to be mined; and finally, calculating the sum of the first frequency and the second frequency, and taking the calculation result as the frequency of the candidate topic appearing in the article to be mined.

In some optional implementations of this embodiment, the step 202 may further specifically include: first, the electronic device may determine whether a source confidence of the source information of the article to be mined exceeds a preset confidence threshold, where the source confidence of the source information of the article to be mined may be obtained from a preset source information and source confidence relationship table, and the source information and source confidence are stored in the source information and source confidence relationship table, for example, the electronic device may first obtain the source information of the article to be mined, such as an author of the article to be mined, a media number for publishing the article to be mined, a website, and the like. And then, matching the source information and the source information of the article to be mined with the source information in the source confidence degree relation table so as to obtain the source confidence degree of the source information of the article to be mined. Here, the source confidence of the source information in the source information and source confidence relation table may be determined in various ways, for example, when the source information is a media number for publishing information, the source confidence of the media number may be set according to the topic concentration, quality, originality and other factors of the historical publishing information, for example, the more concentrated the topic of the historical publishing information, the higher the source confidence may be set, and the higher the quality of the historical publishing information (e.g., may be determined by reading amount, clicking amount, subscription amount and the like) may be set. Then, in response to determining that the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, the electronic device may use the source information of the article to be mined as a second type of article topic, and may use the source confidence of the source information of the article to be mined as a second association degree between the article to be mined and the second type of article topic.

In some optional implementations of this embodiment, the step 202 may further specifically include: firstly, the electronic equipment can perform word segmentation processing on the article to be mined to obtain at least one word segmentation; then, the at least one participle may be imported into a pre-established topic classification model to obtain a probability that the article to be mined belongs to each third type candidate article topic in a preset third type candidate article topic set, where the third type candidate article topic set may be established based on the knowledge network, for example, the electronic device may extract an abstract concept in the knowledge network as the third type candidate article topic, and for example, may extract words such as "constellation", "language", "automobile", "synthesis" and the like as the third type candidate article topic. It should be noted that the topic classification model may be used to represent the corresponding relationship between the word segmentation sets and the topic classification results, and as an example, the topic classification model may be a corresponding relationship table which is pre-made by a technician based on statistics of a large number of word segmentation sets and topic classification results and stores the corresponding relationship between a plurality of word segmentation sets and topic classification results. Finally, a third-type article topic of the article to be mined and a third degree of association between the article to be mined and the determined third-type article topic are determined based on the probability that the article to be mined belongs to each third-type candidate article topic in the third-type candidate article topic set, as an example, the electronic device may determine whether the probability that the article to be mined belongs to a certain third-type candidate article topic exceeds a preset probability threshold, and if so, may determine that the third-type candidate article topic is the third-type article topic of the article to be mined, and in addition, the electronic device may further determine that the probability that the article to be mined belongs to the third-type candidate article topic is the third degree of association between the article to be mined and the third-type article topic.

In some optional implementations, the topic classification model may be a deep neural network; the method for generating information may further include the step of establishing the deep neural network, and specifically may include: first, the electronic device or other electronic devices for training the deep neural network may perform word segmentation on the sample article to obtain at least one sample word segmentation. Then, the at least one sample segmentation word may be filtered to obtain a sample segmentation word set of the sample article, for example, the at least one sample segmentation word may be filtered to remove some punctuations, stop words, and the like in the at least one sample segmentation word. Finally, the sample word segmentation set may be used as an input, a preset theme of the sample article may be used as an output, and an initial deep neural network may be obtained by training the initial deep neural network, where the initial deep neural network may be obtained in various ways, for example, a deep neural network obtained by randomly generating network parameters of the neural network based on an existing deep neural network. As an example, the initial deep neural network may include a convolutional neural network and a fully-connected layer, and the specific training process of the deep neural network may include: firstly, a sample word segmentation set can be input into a convolutional neural network to obtain a feature vector of the sample word segmentation set; then, the feature vector of the sample participle set can be input to the full-connection layer, so as to obtain the prediction probability that the sample article belongs to each third type candidate article topic in the third type candidate article topic set, and the prediction probability that the sample article belongs to each third type candidate article topic in the third type candidate article topic set is compared with the preset topic of the sample article (here, it can be assumed that the probability that the sample article belongs to the preset topic of the sample article is 100%, and the probability that the sample article belongs to other topics except the preset topic is 0), so as to obtain the prediction accuracy of the initial deep neural network, and if the prediction accuracy is greater than the preset accuracy threshold, the initial deep neural network can be used as the trained deep neural network. And if the prediction accuracy is smaller than the preset accuracy threshold, adjusting the network parameters of the initial deep neural network. It should be noted that the training process of the deep neural network is only used for explaining the adjustment process of the parameters of the deep neural network, and it can be considered that the initial deep neural network is a network before the parameters are adjusted, the deep neural network is a network after the parameters are adjusted, the adjustment process of the parameters of the network is not limited to one time, and the adjustment process can be repeated for multiple times according to the optimization degree of the network and the actual needs.

Step 203, determining the subject of the article to be mined and the association degree of the article to be mined and the subject based on the mined subject and the determined association degree.

In this embodiment, the electronic device may determine the topic of the article to be mined and the association degree between the article to be mined and the topic based on the mined topic and the determined association degree, for example, the electronic device may use all or part of the mined topic as the topic of the topic to be mined. For another example, for a certain type of topic, the electronic device may increase the association degree between the certain type of topic and the article to be mined in various ways (e.g., adding a certain preset value to the association degree).

In some optional implementations of this embodiment, step 203 may specifically include: when the mined topics include at least two types of topics, for each type of topic in the at least two types of topics, the electronic device may normalize the association degrees of the article to be mined and the type of topic, and perform weighting processing on the normalized association degrees. The association degrees of the mined topics determined by the different topic mining methods and the articles to be mined are distributed in different digital intervals, for example, the association degrees of the mined topics determined by the first method and the articles to be mined may be distributed between the digital intervals [0,100], and the association degrees of the mined topics determined by the second method and the articles to be mined may be distributed between the digital intervals [0,1], so that the determined association degrees can be normalized for comparison, and the mined topics determined by the different topic mining methods and the articles to be mined are distributed in the same digital interval. In addition, weighting processing may be performed on the association degree after the normalization processing, for example, for a topic mining manner with a good mining effect, the association degree between the mined topic determined by the topic mining manner and the article to be mined may be improved in a weighting manner (for example, by multiplying a certain value).

With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 4, the server 401 acquires an article about the fact that the actor a and the actor b are quarreling as an article to be mined through the web site a; then, the server 401 may mine at least two types of topics of the article to be mined by at least two topic mining methods, for example, mining the topics "actor a", "actor b", etc. of the named entity class, mining the topics "quarreling", "fry", etc. of the abstract concept class, and determining the relevance of the mined topics to the article to be mined; finally, the server 401 may determine the topic of the article to be mined and the association degree of the article to be mined with the topic based on the mined topic and the determined association degree, for example, the server may use all or part of the mined topic as the topic of the topic to be mined.

According to the method provided by the embodiment of the application, at least two types of topics of the article to be mined are mined by adopting at least two topic mining modes, and the association degree of the mined topics and the article to be mined is determined, so that the topics of the article to be mined are mined from different dimensions, and more comprehensive and accurate topics are obtained.

With further reference to fig. 5, a flow 500 of yet another embodiment of a method for generating information is shown. The flow 500 of the method for generating information includes the steps of:

step 501, an article to be mined is obtained.

In this embodiment, an electronic device (e.g., the server 105 shown in fig. 1) on which the method for generating information operates may acquire an article to be mined from the internet in various ways.

Step 502, at least two types of topics of the article to be mined are mined by utilizing at least two topic mining modes, and the association degree of the mined topics and the article to be mined is determined.

In this embodiment, based on the article to be mined obtained in step 501, the electronic device may mine at least two types of topics of the article to be mined by using at least two topic mining methods, and determine a degree of association between the mined topic and the article to be mined.

Step 503, based on the mined topics and the determined association degrees, determining topics of the articles to be mined and association degrees of the articles to be mined and the topics.

In this embodiment, the electronic device may determine the topic of the article to be mined and the association degree between the article to be mined and the topic based on the mined topic and the determined association degree.

Step 504, in response to determining that the target keyword matches the topic of the article to be mined, pushing the article to be mined.

In this embodiment, the electronic device may match the target keyword with the topic of the article to be mined determined in step 503, for example, the target keyword may be compared with the topic of the article to be mined determined in step 503, and if the target keyword is the same as a certain topic of the article to be mined determined in step 503 and the relevance between the topic and the article to be mined is greater than a preset push threshold, the target keyword may be determined to match the topic of the article to be mined. In response to determining that the target keyword matches the topic of the article to be mined, the electronic device may push the article to be mined, for example, the electronic device may push the article to be mined to a terminal that sends the target keyword. Here, the target keyword may be extracted from search information transmitted by a user through a terminal, for example, the user transmits search information "star a" through the terminal, and the electronic device may extract "a" as the target keyword. The target keyword may be a keyword obtained based on a user image, and for example, if a user image of a certain user shows that the user is a fan of "star b", the electronic device may use "b" as the target keyword.

As can be seen from fig. 5, compared with the embodiment corresponding to fig. 2, the flow 500 of the method for generating information in the present embodiment highlights the step of pushing the article to be mined. Therefore, the scheme described in the embodiment can effectively utilize the determined theme of the article to be mined and the association degree of the article to be mined and the theme, so that more accurate information push is realized.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus 600 for generating information of the present embodiment includes: an acquisition unit 601, a mining unit 602, and a determination unit 603. The obtaining unit 601 is configured to obtain an article to be mined; the mining unit 602 is configured to mine at least two types of topics of the article to be mined by using at least two topic mining manners, and determine a degree of association between the mined topics and the article to be mined; the determining unit 603 is configured to determine a topic of the article to be mined and a relevance between the article to be mined and the topic based on the mined topic and the determined relevance.

In this embodiment, specific processes of the obtaining unit 601, the mining unit 602, and the determining unit 603 of the apparatus 600 for generating information and technical effects brought by the specific processes can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the mining unit 602 includes: an identifying subunit (not shown in the figure), configured to perform named entity identification on the article to be mined, and determine whether the article to be mined includes at least one first type article topic based on a named entity identification result; a first determining subunit (not shown in the figure) is configured to determine, in response to determining that the article to be mined includes at least one first-type article topic, a first association degree between the article to be mined and each of the at least one first-type article topic.

In some optional implementations of this embodiment, the identifying subunit includes: an identifying and determining unit (not shown in the figure) configured to perform named entity identification on the article to be mined, and determine whether the article to be mined contains at least one named entity; a matching and determining unit (not shown in the figure), configured to, in response to determining that the article to be mined includes at least one named entity, match each named entity in the at least one named entity with a candidate topic in a candidate topic set established in advance, and determine whether the article to be mined includes at least one candidate topic according to a matching result, where the candidate topic set is constructed based on a knowledge graph; and a statistics and determination unit (not shown in the figure), configured to, in response to determining that the article to be mined includes at least one candidate topic, for each candidate topic in the at least one candidate topic, perform statistics on the frequency of the candidate topic appearing in the article to be mined, and if the frequency of the candidate topic appearing in the article to be mined exceeds a first preset threshold, determine that the candidate topic is a first type article topic of the article to be mined.

In some optional implementations of this embodiment, the first determining subunit is further configured to: for each first-type article topic in the at least one first-type article topic, counting the frequency of the first-type article topic appearing in the article to be mined, and determining the first association degree of the article to be mined and the first-type article topic according to the counted frequency.

In some optional implementations of this embodiment, the statistics and determination unit is further configured to: determining whether the article to be mined contains the alias of the candidate theme according to the knowledge graph; in response to the fact that the article to be mined contains the alias of the candidate topic, counting a first frequency of the alias of the candidate topic appearing in the article to be mined; counting the second frequency of the candidate theme appearing in the article to be mined; and calculating the sum of the first frequency and the second frequency, and taking the calculation result as the frequency of the candidate topic appearing in the article to be mined.

In some optional implementations of the present embodiment, the digging unit 602 is further configured to: determining whether the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, wherein the source confidence of the source information of the article to be mined is acquired from a preset source information and source confidence relation table, and the source information and source confidence are correspondingly stored in the source information and source confidence relation table; and in response to determining that the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, taking the source information of the article to be mined as a second type of article topic, and taking the source confidence of the source information of the article to be mined as a second association degree of the article to be mined and the second type of article topic.

In some optional implementations of the present embodiment, the digging unit 602 is further configured to: performing word segmentation processing on the article to be mined to obtain at least one word segmentation; importing the at least one word segmentation into a pre-established topic classification model to obtain the probability that the article to be mined belongs to each third type candidate article topic in a preset third type candidate article topic set; and determining a third type article topic of the article to be mined and a third association degree between the article to be mined and the determined third type article topic based on the probability that the article to be mined belongs to each third type candidate article topic in the third type candidate article topic collection.

In some optional implementations of this embodiment, the topic classification model is a deep neural network; and the apparatus further comprises a training unit (not shown in the figures) for: performing word segmentation processing on the sample article to obtain at least one sample word segmentation; filtering the at least one sample word segmentation to obtain a sample word segmentation set of the sample article; and taking the sample word segmentation set as input, taking a preset theme of the sample article as output, training an initial deep neural network, and obtaining the deep neural network.

In some optional implementations of this embodiment, the determining unit 603 is further configured to: when the mined topics comprise at least two types of topics, normalizing the association degree of the article to be mined and the topic of the type for each type of topic in the topics of the at least two types, and weighting the association degree after the normalization processing.

In some optional implementations of this embodiment, the apparatus 600 further includes: and a pushing unit (not shown in the figure) for pushing the article to be mined in response to determining that the target keyword matches with the subject of the article to be mined.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing a server according to embodiments of the present application. The server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 706 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An Input/Output (I/O) interface 705 is also connected to the bus 704.

The following components are connected to the I/O interface 705: a storage portion 5706 including a hard disk and the like; and a communication section 707 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 707 performs communication processing via a network such as the internet. A drive 708 is also connected to the I/O interface 705 as needed. A removable medium 709 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 708 as necessary, so that a computer program read out therefrom is mounted into the storage section 706 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 707 and/or installed from the removable medium 709. The computer program, when executed by a Central Processing Unit (CPU) 701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a mining unit, and a determination unit. The names of the units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as a "unit acquiring an article to be mined".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring an article to be mined; mining at least two types of topics of the article to be mined by utilizing at least two topic mining modes, and determining the association degree of the mined topics and the article to be mined; and determining the theme of the article to be mined and the association degree of the article to be mined and the theme based on the mined theme and the determined association degree.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating information, comprising:

acquiring an article to be mined;

mining at least two types of topics of the article to be mined by utilizing at least two topic mining modes, and determining the association degree of the mined topics and the article to be mined, wherein the at least two types of topics comprise the topics of a named entity class and the topics of an abstract concept class;

and determining the theme of the article to be mined and the association degree of the article to be mined and the theme based on the mined theme and the determined association degree.

2. The method of claim 1, wherein the mining at least two types of topics of the article to be mined and determining the relevance of the mined topics to the article to be mined by using at least two topic mining methods comprises:

carrying out named entity recognition on the article to be mined, and determining whether the article to be mined comprises at least one first type article theme based on a named entity recognition result;

in response to determining that the article to be mined comprises at least one first-type article topic, determining a first degree of association of the article to be mined with each of the at least one first-type article topic.

3. The method of claim 2, wherein conducting named entity recognition on the article to be mined, and determining whether the article to be mined includes at least one first type article topic based on a named entity recognition result comprises:

carrying out named entity identification on the article to be mined, and determining whether the article to be mined contains at least one named entity;

in response to the fact that the article to be mined contains at least one named entity, matching each named entity in the at least one named entity with a candidate topic in a pre-established candidate topic set, and determining whether the article to be mined contains at least one candidate topic according to a matching result, wherein the candidate topic set is constructed based on a knowledge graph;

in response to the fact that the article to be mined comprises at least one candidate topic, for each candidate topic in the at least one candidate topic, counting the frequency of the candidate topic appearing in the article to be mined, and if the frequency of the candidate topic appearing in the article to be mined exceeds a preset first threshold value, determining that the candidate topic is the first type article topic of the article to be mined.

4. The method of claim 3, wherein the determining a first degree of association of the article to be mined with each of the at least one first-type article topic in response to determining that the article to be mined includes at least one first-type article topic comprises:

for each first-type article topic in the at least one first-type article topic, counting the frequency of the first-type article topic appearing in the article to be mined, and determining the first association degree of the article to be mined and the first-type article topic according to the counted frequency.

5. The method of claim 4, wherein counting the frequency of occurrence of the candidate topic in the article to be mined comprises:

determining whether the article to be mined contains the alias of the candidate theme according to the knowledge graph;

in response to the fact that the article to be mined contains the alias of the candidate topic, counting a first frequency of the alias of the candidate topic appearing in the article to be mined;

counting the second frequency of the candidate theme appearing in the article to be mined;

and calculating the sum of the first frequency and the second frequency, and taking the calculation result as the frequency of the candidate theme appearing in the article to be mined.

6. The method of claim 1, wherein the mining at least two types of topics of the article to be mined and determining the relevance of the mined topics to the article to be mined by using at least two topic mining methods comprises:

determining whether the source confidence of the source information of the article to be mined exceeds a preset confidence threshold, wherein the source confidence of the source information of the article to be mined is acquired from a preset source information and source confidence relation table, and the source information and source confidence are correspondingly stored in the source information and source confidence relation table;

and in response to the fact that the source confidence degree of the source information of the article to be mined exceeds a preset confidence degree threshold value, taking the source information of the article to be mined as a second type of article topic, and taking the source confidence degree of the source information of the article to be mined as a second association degree of the article to be mined and the second type of article topic.

7. The method of claim 1, wherein the mining at least two types of topics of the article to be mined and determining the relevance of the mined topics to the article to be mined by using at least two topic mining methods comprises:

performing word segmentation processing on the article to be mined to obtain at least one word segmentation;

importing the at least one word segmentation into a pre-established topic classification model to obtain the probability that the article to be mined belongs to each third type candidate article topic in a preset third type candidate article topic set;

determining a third type article topic of the article to be mined and a third association degree between the article to be mined and the determined third type article topic based on the probability that the article to be mined belongs to each third type candidate article topic in the third type candidate article topic collection.

8. The method of claim 7, wherein the topic classification model is a deep neural network; and

the method further comprises the step of establishing the deep neural network, comprising:

performing word segmentation processing on the sample article to obtain at least one sample word segmentation;

filtering the at least one sample word segmentation to obtain a sample word segmentation set of the sample article;

and taking the sample word segmentation set as input, taking a preset theme of the sample article as output, and training an initial deep neural network to obtain the deep neural network.

9. The method of claim 1, wherein determining the topic of the article to be mined and the degree of association of the article to be mined with the topic based on the mined topic and the determined degree of association comprises:

when the mined topics comprise at least two types of topics, normalizing the association degree of the article to be mined and the type of topic for each type of topic in the at least two types of topics, and weighting the association degree after normalization processing.

10. The method of claim 1, wherein the method further comprises:

and responding to the fact that the target keywords are matched with the theme of the article to be mined, and pushing the article to be mined.

11. An apparatus for generating information, comprising:

the acquisition unit is used for acquiring an article to be mined;

the mining unit is used for mining at least two types of topics of the article to be mined by utilizing at least two topic mining modes, and determining the association degree of the mined topics and the article to be mined, wherein the at least two types of topics comprise a topic of a named entity class and a topic of an abstract concept class;

and the determining unit is used for determining the theme of the article to be mined and the association degree of the article to be mined and the theme based on the mined theme and the determined association degree.

12. The apparatus of claim 11, wherein the excavation unit comprises:

the recognition subunit is used for carrying out named entity recognition on the article to be mined and determining whether the article to be mined comprises at least one first type article theme based on a named entity recognition result;

the first determining subunit is configured to determine, in response to determining that the article to be mined includes at least one first-type article topic, a first association degree between the article to be mined and each of the at least one first-type article topic.

13. The apparatus of claim 12, wherein the identification subunit comprises:

the identification and determination unit is used for carrying out named entity identification on the article to be mined and determining whether the article to be mined contains at least one named entity;

the matching and determining unit is used for responding to the fact that the article to be mined contains at least one named entity, matching each named entity in the at least one named entity with a candidate theme in a candidate theme set which is established in advance, and determining whether the article to be mined comprises at least one candidate theme according to a matching result, wherein the candidate theme set is constructed on the basis of a knowledge graph;

the statistic and determination unit is used for responding to the fact that the article to be mined comprises at least one candidate theme, counting the frequency of the candidate theme appearing in the article to be mined for each candidate theme in the at least one candidate theme, and determining that the candidate theme is the first type article theme of the article to be mined if the frequency of the candidate theme appearing in the article to be mined exceeds a preset first threshold value.

14. The apparatus of claim 13, wherein the first determining subunit is further configured to:

15. The apparatus of claim 14, wherein the statistics and determination unit is further configured to:

16. The apparatus of claim 11, wherein the excavation unit is further to:

17. The apparatus of claim 11, wherein the excavation unit is further to:

18. The apparatus of claim 17, wherein the topic classification model is a deep neural network; and

the apparatus further comprises a training unit to:

19. The apparatus of claim 11, wherein the determining unit is further configured to:

20. The apparatus of claim 19, wherein the apparatus further comprises:

and the pushing unit is used for responding to the fact that the target keywords are matched with the theme of the article to be mined and pushing the article to be mined.

21. A server, comprising:

one or more processors;

a storage device for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-10.

22. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-10.