CN113761110A - Information issuing method, device, equipment and storage medium - Google Patents

Information issuing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113761110A
CN113761110A CN202010598743.9A CN202010598743A CN113761110A CN 113761110 A CN113761110 A CN 113761110A CN 202010598743 A CN202010598743 A CN 202010598743A CN 113761110 A CN113761110 A CN 113761110A
Authority
CN
China
Prior art keywords
text
word
weight
published
professional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010598743.9A
Other languages
Chinese (zh)
Inventor
石文帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010598743.9A priority Critical patent/CN113761110A/en
Publication of CN113761110A publication Critical patent/CN113761110A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an information issuing method, an information issuing device, information issuing equipment and a storage medium. The method comprises the following steps: acquiring a text to be published, and determining professional words and non-professional words corresponding to a publishing topic in the text to be published; determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight; and determining whether to publish the text to be published according to the correlation. By the technical scheme, the validity of the text information to be issued is checked more quickly, and the real-time performance and accuracy of the checking are improved.

Description

Information issuing method, device, equipment and storage medium
Technical Field
The present invention relates to computer technologies, and in particular, to an information distribution method, apparatus, device, and storage medium.
Background
With the development of computer technology, more and more people publish relevant information through networks, for example, social dynamic information is published in different types of social circles such as automobiles and pets, information such as comments or questions about commodities is published in an e-commerce platform, and information such as barracks or comments is published in a live broadcast or video play process. If the information issued by the user is not related to the corresponding topic, the information belongs to invalid information relative to the topic, and too much invalid information occupies too much resources and has adverse effects on the process of obtaining valid information by other users. For better user experience, it is necessary to filter out invalid information.
At present, the general process of validity check for information issued by users in a network is as follows: and (4) the user issues information, the background acquires the information issued by the user and submits the information to the manual work to check the validity of the information. If the information belongs to invalid information, the issuance of the information is revoked.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: all issued information needs to be audited manually, so that not only is the auditing workload large, the auditing efficiency low and invalid information easily omitted, but also the issued information can only be issued first and then audited due to the slow manual auditing speed, and the hysteresis is realized.
Disclosure of Invention
The embodiment of the invention provides an information publishing method, an information publishing device, information publishing equipment and a storage medium, so that the validity of text information to be published can be more quickly audited, and the real-time performance and accuracy of audit are improved.
In a first aspect, an embodiment of the present invention provides an information publishing method, including:
acquiring a text to be published, and determining professional words and non-professional words corresponding to a publishing topic in the text to be published;
determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight;
and determining whether to publish the text to be published according to the correlation.
In a second aspect, an embodiment of the present invention further provides an information distribution apparatus, where the apparatus includes:
the word determining module is used for acquiring a text to be published and determining professional words and non-professional words corresponding to a publishing subject in the text to be published;
a topic relevance determining module, configured to determine relevance between the text to be published and the publishing topic according to a first weight of the professional word and a second weight of the non-professional word, where the first weight is greater than the second weight;
and the text publishing module is used for determining whether to publish the text to be published according to the relevancy.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the information distribution method provided by any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the information publishing method provided in any embodiment of the present invention.
The method comprises the steps of obtaining a text to be published, and determining professional words and non-professional words corresponding to a publishing subject in the text to be published; determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight; and determining whether to publish the text to be published according to the correlation. Before information is published, whether the word in the text to be published is a professional word related to the published subject is used for automatically determining the correlation degree between the text to be published and the published subject, and further determining whether the text to be published is publishable effective published information, so that the manual review process is reduced, the hysteresis of the validity review of the text to be published is reduced, and the real-time performance and the accuracy of the validity review of the text to be published are improved.
Drawings
Fig. 1 is a flowchart of an information distribution method in a first embodiment of the present invention;
fig. 2 is a flowchart of an information distribution method in the second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an information distribution apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
The information publishing method provided by the embodiment can be applied to information publishing scenes with publishing topics through networks, such as dynamic publishing in different types of social circles, publishing of comments or questions related to commodities in an e-commerce platform, publishing of barracks or comments related to watching contents in the processes of live broadcasting or video playing and the like. The method can be executed by an information issuing device, which can be implemented by software and/or hardware, and the device can be integrated into an electronic device, such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a server, or the like. Referring to fig. 1, the method of the present embodiment specifically includes:
s110, obtaining the text to be published, and determining professional words and non-professional words corresponding to the publishing subject in the text to be published.
The text to be published refers to information content to be published. The posting subject refers to related content of an information posting object, and may be, for example, a social circle type of an automobile, a pet, or the like, commodity-related information such as a commodity category and a commodity attribute, content-related information for viewing a live broadcast or a video, or the like. Professional words and non-professional words are opposite concepts, the professional words refer to words with specific meanings in the technical field corresponding to the release subject, and the non-professional words refer to words except the professional words.
If the text to be published contains professional words related to the publishing subject, the high probability of the text to be published is effective information related to the publishing subject. Based on the method, whether the words contained in the text to be published are professional or not can be analyzed, and the validity of the words can be automatically checked.
When the method is specifically implemented, firstly, the text to be published is obtained through the input content of the user. And then, performing word segmentation on the text to be published to obtain each initial text word segmentation. And then, performing sensitive word filtering and stop word filtering on each initial text participle to obtain each text participle. The sensitive word bank and the stop word bank used in the sensitive word filtering and the stop word filtering can be pre-constructed in the embodiment of the invention or can be cloud word banks. And then, matching each text participle with each professional word corresponding to the release subject, determining the text participle with the matching similarity meeting a preset similarity threshold as a professional word, and determining the rest text participles as non-professional words. Each professional word corresponding to the release topic may be a professional word bank which is constructed in advance according to the release topic in the embodiment of the present invention, or may be a cloud word bank. It is understood that the text participle is a professional word or a non-professional word, that is, one text participle in the text to be published is either a professional word or a non-professional word.
S120, determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words.
The first weight and the second weight respectively refer to the weight corresponding to the professional word and the weight corresponding to the non-professional word, and are used for representing the correlation degree between the corresponding word and the release topic, and the higher the weight of one word is, the more the word is correlated with the release topic. The first weight of each professional word can be the same or different; the second weight for each non-professional word may be the same or different. The first weight and the second weight may be preset empirical values, or may be calculated according to a certain rule, and the first weight and the second weight may satisfy a relationship that the first weight is greater than the second weight.
And comprehensively determining the total weight of the text to be published according to the first weight of each professional word and the second weight of each non-professional word in the text to be published, wherein the total weight is the correlation degree of the text to be published and the publishing subject.
Exemplarily, before S120, the method further includes: determining a second set number of hot words according to the word frequency of each word in each historical release text; and determining a first weight according to the word frequency and the default weight of each hot word.
The historical release text refers to information content which has been released before the current operation. The word frequency is a ratio of the number of texts of the history released text containing the word to the total number of the history released texts, for example, if 3 history released texts in 10000 history released texts contain a word, the word frequency of the word is 3/10000. The second set number refers to a number of words that is set in advance according to the accuracy requirement of the service, for example, the accuracy requirement is high, and then the second set number is set to a relatively large value. The default weight is a preset floor weight, for example, 1.
For the calculation of the correlation degree between the text to be published and the publishing subject, the proportion of each professional word contained in the text to be published is the same, so that the first weight of each professional word is designed to be the same. And the professional words are important components in the relevancy calculation, so the first weight is determined according to the word frequency of the hot words with the highest word frequency in all the historical release texts and in the second set number. In specific implementation, the word frequency of each word of each historical published text in all historical published texts is determined statistically. Then, all the words in all the historical release texts are ordered according to the relationship of the word frequency from large to small. And then, selecting words with a second set number in the top sequence as hot words. And then, according to the word frequency of each hot word, determining the comprehensive word frequency s-freq of the hot word by means of averaging, taking the maximum value or taking the median and the like. Finally, the sum of the integrated word frequency and the default weight (1+ s-freq) is calculated as the first weight. The method has the advantages that the first weight is determined according to the word frequency of the hot words in each historical release text corresponding to the release topic, the subjective influence of experience set values is avoided, the relevance of the first weight and relevant data of the release topic is established, and the determination dynamics and accuracy of the first weight are improved.
Exemplarily, before S120, the method further includes: and determining a second weight of the corresponding non-professional word according to the word frequency and the default weight of each non-professional word.
The role of each non-professional word in the relevance calculation may be different, and the second weight of each non-professional word may also be different. The higher the frequency of occurrence of a non-professional word in each historical release text corresponding to the release topic is, the higher the probability that the non-professional word belongs to the hot word in the release topic is. If the text to be published contains the hot words in the publishing subject, the probability that the text to be published belongs to the effective information related to the publishing subject is higher. Based on this, the second weight of each non-professional word is determined using a word frequency that characterizes the frequency of occurrence of a word. In specific implementation, for each non-professional word, the word frequency freq of the non-professional word is counted according to each historical release text. Then, the sum of the word frequency freq and the default weight (1+ freq) is calculated as the second weight of the non-professional word. The method has the advantages that the second weight is determined by using the word frequency of the non-professional words, the subjective influence of an experience set value is avoided, the determination dynamics and the accuracy of the second weight are improved, the influence of the word frequency on the relevance calculation is further considered on the basis of considering the influence of the professional words on the relevance calculation, and the accuracy of the relevance determination is further improved.
And S130, determining whether to publish the text to be published according to the correlation.
The higher the correlation degree between the text to be published and the publishing subject is, the higher the probability that the text to be published belongs to the effective information is, and when the probability that the text to be published belongs to the effective information is high enough, the text to be published is considered as the effective information and can be published directly. Otherwise, the text to be issued is considered to be suspected information, a suspected mark is set for the text to be issued, and the validity of the text to be issued is manually checked.
Exemplarily, S130 includes: and if the correlation degree is greater than the correlation degree threshold value, releasing the text to be released. The relevancy threshold is a preset relevancy value which is a lower-limit relevancy used for judging whether the text to be published is published or not. The relevancy threshold can be set according to business requirements (such as the amount of manually reviewed information and the accuracy of algorithms). And if the relevancy of the text to be published is greater than the relevancy threshold, the text to be published is considered as effective information and is published directly. The method has the advantages that whether the text to be published is published can be determined more quickly, and the auditing efficiency of the text to be published is further improved.
According to the technical scheme of the embodiment, the text to be published is obtained, and the professional words and the non-professional words corresponding to the publishing subject in the text to be published are determined; determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight; and determining whether to publish the text to be published according to the correlation. Before information is published, whether the word in the text to be published is a professional word related to the published subject is used for automatically determining the correlation degree between the text to be published and the published subject, and further determining whether the text to be published is publishable effective published information, so that the manual review process is reduced, the hysteresis of the validity review of the text to be published is reduced, and the real-time performance and the accuracy of the validity review of the text to be published are improved.
Example two
On the basis of the first embodiment, the embodiment further optimizes the step of determining the correlation between the text to be published and the publishing topic according to the first weight of the professional word and the second weight of the non-professional word. On the basis, the related step of judging the text distribution quantity in advance can be further added. The same or corresponding terms as those in the above embodiments are not explained in detail herein. Referring to fig. 2, the information publishing method provided in this embodiment includes:
and S210, acquiring the text to be published.
S220, acquiring the total number of the texts, and determining whether the total number of the texts is smaller than a text number threshold value.
The total number of texts is the total number of received published texts, and the total number of texts includes each historical published text and the text to be published. The text number threshold refers to a preset text number, which is a lower limit value for determining whether to calculate the correlation degree, and should be a relatively large value. The threshold number of texts may be determined by the number of texts posted within a certain time period
The accuracy of each first weight and each second weight, whether set empirically or calculated, is related to the number of published texts in the publication topic, with greater numbers yielding greater accuracy. Therefore, in the case that the total number of texts of the published topic is small, the deviation between the first weight and the second weight may be large, and the correlation cannot be calculated by the weights of the professional words and the non-professional words. Based on this, in this embodiment, before the operation of calculating the correlation using the weight is performed, the total number of texts is obtained. Then, the total number of texts is compared with a text number threshold. If the total number of texts is equal to or greater than the threshold number of texts, the correlation degree may be calculated using the weight, and S240 is performed. If the total number of texts is less than the text number threshold, the correlation is calculated without using the weight, and S230 is performed.
And S230, determining the default weight as the correlation degree between the text to be published and the publishing subject.
When the total number of the texts is small, the relevancy cannot be calculated accurately, and in this case, all the texts to be published are manually checked. Since the default weight is the bottom-of-pocket weight of the entire method, which is the smallest weight and is also a weight smaller than the relevancy threshold, the default weight is determined as the relevancy between the text to be posted and the posting subject. And then S290 is performed.
S240, determining professional words and non-professional words corresponding to the publishing subject in the text to be published.
And S250, determining a second set number of hot words according to the word frequency of each word in each historical release text, and determining the first weight of the professional words according to the word frequency and the default weight of each hot word.
S260, determining a second weight of the corresponding non-professional word according to the word frequency and the default weight of each non-professional word.
S270, determining target words with the first set number according to the word number of each text word of the text to be issued, the first set number and each text word.
The first set number refers to the number of words set in advance, and is used for determining words participating in relevance calculation from all text participles. The first set number may be set according to the accuracy requirement of the service, for example, the accuracy requirement is high, and then the first set number is set to a relatively large value. Illustratively, an average word segmentation word number of the text is determined as a first set number according to each history issued text. In order to improve the accuracy of the first set number and further improve the accurate determination of the relevancy calculation, all the historical published texts may be subjected to word segmentation, word desensitization and word deactivation, so as to determine the number of the word segmentation of each historical published text. Then, the average value of the number of the participles (namely, the average participle word number) is calculated and used for representing the average participle condition of a published text. And determining the average word number of the divided words as a first set number.
The target words refer to words participating in subsequent relevance calculation, and the target words can be all professional words, all non-professional words, both professional words and non-professional words, and also professional words, non-professional words and alternative words. The alternative words refer to placeholders other than text segmentations, which have no practical significance but are used to determine missing weights in the subsequent calculation of relevance.
In the correlation calculation, if too many words participate in the calculation, the calculation amount is increased, the calculation speed is reduced, and in order to take into account the accuracy and the speed, the first set number is set in the embodiment to limit the number of the target words participating in the correlation calculation. In specific implementation, the number of the text participles (namely the number of the participles) is compared with a first set number. If the number of the participles is large, the text participles with the first set number need to be screened out from the text participles to be used as target words. If the number of the word segmentation is less, part of the alternative words are added on the basis of the word segmentation of each text to obtain each target word.
Exemplarily, S270 includes: if the number of the participles is larger than the first set number, performing word frequency descending order arrangement on the text participles according to the word frequency of the text participles, wherein the word frequency is the ratio of the number of the texts of the history released texts containing the words to the total number of the history released texts; and screening out a first set number of target words ranked in the front from each text word segmentation according to the ranking result.
And if the number of the participles is larger than the first set number, the target words are required to be screened out from the text participles. Considering that some words in each text segmentation have low importance and have small contribution to the calculation of the relevance, the words can be screened according to the word frequency capable of reflecting the importance of the words. In specific implementation, the word frequency of each text word is determined statistically by using each historical released text. Then, the text participles are arranged in descending order according to the order of the word frequency from big to small. And finally, selecting a first set number of text participles ranked in the front from all the text participles as target words. The method has the advantages that the relatively important words in the text to be published can be screened out according to the word frequency to participate in the subsequent relevancy calculation, the operation process of the relatively unimportant word segmentation and the influence of the operation process on the relevancy are reduced, the accuracy of the relevancy calculation is further improved on the basis of further improving the relevancy calculation speed, and further the speed and the accuracy of the verification of the text to be published are further improved.
And if the word segmentation quantity of each text word of the text to be released is equal to the first set quantity, determining each text word as a target word of the first set quantity. And if the number of the participles of the text to be published is equal to the first set number, directly taking all the text participles as target words.
Exemplarily, S270 includes: and if the number of the segmented words is less than the first set number, determining alternative words of the difference value according to the difference value between the first set number and the number of the segmented words, and determining each text segmented word and the alternative words as target words of the first set number.
And if the number of the participles is less than the first set number, the number of the participles of each text is insufficient. Since the number of weights participating in the relevance calculation affects the calculation result, in order to improve the stability of the relevance calculation, a substitute word whose weight is a default weight is set for occupation, so that the number of target words is kept unchanged. In specific implementation, the difference value between the first set number and the number of the participles is calculated, and then each text participle and a substitute word of the difference value are used as target words. For example, if the first set number is 5 and the number of the segmented words is 3, 2 alternative words are added, and 3 text segmented words and 2 alternative words are used as target words.
S280, determining the correlation degree between the text to be published and the publishing subject according to the target weight of each target word.
The target weight is the weight of the target words, and each target word corresponds to one target weight. When the target word is a professional word, the target weight is a first weight; when the target word is a non-professional word, the target weight is a second weight; and when the target word is a substitute word, the target weight is the default weight.
And calculating the correlation between the text to be published and the publishing subject by integrating the target weight of each target word. For example, the sum of squares of the target weights is calculated, and then the arithmetic square root of the sum of squares is calculated, and the result is the correlation between the text to be published and the published topic.
And S290, if the relevancy is greater than the relevancy threshold, issuing the text to be issued.
It should be noted that the execution sequence of S250 and S260 is not limited as long as it is executed between S220 and S280, and between S250 and S260, S260 may be executed first and then S250 may be executed, or S250 and S260 may be executed in parallel.
According to the technical scheme of the embodiment, the total number of the texts is obtained, and whether the total number of the texts is smaller than a text number threshold value is determined; and if so, determining the default weight as the correlation degree between the text to be published and the publishing subject. The method and the device have the advantages that when the total number of the texts is small, the default weight is directly determined as the relevancy of the to-be-issued text, the to-be-issued text is further judged to be suspected information and is subjected to manual review, the situation that the deviation of the relevancy calculated by using professional words is large due to the fact that the total number of the texts is small is avoided, corresponding operation is still executed, operation processes can be saved, the speed of reviewing the validity of the to-be-issued text is further improved, the reviewing process of the to-be-issued text can be reasonably controlled, and the accuracy of reviewing the validity of the to-be-issued text is further improved. Determining target words of a first set number according to the word number of each text word of the text to be issued, the first set number and each text word; and determining the correlation degree between the text to be published and the publishing subject according to the target weight of each target word. The method and the device realize that the number of the target words participating in the relevancy calculation is limited by the first set number, the relevancy determination precision is improved, and meanwhile, the relevancy determination speed is further improved, so that the speed and the accuracy of the review of the text to be issued are further balanced.
EXAMPLE III
The present embodiment provides an information distribution apparatus, referring to fig. 3, the apparatus specifically includes:
the word determining module 310 is configured to obtain a text to be published, and determine a professional word and a non-professional word corresponding to a publishing topic in the text to be published;
the topic relevance determining module 320 is configured to determine relevance between the text to be published and the published topic according to a first weight of the professional word and a second weight of the non-professional word, where the first weight is greater than the second weight;
the text publishing module 330 is configured to determine whether to publish the text to be published according to the relevance.
Optionally, the topic relevance determination module 320 includes:
the target word determining submodule is used for determining target words of a first set number according to the word number of each text word of the text to be issued, the first set number and each text word, wherein the text words are professional words or non-professional words;
the relevancy determining submodule is used for determining the relevancy between the text to be issued and the issuing subject according to the target weight of each target word, wherein when the target word is a professional word, the target weight is a first weight; when the target word is a non-professional word, the target weight is a second weight; and when the target word is a substitute word, the target weight is the default weight.
Optionally, the target word determination sub-module is specifically configured to:
if the number of the participles is larger than the first set number, performing word frequency descending order arrangement on the text participles according to the word frequency of the text participles, wherein the word frequency is the ratio of the number of the texts of the history released texts containing the words to the total number of the history released texts;
and screening out a first set number of target words ranked in the front from each text word segmentation according to the ranking result.
Optionally, the target word determination sub-module is specifically configured to:
and if the word segmentation quantity of each text word of the text to be released is equal to the first set quantity, determining each text word as a target word of the first set quantity.
Optionally, the target word determination sub-module is specifically configured to:
and if the number of the segmented words is less than the first set number, determining alternative words of the difference value according to the difference value between the first set number and the number of the segmented words, and determining each text segmented word and the alternative words as target words.
Optionally, the topic relevance determination module 320 further includes a set number determination sub-module, configured to:
before determining target words of a first set number according to the word number of each text word of the text to be published, the first set number and each text word, determining the average word number of the word of the text to be published as the first set number according to each historical published text.
Optionally, on the basis of the foregoing apparatus, the apparatus further includes a first weight determining module, configured to:
before determining the correlation degree between the text to be issued and the issuing subject according to the first weight of each professional word and the second weight of each non-professional word, determining a second set number of hot words according to the word frequency of each word in each historical issuing text;
and determining a first weight according to the word frequency and the default weight of each hot word.
Optionally, on the basis of the foregoing apparatus, the apparatus further includes a second weight determining module, configured to:
before determining the correlation degree between the text to be issued and the issuing subject according to the first weight of each professional word and the second weight of each non-professional word, determining the second weight of the corresponding non-professional word according to the word frequency and the default weight of each non-professional word.
Optionally, on the basis of the above apparatus, the apparatus further includes a text quantity comparison module, configured to:
after the text to be issued is obtained, the total number of the texts is obtained, and whether the total number of the texts is smaller than a text number threshold value or not is determined;
if so, determining the default weight as the correlation degree between the text to be published and the publishing subject;
and if not, executing the step of determining the professional words and the non-professional words corresponding to the publishing subject in the text to be published.
Optionally, the text publishing module 330 is specifically configured to:
and if the correlation degree is greater than the correlation degree threshold value, releasing the text to be released.
By the information publishing device, before information is published, whether the word in the text to be published is a professional word related to the publishing topic or not is used for automatically determining the correlation degree between the text to be published and the publishing topic, and further determining whether the text to be published is effective publishing information which can be published or not, so that the manual auditing process is reduced, the hysteresis of auditing the validity of the text to be published is reduced, and the real-time performance and the accuracy of auditing the validity of the text to be published are improved.
The information issuing device provided by the embodiment of the invention can execute the information issuing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the information distribution apparatus, each unit and each module included in the embodiment are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example four
Referring to fig. 4, the present embodiment provides an electronic device 400, which includes: one or more processors 420; the storage device 410 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 420, the one or more processors 420 implement the information distribution method provided in the embodiment of the present invention, including:
acquiring a text to be published, and determining professional words and non-professional words corresponding to a publishing theme in the text to be published;
determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight;
and determining whether to publish the text to be published according to the correlation.
Of course, those skilled in the art can understand that the processor 420 can also implement the technical solution of the information distribution method provided by any embodiment of the present invention.
The electronic device 400 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in fig. 4, electronic device 400 is embodied in the form of a general purpose computing device. The components of electronic device 400 may include, but are not limited to: one or more processors 420, a memory device 410, and a bus 450 that connects the various system components (including the memory device 410 and the processors 420).
Bus 450 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 400 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 400 and includes both volatile and nonvolatile media, removable and non-removable media.
The storage 410 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)411 and/or cache memory 412. The electronic device 400 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 413 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 450 by one or more data media interfaces. Storage 410 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 414 having a set (at least one) of program modules 415, which may be stored, for example, in storage 410, such program modules 415 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment. The program modules 415 generally perform the functions and/or methods of any of the embodiments described herein.
Electronic device 400 may also communicate with one or more external devices 460 (e.g., keyboard, pointing device, display 470, etc.), with one or more devices that enable a user to interact with electronic device 400, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 400 to communicate with one or more other computing devices. Such communication may be through input/output interfaces (I/O interfaces) 430. Also, the electronic device 400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 440. As shown in FIG. 4, the network adapter 440 communicates with the other modules of the electronic device 400 via a bus 450. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 420 executes various functional applications and data processing, for example, implementing an information distribution method provided by an embodiment of the present invention, by executing programs stored in the storage device 410.
EXAMPLE five
The present embodiments provide a storage medium containing computer-executable instructions which, when executed by a computer processor, perform a method of information distribution, the method comprising:
acquiring a text to be published, and determining professional words and non-professional words corresponding to a publishing theme in the text to be published;
determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight;
and determining whether to publish the text to be published according to the correlation.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the above method operations, and may also perform related operations in the information distribution method provided by any embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. An information distribution method, comprising:
acquiring a text to be published, and determining professional words and non-professional words corresponding to a publishing topic in the text to be published;
determining the correlation degree between the text to be published and the publishing subject according to the first weight of the professional words and the second weight of the non-professional words, wherein the first weight is larger than the second weight;
and determining whether to publish the text to be published according to the correlation.
2. The method of claim 1, wherein determining the relevance between the text to be published and the publishing topic according to the first weight of the professional word and the second weight of the non-professional word comprises:
determining target words of a first set number according to the word number of each text word of the text to be released, the first set number and each text word, wherein the text word is the professional word or the non-professional word;
determining the correlation degree between the text to be published and the publishing subject according to the target weight of each target word, wherein when the target word is the professional word, the target weight is the first weight; when the target word is the non-professional word, the target weight is the second weight; and when the target word is a substitute word, the target weight is a default weight.
3. The method according to claim 2, wherein determining the first set number of target words according to the number of the participles of each text participle of the text to be published, the first set number and each text participle comprises:
if the word number is larger than the first set number, performing word frequency descending order arrangement on each text word according to the word frequency of each text word, wherein the word frequency is the ratio of the text number of the historical release text containing words to the total number of the historical release text;
and screening the first set number of target words ranked in the front from each text participle according to a ranking result.
4. The method according to claim 2, wherein determining the first set number of target words according to the number of the participles of each text participle of the text to be published, the first set number and each text participle comprises:
if the number of the segmented words is smaller than the first set number, determining alternative words of the difference value according to the difference value between the first set number and the number of the segmented words, and determining each text segmented word and the alternative words as the target words.
5. The method according to claim 2, before determining the target words of the first set number according to the number of the participles of each text participle of the text to be published, the first set number and each text participle, further comprising:
and determining the average word number of the word segmentation of the text as the first set number according to each historical release text.
6. The method of claim 1, further comprising, before determining the relevance between the text to be published and the publishing topic according to the first weight of each professional word and the second weight of each non-professional word:
determining a second set number of hot words according to the word frequency of each word in each historical release text;
and determining the first weight according to the word frequency and the default weight of each hot word.
7. The method of claim 1, further comprising, before determining the relevance between the text to be published and the publishing topic according to the first weight of each professional word and the second weight of each non-professional word:
and determining a second weight of the corresponding non-professional word according to the word frequency and the default weight of each non-professional word.
8. The method according to claim 1, after obtaining the text to be published, further comprising:
acquiring the total number of texts, and determining whether the total number of texts is smaller than a text number threshold value;
if so, determining the default weight as the correlation degree between the text to be published and the publishing subject;
and if not, executing the step of determining the professional words and the non-professional words corresponding to the publishing subject in the text to be published.
9. The method of claim 1, wherein determining whether to publish the text to be published according to the relevance comprises:
and if the relevancy is greater than the relevancy threshold, issuing the text to be issued.
10. An information distribution apparatus, comprising:
the word determining module is used for acquiring a text to be published and determining professional words and non-professional words corresponding to a publishing subject in the text to be published;
a topic relevance determining module, configured to determine relevance between the text to be published and the publishing topic according to a first weight of the professional word and a second weight of the non-professional word, where the first weight is greater than the second weight;
and the text publishing module is used for determining whether to publish the text to be published according to the relevancy.
11. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the information distribution method of any one of claims 1-9.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the information distribution method according to any one of claims 1 to 9.
CN202010598743.9A 2020-06-28 2020-06-28 Information issuing method, device, equipment and storage medium Pending CN113761110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010598743.9A CN113761110A (en) 2020-06-28 2020-06-28 Information issuing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010598743.9A CN113761110A (en) 2020-06-28 2020-06-28 Information issuing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113761110A true CN113761110A (en) 2021-12-07

Family

ID=78785435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010598743.9A Pending CN113761110A (en) 2020-06-28 2020-06-28 Information issuing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113761110A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254038A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 System and method for analyzing network comment relevance
CN104077327A (en) * 2013-03-29 2014-10-01 阿里巴巴集团控股有限公司 Core word importance recognition method and equipment and search result sorting method and equipment
CN104281606A (en) * 2013-07-08 2015-01-14 腾讯科技(北京)有限公司 Method and device for displaying microblog comments
US9519871B1 (en) * 2015-12-21 2016-12-13 International Business Machines Corporation Contextual text adaptation
CN108256098A (en) * 2018-01-30 2018-07-06 中国银联股份有限公司 A kind of method and device of determining user comment Sentiment orientation
CN109862062A (en) * 2018-10-24 2019-06-07 平安科技(深圳)有限公司 Content uploading management method and device, electronic equipment and storage medium
CN110674415A (en) * 2019-09-20 2020-01-10 北京浪潮数据技术有限公司 Information display method and device and server
CN110826323A (en) * 2019-10-24 2020-02-21 新华三信息安全技术有限公司 Comment information validity detection method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254038A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 System and method for analyzing network comment relevance
CN104077327A (en) * 2013-03-29 2014-10-01 阿里巴巴集团控股有限公司 Core word importance recognition method and equipment and search result sorting method and equipment
CN104281606A (en) * 2013-07-08 2015-01-14 腾讯科技(北京)有限公司 Method and device for displaying microblog comments
US9519871B1 (en) * 2015-12-21 2016-12-13 International Business Machines Corporation Contextual text adaptation
CN108256098A (en) * 2018-01-30 2018-07-06 中国银联股份有限公司 A kind of method and device of determining user comment Sentiment orientation
CN109862062A (en) * 2018-10-24 2019-06-07 平安科技(深圳)有限公司 Content uploading management method and device, electronic equipment and storage medium
CN110674415A (en) * 2019-09-20 2020-01-10 北京浪潮数据技术有限公司 Information display method and device and server
CN110826323A (en) * 2019-10-24 2020-02-21 新华三信息安全技术有限公司 Comment information validity detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邵晓良, 刘红: "Web主题信息采集中信息主题的识别", 现代图书情报技术, no. 10, 25 October 2004 (2004-10-25) *

Similar Documents

Publication Publication Date Title
CN107908740B (en) Information output method and device
US8990149B2 (en) Generating a predictive model from multiple data sources
US8290927B2 (en) Method and apparatus for rating user generated content in search results
US9064212B2 (en) Automatic event categorization for event ticket network systems
US11270375B1 (en) Method and system for aggregating personal financial data to predict consumer financial health
US11775504B2 (en) Computer estimations based on statistical tree structures
US11037238B1 (en) Machine learning tax based credit score prediction
CN113688310B (en) Content recommendation method, device, equipment and storage medium
CN110276009B (en) Association word recommendation method and device, electronic equipment and storage medium
CN111400600A (en) Message pushing method, device, equipment and storage medium
US11238027B2 (en) Dynamic document reliability formulation
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN110245684B (en) Data processing method, electronic device, and medium
CN112966181A (en) Service recommendation method and device, electronic equipment and storage medium
US9965812B2 (en) Generating a supplemental description of an entity
CN107729944B (en) Identification method and device of popular pictures, server and storage medium
US20150012550A1 (en) Systems and methods of messaging data analysis
US11803796B2 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
US10671932B1 (en) Software application selection models integration
CN112965943A (en) Data processing method and device, electronic equipment and storage medium
US11222143B2 (en) Certified information verification services
CN113761110A (en) Information issuing method, device, equipment and storage medium
US10896461B2 (en) Method and apparatus for data mining based on users' search behavior
CN114925050A (en) Data verification method and device based on knowledge base, electronic equipment and storage medium
CN112947844A (en) Data storage method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination