CN109522275B - Label mining method based on user production content, electronic device and storage medium - Google Patents

Label mining method based on user production content, electronic device and storage medium Download PDF

Info

Publication number
CN109522275B
CN109522275B CN201811427538.5A CN201811427538A CN109522275B CN 109522275 B CN109522275 B CN 109522275B CN 201811427538 A CN201811427538 A CN 201811427538A CN 109522275 B CN109522275 B CN 109522275B
Authority
CN
China
Prior art keywords
book
candidate
tag
tags
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811427538.5A
Other languages
Chinese (zh)
Other versions
CN109522275A (en
Inventor
柳燕煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ireader Technology Co Ltd
Original Assignee
Ireader Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ireader Technology Co Ltd filed Critical Ireader Technology Co Ltd
Priority to CN201811427538.5A priority Critical patent/CN109522275B/en
Publication of CN109522275A publication Critical patent/CN109522275A/en
Application granted granted Critical
Publication of CN109522275B publication Critical patent/CN109522275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a label mining method based on user production content, electronic equipment and a storage medium, wherein the label mining method based on the user production content comprises the following steps: extracting candidate tags; obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book; calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book; and mining to obtain book tags according to the scores of the candidate tags. According to the technical scheme provided by the invention, the degree of association between the candidate tags and the book on the book content from the user perspective is fully considered in the calculation of the scores of the candidate tags, the representing degree of the candidate tags on the book content can be accurately measured through the scores of the candidate tags, the book content can be more accurately represented by the mined book tags, and the book content can be accurately extracted from the user perspective.

Description

Label mining method based on user production content, electronic device and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a label mining method based on user production content, electronic equipment and a storage medium.
Background
Books in the form of electronic books are popular with a large number of users because of their advantages such as easy access. When a user searches for a book, the user generally searches according to the book classification. In the prior art, most of book reading platforms classify books according to book labels. Among them, book tags are often found according to book contents and the like. In the prior art, book tags are generally obtained based on text mining, and the text mining-based mode needs to manually read the whole text of a book and then manually induces the book tags. However, the mining method is heavy in workload, the representativeness of the mined book tags is not necessarily good, and the book tags are obtained based on the text content, so that whether the book content can be really represented or not from the perspective of the user cannot be reflected, and the accuracy is low.
Disclosure of Invention
In view of the above, the present invention has been made to provide a tag mining method, an electronic device, and a storage medium based on user-produced content that overcome or at least partially solve the above-mentioned problems.
According to an aspect of the present invention, there is provided a tag mining method for producing content based on a user, the method including:
extracting candidate tags;
obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book;
calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book;
and mining to obtain book tags according to the scores of the candidate tags.
According to another aspect of the present invention, there is provided an electronic apparatus including: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the following operations:
extracting candidate tags;
obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book;
calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book;
and mining to obtain book tags according to the scores of the candidate tags.
According to yet another aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to:
extracting candidate tags;
obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book;
calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book;
and mining to obtain book tags according to the scores of the candidate tags.
According to the technical scheme provided by the invention, the association weight of each candidate tag and each book can be conveniently and quickly determined based on the user production content of each book, the score of each candidate tag is calculated by utilizing the association weight of each candidate tag and each book, the association degree of the candidate tag and the book on the book content in the user perspective is fully considered in the calculation of the score of each candidate tag, and the representative degree of each candidate tag on the book content can be accurately measured through the score of each candidate tag; compared with the prior art, the book label mining method and the book label mining device have the advantages that mining of book labels is conveniently achieved based on user production content, data processing amount is effectively reduced, the mined book labels can represent the book content more accurately, and accurate extraction of the book content from the perspective of a user is achieved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart illustrating a tag mining method based on user-produced content according to a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating a tag mining method based on user-produced content according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
Fig. 1 is a flowchart illustrating a tag mining method based on user-produced content according to a first embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S101, extracting candidate labels.
In step S101, candidate tags are extracted, so that book tags that can be used for labeling books are mined from the candidate tags by processing the candidate tags. Specifically, candidate tags may be extracted from user-produced content of each book or from a thesaurus of search terms. The user production content and the books have corresponding relations, and the user production content of each book comprises the following steps: the original information of the user aiming at the production of each book can be specifically book comment information, topics and the like.
In one embodiment, the candidate tags corresponding to each book may be extracted from the base words obtained by segmenting the user production content of each book. The method includes the steps that word segmentation is conducted on user production content of each book through a word segmentation algorithm such as n-gram in the prior art, a basic word corresponding to each book is obtained, the obtained basic word can be subjected to de-duplication processing in consideration of the fact that the basic word possibly comprises repeated words and stop words which are not suitable for being used as candidate labels, such as 'yes', 'yes' and the like, the stop words are filtered from the basic word, and the processed basic word is extracted to be used as the candidate labels corresponding to each book.
In another embodiment, the search term is extracted from the search term lexicon, and the candidate label is determined according to the search term. The search word bank is constructed based on search contents input by a user in the process of searching books, and comprises a plurality of search words. In specific application, the search terms can be extracted from the search term word bank according to a preset extraction strategy, and then the candidate labels are determined according to the search terms.
In addition, as time increases, more and more user production contents of each book and search words in the search word lexicon are accumulated, and in order to avoid the influence of the user production contents of each book or the search words in the search word lexicon accumulated long before (for example, several years before) on the process of extracting the candidate tags, the candidate tags can be extracted according to the user production contents of each book or the search words in the search word lexicon generated within a preset time range. The preset time range can be set by a person skilled in the art according to actual needs, and is not specifically limited herein. For example, the preset time range may be set to a time range of 30 days.
And step S102, obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book.
After the candidate tags are extracted and obtained, counting the frequency of occurrence of each candidate tag in the user production content of each book to obtain word frequency data of each candidate tag in the user production content of each book, and then calculating and obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book, wherein the user production content is the content such as comments and the like generated by a user aiming at the book content, and has the characteristics of agreeing with the user instinct, character concise, strong book content association and the like, and the association weight of each candidate tag and each book obtained according to the word frequency data of each candidate tag in the user production content of each book can fully show the association degree of the candidate tag and the book in the book content from the user perspective.
Wherein, the production content of each book user comprises: original information such as book comment information and topics produced by a user for each book may be comments, ideas and notes added by the user for the contents of the book, or messages (for example, a message list in a book detail page) or responses to the messages of the user for the book itself or an author; or may be a discussion message for use in a topic page of a corresponding book, which may be understood as a discussion group of the corresponding electronic book.
Taking the production content of the user as book comment information as an example, in step S102, for each candidate tag, counting the number of times that the candidate tag appears in the book comment information of each book, to obtain word frequency data of the candidate tag in the book comment information of each book; after the word frequency data of each candidate tag in the book comment information of each book is obtained, the association weight of each candidate tag and each book can be obtained according to the word frequency data by using a preset weighting algorithm.
Step S103, calculating a score of each candidate tag according to the associated weight of each candidate tag and each book.
The association weights of the candidate labels and the books can be substituted into a preset algorithm model, and the scores of the candidate labels are obtained through calculation. Specifically, the score of each candidate tag can reflect the representing degree of each candidate tag for the book content. The higher the score of a candidate tag is, the higher the representation degree of the candidate tag on the book content is, and the candidate tag can be used as the book classification.
And step S104, mining to obtain book tags according to the scores of the candidate tags.
Specifically, the scores of the candidate tags may be sorted from high to low, and a preset number of candidate tags with the scores arranged in the front may be mined as book tags. The book tag is obtained by mining the production content of each book user, and can accurately represent the book content from the user perspective.
By using the tag mining method based on the user production content provided by the embodiment, the association weight between each candidate tag and each book can be conveniently and quickly determined based on the user production content of each book, the score of each candidate tag is calculated by using the association weight between each candidate tag and each book, the association degree of each candidate tag and each book on the book content in the user angle is fully considered in the calculation of the score of each candidate tag, and the representing degree of each candidate tag on the book content can be accurately weighed by the score of each candidate tag; compared with the text mining mode in the prior art, the method and the device have the advantages that the book tags are conveniently mined based on the user production content, the data processing amount is effectively reduced, the mined book tags can represent the book content more accurately, and the book content is accurately extracted from the perspective of the user.
Example two
Fig. 2 is a flowchart illustrating a tag mining method based on user-produced content according to a second embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:
step S201, extracting n search terms with the top search frequency from the search term lexicon, and determining the n search terms as candidate tags.
The search word bank is constructed based on search contents input by a user in the process of searching books, the search word bank comprises a plurality of search words, and the search words have strong representativeness and can embody the contents which are interested and concerned by the user. The search words in the search word bank may include not only basic words, such as "suspense", "cross", and the like, but also compound words obtained by combining a plurality of basic words, such as "palace cross", "traditional swordsmen", and the like. Specifically, in the process of searching books, a user cuts words of search content input by the user to obtain search words, and the obtained search words are added to a search word bank. By the word segmentation mode, the retrieval words in the combined word form can be conveniently and effectively obtained from the retrieval contents, and the retrieval words in the combined word form are prevented from being segmented into a plurality of basic words.
In order to avoid the repeated search words included in the search word bank, before the search words are added to the search word bank, whether the search words are repeated with the search words existing in the search word bank can be judged. If yes, the search word is not added into the search word lexicon, and the search word is discarded; if not, adding the search word into a search word lexicon.
Specifically, counting the number of times of use of the search term for each search term in a search term lexicon to obtain the search frequency of the search term; after the retrieval frequency of each retrieval word is obtained, the retrieval words are sequenced according to the sequence from high to low of the retrieval frequency, n retrieval words with the retrieval frequency arranged at the front are extracted from a retrieval word lexicon, and the n retrieval words are determined as candidate labels. The person skilled in the art can set n according to actual needs, and is not limited here. For example, when n is set to 1 ten thousand, 1 ten thousand search terms with top search frequency are extracted from the search term lexicon as candidate tags.
After the candidate tags are extracted, the association weight of each candidate tag and each book can be obtained according to the word frequency data of each candidate tag in the user production content of each book. In a specific embodiment, an association weight between each candidate tag and each book may be calculated by using a TF-IDF (Term Frequency-Inverse file Frequency) algorithm, where the specific calculation process may be implemented by steps S202 and S203.
Step S202, aiming at each candidate label, calculating reverse frequency data of the candidate label by using the total number of books and the number of books corresponding to the user production content containing the candidate label.
Based on the TF-IDF algorithm, the association weight of the candidate tag and the book is not only increased in proportion to the word frequency data of the candidate tag in the user production content of the book, but also decreased in inverse proportion to the occurrence frequency of the candidate tag in the user production content of all books. If a candidate tag appears in the user-produced content of one book many times (i.e. the word frequency data is high) and appears in the user-produced content of other books very rarely, the candidate tag has good category distinguishing capability and is suitable for book classification.
In order to accurately calculate the association weight between each candidate tag and each book, it is necessary to determine word frequency data of each candidate tag in the user-generated content of each book and reverse frequency data of each candidate tag. For each candidate tag, the frequency of occurrence of the candidate tag in the user production content of each book can be counted to obtain the word frequency data of the candidate tag in the user production content of each book, so that the word frequency data of each candidate tag in the user production content of each book is obtained. And calculating reverse frequency data of each candidate tag by utilizing the total number of books and the number of books corresponding to the user production content containing the candidate tag.
Taking the production content of the user as book comment information as an example, counting the frequency of occurrence of the candidate tag in the book comment information of each book stored in the book reading platform for each candidate tag to obtain the word frequency data of the candidate tag in the book comment information of each book, so as to obtain the word frequency data of each candidate tag in the book comment information of each book. In addition, for each candidate tag, the reverse frequency data of the candidate tag is calculated by using the total number of books and the number of books corresponding to the book review information containing the candidate tag. Specifically, the total number of books is divided by the number of books corresponding to the book review information including the candidate tag to obtain an intermediate calculation result, then a logarithm with the base 10 is taken as the intermediate calculation result, and the obtained numerical value is used as the reverse frequency data of the candidate tag.
Step S203, obtaining an association weight between each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag.
Specifically, for any one candidate tag and any one book, the word frequency data of the candidate tag in the user production content of the book and the reverse frequency data of the candidate tag are multiplied, and the value obtained by the multiplication is determined as the association weight of the candidate tag and the book.
And step S204, calculating the score of each candidate label according to the associated weight of each candidate label and each book.
The step of calculating the score for each candidate tag is accomplished by loop iteration. Firstly, setting a scoring initial value for each book and each candidate tag, for example, setting the scoring initial value for each book and the scoring initial value for each candidate tag to be 1; and then, circularly and iteratively calculating the scores of the books and the scores of the candidate labels, so that the scores of the books and the scores of the candidate labels are continuously updated until an iteration ending condition is met, and stable scores of the books and the scores of the candidate labels are obtained.
Performing loop iteration on the following steps until an iteration ending condition is met; calculating the score of each book according to the score of each candidate tag and the associated weight of each candidate tag and the book; and calculating the score of each candidate label according to the score of each book and the associated weight of the candidate label and each book. The calculated score of each book can sufficiently reflect the activity of the user in the user production content of each book, and the score of each candidate label can reflect the representative degree of each candidate label to the book content. The higher the score of a certain book is, the higher the activity of the user in the user production content of the book is; the higher the score of a candidate tag is, the higher the representation degree of the candidate tag on the book content is, and the candidate tag can be used as the book classification.
Specifically, for each book, the score of each candidate tag and the associated weight of each candidate tag and the book are weighted, the numeric value obtained by the weighting operation is normalized, and the result of the normalization processing is taken as the score of the book. And for each candidate tag, carrying out weighted operation on the score of each book and the associated weight of the candidate tag and each book, carrying out normalization processing on the numerical value obtained by the weighted operation, and taking the result of the normalization processing as the score of the candidate tag.
In this embodiment, the normalizing the score of the candidate tag specifically includes: and dividing the score of the candidate label by the total number of the candidate labels to obtain a normalized result. The normalizing the book scores specifically comprises: and dividing the score of the book by the total number of the book to obtain a normalized result.
Wherein, a person skilled in the art can set the iteration ending condition according to actual needs, and is not limited herein. For example, the iteration end condition may include: the iteration times reach the preset iteration times; and/or the difference between the scores of the books obtained by two adjacent iterative computations is smaller than a first preset difference, and the difference between the scores of the candidate labels is smaller than a second preset difference. In a specific embodiment, whether the iteration end condition is met or not may be determined by determining whether the iteration number reaches a preset iteration number, or a difference between the score of each book obtained by the current iteration calculation and the score of each book obtained by the previous iteration calculation may be calculated, a difference between the score of each candidate tag obtained by the current iteration calculation and the score of each candidate tag obtained by the previous iteration calculation may be calculated, and whether the iteration end condition is met or not may be determined by determining whether the two differences are respectively smaller than a first preset difference and a second preset difference.
And step S205, mining to obtain book tags according to the scores of the candidate tags.
The scores of the candidate tags can be sorted from high to low, and a preset number of candidate tags with the scores arranged in the front are mined to serve as book tags. When the obtained book tags are used for book recommendation, books related to the book tags can be recommended to the user, and the user is likely to read or download the recommended books, so that the adoption rate of the recommended books is effectively improved, and the recommendation effect is greatly improved.
If the book tag is mined from the candidate tags, when determining the book associated with the book tag, specifically, the book tag may be determined according to the association weight obtained in step S203, and the higher the association weight of the book tag with a certain book is, the closer the association degree between the book tag and the book is in the book content is. For example, when book recommendation is performed using the book tags, a plurality of books may be arranged in an order of high to low association weights with the book tags, and a book with a higher association weight with the book tag may be preferentially recommended to the user.
Step S206, displaying the book label in the page display area.
After the book tags are obtained by mining, the book tags can be displayed in recommended tags in a page display area of a book reading platform, and when a user performs triggering operations such as clicking on a certain book tag, the user can jump to a book recommended page corresponding to the book tag. The book recommendation page may include information such as books associated with the book tag, and a person skilled in the art may recommend the page to the book according to actual needs, which is not limited herein.
By using the tag mining method based on the user production content provided by the embodiment, the association weight between each candidate tag and each book can be quickly obtained according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag, and the association weight between each candidate tag and each book can sufficiently reflect the association degree between the candidate tag and the book in the book content from the user perspective; the scoring of each candidate label is accurately calculated according to the associated weight of each candidate label and each book; according to the scheme, the book tags are conveniently mined according to the scores of the candidate tags, and the mined book tags can accurately represent book contents from the user perspective; in addition, when book recommendation is performed by using the book tags obtained by the scheme, books related to the book tags are recommended to the user, so that the adoption rate of the recommended books is effectively improved, and the recommendation effect is greatly improved.
EXAMPLE III
The third embodiment of the present invention provides a nonvolatile storage medium, where the storage medium stores at least one executable instruction, and the executable instruction may execute the tag mining method based on the user production content in any of the above method embodiments.
The executable instructions may be specifically configured to cause the processor to: extracting candidate tags; obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book; calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book; and mining to obtain book tags according to the scores of the candidate tags.
In an alternative embodiment, the executable instructions further cause the processor to: and extracting candidate labels corresponding to the books from the basic words obtained by segmenting the user production content of the books.
In an alternative embodiment, the executable instructions further cause the processor to: and extracting the search words from the search word bank, and determining candidate labels according to the search words.
In an alternative embodiment, the executable instructions further cause the processor to: and extracting n search words with the top search frequency from the search word bank, and determining the n search words as candidate labels.
In an alternative embodiment, the producing content per user of the book comprises: the user is the original information produced for each book.
In an alternative embodiment, the executable instructions further cause the processor to: for each candidate tag, calculating to obtain reverse frequency data of the candidate tag by using the total number of books and the number of books corresponding to the user production content containing the candidate tag; and obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag.
In an alternative embodiment, the executable instructions further cause the processor to: circularly and iteratively executing the following steps until an iteration ending condition is met; calculating the score of each book according to the score of each candidate tag and the associated weight of each candidate tag and the book; and calculating the score of each candidate label according to the score of each book and the associated weight of the candidate label and each book.
In an alternative embodiment, the iteration end condition includes: the iteration times reach the preset iteration times; and/or the difference between the scores of the books obtained by two adjacent iterative computations is smaller than a first preset difference, and the difference between the scores of the candidate labels is smaller than a second preset difference.
In an alternative embodiment, the executable instructions further cause the processor to: the book tags are presented in the page display area.
Example four
Fig. 3 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.
As shown in fig. 3, the electronic device may include: a processor (processor)302, a communication Interface 304, a memory 306, and a communication bus 308.
Wherein:
the processor 302, communication interface 304, and memory 306 communicate with each other via a communication bus 308.
A communication interface 304 for communicating with network elements of other devices, such as clients or other servers.
The processor 302 is configured to execute the program 310, and may specifically execute relevant steps in the embodiment of the tag mining method based on the user-produced content.
In particular, program 310 may include program code comprising computer operating instructions.
The processor 302 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 306 for storing a program 310. Memory 306 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 310 may specifically be configured to cause the processor 302 to perform the following operations: extracting candidate tags; obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book; calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book; and mining to obtain book tags according to the scores of the candidate tags.
In an alternative embodiment, program 310 further causes processor 302 to: and extracting candidate labels corresponding to the books from the basic words obtained by segmenting the user production content of the books.
In an alternative embodiment, program 310 further causes processor 302 to: and extracting the search words from the search word bank, and determining candidate labels according to the search words.
In an alternative embodiment, program 310 further causes processor 302 to: and extracting n search words with the top search frequency from the search word bank, and determining the n search words as candidate labels.
In an alternative embodiment, the producing content per user of the book comprises: the user is the original information produced for each book.
In an alternative embodiment, program 310 further causes processor 302 to: for each candidate tag, calculating to obtain reverse frequency data of the candidate tag by using the total number of books and the number of books corresponding to the user production content containing the candidate tag; and obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag.
In an alternative embodiment, program 310 further causes processor 302 to: circularly and iteratively executing the following steps until an iteration ending condition is met; calculating the score of each book according to the score of each candidate tag and the associated weight of each candidate tag and the book; and calculating the score of each candidate label according to the score of each book and the associated weight of the candidate label and each book.
In an alternative embodiment, program 310 further causes processor 302 to: the iteration end condition includes: the iteration times reach the preset iteration times; and/or the difference between the scores of the books obtained by two adjacent iterative computations is smaller than a first preset difference, and the difference between the scores of the candidate labels is smaller than a second preset difference.
In an alternative embodiment, program 310 further causes processor 302 to: the book tags are presented in the page display area.
For specific implementation of each step in the program 310, reference may be made to the description corresponding to the corresponding step in the above tag mining embodiment based on the user production content, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above-described device may refer to the corresponding process description in the foregoing method embodiment, and is not described herein again.
By the scheme provided by the embodiment, the degree of association between the candidate tags and the book on the book content in the user perspective is fully considered in the calculation of the scores of the candidate tags, and the representing degree of the candidate tags on the book content can be accurately measured by the scores of the candidate tags; the mining of the book tags is conveniently realized based on the user production content, the mined book tags can represent the book content more accurately, and the accurate extraction of the book content from the user perspective is realized.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (24)

1. A tag mining method based on user production content comprises the following steps:
extracting candidate tags;
obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book;
calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book;
according to the scores of all candidate tags, mining to obtain book tags;
wherein, the calculating the score of each candidate tag according to the associated weight of each candidate tag and each book further comprises:
circularly and iteratively executing the following steps until an iteration ending condition is met;
calculating the score of each book according to the score of each candidate tag and the associated weight of each candidate tag and the book;
and calculating the score of each candidate label according to the score of each book and the associated weight of the candidate label and each book.
2. The method of claim 1, the extracting candidate tags further comprising:
and extracting candidate labels corresponding to the books from the basic words obtained by segmenting the user production content of the books.
3. The method of claim 1, the extracting candidate tags further comprising:
extracting a search word from a search word lexicon, and determining a candidate label according to the search word.
4. The method of claim 3, wherein extracting the search term from the search term corpus, and determining the candidate tag based on the search term further comprises:
extracting n search words with front search frequency arrangement from a search word bank, and determining the n search words as candidate labels.
5. The method of claim 1, the producing content per user of the book comprising: the user is the original information produced for each book.
6. The method of claim 1, wherein obtaining the association weight of each candidate tag with each book according to the word frequency data of each candidate tag in the user-generated content of each book further comprises:
for each candidate tag, calculating to obtain reverse frequency data of the candidate tag by using the total number of books and the number of books corresponding to the user production content containing the candidate tag;
and obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag.
7. The method of claim 1, the iteration end condition comprising: the iteration times reach the preset iteration times; and/or the difference between the scores of the books obtained by two adjacent iterative computations is smaller than a first preset difference, and the difference between the scores of the candidate labels is smaller than a second preset difference.
8. The method of any of claims 1-7, after mining the book tags according to the scores of the respective candidate tags, the method further comprising:
and displaying the book label in a page display area.
9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to:
extracting candidate tags;
obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book;
calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book;
according to the scores of all candidate tags, mining to obtain book tags;
wherein the executable instructions further cause the processor to:
circularly and iteratively executing the following steps until an iteration ending condition is met;
calculating the score of each book according to the score of each candidate tag and the associated weight of each candidate tag and the book;
and calculating the score of each candidate label according to the score of each book and the associated weight of the candidate label and each book.
10. The electronic device of claim 9, the executable instructions further cause the processor to:
and extracting candidate labels corresponding to the books from the basic words obtained by segmenting the user production content of the books.
11. The electronic device of claim 9, the executable instructions further cause the processor to:
extracting a search word from a search word lexicon, and determining a candidate label according to the search word.
12. The electronic device of claim 11, the executable instructions further cause the processor to:
extracting n search words with front search frequency arrangement from a search word bank, and determining the n search words as candidate labels.
13. The electronic device of claim 9, the producing content per user of the book comprising: the user is the original information produced for each book.
14. The electronic device of claim 9, the executable instructions further cause the processor to:
for each candidate tag, calculating to obtain reverse frequency data of the candidate tag by using the total number of books and the number of books corresponding to the user production content containing the candidate tag;
and obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag.
15. The electronic device of claim 9, the iteration end condition comprising: the iteration times reach the preset iteration times; and/or the difference between the scores of the books obtained by two adjacent iterative computations is smaller than a first preset difference, and the difference between the scores of the candidate labels is smaller than a second preset difference.
16. The electronic device of any of claims 9-15, the executable instructions further cause the processor to:
and displaying the book label in a page display area.
17. A storage medium having stored therein at least one executable instruction that causes a processor to:
extracting candidate tags;
obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book;
calculating to obtain the score of each candidate label according to the associated weight of each candidate label and each book;
according to the scores of all candidate tags, mining to obtain book tags;
wherein the executable instructions further cause the processor to:
circularly and iteratively executing the following steps until an iteration ending condition is met;
calculating the score of each book according to the score of each candidate tag and the associated weight of each candidate tag and the book;
and calculating the score of each candidate label according to the score of each book and the associated weight of the candidate label and each book.
18. The storage medium of claim 17, the executable instructions further causing the processor to:
and extracting candidate labels corresponding to the books from the basic words obtained by segmenting the user production content of the books.
19. The storage medium of claim 17, the executable instructions further causing the processor to:
extracting a search word from a search word lexicon, and determining a candidate label according to the search word.
20. The storage medium of claim 19, the executable instructions further causing the processor to:
extracting n search words with front search frequency arrangement from a search word bank, and determining the n search words as candidate labels.
21. The storage medium of claim 17, the producing content per user of the book comprising: the user is the original information produced for each book.
22. The storage medium of claim 17, the executable instructions further causing the processor to:
for each candidate tag, calculating to obtain reverse frequency data of the candidate tag by using the total number of books and the number of books corresponding to the user production content containing the candidate tag;
and obtaining the association weight of each candidate tag and each book according to the word frequency data of each candidate tag in the user production content of each book and the reverse frequency data of each candidate tag.
23. The storage medium of claim 17, the iteration end condition comprising: the iteration times reach the preset iteration times; and/or the difference between the scores of the books obtained by two adjacent iterative computations is smaller than a first preset difference, and the difference between the scores of the candidate labels is smaller than a second preset difference.
24. The storage medium of any one of claims 17-23, the executable instructions further cause the processor to:
and displaying the book label in a page display area.
CN201811427538.5A 2018-11-27 2018-11-27 Label mining method based on user production content, electronic device and storage medium Active CN109522275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811427538.5A CN109522275B (en) 2018-11-27 2018-11-27 Label mining method based on user production content, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811427538.5A CN109522275B (en) 2018-11-27 2018-11-27 Label mining method based on user production content, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN109522275A CN109522275A (en) 2019-03-26
CN109522275B true CN109522275B (en) 2020-11-20

Family

ID=65794472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811427538.5A Active CN109522275B (en) 2018-11-27 2018-11-27 Label mining method based on user production content, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN109522275B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334202A (en) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 User interest label construction method and relevant device based on news application software
CN109976622B (en) * 2019-04-04 2021-02-02 掌阅科技股份有限公司 Book tag determination method, electronic device and computer storage medium
CN110990701B (en) * 2019-12-03 2022-11-15 掌阅科技股份有限公司 Book searching method, computing device and computer storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886067A (en) * 2014-03-20 2014-06-25 浙江大学 Method for recommending books through label implied topic

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020950B2 (en) * 2011-12-19 2015-04-28 Palo Alto Research Center Incorporated System and method for generating, updating, and using meaningful tags
CN105975453A (en) * 2015-12-01 2016-09-28 乐视网信息技术(北京)股份有限公司 Method and device for comment label extraction
CN105893478B (en) * 2016-03-29 2019-10-29 广州华多网络科技有限公司 A kind of tag extraction method and apparatus
US11093557B2 (en) * 2016-08-29 2021-08-17 Zoominfo Apollo Llc Keyword and business tag extraction
CN106649818B (en) * 2016-12-29 2020-05-15 北京奇虎科技有限公司 Application search intention identification method and device, application search method and server
CN108182174B (en) * 2017-12-27 2019-03-26 掌阅科技股份有限公司 New words extraction method, electronic equipment and computer storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886067A (en) * 2014-03-20 2014-06-25 浙江大学 Method for recommending books through label implied topic

Also Published As

Publication number Publication date
CN109522275A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
US8566303B2 (en) Determining word information entropies
US9767144B2 (en) Search system with query refinement
US20160012061A1 (en) Similar document detection and electronic discovery
CN107704512B (en) Financial product recommendation method based on social data, electronic device and medium
US20060212441A1 (en) Full text query and search systems and methods of use
WO2020233344A1 (en) Searching method and apparatus, and storage medium
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CA3059929C (en) Text searching method, apparatus, and non-transitory computer-readable storage medium
CN110019669B (en) Text retrieval method and device
CN111506727B (en) Text content category acquisition method, apparatus, computer device and storage medium
CN111753526A (en) Similar competitive product data analysis method and system
CN109597934B (en) Method and device for determining click recommendation words, storage medium and electronic equipment
CN107908649B (en) Text classification control method
CN116226515B (en) Search result ordering method and device, electronic equipment and storage medium
CN111125543A (en) Training method of book recommendation sequencing model, computing device and storage medium
CN112528665A (en) Information extraction method based on semantic understanding
CN108733702B (en) Method, device, electronic equipment and medium for extracting upper and lower relation of user query
WO2019192122A1 (en) Document topic parameter extraction method, product recommendation method and device, and storage medium
KR101544142B1 (en) Searching method and system based on topic
CN112015853B (en) Book searching method, book searching system, electronic device and medium
CN114329206A (en) Title generation method and device, electronic equipment and computer readable medium
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph
CN111340521B (en) Book production price processing method, electronic device and storage medium
CN109614538A (en) A kind of extracting method, device and the equipment of agricultural product price data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant