CN112307210A - Document tag prediction method, system, medium and electronic device - Google Patents

Document tag prediction method, system, medium and electronic device Download PDF

Info

Publication number
CN112307210A
CN112307210A CN202011232409.8A CN202011232409A CN112307210A CN 112307210 A CN112307210 A CN 112307210A CN 202011232409 A CN202011232409 A CN 202011232409A CN 112307210 A CN112307210 A CN 112307210A
Authority
CN
China
Prior art keywords
document
predicted
keywords
keyword
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011232409.8A
Other languages
Chinese (zh)
Other versions
CN112307210B (en
Inventor
李开兴
邓黎
唐建烊
宗涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CISDI Engineering Co Ltd
CISDI Technology Research Center Co Ltd
Original Assignee
CISDI Engineering Co Ltd
CISDI Technology Research Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CISDI Engineering Co Ltd, CISDI Technology Research Center Co Ltd filed Critical CISDI Engineering Co Ltd
Priority to CN202011232409.8A priority Critical patent/CN112307210B/en
Publication of CN112307210A publication Critical patent/CN112307210A/en
Application granted granted Critical
Publication of CN112307210B publication Critical patent/CN112307210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a document label prediction method, a system, a medium and an electronic device, wherein the method comprises the following steps: extracting keywords according to the original document to obtain a keyword set of the original document; classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords; labeling a classification system of the documents corresponding to the keywords to obtain a training data set; inputting a training data set into a document classification neural network for training to obtain a document label prediction model; inputting a document to be predicted into the document label prediction model, and performing label prediction on the document to be predicted; according to the document label prediction method, the original document is processed to obtain the document label prediction model, the document to be predicted is input into the document label prediction model to be trained, label prediction of the document to be predicted is achieved, the matching degree of the document and the label is high, implementation is convenient, and accuracy is high.

Description

Document tag prediction method, system, medium and electronic device
Technical Field
The present invention relates to the field of electronics, and in particular, to a method, a system, a medium, and an electronic device for predicting a document tag.
Background
The characters are carriers of human civilization, contain a large amount of valuable information, and are typical unstructured data, so that corresponding labels are marked on text contents, the application is very difficult, at present, the labels are usually added to the documents in a manual mode, the matching degree of the labels and the document contents is low, the accuracy is low, and the working efficiency is low.
Disclosure of Invention
The invention provides a document tag prediction method, a system, a medium and an electronic device, which aim to solve the problems that tags are not convenient to add to documents and the matching degree is low in the prior art.
The document tag prediction method provided by the invention comprises the following steps:
extracting keywords according to an original document to obtain a keyword set of the original document;
classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords;
labeling a classification system of the document corresponding to the keyword to obtain a training data set;
inputting the training data set into a document classification neural network for training to obtain a document label prediction model;
and inputting the document to be predicted into the document label prediction model, and performing label prediction on the document to be predicted.
Optionally, the step of obtaining the training data set includes:
acquiring related vocabularies of the keywords in the keyword set, wherein the related vocabularies and the keywords in the keyword set have a superior-inferior relation;
classifying original documents according to keywords in a keyword set and the associated vocabularies to obtain a document classification system, and taking the keywords in the keyword set and the associated vocabularies as associated keywords in the document classification system;
and labeling the document classification system to obtain the training data set.
Optionally, the step of extracting the keyword according to the original document includes:
acquiring an original document;
performing word segmentation on the original document, acquiring a first original vocabulary set, and further acquiring word frequency of vocabularies in the first original vocabulary set;
determining irrelevant words according to the word frequency of the words in the first original word set, and further acquiring a disabled word set;
extracting keywords according to an original document to obtain a keyword set of the original document;
and screening stop words in the keyword set according to the stop word set, and further determining the keyword set.
Optionally, the step of performing label prediction on the document to be predicted includes:
performing word segmentation on the document to be predicted and removing stop words so as to obtain a word set to be predicted;
vectorizing the vocabulary to be predicted to obtain the vectorized vocabulary to be predicted;
vectorizing the document to be predicted according to the vectorized vocabulary to be predicted to obtain a document vector to be predicted;
and inputting the document vector to be predicted into the document label prediction model for training, and performing label prediction on the document to be predicted.
Optionally, the step of inputting the document vector to be predicted into the document tag prediction model for training, and the step of performing tag prediction on the document to be predicted includes:
extracting keywords from the document to be predicted according to the vector of the document to be predicted, and matching the obtained keywords with associated keywords in different categories to obtain a matching result;
classifying and labeling the documents to be predicted according to the matching result to obtain the categories of the documents to be predicted;
and performing label prediction on the document to be predicted according to the associated keywords corresponding to the category of the document to be predicted.
Optionally, the step of performing label prediction on the document to be predicted according to the associated keyword corresponding to the category of the document to be predicted includes:
acquiring the weight of the associated keywords corresponding to the category of the document to be predicted;
acquiring the scores of the associated keywords according to the weights of the associated keywords and the word frequencies of the associated keywords in the original document;
performing label prediction on the document to be predicted according to the scores of the associated keywords;
the mathematical expression of the weight of the associated keyword corresponding to the category of the document to be predicted is obtained as follows:
Figure BDA0002765642000000021
wherein w is the weight of the associated keyword, nwordFor the number of occurrences of the associated keyword, ndocThe number of original documents corresponding to the associated keywords of the same category.
Optionally, the step of performing label prediction on the document to be predicted according to the score of the associated keyword includes:
when the score of the associated keyword is larger than the preset score threshold value, performing label prediction on the document to be predicted, wherein the mathematical expression of obtaining the score of the associated keyword is as follows:
Figure BDA0002765642000000022
wherein s is the association score, and n is the high-frequency vocabularyNumber of (2), wiIs the weight corresponding to the high frequency vocabulary, xiAnd i is the word frequency of the high-frequency words and the serial number of the words.
The invention also provides a document tag prediction system, comprising:
the preprocessing module is used for extracting keywords according to an original document to obtain a keyword set of the original document; classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords;
the processing module is used for labeling the classification system of the document corresponding to the keyword to obtain a training data set; inputting the training data set into a document classification neural network for training to obtain a document label prediction model;
the prediction module is used for inputting the document to be predicted into the document label prediction model and performing label prediction on the document to be predicted; the preprocessing module, the processing module and the prediction module are connected.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method as defined in any one of the above.
The present invention also provides an electronic terminal, comprising: a processor and a memory;
the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the terminal to perform the method as defined in any one of the above.
The invention has the beneficial effects that: according to the document label prediction method, the original document is processed to obtain the document label prediction model, the document to be predicted is input into the document label prediction model to be trained, label prediction of the document to be predicted is achieved, the matching degree of the document and the label is high, implementation is convenient, and accuracy is high.
Drawings
FIG. 1 is a flow chart of a document tag prediction method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a document tag prediction method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a document tag prediction system in an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The inventor finds that a character is typical unstructured data, and a corresponding label is marked on text content, so that the application is very difficult, at present, a label is usually added to a document in a manual mode, the matching degree of the label and the document content is low, the accuracy is low, and the working efficiency is low.
As shown in fig. 1, the document tag prediction method in the present embodiment includes:
s101: extracting keywords according to an original document to obtain a keyword set of the original document; wherein the original document includes: according to different application scenarios, the content of the original document can be adjusted, for example, according to text data such as news, policy, and comment: label prediction and/or recommendation are carried out on the policy files, the related policy files can be used as original documents, the relevance of the original documents is increased, and the matching degree of prediction labels and the documents to be predicted is improved;
according to the original document, the step of extracting the key words comprises the following steps:
acquiring the word frequency and the inverse document frequency of the vocabulary in the original document, acquiring the weight corresponding to the vocabulary in the original document according to the word frequency and the inverse document frequency of the vocabulary in the original document, and extracting the keywords according to the acquired weight, wherein the mathematical expression of the weight corresponding to the vocabulary in the original document is as follows:
TF-IDF=TF×IDF
the TF-IDF is the weight of the vocabulary in the original document, the TF is the word frequency of the vocabulary in the original document, and the IDF is the inverse document frequency corresponding to the vocabulary in the original document;
the mathematical expressions for obtaining the word frequency and the inverse document frequency are as follows:
Figure BDA0002765642000000041
Figure BDA0002765642000000042
calculating to obtain the TF-IDF value of each word according to the calculation formula, and extracting keywords according to the TF-IDF value, for example: taking a vocabulary with a larger TF-IDF value as a keyword of the document;
s102: classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords, labeling the classification system of the documents corresponding to the keywords to obtain a training data set;
the step of establishing a classification system comprises the following steps: merging the same and similar vocabularies to form different categories, and mining the upper and lower position relations among the different categories, for example, the automobile is the upper concept of the gearbox; collecting the categories of different topics to form classification trees of different topics, and further forming a complete classification system;
s103: inputting the training data set into a document classification neural network for training to obtain a document label prediction model;
s104: inputting a document to be predicted into the document label prediction model, and performing label prediction on the document to be predicted; the method comprises the steps of extracting keywords from a high-frequency vocabulary set in an original document, classifying and labeling the keywords in the keyword set to obtain a training data set, enabling coverage data of the training data set to be comprehensive, enabling accuracy of document classification to be high, inputting the training data set into a document classification neural network for training to obtain a document label prediction model, enabling the document label prediction model to be capable of conducting deep learning, classifying and label predicting input documents, inputting the documents to be predicted into the document label prediction model for training, enabling label prediction of the documents to be predicted to be achieved, enabling matching degree of the documents and labels to be high, and being convenient to implement, high in accuracy and low in cost.
As shown in FIG. 2, a document tag prediction method in some embodiments includes:
s201: acquiring an original document, segmenting words of the original document, acquiring a first original vocabulary set, and further acquiring word frequency of the vocabulary in the first original vocabulary set;
s202: determining a keyword set of the original document according to the word frequency of the vocabulary in the first original vocabulary set, namely determining irrelevant vocabulary according to the word frequency of the vocabulary in the first original vocabulary set so as to obtain a deactivated vocabulary set; extracting keywords according to an original document to obtain a keyword set of the original document; according to the stop word set, stop word screening is carried out on the keyword set, and then the keyword set is determined;
s203: classifying original documents according to the keywords in the keyword set and associated vocabularies having upper and lower relations with the keywords to obtain a document classification system, namely acquiring the associated vocabularies of the keywords in the keyword set, wherein the associated vocabularies have upper and lower relations with the keywords in the keyword set; classifying original documents according to keywords in a keyword set and the associated vocabularies to obtain a document classification system, and taking the keywords in the keyword set and the associated vocabularies as associated keywords of different categories in the document classification system; the method comprises the steps of classifying original documents by acquiring associated vocabularies with upper and lower relations with keywords, and classifying the original documents by the keywords in the keyword set and the associated vocabularies with the upper and lower relations with the keywords, so that the accuracy of classifying the original documents is improved;
s204: labeling the document classification system to obtain the training data set;
in some embodiments, the document classification system is labeled to obtain a test data set, the test data set is input into the document label prediction model, and the document label prediction model is tested, so that the accuracy of the document label prediction model is improved; testing the document label prediction model through a test data set to ensure the prediction precision of the document label prediction model;
s205: building a document classification neural network based on deep learning, inputting the training data set into the document classification neural network for training, and obtaining a document label prediction model;
s206: inputting the document to be predicted into the document label prediction model for training, and performing label prediction on the document to be predicted, wherein the training step comprises the following steps: performing word segmentation on the document to be predicted and removing stop words so as to obtain a word set to be predicted; vectorizing the vocabulary to be predicted to obtain the vectorized vocabulary to be predicted; vectorizing the document to be predicted according to the vectorized vocabulary to be predicted to obtain a document vector to be predicted; inputting the document vector to be predicted into the document label prediction model for training, extracting keywords from the document to be predicted by the document label prediction model according to the document vector to be predicted, and matching the obtained keywords with associated keywords in different categories to obtain a matching result; classifying and labeling the documents to be predicted according to the matching result to obtain the categories of the documents to be predicted;
performing label prediction on the document to be predicted according to the associated keywords corresponding to the category of the document to be predicted; acquiring the weight of the associated keywords corresponding to the category of the document to be predicted; acquiring the scores of the associated keywords according to the weights of the associated keywords and the word frequencies of the associated keywords in the original document; performing label prediction on the document to be predicted according to the scores of the associated keywords;
when the score of the associated keyword is greater than the preset score threshold, performing label prediction on the document to be predicted, wherein the mathematical expression of the weight of the associated keyword corresponding to the category of the document to be predicted is obtained as follows:
Figure BDA0002765642000000061
wherein w is the weight of the associated keyword, nwordFor the number of occurrences of the associated keyword, ndocThe number of original documents corresponding to the associated keywords in the same category; in some embodiments, the obtained weights may be further normalized to obtain weights of the normalized associated keywords in the keyword sets of the corresponding categories;
obtaining a mathematical expression of the score of the associated keyword as:
Figure BDA0002765642000000062
wherein s is the association score, n is the number of the high-frequency vocabulary, and wiIs the weight corresponding to the high frequency vocabulary, xiAnd i is the word frequency of the high-frequency words and the serial number of the words.
The document tag prediction method provided by this embodiment may also be applied to a plurality of application scenarios, such as document search, personalized recommendation, and knowledge graph construction, for example: inputting keywords into a document label prediction model for classification and matching, selecting a document with a high matching degree for recommendation, or inputting a document to be predicted into the document label prediction model, extracting the keywords of the document to be predicted, obtaining a feature vector of the keywords of the document to be predicted, splicing and combining the feature vectors of the keywords of the document to be predicted, obtaining a spliced feature vector of the document to be predicted, inputting the spliced feature vector into the document label prediction model for training, and performing label prediction on the document to be predicted.
In some embodiments, after performing tag prediction on the document to be predicted, the document to be predicted may also be subjected to tag recommendation by a tag recommendation model, and receive feedback of a user on a recommended tag, and update the document tag prediction model according to feedback content, so as to improve accuracy of the document tag prediction, where the tag recommendation model performs tag recommendation according to a tag prediction result, and plays roles of prompting and assisting recommendation in tagging, and the tag recommendation model is obtained by: one or more label prediction results are obtained, and the label prediction results are input into a deep learning neural network for training to obtain a label recommendation model.
As shown in fig. 3, the present embodiment also provides a document tag prediction system, including:
the preprocessing module is used for extracting keywords according to an original document to obtain a keyword set of the original document; classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords;
the processing module is used for labeling the classification system of the document corresponding to the keyword to obtain a training data set; inputting the training data set into a document classification neural network for training to obtain a document label prediction model;
the prediction module is used for inputting the document to be predicted into the document label prediction model and performing label prediction on the document to be predicted; the preprocessing module, the processing module and the prediction module are connected. The method comprises the steps of processing an original document to obtain a document label prediction model, inputting a document to be predicted into the document label prediction model for training, so that label prediction and/or recommendation of the document to be predicted are/is realized, the matching degree of the document and a label is high, the implementation is convenient, and the accuracy is high.
The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.
The present embodiment further provides an electronic terminal, including: a processor and a memory;
the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the method in the embodiment.
The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The electronic terminal provided by the embodiment comprises a processor, a memory, a transceiver and a communication interface, wherein the memory and the communication interface are connected with the processor and the transceiver and are used for completing mutual communication, the memory is used for storing a computer program, the communication interface is used for carrying out communication, and the processor and the transceiver are used for operating the computer program so that the electronic terminal can execute the steps of the method.
In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A document tag prediction method, comprising:
extracting keywords according to an original document to obtain a keyword set of the original document;
classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords;
labeling a classification system of the document corresponding to the keyword to obtain a training data set;
inputting the training data set into a document classification neural network for training to obtain a document label prediction model;
and inputting the document to be predicted into the document label prediction model, and performing label prediction on the document to be predicted.
2. The document tag prediction method of claim 1, wherein the step of obtaining a training data set comprises:
acquiring related vocabularies of the keywords in the keyword set, wherein the related vocabularies and the keywords in the keyword set have a superior-inferior relation;
classifying original documents according to keywords in a keyword set and the associated vocabularies to obtain a document classification system, and taking the keywords in the keyword set and the associated vocabularies as associated keywords in the document classification system;
and labeling the document classification system to obtain the training data set.
3. The method of claim 1, wherein the step of extracting keywords from the original document comprises:
acquiring an original document;
performing word segmentation on the original document, acquiring a first original vocabulary set, and further acquiring word frequency of vocabularies in the first original vocabulary set;
determining irrelevant words according to the word frequency of the words in the first original word set, and further acquiring a disabled word set;
extracting keywords according to an original document to obtain a keyword set of the original document;
and screening stop words in the keyword set according to the stop word set, and further determining the keyword set.
4. The document tag prediction method according to claim 1, wherein the step of performing tag prediction on the document to be predicted comprises:
performing word segmentation on the document to be predicted and removing stop words so as to obtain a word set to be predicted;
vectorizing the vocabulary to be predicted to obtain the vectorized vocabulary to be predicted;
vectorizing the document to be predicted according to the vectorized vocabulary to be predicted to obtain a document vector to be predicted;
and inputting the document vector to be predicted into the document label prediction model for training, and performing label prediction on the document to be predicted.
5. The training set obtaining method according to claim 4, wherein the document vector to be predicted is input into the document label prediction model for training, and the step of performing label prediction on the document to be predicted comprises:
extracting keywords from the document to be predicted according to the vector of the document to be predicted, and matching the obtained keywords with associated keywords in different categories to obtain a matching result;
classifying and labeling the documents to be predicted according to the matching result to obtain the categories of the documents to be predicted;
and performing label prediction on the document to be predicted according to the associated keywords corresponding to the category of the document to be predicted.
6. The document tag recommendation method according to claim 5, wherein the step of performing tag prediction on the document to be predicted according to the associated keywords corresponding to the category of the document to be predicted comprises:
acquiring the weight of the associated keywords corresponding to the category of the document to be predicted;
acquiring the scores of the associated keywords according to the weights of the associated keywords and the word frequencies of the associated keywords in the original document;
performing label prediction on the document to be predicted according to the scores of the associated keywords;
the mathematical expression of the weight of the associated keyword corresponding to the category of the document to be predicted is obtained as follows:
Figure FDA0002765641990000021
wherein w is the weight of the associated keyword, nwordFor the number of occurrences of the associated keyword, ndocThe number of original documents corresponding to the associated keywords of the same category.
7. The document tag prediction method according to claim 6, wherein the step of performing tag prediction on the document to be predicted according to the score of the associated keyword comprises:
when the score of the associated keyword is larger than the preset score threshold value, performing label prediction on the document to be predicted, wherein the mathematical expression of obtaining the score of the associated keyword is as follows:
Figure FDA0002765641990000022
wherein s is the association score, n is the number of the high-frequency vocabulary, and wiIs the weight corresponding to the high frequency vocabulary, xiAnd i is the word frequency of the high-frequency words and the serial number of the words.
8. A document tag prediction system, comprising:
the preprocessing module is used for extracting keywords according to an original document to obtain a keyword set of the original document; classifying the keywords in the keyword set to obtain a classification system of the documents corresponding to the keywords;
the processing module is used for labeling the classification system of the document corresponding to the keyword to obtain a training data set; inputting the training data set into a document classification neural network for training to obtain a document label prediction model;
the prediction module is used for inputting the document to be predicted into the document label prediction model and performing label prediction on the document to be predicted; the preprocessing module, the processing module and the prediction module are connected.
9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.
10. An electronic terminal, comprising: a processor and a memory;
the memory is for storing a computer program and the processor is for executing the computer program stored by the memory to cause the terminal to perform the method of any of claims 1 to 7.
CN202011232409.8A 2020-11-06 2020-11-06 Document tag prediction method, system, medium and electronic device Active CN112307210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011232409.8A CN112307210B (en) 2020-11-06 2020-11-06 Document tag prediction method, system, medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011232409.8A CN112307210B (en) 2020-11-06 2020-11-06 Document tag prediction method, system, medium and electronic device

Publications (2)

Publication Number Publication Date
CN112307210A true CN112307210A (en) 2021-02-02
CN112307210B CN112307210B (en) 2024-07-30

Family

ID=74326508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011232409.8A Active CN112307210B (en) 2020-11-06 2020-11-06 Document tag prediction method, system, medium and electronic device

Country Status (1)

Country Link
CN (1) CN112307210B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204653A (en) * 2021-06-04 2021-08-03 中国银行股份有限公司 Demand value labeling method and device, computer equipment and readable storage medium
CN115861606A (en) * 2022-05-09 2023-03-28 北京中关村科金技术有限公司 Method and device for classifying long-tail distribution documents and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064969A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Method for automatically creating keyword index table
WO2017101342A1 (en) * 2015-12-15 2017-06-22 乐视控股(北京)有限公司 Sentiment classification method and apparatus
CN106997340A (en) * 2016-01-25 2017-08-01 阿里巴巴集团控股有限公司 The generation of dictionary and the Document Classification Method and device using dictionary
CN110196910A (en) * 2019-05-30 2019-09-03 珠海天燕科技有限公司 A kind of method and device of corpus classification
CN110275935A (en) * 2019-05-10 2019-09-24 平安科技(深圳)有限公司 Processing method, device and storage medium, the electronic device of policy information
CN110309303A (en) * 2019-05-22 2019-10-08 浙江工业大学 A kind of judicial dispute data visualization analysis method based on Weighted T F-IDF
CN110717042A (en) * 2019-09-24 2020-01-21 北京工商大学 Method for constructing document-keyword heterogeneous network model
CN110837601A (en) * 2019-10-25 2020-02-25 杭州叙简科技股份有限公司 Automatic classification and prediction method for alarm condition
WO2020207431A1 (en) * 2019-04-12 2020-10-15 智慧芽信息科技(苏州)有限公司 Document classification method, apparatus and device, and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064969A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Method for automatically creating keyword index table
WO2017101342A1 (en) * 2015-12-15 2017-06-22 乐视控股(北京)有限公司 Sentiment classification method and apparatus
CN106997340A (en) * 2016-01-25 2017-08-01 阿里巴巴集团控股有限公司 The generation of dictionary and the Document Classification Method and device using dictionary
WO2020207431A1 (en) * 2019-04-12 2020-10-15 智慧芽信息科技(苏州)有限公司 Document classification method, apparatus and device, and storage medium
CN110275935A (en) * 2019-05-10 2019-09-24 平安科技(深圳)有限公司 Processing method, device and storage medium, the electronic device of policy information
CN110309303A (en) * 2019-05-22 2019-10-08 浙江工业大学 A kind of judicial dispute data visualization analysis method based on Weighted T F-IDF
CN110196910A (en) * 2019-05-30 2019-09-03 珠海天燕科技有限公司 A kind of method and device of corpus classification
CN110717042A (en) * 2019-09-24 2020-01-21 北京工商大学 Method for constructing document-keyword heterogeneous network model
CN110837601A (en) * 2019-10-25 2020-02-25 杭州叙简科技股份有限公司 Automatic classification and prediction method for alarm condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宁建飞;刘降珍;: "融合Word2vec与TextRank的关键词抽取研究", 现代图书情报技术, no. 06, 25 June 2016 (2016-06-25) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204653A (en) * 2021-06-04 2021-08-03 中国银行股份有限公司 Demand value labeling method and device, computer equipment and readable storage medium
CN115861606A (en) * 2022-05-09 2023-03-28 北京中关村科金技术有限公司 Method and device for classifying long-tail distribution documents and storage medium
CN115861606B (en) * 2022-05-09 2023-09-08 北京中关村科金技术有限公司 Classification method, device and storage medium for long-tail distributed documents

Also Published As

Publication number Publication date
CN112307210B (en) 2024-07-30

Similar Documents

Publication Publication Date Title
CN110222160B (en) Intelligent semantic document recommendation method and device and computer readable storage medium
CN108121700B (en) Keyword extraction method and device and electronic equipment
CN110929038B (en) Knowledge graph-based entity linking method, device, equipment and storage medium
CN109471944B (en) Training method and device of text classification model and readable storage medium
CN110781276A (en) Text extraction method, device, equipment and storage medium
CN110737756B (en) Method, apparatus, device and medium for determining answer to user input data
CN110688452B (en) Text semantic similarity evaluation method, system, medium and device
CN110399547B (en) Method, apparatus, device and storage medium for updating model parameters
CN110688405A (en) Expert recommendation method, device, terminal and medium based on artificial intelligence
WO2021190662A1 (en) Medical text sorting method and apparatus, electronic device, and storage medium
CN112667782A (en) Text classification method, device, equipment and storage medium
CN110765765A (en) Contract key clause extraction method and device based on artificial intelligence and storage medium
CN113609847B (en) Information extraction method, device, electronic equipment and storage medium
CN113434636A (en) Semantic-based approximate text search method and device, computer equipment and medium
CN112579729B (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN112307210B (en) Document tag prediction method, system, medium and electronic device
CN112800226A (en) Method for obtaining text classification model, method, device and equipment for text classification
CN112380421A (en) Resume searching method and device, electronic equipment and computer storage medium
CN110413992A (en) A kind of semantic analysis recognition methods, system, medium and equipment
CN111931516A (en) Text emotion analysis method and system based on reinforcement learning
CN114003725A (en) Information annotation model construction method and information annotation generation method
CN108241650B (en) Training method and device for training classification standard
CN111241848B (en) Article reading comprehension answer retrieval method and device based on machine learning
CN114742062B (en) Text keyword extraction processing method and system
CN114647739B (en) Entity chain finger method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant