CN110781297B - Classification method of multi-label scientific research papers based on hierarchical discriminant trees - Google Patents

Classification method of multi-label scientific research papers based on hierarchical discriminant trees Download PDF

Info

Publication number
CN110781297B
CN110781297B CN201910881086.6A CN201910881086A CN110781297B CN 110781297 B CN110781297 B CN 110781297B CN 201910881086 A CN201910881086 A CN 201910881086A CN 110781297 B CN110781297 B CN 110781297B
Authority
CN
China
Prior art keywords
label
words
papers
discriminant
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910881086.6A
Other languages
Chinese (zh)
Other versions
CN110781297A (en
Inventor
刘玮
吴俊杰
李超
左源
纪玉春
袁石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201910881086.6A priority Critical patent/CN110781297B/en
Publication of CN110781297A publication Critical patent/CN110781297A/en
Application granted granted Critical
Publication of CN110781297B publication Critical patent/CN110781297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a classification method of multi-label scientific research papers based on a hierarchical discriminant tree, which comprises the following steps: acquiring a thesis and a label with known labels, extracting a feature word set of the label, and constructing a binary discriminant model; updating the label into a binary discrimination model to obtain a hierarchical discrimination tree model; step three, obtaining a text representation of a label unknown paper, inputting the text representation into all binary discriminant models of a root node in a hierarchical discriminant tree model, calculating the probability of a label corresponding to the node, and outputting the label corresponding to the root node if the probability is greater than a threshold value; inputting the label into all binary discriminant models of child nodes of the node corresponding to the label, calculating the probability of representing the label by the node, if the probability is greater than a threshold value, outputting the label corresponding to the child node, and gradually judging until the leaf node is reached; all the output labels are the labels of the paper. The method has the advantages of fully mining the characteristic words of the papers and quickly and accurately classifying the papers in a hierarchical manner.

Description

Classification method of multi-label scientific research papers based on hierarchical discriminant trees
Technical Field
The invention relates to the field of scientific research paper classification. More specifically, the invention relates to a classification method of multi-label scientific research papers based on a hierarchical discriminant tree.
Background
The organization and management of scientific research papers are always concerned by publishing institutions, scientific research workers and the like. In the field of organization and management of scientific research papers, classification of scientific research papers is an important basic task. The task is to carry out hierarchical label classification on scientific research papers according to the existing class label system, and has very important significance for quick retrieval, induction and summarization of the scientific papers. On one hand, scientific research paper classification can help a publishing institution to quickly locate the latest scientific research paper category, and the latest paper is added into a citation database to provide a high-quality paper data service. On the other hand, scientific research paper classification can support scientific research institutions and scientific research workers to carry out rapid paper retrieval and summarization according to the existing classification system, and the retrieval and summarization efficiency of the scientific research institutions and the scientific research workers is improved. However, the existing class label system with a multi-layer complex structure brings difficulties to scientific research papers, for example, after a new scientific research paper is taken, the paper needs to be reasonably and comprehensively formed with a classification label in the multi-layer label system, and thus, the workload is large, and the work difficulty is high.
Disclosure of Invention
An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.
The invention also aims to provide a classification method of multi-label scientific research papers based on the hierarchical discriminant tree, which can fully mine the characteristic words of the papers and quickly and accurately classify the papers in a hierarchical manner.
To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided a classification method of multi-labeled scientific papers based on hierarchical discriminant trees, comprising:
step one, constructing a binary discriminant model:
acquiring all papers with known labels and labels of the papers in a multi-level label system, acquiring text representations of all papers by adopting a text word segmentation technology, screening the text representations to obtain a characteristic word set of each label, and constructing a binary discrimination model by using the corresponding relation between each label and the characteristic word set of the label;
step two, constructing a hierarchical discrimination tree model: updating labels of all levels in a multi-level label system into a binary discrimination model of the labels to form a level discrimination tree model;
step three, classifying the papers with unknown labels: adopting a text word segmentation technology to obtain text representations of the paper, respectively inputting the text representations into all binary discriminant models of root nodes in a hierarchical discriminant tree model, calculating the probability that the paper has a label corresponding to the node by using the binary discriminant models, and outputting the label corresponding to the root node if the probability is greater than a threshold value;
inputting the text representation into all binary discriminant models of the child nodes of the node corresponding to the label of the hierarchy, calculating the probability that the thesis has the label represented by the node by using the binary discriminant models, and outputting the label corresponding to the child node if the probability is greater than a threshold value;
judging according to the hierarchical sequence from top to bottom until the text representation is input to the binary judgment model of the leaf node of the hierarchical judgment tree model and the output result is judged;
all labels output on the path starting from the root node and ending with the leaf nodes are taken as labels of the paper.
Preferably, the method for obtaining the text representation by adopting the text word segmentation technology comprises the following steps:
adopting a word segmentation and part-of-speech tagging tool to perform word segmentation and part-of-speech tagging on the paper, and reserving all words with part-of-speech tagging results in the text as nouns to form a word set I;
adopting a BERT pre-training language model to obtain semantic vectors of words in each word set I from a thesis to form a word set II;
the word set I and word set II comprise textual representations of the paper.
Preferably, the method for obtaining the feature word set of each label by screening comprises the following steps: starting from a top-level label of a multi-level label system, acquiring a characteristic word corresponding to each label by the following method according to the sequence from a root node to a leaf node;
the method comprises the following steps:
step a, calculating the weight of each word in the text representation of all papers according to all papers under each label, wherein the weight calculation formula is shown as a formula (1):
Figure BDA0002205892290000021
wherein, Fj(i) Representing the frequency of the word i in paper j, the calculation formula is shown in formula (2):
Figure BDA0002205892290000022
count (i) represents the number of times word i appears in paper j, total _ wordjRepresents the total number of words in paper j; n is a radical oftRepresents the number of all papers under label t; n is a radical of~tIndicating the number of all papers under other tags having the same upper level tag as tag t; if the label t is a top label, t represents other top labels; if the label t is a non-top label, t represents other labels under the upper label belonging to the label t; n is a radical ofi ~tRepresents the number of papers in which the word i appears in all papers under other labels having the same upper-level label as the label t;
b, sorting the weights of all words under the label in a descending order, taking M words at the top of the ranking as the characteristic words of the label, and forming an initial characteristic word set of the label;
step c, calculating semantic similarity of all the remaining words and all the words in the initial characteristic word set according to the semantic characteristics of the characteristic words, wherein a calculation formula is shown as a formula (3):
Figure BDA0002205892290000031
wherein M represents the number of words in the initial characteristic word set of the label, cos (j, i) represents the cosine distance of semantic representations of the word j and the word i, and W represents the distance between the words in the initial characteristic word set of the label and the cosine distance of the semantic representations of the word it(j) Represents the weight of the word j in the label t;
sequencing all the remaining words under the label according to the sequence of semantic similarity from large to small, wherein K words before ranking are the feature words of the label to form a supplementary feature word set of the label;
and the initial characteristic word set and the supplementary characteristic word set of the label form a characteristic word set of the label.
Preferably, the value of M is 5% of the total number of words of the text representation under the corresponding label.
Preferably, M is no greater than 1000.
Preferably, the total number of feature words per tag is no greater than 5000.
Preferably, after the binary discriminant model calculates the probability, the threshold of the probability is 0.5.
Preferably, the method for constructing and forming the binary discriminant model is any one of a convolutional neural network, naive Bayes and a support vector product.
The invention at least comprises the following beneficial effects:
first, the labels in the existing multi-level label system have no judgment function, and can only be defined by human subjectivity, so that whether the labels have relevance with a paper can not be accurately known, and after a hierarchical discrimination tree model is formed, each node has an automatic discrimination function, and only text representation needs to be input, whether the labels corresponding to the paper and the node have relevance can be output, so that the discrimination readiness is improved, and the method is more objective and less prone to error.
And secondly, the binary discriminant model can accurately and comprehensively reflect the association relation between the label and the word used in the thesis, and the feature word with the maximum association with the label is obtained. And with the increase of the number of the papers and the update, the feature word set of each label is correspondingly increased and updated, so that the accuracy of the whole classification system can be improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a block diagram of one embodiment of the present invention.
Detailed Description
The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.
As shown in fig. 1, the present invention provides a classification method for a multi-label scientific research paper based on a hierarchical discriminant tree, including:
step one, constructing a binary discriminant model:
acquiring all papers with known labels and labels of the papers in a multi-level label system, acquiring text representations of all papers by adopting a text word segmentation technology, screening the text representations to obtain a characteristic word set of each label, and constructing a binary discrimination model by using the corresponding relation between each label and the characteristic word set of the label; the discrimination model can judge whether a scientific research paper belongs to a label by adopting a traditional data mining method such as support vector product, naive Bayes, logistic regression and the like. The binary discriminant model obtained in the way can accurately and comprehensively reflect the association relation between the label and the word used in the thesis, and obtain the feature word with the maximum association with the label. And with the increase of the number of the papers and the update, the feature word set of each label is correspondingly increased and updated, so that the accuracy of the whole classification system can be improved.
Step two, constructing a hierarchical discrimination tree model: updating labels of all levels in a multi-level label system into a binary discrimination model of the labels to form a level discrimination tree model; the labels in the existing multilayer label system have no judgment function and can be defined only by the subjectivity of people, so that whether the labels and the paper have the relevance cannot be accurately known, after a hierarchical judgment tree model is formed, each node has an automatic judgment function, and only text representation needs to be input, whether the paper and the labels corresponding to the nodes have the relevance can be output, the judgment preparation is improved, and the method is more objective and is not easy to make mistakes.
Step three, classifying the papers with unknown labels: adopting a text word segmentation technology to obtain text representations of the paper, respectively inputting the text representations into all binary discriminant models of root nodes in a hierarchical discriminant tree model, calculating the probability that the paper has a label corresponding to the node by using the binary discriminant models, and outputting the label corresponding to the root node if the probability is greater than a threshold value;
inputting the text representation into all binary discriminant models of the child nodes of the node corresponding to the label of the hierarchy, calculating the probability that the thesis has the label represented by the node by using the binary discriminant models, and outputting the label corresponding to the child node if the probability is greater than a threshold value;
judging according to the hierarchical sequence from top to bottom until the text representation is input to the binary judgment model of the leaf node of the hierarchical judgment tree model and the output result is judged;
all labels output on the path starting from the root node and ending with the leaf nodes are taken as labels of the paper. The root node is judged from the leaf node to the leaf node according to the hierarchy sequence, so as to avoid omission, reduce the workload of judgment, quickly and accurately output the hierarchical label of the new paper, and classify the new paper.
In the technical scheme, in view of the incidence relation between the word and word terms used in the scientific research papers and the tags, the scientific research papers with known tags and tag information thereof are utilized to obtain a feature word set corresponding to each tag; then, according to a multi-level label system, a binary discrimination model is constructed for each label, and discrimination models of all labels are fused into a level discrimination tree model; and finally, judging the labels to which the scientific research papers with unknown labels belong based on the hierarchical discrimination tree model. The method considers the relevance between the words and phrases used in scientific research papers and the labels, can automatically screen the characteristic words related to the labels, and constructs a corresponding binary discrimination model. And a classification task of scientific research papers with unknown labels is realized by utilizing the hierarchical discrimination tree model, and the hierarchical relation among the labels is fully excavated.
In another technical scheme, a method for obtaining text representation by adopting a text word segmentation technology comprises the following steps:
adopting a word segmentation and part-of-speech tagging tool to perform word segmentation and part-of-speech tagging on the paper, and reserving all words with part-of-speech tagging results in the text as nouns to form a word set I;
adopting a BERT pre-training language model to obtain semantic vectors of words in each word set I from a thesis to form a word set II;
the word set I and word set II comprise textual representations of the paper.
In another technical scheme, the method for obtaining the feature word set of each label by screening comprises the following steps: starting from a top-level label of a multi-level label system, acquiring a characteristic word corresponding to each label by the following method according to the sequence from a root node to a leaf node;
the method comprises the following steps:
step a, calculating the weight of each word in the text representation of the papers according to all papers under each label, wherein the weight calculation formula is shown as a formula (1):
Figure BDA0002205892290000051
wherein, Fj(i) Representing the frequency of the word i in paper j, the calculation formula is shown in formula (2):
Figure BDA0002205892290000061
count (i) represents the number of times the word i appears in paper j, total _ wordjRepresents the total number of words in paper j; n is a radical of hydrogentRepresents the number of all papers under label t; n is a radical of~tIndicating the number of all papers under other tags having the same upper level tag as tag t; if the label t is a top label, t represents other top labels; if the label t is a non-top label, t represents other labels under the upper label belonging to the label t; n is a radical ofi ~tRepresents the number of papers in which the word i appears in all papers under other labels having the same upper-level label as the label t;
b, sequencing the weights of all words under the label in a descending order, and taking M words before ranking as feature words of the label to form an initial feature word set of the label;
step c, calculating semantic similarity of all the remaining words and all the words in the initial characteristic word set according to the semantic characteristics of the characteristic words, wherein a calculation formula is shown as a formula (3):
Figure BDA0002205892290000062
wherein M represents the number of words in the initial characteristic word set of the label, cos (j, i) represents the cosine distance of semantic representations of the word j and the word i, and W represents the distance between the words in the initial characteristic word set of the label and the cosine distance of the semantic representations of the word it(j) Represents the weight of the word j in the label t;
sequencing all the remaining words under the label according to the sequence of semantic similarity from large to small, wherein K words before ranking are the feature words of the label to form a supplementary feature word set of the label;
and the initial characteristic word set and the supplementary characteristic word set of the label form a characteristic word set of the label.
In the technical scheme, the scientific research paper is long in space and has more information irrelevant to the classification of the multi-level labels, so that the information relevant to the classification of the multi-level labels in the scientific research paper is extracted, the text representation of the scientific research paper is obtained, and the classification efficiency and the classification accuracy can be improved.
In another technical scheme, the value of M is 5% of the total number of words represented by the text under the corresponding label. The value of M can be adjusted in a floating mode according to the total number of the characteristic words under each label, and the value of M is generally 5% of the total number of the characteristic words.
In another technical scheme, the value of M is not more than 1000. The total number of the characteristic words of scientific research papers to which part of labels belong is large and can reach over ten thousand. This will result in an excessively large value of M, easily increasing noise words, and reducing the effect of the multi-level label classification model. Therefore, the present invention limits the value of M to 1000 to reduce the number of noise feature words.
In another technical scheme, the total number of characteristic words of each label is not more than 5000. And sequencing all the remaining words according to the calculated semantic similarity, taking K words before ranking, adding the feature word set of the label, and expanding the feature word set. To prevent the introduction of too many noise feature words, M + K (i.e., the total number of feature words per tag) is limited to 5000.
In another technical scheme, after the binary discriminant model calculates the probability, the threshold values of the probability are all 0.5. So as to improve the accuracy of the correspondence of the label and the paper.
In another technical scheme, the method for constructing and forming the binary discriminant model is any one of a convolutional neural network, naive Bayes and a support vector product. The three methods have accurate corresponding relation, small calculated amount and quick judgment.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims (7)

1. The classification method of the multi-label scientific research paper based on the hierarchical discriminant tree is characterized by comprising the following steps:
step one, constructing a binary discriminant model:
acquiring all papers with known labels and labels of the papers in a multi-level label system, acquiring text representations of all papers by adopting a text word segmentation technology, screening the text representations to obtain a characteristic word set of each label, and constructing a binary discrimination model by using the corresponding relation between each label and the characteristic word set of the label;
step two, constructing a hierarchical discrimination tree model: updating labels of all levels in a multi-level label system into a binary discrimination model of the labels to form a level discrimination tree model;
step three, classifying the papers with unknown labels: adopting a text word segmentation technology to obtain text representations of the paper, respectively inputting the text representations into all binary discriminant models of root nodes in a hierarchical discriminant tree model, calculating the probability that the paper has a label corresponding to the node by using the binary discriminant models, and outputting the label corresponding to the root node if the probability is greater than a threshold value;
inputting the text representation into all binary discriminant models of the child nodes of the node corresponding to the label of the hierarchy, calculating the probability that the thesis has the label represented by the node by using the binary discriminant models, and outputting the label corresponding to the child node if the probability is greater than a threshold value;
judging according to the hierarchical sequence from top to bottom until the text representation is input to the binary judgment model of the leaf node of the hierarchical judgment tree model and the output result is judged;
taking all labels output on a path from a root node to a leaf node as labels of the paper;
the method for obtaining the feature word set of each label through screening comprises the following steps: starting from a top-level label of a multi-level label system, acquiring a characteristic word corresponding to each label by the following method according to the sequence from a root node to a leaf node;
the method comprises the following steps:
step a, calculating the weight of each word in the text representation of all papers according to all papers under each label, wherein the weight calculation formula is shown as a formula (1):
Figure 601330DEST_PATH_IMAGE002
wherein, F j i) Meaning termiIn the thesisjThe calculation formula of the frequency in (1) is shown as formula (2):
Figure DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE005
meaning wordiIn the thesisjThe number of times of occurrence of (a),
Figure 932518DEST_PATH_IMAGE007
presentation paperjThe total number of words in; n is a radical of t Presentation labeltThe number of all papers that follow; n is a radical of t~Presentation and labeltThe number of all papers under other labels with the same upper label; if labeltTop label, then ~ -tRepresent other top-level labels; if labeltIs a non-top label, then ~ -tTag for indicating co-existencetOther tags under the higher level tag of (a); n is a radical of i t~Is shown in and labeledtAll papers under other labels with the same upper label appear in terms of wordsiThe number of papers of (1);
b, sequencing the weights of all words under the label in a descending order, and taking M words before ranking as feature words of the label to form an initial feature word set of the label;
step c, calculating semantic similarity of all the remaining words and all the words in the initial characteristic word set according to the semantic characteristics of the characteristic words, wherein a calculation formula is shown as a formula (3):
Figure 508992DEST_PATH_IMAGE009
wherein M represents the number of words in the initial feature word set of the tag, cos (c) ((M))j,i) Meaning termjWords and phrasesiOf the semantic representation of (2) cosine distance, W t j) Meaning wordjOn the labeltThe weight of (1);
sequencing all the remaining words under the label according to the sequence of semantic similarity from large to small, wherein K words before ranking are the feature words of the label to form a supplementary feature word set of the label;
and the initial characteristic word set and the supplementary characteristic word set of the label form a characteristic word set of the label.
2. The method for classifying a multi-label scientific research paper based on a hierarchical discriminant tree as claimed in claim 1, wherein the method for obtaining the text representation by using the text segmentation technology comprises:
adopting a word segmentation and part-of-speech tagging tool to perform word segmentation and part-of-speech tagging on the paper, and reserving all words with part-of-speech tagging results in the text as nouns to form a word set I;
adopting a BERT pre-training language model to obtain semantic vectors of words in each word set I from a thesis to form a word set II;
the word set I and word set II comprise textual representations of the paper.
3. The method for classifying multi-label scientific research papers based on hierarchical discriminant trees as claimed in claim 1, wherein the value of M is 5% of the total number of words represented by the text under the corresponding label.
4. The method for classifying multi-label scientific papers based on hierarchical discriminant trees as claimed in claim 3, wherein a value of M is not greater than 1000.
5. The method of classifying a multi-label scientific paper based on hierarchical discriminant trees as claimed in claim 1, wherein the total number of feature words per label is not more than 5000.
6. The method for classifying multi-label scientific research papers based on hierarchical discriminant trees as claimed in claim 1, wherein after the binary discriminant model calculates the probability, the threshold values of the probability are all 0.5.
7. The classification method of multi-label scientific research papers based on hierarchical discriminant trees as claimed in claim 1, wherein the method for constructing the binary discriminant model is any one of convolutional neural network, naive bayes, and support vector product.
CN201910881086.6A 2019-09-18 2019-09-18 Classification method of multi-label scientific research papers based on hierarchical discriminant trees Active CN110781297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910881086.6A CN110781297B (en) 2019-09-18 2019-09-18 Classification method of multi-label scientific research papers based on hierarchical discriminant trees

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910881086.6A CN110781297B (en) 2019-09-18 2019-09-18 Classification method of multi-label scientific research papers based on hierarchical discriminant trees

Publications (2)

Publication Number Publication Date
CN110781297A CN110781297A (en) 2020-02-11
CN110781297B true CN110781297B (en) 2022-06-21

Family

ID=69384249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910881086.6A Active CN110781297B (en) 2019-09-18 2019-09-18 Classification method of multi-label scientific research papers based on hierarchical discriminant trees

Country Status (1)

Country Link
CN (1) CN110781297B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672736B (en) * 2021-09-09 2023-08-22 上海德拓信息技术股份有限公司 Text multi-label classification method and system
CN115659969B (en) * 2022-12-13 2023-04-28 成方金融科技有限公司 Document labeling method, device, electronic equipment and storage medium
CN115964487A (en) * 2022-12-22 2023-04-14 南阳理工学院 Thesis label supplementing method and device based on natural language and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978328A (en) * 2014-04-03 2015-10-14 北京奇虎科技有限公司 Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139754B2 (en) * 2004-02-09 2006-11-21 Xerox Corporation Method for multi-class, multi-label categorization using probabilistic hierarchical modeling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978328A (en) * 2014-04-03 2015-10-14 北京奇虎科技有限公司 Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method

Also Published As

Publication number Publication date
CN110781297A (en) 2020-02-11

Similar Documents

Publication Publication Date Title
CA3007723C (en) Systems and/or methods for automatically classifying and enriching data records imported from big data and/or other sources to help ensure data integrity and consistency
CN105389379B (en) A kind of rubbish contribution classification method indicated based on text distributed nature
CN102567464B (en) Based on the knowledge resource method for organizing of expansion thematic map
CN107315738B (en) A kind of innovation degree appraisal procedure of text information
CN109189942A (en) A kind of construction method and device of patent data knowledge mapping
CN110781297B (en) Classification method of multi-label scientific research papers based on hierarchical discriminant trees
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
US20060288275A1 (en) Method for classifying sub-trees in semi-structured documents
CN113254659A (en) File studying and judging method and system based on knowledge graph technology
CN110633365A (en) Word vector-based hierarchical multi-label text classification method and system
CN110807086A (en) Text data labeling method and device, storage medium and electronic equipment
CN103778206A (en) Method for providing network service resources
CN111222318A (en) Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
US11886515B2 (en) Hierarchical clustering on graphs for taxonomy extraction and applications thereof
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
Bhutada et al. Semantic latent dirichlet allocation for automatic topic extraction
CN114997288A (en) Design resource association method
Van et al. Vietnamese news classification based on BoW with keywords extraction and neural network
CN115238040A (en) Steel material science knowledge graph construction method and system
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
CN110245234A (en) A kind of multi-source data sample correlating method based on ontology and semantic similarity
CN111930944B (en) File label classification method and device
CN102541913B (en) VSM classifier trainings, the identification of the OSSP pages and the OSS Resource Access methods of web oriented
CN108733702B (en) Method, device, electronic equipment and medium for extracting upper and lower relation of user query
CN114528378A (en) Text classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant