CN108073571B - Multi-language text quality evaluation method and system and intelligent text processing system - Google Patents

Multi-language text quality evaluation method and system and intelligent text processing system Download PDF

Info

Publication number
CN108073571B
CN108073571B CN201810028932.5A CN201810028932A CN108073571B CN 108073571 B CN108073571 B CN 108073571B CN 201810028932 A CN201810028932 A CN 201810028932A CN 108073571 B CN108073571 B CN 108073571B
Authority
CN
China
Prior art keywords
text
quality
word
semantic
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810028932.5A
Other languages
Chinese (zh)
Other versions
CN108073571A (en
Inventor
宋俊平
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN201810028932.5A priority Critical patent/CN108073571B/en
Publication of CN108073571A publication Critical patent/CN108073571A/en
Application granted granted Critical
Publication of CN108073571B publication Critical patent/CN108073571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of intelligent text data, and discloses a multilingual text quality evaluation method and system and an intelligent text processing system. The invention focuses on text quality evaluation, and sequences the texts in grammar and semantic expression, arranges the high-quality texts in the front, arranges the low-quality texts with messy codes and the like in the back, and performs manual marking evaluation. The method has the advantages that the accuracy rate is over 95 percent, the improvement is about 8 percent compared with the traditional method, and on the other hand, the problem of multi-language balance evaluation which cannot be solved by the prior art is creatively solved.

Description

Multi-language text quality evaluation method and system and intelligent text processing system
Technical Field
The invention belongs to the technical field of intelligent text data, and particularly relates to a multilingual text quality evaluation method and system and an intelligent text processing system.
Background
The prior art commonly used in the industry is now such: with the continuous deepening of the globalization process and the rapid development of the internet, the text data shows explosive growth, but the data sources are different and influence the utilization efficiency of the information. Therefore, how to evaluate the quality of the data acquired in real time and recommend high-quality text information for the user is an important basic problem of the research in the text intelligent field under the background of big data. At present, the existing technology in the field of text quality assessment is often used as a functional item for data cleaning. From the text data, the cleaned objects include scrambled text, mixed JS code text, inconsistent title content, water-filled text (e.g., repeating a sentence or randomly entering meaningless words and sentences). Existing methods of filtering such dirty data are generally classified into two types: a rule method and a word frequency statistical method; the basic idea of the rule-based method is to enumerate various filtering rules for different data formats, for example, judging whether the messy codes adopt coding rules or dictionary mode and judging JS codesAdopting JS grammar keyword dictionary and the like; the basic idea of the word frequency statistical method is to count the word collocation frequency on a data set with higher text quality, and regarding low-frequency collocation as dirty data to be filtered.
In summary, the prior art has the problems that
The existing rule and frequency statistics based method causes a great deal of information loss; it is difficult to cover the ever-flooding sources and formats of data.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a multilingual text quality evaluation method and system and an intelligent text processing system.
The invention is realized in such a way, the multilingual text quality evaluation method adopts a bigram model to divide a sentence or a chapter into a continuous word pair set, calculates the cosine similarity between vectors of two adjacent words in the sentence pair by pair to obtain the semantic quality score between the two words, and obtains the quality scores of a sentence level and a chapter level through an average sum algorithm.
Further, the quality score is calculated by:
the bigram model is that the current character is only related to the previous character, and is expressed by the formula:
p(s)=p(w1)p(w2|w1)p(w3|w2)K p(wn|wn-1);
the condition is obtained by counting word frequency;
using semantic similarity between word vectors of characters instead of word frequency, the new formula for calculating conditional probability becomes:
Figure BDA0001545841280000021
combining the two formulas to obtain the mass fraction of a sentence; the quality score of the input document can be obtained by averaging all sentences.
Further, before the sentence or chapter is split into a continuous word pair set by using the bigram model, the following steps are required: performing distributed representation of characters on the text corpus at a character level through a neural network language model; capturing context information of the words and the words in sentences by utilizing a sliding window, modeling semantic collocation of the words, and mapping each word into an N-dimensional floating point vector.
Further, after the sentence or chapter is divided into a continuous word pair set by using the bigram model, the following steps are required:
(1) measuring the matching degree between the title and the text content by using the maximum repeat substring and the semantic similarity, and calculating the maximum repeat substring of the title and the text content by using a Knudt-Morris-pratt algorithm to represent the reproduction degree of the language expression of the title in the text content; the semantic similarity obtains vector representation of the title and the text content by using a weighted average method of word vectors, and calculates cosine similarity between the title and the text content vector to represent the semantic similarity of the title and the text;
(2) the average quality score of each language is calculated by large-scale, multi-language and high-quality text corpora respectively, and the score of each language is balanced by setting a reference value by user self-definition.
Further, the method for determining the matching degree between the title and the text content is to perform distributed representation on the title and the text, and measure the matching degree between the title and the text by calculating the semantic similarity between the title and the text; semantic vectors for the headlines and the body are obtained by calculating a weighted average sum of word vectors for the characters.
Further comprising:
firstly, the characters in the text are subjected to importance ranking by using a textrank algorithm, and a character calculation formula is as follows:
Figure BDA0001545841280000031
wherein d is damping coefficient with value of 0.85 In (W)i) To point to the current wordCharacter set of symbols, Out (W)j) Set of characters, ω, pointed to for the current characterjiIs a co-occurrence weight of two characters; obtaining a text semantic vector by means of weighted average sum, and expressing the text semantic vector as follows by using a formula:
Figure BDA0001545841280000032
and finally, calculating the cosine similarity of the title and the text to obtain the matching degree.
Another object of the present invention is to provide an intelligent text processing system for implementing the multilingual text quality assessment method.
Another object of the present invention is to provide a multilingual text quality evaluation system of the multilingual text quality evaluation method, the multilingual text quality evaluation system including:
the distributed representation acquisition module is used for carrying out distributed representation on the word level through the neural network language model;
the splitting module is used for splitting the sentence or the chapter into a continuous word pair set, calculating the cosine similarity between vectors of two adjacent words in the sentence or the chapter pair by pair to obtain the semantic quality score between the two words, and obtaining the quality scores of a sentence level and a chapter level through an average sum algorithm;
the matching module is used for measuring the matching degree between the title and the text content by utilizing the maximum repeated substring and the semantic similarity;
and the average quality score calculating module is used for calculating the average quality scores of all languages respectively, and setting a reference value by user self-definition to balance the scores of all languages.
Another object of the present invention is to provide an intelligent text processing system using the multilingual text quality-assessment system.
In summary, the advantages and positive effects of the invention are: the invention adopts text quality assessment to monitor the data in real time on the basis of data cleaning. The method can identify the messy codes and the texts with the JS scripts, and can also identify the irrigation texts and calculate the matching degree of the main body of the title; the distribution balance of the multi-language retrieval results obtained in the retrieval process can count the score distribution histogram of each language and perform adjustment and balance.
The intermediate result model trained by the invention is independent in language, and is embedded with a naive Bayes-based language recognition algorithm, so that the intermediate result model can be easily added without changing the original model, and the flexibility of the method is greatly improved.
The invention focuses on text quality evaluation, and sequences the texts in grammar and semantic expression, arranges the high-quality texts in the front, arranges the low-quality texts with messy codes and the like in the back, and performs manual marking evaluation. The method has the advantages that the accuracy rate is over 95 percent, the improvement is about 8 percent compared with the traditional method, and on the other hand, the problem of multi-language balance evaluation which cannot be solved by the prior art is creatively solved.
Drawings
Fig. 1 is a flowchart of a multilingual text quality assessment method according to an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of a multilingual text quality assessment system according to an embodiment of the present invention;
in the figure: 1. a distributed representation acquisition module; 2. splitting the module; 3. a matching module; 4. and an average quality score calculation module.
FIG. 3 is a diagram illustrating statistical distributions in corpus of languages according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention focuses on text quality evaluation, namely, the texts are sequenced on grammatical and semantic expressions, so that the texts with good quality are arranged in front of the texts with messy codes and the like and the texts with poor quality are arranged behind the texts, and the problems of the original algorithm are obviously solved.
As shown in fig. 1, the method for evaluating the quality of a multilingual text according to the embodiment of the present invention includes the following steps:
s101: performing distributed representation of characters on the level of the characters (the words in the alphabetic language) by the text corpus through a neural network language model; capturing context information of the words and the words in sentences by utilizing a sliding window, modeling semantic collocation of the words, and mapping each word into an N-dimensional floating point vector so as to facilitate mathematical operation among the words;
s102: based on a binary grammar model, splitting a sentence or a chapter into a continuous word pair set, calculating cosine similarity between vectors of two adjacent words in the sentence or the chapter pair by pair to obtain semantic quality (grammar and semantic hierarchy) scores between the two words, and obtaining quality scores of a sentence level and a chapter level through an average sum algorithm;
s103: measuring the matching degree between the title and the text content by using the maximum repetition substring and the semantic similarity, and calculating the maximum repetition substring of the title and the text content by using a KMP (Kent-Morris-Praite) algorithm so as to represent the reproduction degree of the linguistic expression of the title in the text content; the semantic similarity obtains vector representation of the title and the text content by using a weighted average method of word vectors, and calculates cosine similarity between the title and the text content vector to represent the semantic similarity of the title and the text;
s104: the average quality score of each language is calculated by large-scale, multi-language and high-quality text corpora respectively, and the score of each language is balanced by setting a reference value by user self-definition, namely the text with good quality is close to the reference value.
The multilingual text quality evaluation method based on the depth characterization provided by the embodiment of the invention is based on a tf-idf algorithm of the depth characterization and a vector calculation method of sentence level and chapter level, and the most basic algorithm of the technical dependence is word vector training. The colloquial description of word vectors is a Distributed Representation (Distributed Representation) method of words, that is, abstract words in natural language are converted into easily-calculated N-dimensional vectors, and deep semantic association contained between words can also be obtained by calculating similarity between word vectors. The existing word vector training method mainly comprises a word2vec model of Google and a global vector model (GloVe) of Stanford, and the word2vec model is adopted in the invention. Because the text quality evaluation needs stronger real-time performance, the invention does not need to carry out preprocessing of algorithms such as word segmentation and the like, and directly trains word vectors on the level of characters (a word in a letter system).
The N-gram model is a probabilistic grammar model based on markov model, which determines the grammatical rationality of a sentence by the probability of the simultaneous occurrence of N consecutive characters in a natural language, most commonly a bigram model, whose basic idea is that the current character is only related to the previous character, and is formulated as:
p(s)=p(w1)p(w2|w1)p(w3|w2)K p(wn|wn-1);
the condition in the above formula can be obtained by counting word frequency, but for text quality evaluation, it is not that the higher the statistical frequency is, the better the quality is, but on the contrary, the higher the statistical frequency is, the smaller the amount of information contained therein is.
The invention uses tfidf algorithm to replace single statistical word frequency, in addition, in order to better represent semantic relation between characters, the invention uses semantic similarity between word vectors of characters to replace word frequency, thus a new formula for calculating conditional probability becomes:
Figure BDA0001545841280000061
combining the above two formulas, the quality score of a sentence can be obtained, and then the quality score of the input document can be obtained by averaging all sentences.
The matching degree of the title and the text is also an important factor influencing the text quality, and the solution of the invention is to respectively perform distributed representation on the title and the text and measure the matching degree between the title and the text by calculating the semantic similarity between the title and the text. The semantic vectors of the title and the text can be obtained by calculating the weighted average sum of word vectors of characters, and the method comprises the following specific steps: firstly, the characters in the text are subjected to importance ranking by using a textrank algorithm, and a character calculation formula is as follows:
Figure BDA0001545841280000071
wherein d is damping coefficient (generally 0.85), nIW: (i) To point to the character set of the current character, Out (W)j) Set of characters, ω, pointed to for the current characterjiIs the co-occurrence weight of two characters. Then, a text semantic vector is obtained by means of weighted average sum, and is expressed by a formula as follows:
Figure BDA0001545841280000072
finally, the matching degree between the title and the text is obtained by calculating the cosine similarity of the title and the text.
As shown in fig. 2, the multilingual text quality evaluation system provided in the embodiment of the present invention includes:
the distributed representation acquisition module 1 is used for performing distributed representation of the words on the word (word in the alphabetical language) level through the neural network language model.
The splitting module 2 is configured to split the sentence or the chapter into a continuous word pair set, calculate cosine similarity between vectors of two adjacent words in the sentence or the chapter pair by pair, obtain a semantic quality score between the two words, and obtain a sentence-level quality score and a chapter-level quality score through an average sum algorithm.
And the matching module 3 is used for measuring the matching degree between the title and the text content by utilizing the maximum repeated substrings and the semantic similarity.
And the average quality score calculating module 4 is used for calculating the average quality scores of all languages respectively, and setting a reference value by user self-definition to balance the scores of all languages.
The model training of the invention is language independent, and the calculated quality score is language independent, namely the condition of unbalanced quality score distribution in different languages may occur. In order to observe the imbalance condition, the invention counts the distribution condition in the training corpus of each language, as shown in fig. 2: as can be seen in fig. 2, each language distributes a severe imbalance in the quality score. In order to balance the distribution of the scores of the languages, the invention takes the average score of each language as a reference value based on the statistical result, and weights the quality score of each language, thereby minimizing the imbalance.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A multilingual text quality assessment method is characterized in that a sentence or chapter is divided into a continuous word pair set by adopting a bigram model, cosine similarity between vectors of two adjacent words in the sentence is calculated pair by pair to obtain semantic quality scores between the two words, and the quality scores of the sentence level and the chapter level are obtained through an average sum algorithm;
the quality score is calculated by the following method:
the bigram model is that the current character is only related to the previous character, and is expressed by the formula:
p(s)=p(w1)p(w2|w1)p(w3|w2)K p(wn|wn-1);
the condition is obtained by counting word frequency;
using semantic similarity between word vectors of characters instead of word frequency, the new formula for calculating conditional probability becomes:
Figure FDA0003080782440000011
combining the two formulas to obtain the mass fraction of a sentence; the quality score of the input document can be obtained by averaging all sentences.
2. The multilingual text quality-assessment method of claim 1, wherein the splitting of a sentence or chapter into a set of consecutive word pairs using a bigram model requires: performing distributed representation of characters on the text corpus at a character level through a neural network language model; capturing context information of the words and the words in sentences by utilizing a sliding window, modeling semantic collocation of the words, and mapping each word into an N-dimensional floating point vector.
3. The multilingual text quality-assessment method of claim 1, wherein said splitting a sentence or chapter into a set of consecutive word pairs using a bigram model entails:
(1) measuring the matching degree between the title and the text content by using the maximum repeat substring and the semantic similarity, and calculating the maximum repeat substring of the title and the text content by using a Knudt-Morris-pratt algorithm to represent the reproduction degree of the language expression of the title in the text content; the semantic similarity obtains vector representation of the title and the text content by using a weighted average method of word vectors, and calculates cosine similarity between the title and the text content vector to represent the semantic similarity of the title and the text;
(2) the average quality score of each language is calculated by large-scale, multi-language and high-quality text corpora respectively, and the score of each language is balanced by setting a reference value by user self-definition.
4. The multilingual text quality-assessment method of claim 3, wherein said degree of matching between the headline and body contents is determined by distributively characterizing the headline and body, and measuring the degree of matching between the headline and body by calculating semantic similarity therebetween; semantic vectors for the headlines and the body are obtained by calculating a weighted average sum of word vectors for the characters.
5. The multilingual text quality-assessment method of claim 4, further comprising:
firstly, the characters in the text are subjected to importance ranking by using a textrank algorithm, and a character calculation formula is as follows:
Figure FDA0003080782440000021
wherein d is damping coefficient with value of 0.85 In (W)i) To point to the character set of the current character, Out (W)j) Set of characters, ω, pointed to for the current characterjiIs a co-occurrence weight of two characters; obtaining a text semantic vector by means of weighted average sum, and expressing the text semantic vector as follows by using a formula:
Figure FDA0003080782440000022
and finally, calculating the cosine similarity of the title and the text to obtain the matching degree.
6. An intelligent text processing system for implementing the multilingual text-quality-assessment method according to any one of claims 1 to 5.
7. A multilingual text quality-evaluating system of the multilingual text quality-evaluating method of claim 1, wherein the multilingual text quality-evaluating system comprises:
the distributed representation acquisition module is used for carrying out distributed representation on the word level through the neural network language model;
the splitting module is used for splitting the sentence or the chapter into a continuous word pair set, calculating the cosine similarity between vectors of two adjacent words in the sentence or the chapter pair by pair to obtain the semantic quality score between the two words, and obtaining the quality scores of a sentence level and a chapter level through an average sum algorithm; the quality score is calculated by the following method:
the bigram model is that the current character is only related to the previous character, and is expressed by the formula:
p(s)=p(w1)p(w2|w1)p(w3|w2)K p(wn|wn-1);
the condition is obtained by counting word frequency;
using semantic similarity between word vectors of characters instead of word frequency, the new formula for calculating conditional probability becomes:
Figure FDA0003080782440000031
combining the two formulas to obtain the mass fraction of a sentence; the quality score of the input document can be obtained by averaging all sentences;
the matching module is used for measuring the matching degree between the title and the text content by utilizing the maximum repeated substring and the semantic similarity;
and the average quality score calculating module is used for calculating the average quality scores of all languages respectively, and setting a reference value by user self-definition to balance the scores of all languages.
8. An intelligent text processing system to which the multilingual text-quality-evaluating system of claim 7 is applied.
CN201810028932.5A 2018-01-12 2018-01-12 Multi-language text quality evaluation method and system and intelligent text processing system Active CN108073571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810028932.5A CN108073571B (en) 2018-01-12 2018-01-12 Multi-language text quality evaluation method and system and intelligent text processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810028932.5A CN108073571B (en) 2018-01-12 2018-01-12 Multi-language text quality evaluation method and system and intelligent text processing system

Publications (2)

Publication Number Publication Date
CN108073571A CN108073571A (en) 2018-05-25
CN108073571B true CN108073571B (en) 2021-08-13

Family

ID=62156722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810028932.5A Active CN108073571B (en) 2018-01-12 2018-01-12 Multi-language text quality evaluation method and system and intelligent text processing system

Country Status (1)

Country Link
CN (1) CN108073571B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN110110969A (en) * 2019-04-10 2019-08-09 中国科学院国家空间科学中心 A kind of space environment forecast product gross examines appraisal procedure and system automatically
CN110232180B (en) * 2019-06-06 2020-11-03 北京师范大学 Automatic proposition method and system for ancient poetry evaluation
CN111061870B (en) * 2019-11-25 2023-06-06 腾讯科技(深圳)有限公司 Article quality evaluation method and device
CN113407663B (en) * 2020-11-05 2024-03-15 腾讯科技(深圳)有限公司 Image-text content quality identification method and device based on artificial intelligence
CN116136839B (en) * 2023-04-17 2023-06-23 湖南正宇软件技术开发有限公司 Method, system and related equipment for generating legal document face manuscript

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104516870A (en) * 2013-09-29 2015-04-15 北大方正集团有限公司 Translation check method and system
CN104699763A (en) * 2015-02-11 2015-06-10 中国科学院新疆理化技术研究所 Text similarity measuring system based on multi-feature fusion
CN106484678A (en) * 2016-10-13 2017-03-08 北京智能管家科技有限公司 A kind of short text similarity calculating method and device
CN107193803A (en) * 2017-05-26 2017-09-22 北京东方科诺科技发展有限公司 A kind of particular task text key word extracting method based on semanteme

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130103695A1 (en) * 2011-10-21 2013-04-25 Microsoft Corporation Machine translation detection in web-scraped parallel corpora

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104516870A (en) * 2013-09-29 2015-04-15 北大方正集团有限公司 Translation check method and system
CN104699763A (en) * 2015-02-11 2015-06-10 中国科学院新疆理化技术研究所 Text similarity measuring system based on multi-feature fusion
CN106484678A (en) * 2016-10-13 2017-03-08 北京智能管家科技有限公司 A kind of short text similarity calculating method and device
CN107193803A (en) * 2017-05-26 2017-09-22 北京东方科诺科技发展有限公司 A kind of particular task text key word extracting method based on semanteme

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于二元文法模型的汉语句子相似度计算;郜炎峰;王硕宁;《中国科技信息》;20160630(第13期);全文 *
基于内容相似度的文摘自动评测方法及其有效性分析;张姝等;《高技术通讯》;20060331;第16卷(第3期);全文 *
基于语义相似度的中文文本相似度算法研究;金希茜;《信息科技辑》;20111231;全文 *

Also Published As

Publication number Publication date
CN108073571A (en) 2018-05-25

Similar Documents

Publication Publication Date Title
CN108073571B (en) Multi-language text quality evaluation method and system and intelligent text processing system
CN107451126B (en) Method and system for screening similar meaning words
CN106156204B (en) Text label extraction method and device
CN111125349A (en) Graph model text abstract generation method based on word frequency and semantics
CN107180025B (en) Method and device for identifying new words
CN111143549A (en) Method for public sentiment emotion evolution based on theme
CN109960786A (en) Chinese Measurement of word similarity based on convergence strategy
CN107145560B (en) Text classification method and device
CN106610951A (en) Improved text similarity solving algorithm based on semantic analysis
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN114065758A (en) Document keyword extraction method based on hypergraph random walk
CN110705291A (en) Word segmentation method and system for documents in ideological and political education field based on unsupervised learning
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
CN111626042A (en) Reference resolution method and device
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
Liu et al. Extract Product Features in Chinese Web for Opinion Mining.
CN109298796B (en) Word association method and device
CN109710762B (en) Short text clustering method integrating multiple feature weights
CN112182159B (en) Personalized search type dialogue method and system based on semantic representation
Ye et al. A sentiment based non-factoid question-answering framework
CN110765762B (en) System and method for extracting optimal theme of online comment text under big data background
CN116932736A (en) Patent recommendation method based on combination of user requirements and inverted list

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant