CN114153967A - Public opinion classification optimization method for long text - Google Patents

Public opinion classification optimization method for long text Download PDF

Info

Publication number
CN114153967A
CN114153967A CN202111060615.XA CN202111060615A CN114153967A CN 114153967 A CN114153967 A CN 114153967A CN 202111060615 A CN202111060615 A CN 202111060615A CN 114153967 A CN114153967 A CN 114153967A
Authority
CN
China
Prior art keywords
public opinion
text
character
public
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111060615.XA
Other languages
Chinese (zh)
Inventor
唐亮
曹特磊
赵伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Social Touch Beijing Technology Co ltd
Original Assignee
Social Touch Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Social Touch Beijing Technology Co ltd filed Critical Social Touch Beijing Technology Co ltd
Priority to CN202111060615.XA priority Critical patent/CN114153967A/en
Publication of CN114153967A publication Critical patent/CN114153967A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a public opinion classification optimization method for long texts, which comprises the following steps: a. carrying out public opinion judgment on an input text by using a traditional bert fine-tuned model, and judging whether the length of the text exceeds a set length threshold or not for the text judged to be neutral public opinion; b. if the judgment result does not exceed the preset judgment result, maintaining the original public opinion judgment result, and if the judgment result exceeds the preset judgment result, performing more detailed public opinion analysis; c. and simultaneously and respectively sending the current text to the pre-trained and post-fine-tuned bert models to obtain semantic vectors of each character in the current text before and after fine tuning. In the application, the change of character semantics of a bert model before and after fine adjustment is utilized, and the method is applied to a public opinion classification task aiming at long texts; by identifying the text segments with public opinion tendencies, the probability that the whole text segments are judged to be neutral is reduced, and the detailed public opinion tendencies of the users are better identified.

Description

Public opinion classification optimization method for long text
Technical Field
The invention relates to the technical field of text public opinion classification, in particular to a public opinion classification optimization method for long texts.
Background
When the public sentiment classification is carried out on texts with more characters and longer space, the commonly used bert model in the industry at present often gives a 'neutral' judgment. This is because, on the one hand, in the long content, most paragraphs are actually objective statements with neutral tendency, and a small amount of text segments expressing public opinion tendency are merely mixed in, and are not easy to find even if being read manually; on the other hand, when classifying text opinions, the bert model simply gives a judgment of the opinion tendency from the whole text, and may be considered as a weighted average of the opinion tendencies of the whole text, and the longer the text is, the more likely the probability of the judgment being positive or negative is lowered. Therefore, when the long texts are classified, important public sentiment fragments carried in the long texts are ignored, and the overall neutral public sentiment judgment is given.
Specifically, the bert model is a massive text data and a large-scale computing server cluster accumulated by large companies such as google, Tencent, Huashi and the like, and learns what context each character usually appears in, namely the semantic meaning of each character (represented by a floating point number vector of hundreds of dimensions) by constructing a labeled training sample (namely randomly 'covering' a certain character from a section of text, using an original correct character as a target value of a positive sample, and using other randomly selected characters as target values of a negative sample) and predicting the covered real character by using a multi-layer semantic vector model (deep learning model). The trained model is called a pre-trained model, generally comprises thousands to tens of thousands of common characters, represents the semantic meaning by hundreds of dimensions of floating point vectors, and generally supports the superposition depth of 12 layers at most.
The pre-training model is used as a basic model of downstream natural language tasks (text classification, named entity recognition, relation extraction, file creation and the like). The downstream natural language tasks utilize the own and a small amount of training samples to learn the context environment, the character semantic relationship and the like of the current task by 'fine tuning' (fine tuning) the pre-training model, namely adjusting the semantic vector value of each character (combination) through the prediction error. And mapping the semantic vectors derived from the bert (fine tuning) model into the solution space of the target problem by adding additional network connection layers on the basis of the bert model (e.g. adding network layers of positive, negative and middle three outputs in the public opinion classification task). And judging three classified public opinions, namely positive, negative and middle three public opinion categories, and obtaining corresponding three probability values of values in [0,1 ]. The category with the largest probability value is the public opinion judgment of the current input text, and the value can be regarded as the probability or confidence of the current category.
Here, the bert model, including pre-training and fine-tuning, the data format of the output semantic vector (array) is generally: the first vector is a semantic vector of the whole input article (which is also a semantic vector commonly used by a downstream public opinion classification task), and the subsequent N vectors are semantic vectors of each character of the current input article (including unknown characters, placeholder characters and the like). The dimensions of the N +1 semantic vectors are consistent, and are generally 768 dimensions.
Experiments show that in the application of text public sentiment classification, after fine adjustment of a small number of public sentiment samples, semantic vectors of characters (combinations) with public sentiment tendency have obvious change compared with those before fine adjustment (pre-training models) in the last N characters (excluding unknown and space-occupying nonsensical characters and the like), namely, the vector distance has larger change compared with other characters (combinations) without public sentiment tendency.
By the regularity phenomenon, the method is applied to a public opinion classification task aiming at long texts, and text segments with large semantic vector distance change before and after fine adjustment are extracted from the long texts and serve as text segments for expressing the public opinion tendency of an author, so that the overall neutral judgment caused by excessive characters is avoided, and the public opinion information of a user is ignored.
Disclosure of Invention
The invention aims to: in order to solve the problems, a public opinion classification optimization method of a long text is provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
the public opinion classification optimization method for the long text comprises the following steps:
a. carrying out public opinion judgment on an input text by using a traditional bert fine-tuned model, and judging whether the length of the text exceeds a set length threshold or not for the text judged to be neutral public opinion;
b. if the judgment result does not exceed the preset judgment result, maintaining the original public opinion judgment result, and if the judgment result exceeds the preset judgment result, performing more detailed public opinion analysis;
c. simultaneously and respectively sending the current text to the pre-trained and post-fine-tuned bert models to obtain semantic vectors of each character in the current text before and after fine tuning;
d. comparing and finding out characters with large distance change of semantic vectors, namely characters with public opinion tendency;
e. extracting characters which are adjacent to the public opinion characters in position and have a short semantic distance according to the semantic vector of the finely adjusted model so as to extract text segments with public opinion tendency and complete semantics;
f. carrying out public opinion classification on the extracted public opinion fragments by using a fine-tuned public opinion model;
g. and (4) combining the original text length and the original public opinion score of the full text to give final public opinion judgment information.
Preferably, the threshold value in the step a is 300.
Preferably, the process in the step d is as follows:
traversing each character of the input text one by one, respectively taking out semantic vectors of the current character after pre-training and fine-tuning, calculating the cosine distance of the two vectors, comparing with the calculation value of the formula 1, if the value is smaller than the value, determining that the current character has larger semantic change before and after fine-tuning, and determining that the character has public opinion tendency; otherwise, it is regarded as having no public sentiment tendency, formula 1: 1-1/log (N/m) where N is the number of characters of the current text; m is a coefficient, the sensitivity to semantic distance change is adjusted, the current setting is 4, and characters with larger semantic distance change and position indexes of the characters in the text are extracted.
Preferably, the process in the step e is as follows:
d, respectively expanding the public sentiment characters extracted from the step d and the positions of the public sentiment characters in the original text to the left end and the right end; for public opinion character strings with connected positions, respectively traversing and expanding from the first and last character positions of the character string to the left and right, judging whether the character is a punctuation mark or other stop characters for newly traversed characters, and if so, stopping traversing and expanding on the side; if the traversal length of the current side exceeds the set threshold value of the traversal length, stopping traversal; otherwise, calculating the semantic distance between the newly traversed character and the adjacent character in the public opinion segment, wherein only the finely adjusted semantic vector is needed, and the cosine distance of the vector is also used, so that whether the newly traversed character should be added into the current public opinion segment can be judged by using a fixed distance threshold, the semantic distance threshold of the adjacent character is set to be 0.75 at present, if the cosine similarity of the semantic is more than 0.75, the newly traversed character is considered to be similar to the meaning of the adjacent public opinion segment character, or the characters are often combined together, and the newly traversed character should be added into the finally extracted public opinion segment as a fixed collocation; otherwise, the character is considered to belong to another semantic segment, is irrelevant to the meaning of the currently extracted public opinion segment, is excluded from the current public opinion segment, and the traversal expansion of the side is stopped.
Preferably, the process in the g step is:
if the original three-classification public sentiment of the original text takes on the value (Pn, Pm, Pp), wherein Pn, Pm, Pp are probabilities of being judged to be negative, neutral and positive respectively. If the original public sentiment of the original text is judged to be neutral and is a long text, the public sentiment values of k public sentiment fragments extracted in the step e are (Pni, Pmi, Ppi), wherein Pni, Pmi and Ppi are probabilities that the judgment of the ith public sentiment fragment is negative, neutral and positive respectively, and the value range [1, k ] of i;
and the k extracted public opinion segments have public opinion values weighted and accumulated according to length as follows:
Figure RE-GDA0003461806430000041
pns, Pms and Pps are respectively negative, middle and positive public sentiment values of k public sentiment fragments after weighted accumulation, Li is the character length of the current public sentiment fragment, and N is the character length of the original text;
and then accumulating the public sentiment into the original public sentiment value of the original text to obtain:
(Pnr,Pmr,Ppr)=(Pn+Pns,Pm+Pms,Pp+Pps)
in order to unify values in the range of [0,1], the values can be normalized as follows:
(Pn,Pm,Pp)=(Pnr/(Pnr+Pmr+Ppr),Pmr/(Pnr+Pmr+Ppr),Ppr/(Pnr+Pmr+P pr))。
in summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
in the application, the change of character semantics of a bert model before and after fine adjustment is utilized, and the method is applied to a public opinion classification task aiming at long texts; by identifying the text segments with public opinion tendencies, the probability that the whole text segments are judged to be neutral is reduced, and the detailed public opinion tendencies of the users are better identified.
Drawings
FIG. 1 is a schematic flow chart of step a provided in accordance with an embodiment of the present invention;
fig. 2 shows a schematic flow chart of steps b-g provided according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution:
the public opinion classification optimization method for the long text comprises the following steps:
a. carrying out public opinion judgment on an input text by using a traditional bert fine-tuned model, judging whether the length of the text is beyond a set length threshold value, wherein the threshold value is 300, namely whether the text has the length of 300 characters;
b. if the judgment result does not exceed the preset judgment result, maintaining the original public opinion judgment result, and if the judgment result exceeds the preset judgment result, performing more detailed public opinion analysis;
c. simultaneously and respectively sending the current text to the pre-trained and post-fine-tuned bert models to obtain semantic vectors of each character in the current text before and after fine tuning;
d. the characters with large distance change of semantic vectors, namely the characters with public opinion tendency are found out through comparison,
and d, traversing each character of the input text one by one, respectively taking out semantic vectors of the current character after pre-training and fine-tuning, and calculating cosine distances of the two vectors (the value range is [0,1], the more the value range is closer to 1, the more the value range is similar, the more the value range is closer to 0, the more the value range is dissimilar). Comparing with the calculated value of the formula 1, if the value is smaller than the calculated value, determining that the current character has larger semantic change before and after fine adjustment, and determining that the character has public opinion tendency; otherwise, it is regarded as having no public sentiment tendency, formula 1: 1-1/log (N/m) where N is the number of characters of the current text; m is a coefficient, the sensitivity to semantic distance change is adjusted, the current setting is 4, characters with large semantic distance change are extracted, and the characters and the position indexes of the characters in the original text are indexed, so that the characters are combined according to the positions of the characters and expanded to two ends, and text segments with complete semantics are obtained;
e. extracting characters which are adjacent to the public opinion characters in position and have a short semantic distance according to the semantic vector of the finely adjusted model so as to extract text segments with public opinion tendency and complete semantics;
d, respectively expanding the public sentiment characters extracted from the step d and the positions of the public sentiment characters in the original text to the left end and the right end; traversing and expanding public opinion character strings with the positions connected together from the first and last character positions of the character strings to the left and right respectively;
for the newly traversed character, judging whether the character is a punctuation mark or other stop characters, and if so, stopping the traversal expansion of the side; if the traversal length of the current side exceeds a set traversal length threshold (for example, 8), stopping traversal, otherwise, calculating the semantic distance between the currently newly traversed character and the character which is already in the public opinion segment and is adjacent;
here, only the semantic vector after fine tuning is used, and the vector cosine distance is also used. Whether the newly traversed character is added into the current public opinion segment can be judged by using a fixed distance threshold, and the semantic distance threshold of the current adjacent character is set to be 0.75;
if the cosine similarity of the semantics is more than 0.75, the newly traversed character is considered to be similar to the meanings of the characters of the adjacent public opinion segments, or the characters are often combined together to appear, and the characters are added into the finally extracted public opinion segments as fixed collocation; otherwise, the character is considered to belong to another semantic segment, is irrelevant to the meaning of the currently extracted public opinion segment, is excluded from the current public opinion segment, and the traversal expansion of the side is stopped.
f. Carrying out public opinion classification on the extracted public opinion fragments by using a fine-tuned public opinion model;
g. combining the original text length and the original public opinion score of the full text to give final public opinion judgment information;
if the original three-classification public sentiment of the original text takes on the value (Pn, Pm, Pp), wherein Pn, Pm, Pp are probabilities of being judged to be negative, neutral and positive respectively. If the original public sentiment of the original text is judged to be neutral and is a long text, the public sentiment values of k (k > ═ 0) public sentiment segments extracted in the step e are (Pni, Pmi, Ppi), wherein Pni, Pmi and Ppi are the probabilities of the i-th public sentiment segment which is judged to be negative, neutral and positive respectively. The value range of i [1, k ].
And the k extracted public opinion segments have public opinion values weighted and accumulated according to length as follows:
Figure RE-GDA0003461806430000071
pns, Pms and Pps are respectively negative, middle and positive public sentiment values of k public sentiment segments after weighted accumulation, Li is the character length of the current public sentiment segment, and N is the character length of the original text.
And then accumulating the public sentiment into the original public sentiment value of the original text to obtain:
(Pnr,Pmr,Ppr)=(Pn+Pns,Pm+Pms,Pp+Pps)
in order to unify values in the range of [0,1], the values can be normalized as follows:
(Pn,Pm,Pp)=(Pnr/(Pnr+Pmr+Ppr),Pmr/(Pnr+Pmr+Ppr),Ppr/(Pnr+Pmr+P pr))。
experimental analysis:
through a data comparison experiment, in public opinion classification aiming at long texts, the strategy can better give an overall public opinion value with higher discrimination by analyzing public opinion fragments contained in the public opinion classification; and by returning the specific public opinion segment contained in the public opinion segment, the public opinion tendency and other information expressed by the user can be better identified.
An example of partial data (length truncation) is as follows:
Figure RE-GDA0003461806430000081
Figure RE-GDA0003461806430000091
Figure RE-GDA0003461806430000101
Figure RE-GDA0003461806430000111
it can be seen that the original opinion of the original text in the above example is more neutral (the value judged to be neutral is larger); however, after the detailed public opinion segments are extracted by the method and the public opinion values of the public opinion segments are weighted and accumulated into the final public opinion result, the overall public opinion value and the judgment result have more obvious public opinion tendency (the value in the positive direction or the negative direction is larger). And moreover, according to the extracted public opinion fragments, the public opinion tendency of user details can be better mined, and the result dimension of data insight is enriched.
The previous description of the embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (5)

1. The public opinion classification optimization method for the long text is characterized by comprising the following steps of:
a. carrying out public opinion judgment on an input text by using a traditional bert fine-tuned model, and judging whether the length of the text exceeds a set length threshold or not for the text judged to be neutral public opinion;
b. if the judgment result does not exceed the preset judgment result, maintaining the original public opinion judgment result, and if the judgment result exceeds the preset judgment result, performing more detailed public opinion analysis;
c. simultaneously and respectively sending the current text to the pre-trained and post-fine-tuned bert models to obtain semantic vectors of each character in the current text before and after fine tuning;
d. comparing and finding out characters with large distance change of semantic vectors, namely characters with public opinion tendency;
e. extracting characters which are adjacent to the public opinion characters in position and have a short semantic distance according to the semantic vector of the finely adjusted model so as to extract text segments with public opinion tendency and complete semantics;
f. carrying out public opinion classification on the extracted public opinion fragments by using a fine-tuned public opinion model;
g. and (4) combining the original text length and the original public opinion score of the full text to give final public opinion judgment information.
2. The method as claimed in claim 1, wherein the threshold in the step a is 300.
3. The method for optimizing the public opinion classification of the long text according to claim 1, wherein the process in the step d is as follows: traversing each character of the input text one by one, respectively taking out semantic vectors of the current character after pre-training and fine-tuning, calculating the cosine distance of the two vectors, comparing with the calculation value of formula 1, if the value is smaller than the value, determining that the current character has larger semantic change before and after fine-tuning, judging that the character has public opinion tendency, otherwise, determining that the character has no public opinion tendency;
the formula 1 is specifically: 1-1/log (N/m) where N is the number of characters of the current text; m is a coefficient, the sensitivity to semantic distance change is adjusted, the current setting is 4, and characters with larger semantic distance change and position indexes of the characters in the text are extracted.
4. The method as claimed in claim 1, wherein the process in the step e is as follows:
d, respectively expanding the public sentiment characters extracted from the step d and the positions of the public sentiment characters in the original text to the left end and the right end; for public opinion character strings with connected positions, respectively traversing and expanding from the first and last character positions of the character string to the left and right, judging whether the character is a punctuation mark or other stop characters for newly traversed characters, and if so, stopping traversing and expanding on the side;
if the traversal length of the current side exceeds the set threshold value of the traversal length, stopping traversal; otherwise, calculating the semantic distance between the newly traversed character and the adjacent character in the public opinion segment, and judging whether the newly traversed character should be added into the current public opinion segment by using a fixed distance threshold value by using the vector cosine distance;
setting the semantic distance threshold of the adjacent characters to be 0.75, and if the cosine similarity of the semantics is greater than 0.75, considering that the newly traversed characters are similar to the meanings of the adjacent public opinion segment characters, or frequently combined together, and adding the characters as fixed collocation into the finally extracted public opinion segment; otherwise, the character is considered to belong to another semantic segment, is irrelevant to the meaning of the currently extracted public opinion segment, is excluded from the current public opinion segment, and the traversal expansion of the side is stopped.
5. The method for optimizing the public opinion classification of the long text according to claim 1, wherein the process in the step g is as follows:
if the original three-classification public sentiment of the original text takes on the value of (Pn, Pm, Pp), wherein Pn, Pm, Pp are probabilities of being judged to be negative, neutral and positive respectively; if the original public sentiment of the original text is judged to be neutral and is a long text, the public sentiment values of k public sentiment fragments extracted in the step e are (Pni, Pmi, Ppi), wherein Pni, Pmi and Ppi are probabilities that the judgment of the ith public sentiment fragment is negative, neutral and positive respectively, and the value range [1, k ] of i;
and the k extracted public opinion segments have public opinion values weighted and accumulated according to length as follows:
Figure FDA0003256363860000031
pns, Pms and Pps are respectively negative, middle and positive public sentiment values of k public sentiment fragments after weighted accumulation, Li is the character length of the current public sentiment fragment, and N is the character length of the original text;
and then accumulating the public sentiment into the original public sentiment value of the original text to obtain:
(Pnr,Pmr,Ppr)=(Pn+Pns,Pm+Pms,Pp+Pps)
in order to unify values in the range of [0,1], the values can be normalized as follows:
(Pn,Pm,Pp)=(Pnr/(Pnr+Pmr+Ppr),Pmr/(Pnr+Pmr+Ppr),Ppr/(Pnr+Pmr+Ppr))。
CN202111060615.XA 2021-09-10 2021-09-10 Public opinion classification optimization method for long text Pending CN114153967A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111060615.XA CN114153967A (en) 2021-09-10 2021-09-10 Public opinion classification optimization method for long text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111060615.XA CN114153967A (en) 2021-09-10 2021-09-10 Public opinion classification optimization method for long text

Publications (1)

Publication Number Publication Date
CN114153967A true CN114153967A (en) 2022-03-08

Family

ID=80462796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111060615.XA Pending CN114153967A (en) 2021-09-10 2021-09-10 Public opinion classification optimization method for long text

Country Status (1)

Country Link
CN (1) CN114153967A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073646A (en) * 2009-11-23 2011-05-25 北京科技大学 Blog group-oriented subject propensity processing method and system
CN108959268A (en) * 2018-07-20 2018-12-07 科大讯飞股份有限公司 A kind of text emotion analysis method and device
CN111539212A (en) * 2020-04-13 2020-08-14 腾讯科技(武汉)有限公司 Text information processing method and device, storage medium and electronic equipment
CN111984793A (en) * 2020-09-03 2020-11-24 平安国际智慧城市科技股份有限公司 Text emotion classification model training method and device, computer equipment and medium
CN112307771A (en) * 2020-10-29 2021-02-02 平安科技(深圳)有限公司 Course analysis method, device, equipment and medium based on emotion analysis
US20210192141A1 (en) * 2019-12-20 2021-06-24 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating vector representation of text, and related computer device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073646A (en) * 2009-11-23 2011-05-25 北京科技大学 Blog group-oriented subject propensity processing method and system
CN108959268A (en) * 2018-07-20 2018-12-07 科大讯飞股份有限公司 A kind of text emotion analysis method and device
US20210192141A1 (en) * 2019-12-20 2021-06-24 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating vector representation of text, and related computer device
CN111539212A (en) * 2020-04-13 2020-08-14 腾讯科技(武汉)有限公司 Text information processing method and device, storage medium and electronic equipment
CN111984793A (en) * 2020-09-03 2020-11-24 平安国际智慧城市科技股份有限公司 Text emotion classification model training method and device, computer equipment and medium
CN112307771A (en) * 2020-10-29 2021-02-02 平安科技(深圳)有限公司 Course analysis method, device, equipment and medium based on emotion analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨玮祺;杜晔;: "基于预训练模型的文本分类网络TextCGA", 现代计算机, no. 12, 25 April 2020 (2020-04-25) *
王昆;郑毅;方书雅;刘守印;: "基于文本筛选和改进BERT的长文本方面级情感分析", 计算机应用, no. 10, 8 June 2020 (2020-06-08) *

Similar Documents

Publication Publication Date Title
WO2023065544A1 (en) Intention classification method and apparatus, electronic device, and computer-readable storage medium
CN106372061B (en) Short text similarity calculation method based on semantics
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
CN108984526A (en) A kind of document subject matter vector abstracting method based on deep learning
CN107562772B (en) Event extraction method, device, system and storage medium
CN108027814B (en) Stop word recognition method and device
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN103678684A (en) Chinese word segmentation method based on navigation information retrieval
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
WO2004066090A2 (en) Query string matching method and apparatus
CN115630640B (en) Intelligent writing method, device, equipment and medium
CN110083832B (en) Article reprint relation identification method, device, equipment and readable storage medium
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN109492230B (en) Method for extracting insurance contract key information based on interested text field convolutional neural network
WO2020232898A1 (en) Text classification method and apparatus, electronic device and computer non-volatile readable storage medium
CN108614897B (en) Content diversification searching method for natural language
CN112069312B (en) Text classification method based on entity recognition and electronic device
WO2023065642A1 (en) Corpus screening method, intention recognition model optimization method, device, and storage medium
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN114997288A (en) Design resource association method
CN107451116B (en) Statistical analysis method for mobile application endogenous big data
CN117113982A (en) Big data topic analysis method based on embedded model
CN104794209A (en) Chinese microblog sentiment classification method and system based on Markov logic network
CN110569355A (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination