CN117150024A - Cross-domain fine granularity emotion analysis method, system, equipment and storage medium - Google Patents

Cross-domain fine granularity emotion analysis method, system, equipment and storage medium Download PDF

Info

Publication number
CN117150024A
CN117150024A CN202311409024.8A CN202311409024A CN117150024A CN 117150024 A CN117150024 A CN 117150024A CN 202311409024 A CN202311409024 A CN 202311409024A CN 117150024 A CN117150024 A CN 117150024A
Authority
CN
China
Prior art keywords
domain
attribute
data set
target
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311409024.8A
Other languages
Chinese (zh)
Inventor
王志强
鞠磊
宫永广
刘飚
吕修伟
文津
张珂
于欣月
罗乐琦
薛培阳
余酋龙
庞舒方
倪安发
张颖
陈旭东
肖子龙
周武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE
Original Assignee
BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE filed Critical BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE
Priority to CN202311409024.8A priority Critical patent/CN117150024A/en
Publication of CN117150024A publication Critical patent/CN117150024A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a cross-domain fine granularity emotion analysis method, a system, equipment and a storage medium, which relate to the technical field of big data, wherein the method comprises the following steps: extracting a source domain specific attribute data set and a target domain specific attribute data set according to the labeled source domain comment data set and the unlabeled target domain comment data set; generating field independent sentence structure set data according to the field comment data set with the label source and the source field specific attribute data set; generating a domain comment data set with a labeling target according to the domain-independent sentence structure set data and the pre-training language model; optimizing the pre-training language model according to the unlabeled target field comment data set and the field comment data set with the labeled target to obtain a fine-grained emotion analysis model; and carrying out fine-granularity emotion analysis on the attribute data set specific to the target field according to the fine-granularity emotion analysis model to obtain the comment of the target field. According to the method, when a large amount of fine-grained label data is not available, the target field comment with the dependency relationship is generated.

Description

Cross-domain fine granularity emotion analysis method, system, equipment and storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a cross-domain fine granularity emotion analysis method, a system, equipment and a storage medium.
Background
In recent years, attribute-level emotion-analysis (ABSA) has received increasing attention, and as deep learning progresses, a number of unsupervised domain-adaptive (Domain Adaptation) methods have been proposed, which aim to learn domain-sharing knowledge from source domain with fine-grained tagged data and apply the learned knowledge to target domain with untagged data, and these methods can be divided into two types: one focuses on learning feature-based domain sharing knowledge, another is to re-weight instances in the source domain and use them in the target domain.
Because of the challenges in fine-grained adaptation, only a few studies have explored the domain adaptation of ABSA, however, these studies still follow the two approaches described above and have common limitations, e.g., the supervision-related knowledge of the primary task comes only from fine-grained label data in the source domain, so these approaches still have limitations that require large amounts of label data with fine granularity, which are scarce in many new domains, while the process of generating target domain comments is simultaneous and one-time, resulting in independent and isolated relationships between words in the target domain comments, and thus the generated sentences do not represent well the dependency relationships between the target domain comment words.
Disclosure of Invention
The problem to be solved by the invention is how to generate target domain comments with dependency relations between words without a large amount of tag data with fine granularity.
In order to solve the problems, the invention provides a cross-domain fine granularity emotion analysis method, which comprises the following steps:
acquiring a field comment data set with a labeling source and an unlabeled target field comment data set, extracting the field comment data set with the labeling source and the unlabeled target field comment data set to obtain a source field specific attribute data set and a target field specific attribute data set, and acquiring a target field comment data test set according to the unlabeled target field comment data set;
generating domain independent sentence structure collection data according to the domain comment data set with the label source, the target domain specific attribute data set and the source domain specific attribute data set;
generating a field comment data set with a labeling target according to the field irrelevant sentence structure set data and a pre-training language model;
optimizing the pre-training language model according to the unlabeled target field comment data set and the labeled target field comment data set to obtain a fine-grained emotion analysis model;
And carrying out fine-granularity emotion analysis on the target field comment data test set according to the fine-granularity emotion analysis model to obtain target field comments.
Compared with the prior art: compared with the prior art, the method and the device have the advantages that the problem of lack of fine granularity labeling data is fundamentally solved, dependence on the fine granularity labeling data is reduced, the field irrelevant sentence structure aggregate data is generated according to the field comment data set with the labeling source, the target field peculiar attribute data set and the source field peculiar attribute data set, the comment data set with the labeling target field is generated by combining a pre-training language model, the comment data set with labeling information attached to the generated target field is directly used as a target, the problem of scarcity of fine granularity label data is solved, the pre-training language model is optimized to obtain a fine granularity emotion analysis model, the dependence relationship between generated words is not independent and independent words and comments is improved through optimization, meanwhile, the target field comment data set is separated according to the field comment data set with the labeling source, the comment data set with the labeling target field peculiar attribute data set is generated according to the pre-training language model, the comment data set with labeling information attached to the generated in the target field is directly used as a target, the problem of fine granularity label data is solved, the problem of fine granularity emotion analysis model is obtained, the generated by optimizing the pre-training language model is achieved, the dependence relationship between words and the target field comments is not independent words and the word-independent, and the word comment is accurately analyzed according to the method is further provided for the method of analyzing the target-independent comments, and the target-related words, and the method is more accurate, and the method is more than the method is better analyzed.
Optionally, the extracting the labeled source domain comment data set and the unlabeled target domain comment data set to obtain a source domain specific attribute data set and a target domain specific attribute data set includes:
extracting all the attributes of the modified words and the attributes of the modifier words in the field comment data set with the labeling source and the field comment data set without the labeling target;
and rejecting the modified word attribute and the modifier word attribute shared by the marked source domain comment data set and the non-marked target domain comment data set to obtain the source domain specific attribute data set and the target domain specific attribute data set.
Optionally, the removing the modifier attribute and the modifier attribute shared by the labeled source domain comment data set and the unlabeled target domain comment data set to obtain the source domain specific attribute data set and the target domain specific attribute data set includes:
according to all the modified word attributes and modifier attributes in the marked source domain comment data set and the non-marked target domain comment data set, respectively obtaining a source domain modified word attribute set and a source domain modifier attribute set, and a target domain modified word attribute set and a target domain modifier attribute set;
Data screening is carried out on the source domain modified word attribute set and the target domain modified word attribute set to obtain common modified word attributes, and a common modified word attribute set is generated;
data screening is carried out on the source domain modifier attribute set and the target domain modifier attribute set to obtain common modifier attributes, and a common modifier attribute set is generated;
removing the same modified word attribute and modifier attribute in the source domain modified word attribute set and the source domain modifier attribute set according to the common modified word attribute set and the common modifier attribute set to obtain a source domain specific attribute data set;
and rejecting the same modifier attribute and modifier attribute in the target domain modified attribute set and the target domain modifier attribute set according to the common modified attribute set and the common modifier attribute set to obtain the target domain specific attribute data set.
Optionally, the generating the domain independent sentence structure set data according to the labeled source domain comment data set, the target domain specific attribute data set and the source domain specific attribute data set includes:
And according to the marked source domain comment data set, the target domain specific attribute data set and the source domain specific attribute data set, replacing all original attributes in the source domain specific attribute data set with special mask attributes to generate the domain independent sentence structure set data.
Optionally, the construction process of the pre-training language model includes:
acquiring a target field non-labeling data set and a mask language model;
and carrying out initial training on the original pre-training language model according to the target field non-labeling data set and the mask language model to obtain the pre-training language model.
Optionally, the generating the domain comment data set with the labeling target according to the domain independent sentence structure set data and the pre-training language model includes:
inputting an initial sentence structure in the field-independent sentence structure set data into the pre-training language model to obtain a probability list of each special mask attribute in the initial sentence structure as other attributes;
obtaining the probability that all other attributes appear at the special mask attribute position according to the probability list corresponding to the special mask attribute, and judging whether the other attributes are modified word attributes or modifier word attributes through the target field specific attribute data set;
According to the judging result, replacing the special mask attribute in the field independent sentence structure of the field independent sentence structure collection data with the other attribute to obtain a new field independent sentence structure;
inputting the new domain independent sentence structure into the pre-training language model, and repeating the replacing operation until all the special mask attributes in the domain independent sentence structure are replaced by the other attributes, and generating the domain comment data set with the labeling target according to all the replaced domain independent sentence structures.
Optionally, replacing the special mask attribute in the domain-independent sentence structure with the other attribute according to the determination result to obtain a new domain-independent sentence structure, including:
acquiring a target field specific modified word attribute set and a target field specific modified word attribute set;
if the other attributes are modified word attributes, selecting the position of the modified word attribute with the largest probability existing in the target domain specific modified word attribute set to replace the special mask attribute according to the probability list;
if the other attributes are modifier attributes, judging the emotion polarity of the modifier attributes according to the labeling information, wherein the emotion polarity is positive polarity, negative polarity or neutral polarity;
If the front polarity is the front polarity, selecting the position with the highest probability of replacing the special mask attribute by the modifier attribute from the front polarity attribute set in the target field specific modifier attribute set;
if the negative polarity is the negative polarity, selecting a position with the highest probability of replacing the special mask attribute by the modifier attribute from a negative polarity attribute set in the target field specific modifier attribute set;
and if the polarity is the neutral polarity, selecting the position of the modifier attribute with the highest probability from the neutral polarity attribute sets in the target domain specific modifier attribute set to replace the special mask attribute.
In order to solve the above problems, the present invention further provides a cross-domain fine granularity emotion analysis system, including: the system comprises a domain specific attribute extraction module, a domain independent sentence structure generation module, a cross-domain comment generation module, a pre-training language model fine adjustment module and a cross-domain emotion analysis module;
the field specific attribute extraction module is used for obtaining a field comment data set with a labeling source and an unlabeled target field comment data set, extracting the field comment data set with the labeling source and the unlabeled target field comment data set to obtain a source field specific attribute data set and a target field specific attribute data set, and obtaining a target field comment data test set according to the unlabeled target field comment data set;
The domain independent sentence structure generation module is used for generating domain independent sentence structure set data according to the domain comment data set with the label source, the target domain specific attribute data set and the source domain specific attribute data set;
the cross-domain comment generation module is used for generating a domain comment data set with a labeling target according to the domain independent sentence structure set data and a pre-training language model;
the pre-training language model fine adjustment module is used for optimizing the pre-training language model according to the unlabeled target field comment data set and the labeled target field comment data set to obtain a fine-granularity emotion analysis model;
and the cross-domain emotion analysis module is used for carrying out fine-granularity emotion analysis on the target domain comment data test set according to the fine-granularity emotion analysis model to obtain target domain comments.
The cross-domain fine granularity emotion analysis system and the cross-domain fine granularity emotion analysis method have the same advantages as those of the prior art, and are not repeated here.
In order to solve the above problems, the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements any one of the above cross-domain fine granularity emotion analysis methods when executing the computer program.
The advantages of the computer device of the present invention and the cross-domain fine granularity emotion analysis method are the same as those of the prior art, and are not described in detail herein.
To solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the cross-domain fine-granularity emotion analysis method of any one of the above.
The advantages of the computer readable storage medium of the present invention and the cross-domain fine granularity emotion analysis method are the same as those of the prior art, and are not described in detail herein.
Drawings
FIG. 1 is a schematic flow chart of a cross-domain fine granularity emotion analysis method in an embodiment of the invention;
FIG. 2 is a schematic flow chart of a cross-domain fine granularity emotion analysis system in an embodiment of the invention;
fig. 3 is an internal structural diagram of a computer device in an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In order to solve the above problems, with reference to fig. 1, the present invention provides a cross-domain fine granularity emotion analysis method, including:
Step 110, obtaining a field comment data set with a labeling source and an unlabeled target field comment data set, extracting the field comment data set with the labeling source and the unlabeled target field comment data set to obtain a source field specific attribute data set and a target field specific attribute data set, and obtaining a target field comment data test set according to the unlabeled target field comment data set;
specifically, for a given source domain and target domain review dataset, namely a labeled source domain review dataset and an unlabeled target domain review dataset, where domain-specific attributes refer to words, phrases, syntactic structures, and expression styles that appear only in the source domain or target domain, this embodiment only considers words and patterns.
Step 120, generating field independent sentence structure collection data according to the field comment data set with label source, the target field specific attribute data set and the source field specific attribute data set;
specifically, for each comment sentence of the source domain comment data set, each attribute thereof is traversed, and the attribute appearing in the source domain-specific attribute data set is replaced with a special token "[ MASK ]" (special MASK attribute), so that a domain-independent sentence structure set in the source domain, namely domain-independent sentence structure set data, is obtained.
Step 130, generating a field comment data set with a labeling target according to the field irrelevant sentence structure set data and a pre-training language model;
specifically, the pre-training language model BERT is an unsupervised pre-training language model oriented to natural language processing tasks, the accuracy of each natural language processing task can be remarkably improved, and a field comment data set with a labeling target is generated by inputting field irrelevant sentence structure set data into the pre-training language model.
Step 140, optimizing the pre-training language model according to the unlabeled target field comment data set and the labeled target field comment data set to obtain a fine-grained emotion analysis model;
specifically, in the embodiment, through multi-step prediction, namely, multiple optimization is performed on the model, the harmony and the dependence among comment sentences are improved, so that the generated sentences are more in line with language habits, wherein optimizers adopted in model optimization are Adam optimizer, learning rate, dropout rate and batch size, the optimizers are set to be 5e-5, 0.1 and 32, and a learning mature model is obtained through continuous training and is used as a final emotion analysis model, namely, a fine-granularity emotion analysis model.
And step 150, carrying out fine-granularity emotion analysis on the target field comment data test set according to the fine-granularity emotion analysis model to obtain target field comments.
Specifically, the trained model, namely the fine-granularity emotion analysis model, performs corresponding emotion analysis tasks on the target field comment data test set, and evaluates the performance of the model, wherein the evaluation method comprises the following steps: the Micro-F1 score is used as a measure of model performance, each group of experiments is carried out 5 times, 5 experimental results are recorded respectively, and the average value of the 5 experimental results is used as a final experimental result.
In particular, the embodiment obtains the source domain specific attribute data set by extracting the labeling source domain comment data set and the unlabeled target domain comment data set, compared with the prior art, the embodiment does not need a large amount of label data with fine granularity, only the existing labeling source domain comment data set is combined with the unlabeled target domain comment data set to conduct data analysis, the problem of lack of fine granularity labeling data is fundamentally solved, dependence on the fine granularity labeling data is reduced, the domain independent sentence structure aggregate data is generated according to the labeling source domain comment data set and the source domain specific attribute data set, the pre-training language model is combined to generate the labeling target domain comment data set, the comment data set with labeling information attached to the target domain is directly used as a target, the problem of scarcity of the fine granularity label data is solved, the pre-training language model is optimized to obtain the fine granularity emotion analysis model, the multi-step prediction is achieved, the relation between generated words is not independent and isolated any more, the dependence relation between target domain comments is improved, the method of the embodiment can promote the dependence relation between the target domain and can further provide more accurate label data for the target domain with more supervision words.
Optionally, the extracting the labeled source domain comment data set and the unlabeled target domain comment data set to obtain a source domain specific attribute data set and a target domain specific attribute data set includes:
extracting all the attributes of the modified words and the attributes of the modifier words in the field comment data set with the labeling source and the field comment data set without the labeling target;
and rejecting the modified word attribute and the modifier word attribute shared by the marked source domain comment data set and the non-marked target domain comment data set to obtain the source domain specific attribute data set and the target domain specific attribute data set.
Specifically, the attributes specific to the domain are classified into a modified word attribute aspect and a modifier word attribute option according to the functions of the attributes in the sentences, wherein the aspect term: words in English sentences representing "object", "aspect" and "attribute" can be understood as the object described by the modifier; open term: the vocabulary used for modifying aspect term in english sentence is used for expressing opinion, emotion and other information about the aspect term, and the embodiment obtains the attribute dataset specific to the source field by eliminating aspect term and option term shared in the source field and the target field, and it can be understood that the attribute dataset specific to the target field can be obtained in an equivalent manner.
Optionally, the removing the modifier attribute and the modifier attribute shared by the labeled source domain comment data set and the unlabeled target domain comment data set to obtain the source domain specific attribute data set and the target domain specific attribute data set includes:
according to all the modified word attributes and modifier attributes in the marked source domain comment data set and the non-marked target domain comment data set, respectively obtaining a source domain modified word attribute set and a source domain modifier attribute set, and a target domain modified word attribute set and a target domain modifier attribute set;
data screening is carried out on the source domain modified word attribute set and the target domain modified word attribute set to obtain common modified word attributes, and a common modified word attribute set is generated;
data screening is carried out on the source domain modifier attribute set and the target domain modifier attribute set to obtain common modifier attributes, and a common modifier attribute set is generated;
removing the same modified word attribute and modifier attribute in the source domain modified word attribute set and the source domain modifier attribute set according to the common modified word attribute set and the common modifier attribute set to obtain a source domain specific attribute data set;
And eliminating the same modified word attribute and modifier attribute in the target domain modified word attribute set and the target domain modified word attribute set according to the common modified word attribute set and the common modifier attribute set to obtain the target domain specific attribute data set.
Specifically, a common modified word attribute set (common term list) and a common modified word attribute set (common opinion term list) are found out through the method in the embodiment, so that a source field specific attribute data set is obtained, common data and characteristic data are separated through the embodiment, further, a special mask attribute is extracted for the follow-up preparation, and similarly, a target field specific attribute data set can be obtained through the method in the embodiment, the method in the embodiment can analyze attributes of an unlabeled comment data set, further, labeling is carried out, the problem of insufficient data is fundamentally solved, fine granularity emotion analysis can be utilized and analyzed, and for the source field data set with labeling information, all the aspect terms and the option term in the source field are directly extracted according to the labeling information; for the target domain data set without any labeling information, double Propogation may be used in this embodiment to extract all aspects and aspects of the domain; all aspect and open term that occur not only in the source domain but also in the target domain, i.e., aspect term and open term common to the source domain and the target domain, are found, and then these common aspect term and open term are removed, respectively, to obtain aspect term lists and opinion term lists specific to the source domain and aspect term lists and opinion term lists specific to the target domain.
Optionally, the generating the domain independent sentence structure set data according to the labeled source domain comment data set, the target domain specific attribute data set and the source domain specific attribute data set includes:
and according to the marked source domain comment data set, the target domain specific attribute data set and the source domain specific attribute data set, replacing all original attributes in the source domain specific attribute data set with special mask attributes to generate the domain independent sentence structure set data.
Specifically, for each source domain comment sentence, each token (attribute) of the comment is traversed, and whether the token exists in an aspect term list or opinion term list specific to the source domain is determined, and if so, a special token is used: "[ MASK ]" (special MASK attribute) replaces the token; otherwise, the method does not work, and continues to traverse the next token until the comment sentence is traversed; after each piece of comment data in the source field is traversed, a set of sentence structure irrelevant to the field in the source field can be obtained, wherein the number of the sentence structure sets is equal to that of the comment data in the comment data set in the source field, and each sentence structure irrelevant to the field is attached with the labeling information of the original sentence.
Optionally, the construction process of the pre-training language model includes:
acquiring a target field non-labeling data set and a mask language model;
and carrying out initial training on the original pre-training language model according to the target field non-labeling data set and the mask language model to obtain the pre-training language model.
Specifically, an original pre-training language model is trained based on multi-step prediction of a mask language model, so that the pre-training language model is obtained and used for cross-domain comment generation, generated comments have higher harmony and correlation, and meanwhile, the effect of the model in fine-granularity cross-domain emotion analysis can be improved.
Optionally, the generating the domain comment data set with the labeling target according to the domain independent sentence structure set data and the pre-training language model includes:
inputting an initial sentence structure in the field-independent sentence structure set data into the pre-training language model to obtain a probability list of each special mask attribute in the initial sentence structure as other attributes;
obtaining the probability that all other attributes appear at the special mask attribute position according to the probability list corresponding to the special mask attribute, and judging whether the other attributes are modified word attributes or modifier word attributes through the target field specific attribute data set;
According to the judging result, replacing the special mask attribute in the field independent sentence structure of the field independent sentence structure collection data with the other attribute to obtain a new field independent sentence structure;
inputting the new domain independent sentence structure into the pre-training language model, and repeating the replacing operation until all the special mask attributes in the domain independent sentence structure are replaced by the other attributes, and generating the domain comment data set with the labeling target according to all the replaced domain independent sentence structures.
Specifically, the method of the embodiment can convert the domain-independent sentence structure set data into the domain comment data set with the labeling target, solves the problem of lacking fine-granularity labeling data, and reduces the dependence on the fine-granularity labeling data.
Optionally, replacing the special mask attribute in the domain-independent sentence structure with the other attribute according to the determination result to obtain a new domain-independent sentence structure, including:
acquiring a target field specific modified word attribute set and a target field specific modified word attribute set;
if the other attributes are modified word attributes, selecting the position of the modified word attribute with the largest probability existing in the target domain specific modified word attribute set to replace the special mask attribute according to the probability list;
If the other attributes are modifier attributes, judging the emotion polarity of the modifier attributes according to the labeling information, wherein the emotion polarity is positive polarity, negative polarity or neutral polarity;
if the front polarity is the front polarity, selecting the position with the highest probability of replacing the special mask attribute by the modifier attribute from the front polarity attribute set in the target field specific modifier attribute set;
if the negative polarity is the negative polarity, selecting a position with the highest probability of replacing the special mask attribute by the modifier attribute from a negative polarity attribute set in the target field specific modifier attribute set;
and if the polarity is the neutral polarity, selecting the position of the modifier attribute with the highest probability from the neutral polarity attribute sets in the target domain specific modifier attribute set to replace the special mask attribute.
Specifically, the emotion polarity is classified into positive polarity, negative polarity or neutral polarity, i.e. positive, negative or neutral, if the emotion polarity is positive, selecting a modifier attribute term with the highest probability from a positive polarity attribute set positive term list in the target domain specific opinion term list to replace the position of the special mask attribute; if the negative is, selecting a position of the term with the highest probability from the negative polarity attribute set negative term list in the target domain specific opinion term list to replace the special mask attribute; if the result is the neutral, selecting the position with the highest probability of replacing the special mask attribute from the neutral polarity attribute set neutral term list in the target domain specific opinion term list.
Corresponding to the above-mentioned cross-domain fine granularity emotion analysis method, the embodiment of the invention also provides a cross-domain fine granularity emotion analysis system, which comprises: the system comprises a domain specific attribute extraction module, a domain independent sentence structure generation module, a cross-domain comment generation module, a pre-training language model fine adjustment module and a cross-domain emotion analysis module;
the field specific attribute extraction module is used for obtaining a field comment data set with a labeling source and an unlabeled target field comment data set, extracting the field comment data set with the labeling source and the unlabeled target field comment data set to obtain a source field specific attribute data set and a target field specific attribute data set, and obtaining a target field comment data test set according to the unlabeled target field comment data set;
the domain independent sentence structure generation module is used for generating domain independent sentence structure set data according to the domain comment data set with the label source, the target domain specific attribute data set and the source domain specific attribute data set;
the cross-domain comment generation module is used for generating a domain comment data set with a labeling target according to the domain independent sentence structure set data and a pre-training language model;
The pre-training language model fine adjustment module is used for optimizing the pre-training language model according to the unlabeled target field comment data set and the labeled target field comment data set to obtain a fine-granularity emotion analysis model;
and the cross-domain emotion analysis module is used for carrying out fine-granularity emotion analysis on the target domain comment data test set according to the fine-granularity emotion analysis model to obtain target domain comments.
In one embodiment, as shown in fig. 2, the domain-specific attribute extraction module extracts domain comment data with a labeling source and non-labeling target, obtains all attributes of the source domain and all attributes of the target domain, further extracts all attributes of the two domains, obtains the attributes specific to the source domain and the attributes specific to the target domain by removing the attributes common to the two domains, inputs the attributes specific to the source domain comment data set with the labeling source and the attributes specific to the source domain into the domain-independent sentence structure generation module, generates a domain-independent sentence structure with the labeling, inputs the domain-independent sentence structure into the cross-domain comment generation module, trains the pre-training language model BERT through the non-labeling target domain comment data, generates a retrained BERT, combines the retrained BERT to generate a pre-training language model fine-tuning module, realizes fine-tuning of the BERT, generates a fine-granularity emotion analysis model, inputs the fine-granularity emotion analysis model into the cross-domain comment data test set, inputs the target domain comment test set into the cross-granularity analysis module, and generates a target comment through the non-emotion analysis model, and simultaneously generates a word-dependent comment on the target comment by a large-domain comment, and a target comment is generated by a target comment, and a target comment is generated by a target comment-independent comment, and a target comment is generated by a target comment is generated.
In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the cross-domain fine-grained emotion analysis method described above when executing the computer program.
It should be noted that the device may be a computer device such as a server, a mobile terminal, or the like.
FIG. 3 illustrates an internal block diagram of a computer device in one embodiment. The computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement a multi-functional co-operating method. The internal memory may also store a computer program that, when executed by the processor, causes the processor to perform a multi-functional co-operation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
In one embodiment, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described cross-domain fine-grained emotion analysis method.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Although the invention is disclosed above, the scope of the invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications will fall within the scope of the invention.

Claims (10)

1. A cross-domain fine granularity emotion analysis method, comprising:
acquiring a field comment data set with a labeling source and an unlabeled target field comment data set, extracting the field comment data set with the labeling source and the unlabeled target field comment data set to obtain a source field specific attribute data set and a target field specific attribute data set, and acquiring a target field comment data test set according to the unlabeled target field comment data set;
generating domain independent sentence structure collection data according to the domain comment data set with the label source, the target domain specific attribute data set and the source domain specific attribute data set;
generating a field comment data set with a labeling target according to the field irrelevant sentence structure set data and a pre-training language model;
optimizing the pre-training language model according to the unlabeled target field comment data set and the labeled target field comment data set to obtain a fine-grained emotion analysis model;
And carrying out fine-granularity emotion analysis on the target field comment data test set according to the fine-granularity emotion analysis model to obtain target field comments.
2. The cross-domain fine granularity emotion analysis method of claim 1, wherein said extracting the labeled source domain comment dataset and the unlabeled target domain comment dataset to obtain a source domain-specific attribute dataset and a target domain-specific attribute dataset comprises:
extracting all the attributes of the modified words and the attributes of the modifier words in the field comment data set with the labeling source and the field comment data set without the labeling target;
and rejecting the modified word attribute and the modifier word attribute shared by the marked source domain comment data set and the non-marked target domain comment data set to obtain the source domain specific attribute data set and the target domain specific attribute data set.
3. The cross-domain fine granularity emotion analysis method according to claim 2, wherein said removing the modifier and modifier attributes common to the labeled source domain comment dataset and the unlabeled target domain comment dataset to obtain the source domain-specific attribute dataset and the target domain-specific attribute dataset comprises:
According to all the modified word attributes and modifier attributes in the marked source domain comment data set and the non-marked target domain comment data set, respectively obtaining a source domain modified word attribute set and a source domain modifier attribute set, and a target domain modified word attribute set and a target domain modifier attribute set;
data screening is carried out on the source domain modified word attribute set and the target domain modified word attribute set to obtain common modified word attributes, and a common modified word attribute set is generated;
data screening is carried out on the source domain modifier attribute set and the target domain modifier attribute set to obtain common modifier attributes, and a common modifier attribute set is generated;
removing the same modified word attribute and modifier attribute in the source domain modified word attribute set and the source domain modifier attribute set according to the common modified word attribute set and the common modifier attribute set to obtain a source domain specific attribute data set;
and eliminating the same modified word attribute and modifier attribute in the target domain modified word attribute set and the target domain modified word attribute set according to the common modified word attribute set and the common modifier attribute set to obtain the target domain specific attribute data set.
4. The cross-domain fine granularity emotion analysis method of claim 1, wherein said generating domain independent sentence structure collection data from said labeled source domain comment dataset, said target domain specific attribute dataset, and said source domain specific attribute dataset comprises:
and according to the marked source domain comment data set, the target domain specific attribute data set and the source domain specific attribute data set, replacing all original attributes in the source domain specific attribute data set with special mask attributes to generate the domain independent sentence structure set data.
5. The cross-domain fine granularity emotion analysis method of claim 4, wherein said pre-training language model construction process comprises:
acquiring a target field non-labeling data set and a mask language model;
and carrying out initial training on the original pre-training language model according to the target field non-labeling data set and the mask language model to obtain the pre-training language model.
6. The cross-domain fine granularity emotion analysis method of claim 5, wherein said generating a domain comment dataset with a labeling target according to said domain independent sentence structure collection data in combination with a pre-trained language model comprises:
Inputting an initial sentence structure in the field-independent sentence structure set data into the pre-training language model to obtain a probability list of each special mask attribute in the initial sentence structure as other attributes;
obtaining the probability that all other attributes appear at the special mask attribute position according to the probability list corresponding to the special mask attribute, and judging whether the other attributes are modified word attributes or modifier word attributes through the target field specific attribute data set;
according to the judging result, replacing the special mask attribute in the field independent sentence structure of the field independent sentence structure collection data with the other attribute to obtain a new field independent sentence structure;
inputting the new domain independent sentence structure into the pre-training language model, and repeating the replacing operation until all the special mask attributes in the domain independent sentence structure are replaced by the other attributes, and generating the domain comment data set with the labeling target according to all the replaced domain independent sentence structures.
7. The cross-domain fine granularity emotion analysis method of claim 6, wherein said replacing said special mask attribute in the domain-independent sentence structure with said other attribute according to the determination result, to obtain a new domain-independent sentence structure, comprises:
Acquiring a target field specific modified word attribute set and a target field specific modified word attribute set;
if the other attributes are modified word attributes, selecting the position of the modified word attribute with the largest probability existing in the target domain specific modified word attribute set to replace the special mask attribute according to the probability list;
if the other attributes are modifier attributes, judging the emotion polarity of the modifier attributes according to the labeling information, wherein the emotion polarity is positive polarity, negative polarity or neutral polarity;
if the front polarity is the front polarity, selecting the position with the highest probability of replacing the special mask attribute by the modifier attribute from the front polarity attribute set in the target field specific modifier attribute set;
if the negative polarity is the negative polarity, selecting a position with the highest probability of replacing the special mask attribute by the modifier attribute from a negative polarity attribute set in the target field specific modifier attribute set;
and if the polarity is the neutral polarity, selecting the position of the modifier attribute with the highest probability from the neutral polarity attribute sets in the target domain specific modifier attribute set to replace the special mask attribute.
8. A cross-domain fine granularity emotion analysis system, comprising: the system comprises a domain specific attribute extraction module, a domain independent sentence structure generation module, a cross-domain comment generation module, a pre-training language model fine adjustment module and a cross-domain emotion analysis module;
the field specific attribute extraction module is used for obtaining a field comment data set with a labeling source and an unlabeled target field comment data set, extracting the field comment data set with the labeling source and the unlabeled target field comment data set to obtain a source field specific attribute data set and a target field specific attribute data set, and obtaining a target field comment data test set according to the unlabeled target field comment data set;
the domain independent sentence structure generation module is used for generating domain independent sentence structure set data according to the domain comment data set with the label source, the target domain specific attribute data set and the source domain specific attribute data set;
the cross-domain comment generation module is used for generating a domain comment data set with a labeling target according to the domain independent sentence structure set data and a pre-training language model;
the pre-training language model fine adjustment module is used for optimizing the pre-training language model according to the unlabeled target field comment data set and the labeled target field comment data set to obtain a fine-granularity emotion analysis model;
And the cross-domain emotion analysis module is used for carrying out fine-granularity emotion analysis on the target domain comment data test set according to the fine-granularity emotion analysis model to obtain target domain comments.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the cross-domain fine-grained emotion analysis method of any of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the cross-domain fine granularity emotion analysis method of any of claims 1 to 7.
CN202311409024.8A 2023-10-27 2023-10-27 Cross-domain fine granularity emotion analysis method, system, equipment and storage medium Pending CN117150024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311409024.8A CN117150024A (en) 2023-10-27 2023-10-27 Cross-domain fine granularity emotion analysis method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311409024.8A CN117150024A (en) 2023-10-27 2023-10-27 Cross-domain fine granularity emotion analysis method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117150024A true CN117150024A (en) 2023-12-01

Family

ID=88910385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311409024.8A Pending CN117150024A (en) 2023-10-27 2023-10-27 Cross-domain fine granularity emotion analysis method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117150024A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860901A (en) * 2021-03-31 2021-05-28 中国工商银行股份有限公司 Emotion analysis method and device integrating emotion dictionaries
CN113326378A (en) * 2021-06-16 2021-08-31 山西财经大学 Cross-domain text emotion classification method based on parameter migration and attention sharing mechanism
CN114090725A (en) * 2020-08-24 2022-02-25 阿里巴巴集团控股有限公司 Emotion prediction model training method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114090725A (en) * 2020-08-24 2022-02-25 阿里巴巴集团控股有限公司 Emotion prediction model training method and device
CN112860901A (en) * 2021-03-31 2021-05-28 中国工商银行股份有限公司 Emotion analysis method and device integrating emotion dictionaries
CN113326378A (en) * 2021-06-16 2021-08-31 山西财经大学 Cross-domain text emotion classification method based on parameter migration and attention sharing mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈浩: "基于评论生成与扩充的联合情感分析研究", 《万方数据知识服务平台》, pages 38 - 49 *

Similar Documents

Publication Publication Date Title
CN108399158B (en) Attribute emotion classification method based on dependency tree and attention mechanism
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN106649603B (en) Designated information pushing method based on emotion classification of webpage text data
Zhao et al. Adding redundant features for CRFs-based sentence sentiment classification
CN110765265A (en) Information classification extraction method and device, computer equipment and storage medium
CN113138920B (en) Software defect report allocation method and device based on knowledge graph and semantic role labeling
CN111462752B (en) Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method
CN113806493B (en) Entity relationship joint extraction method and device for Internet text data
CN112287090A (en) Financial question asking back method and system based on knowledge graph
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN115687634A (en) Financial entity relationship extraction system and method combining priori knowledge
CN112101014A (en) Chinese chemical industry document word segmentation method based on mixed feature fusion
CN113934909A (en) Financial event extraction method based on pre-training language and deep learning model
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN115630156A (en) Mongolian emotion analysis method and system fusing Prompt and SRU
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
Shahade et al. Multi-lingual opinion mining for social media discourses: An approach using deep learning based hybrid fine-tuned smith algorithm with adam optimizer
CN114896387A (en) Military intelligence analysis visualization method and device and computer readable storage medium
CN112667819A (en) Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device
CN117237479A (en) Product style automatic generation method, device and equipment based on diffusion model
Cheng et al. GRCNN: Graph Recognition Convolutional Neural Network for synthesizing programs from flow charts
CN117150024A (en) Cross-domain fine granularity emotion analysis method, system, equipment and storage medium
CN114580423A (en) Bert and Scat-based shale gas field named entity identification method
CN114528459A (en) Semantic-based webpage information extraction method and system
Jamil et al. Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination