CN115374778A

CN115374778A - Cosmetic public opinion text entity relation extraction method based on deep learning

Info

Publication number: CN115374778A
Application number: CN202211010810.6A
Authority: CN
Inventors: 左敏; 葛伟; 路勇; 张伟清; 许鸣镝; 孙磊; 王海燕
Original assignee: Beijing Technology and Business University; National Institutes for Food and Drug Control
Current assignee: Beijing Technology and Business University; National Institutes for Food and Drug Control
Priority date: 2022-08-08
Filing date: 2022-08-23
Publication date: 2022-11-22

Abstract

The invention relates to a cosmetic public opinion text entity relation extraction method based on deep learning, which comprises the following steps: preprocessing cosmetic risk public opinion text information crawled on the Internet, constructing a cosmetic field word library, extracting word dimension text characteristics through an improved BERT neural network, fusing the word dimension text characteristics with word dimension information embedded in words, calculating multi-classification information through a BLSTM network of a fusion position perception attention mechanism, integrating the multi-classification information into an improved BERT neural network extracted word dimension text vector, calculating BLSTM of the fusion position perception attention mechanism again, and finally calculating optimal probability through CRF to finish extraction of cosmetic risk public opinion text relation. The invention solves the problems of low accuracy and strong field of extraction of cosmetic risk public opinion text relation to a certain extent, and improves the accuracy of extraction of event information by constructing a new model and adding word dimension for auxiliary representation on the basis of combining the word dimension of Chinese radical information.

Description

Cosmetic public opinion text entity relation extraction method based on deep learning

Technical Field

The invention relates to the field of artificial intelligence, in particular to a cosmetic public sentiment text entity relation extraction method based on deep learning.

Background

With the advent of the "regulations on supervision and management of cosmetics", the cosmetics industry has opened a new era and has become the focus of public opinion attention. Related supporting regulation documents also sequentially disclose solicitation opinions, and the legal status of related works of national risk monitoring is clarified for the first time in regulations.

The safety risk substances possibly existing in the cosmetics refer to substances brought in by cosmetic raw materials, generated or brought in during the production process, which may cause potential harm to human health. On one hand, the safety risk of the use of the cosmetics is objectively caused by the complexity of the cosmetic formula, the limitation of people on the knowledge of the ingredients of the cosmetic formula and the potential threats thereof and the incompleteness of the use experience of the cosmetics; on the other hand, in order to pursue high profit, many lawbreakers do not need to add forbidden substances artificially, counterfeit the well-known brand cosmetics, and subjectively cause the safety risk of using the cosmetics. Because cosmetic risks have certain harmfulness and sociality, human body injuries and certain economic losses are caused to different degrees, even sometimes, adverse social effects are possibly generated after the event is fermented through public opinions, the online public opinions are formed quickly, people often publish opinions on the internet within a short time after the event occurs, and when people pay attention to the progress of the event, the online public opinions develop at a higher speed, so that the online public opinions are difficult to control to some extent.

Therefore, the method attaches attention to and makes an assessment of the online public opinion risk, identifies the degree of the online public opinion risk, and prevents the expansion of the online public opinion risk, and is the first step of online public opinion management and control, and only by effectively assessing the online public opinion risk can determine which countermeasures to take. By the measure, scientific management of network public sentiment can be better realized. Therefore, the establishment of the risk public opinion entity relationship extraction model is of great significance to the supervision of cosmetic safety.

The relation extraction is gradually developed from pattern matching to a machine learning method based on statistics, and the deep learning based on the artificial neural network is currently the dominant position, and the deep learning not only considers the event extraction as a classification task, but also considers the event extraction as a sequence labeling task.

Two main tasks of relationship extraction are entity identification and relationship classification, and some current models firstly identify trigger words and then extract arguments by using a cascade (pipeline) mode. This method may cause errors of the previous stage to propagate to the next stage, causing error propagation. The invention adopts a combined extraction mode to simultaneously extract the trigger words and the arguments so as to improve the performance of the two subtasks, and simultaneously adds global characteristics to express global information between the trigger words and the arguments.

The invention adopts an event joint extraction model structure based on BERT-BLSTM-CRF and a novel sequence marking mode, changes the problem of event argument extraction into an end-to-end problem, and well solves the problem of error transmission caused by the traditional pipeline model. Meanwhile, a dual-network model structure is adopted, one network uses Chinese characters as input and introduces Chinese radical characteristics to increase extra semantic information, and the second network model uses Chinese words as input, and a domain word mechanism is introduced in order to enable the network to have better performance effect on different argument distinction degrees and absorb the text characteristics in the cosmetic public opinion field.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method can quickly and accurately extract the information of the cosmetic public sentiment events, greatly improve the working efficiency of a supervisor, assist the supervisor to make a judgment, realize the crossing from 'post-event public sentiment monitoring' to 'pre-event risk early warning', provide scientific basis for cosmetic safety supervision decision making, and lay the foundation for establishing a national cosmetic safety risk control system.

The method provided by the invention comprises the following steps: a cosmetic public opinion text entity relation extraction method based on deep learning comprises the following steps:

step 1, according to four main publishing channels of cosmetic risk public opinion data: official release information, social news, E-commerce platform comment data and social media related information, a network crawler is compiled aiming at public sentiment events by using a python programming language, and original text data crawled by the crawler is subjected to duplication removal and screening pretreatment to form available public sentiment event text corpora.

And 2, combining a word embedding (word embedding) resource library in the public domain to obtain a word embedding resource library in the cosmetic safety domain according to the professional words in the cosmetic public opinion supervision domain obtained in the step 1. On the basis of the public domain word embedding resource library, the professional words in the cosmetic public opinion domain are used for carrying out incremental training on the word embedding resource library to obtain the cosmetic public opinion domain word embedding resource library.

Step 3, semantic role labeling of entity 1, relation and entity 2 triples is carried out aiming at the cosmetic risk public opinion text extracted in the step 1, wherein the entity 1 is a main body of a cosmetic public opinion event, the entity 2 is an object of the cosmetic public opinion event, the relation is a relation between the entity 1 and the entity 2, the entity 1 comprises baby cream, a big head doll event, fake cosmetics and the like, the entity 2 comprises hormone, preservative, overdue parts and the like, and the relation is 6 in total: the method comprises the following steps of raw material components, adverse reactions, risk substances, public opinion heat, efficacy declaration and illegal behaviors, wherein a sentence is divided into different components aiming at a cosmetic risk public opinion text, the influence degree of a core word on adjacent words in the same sentence component changes along with the distance, the influence of all core words in the sentence on the adjacent words is accumulated to simulate the state of the whole sentence influenced by position perception, the position perception strategy is combined with the traditional attention mechanism, and the semantic role attention mechanism based on the position perception is constructed;

step 4, aiming at the cosmetic risk public opinion text extracted in the step 1, adopting a coder (BERT-Bidirectional Encoder retrieval from transforms) based on a Bidirectional depth self-attention transformation network to construct a pre-training model on word dimension, fusing a character radical feature vector for each word vector, then constructing a word dimension pre-training model by using a cosmetic public opinion field word embedding resource library obtained in the step 2, respectively obtaining text feature vectors after full-text semantic information is fused by combining the word vectors and the word vectors through a Bidirectional long-short term memory network BLSTM model and a semantic role attention machine based on position perception constructed in the step three, and obtaining a multi-classification relation of the text through localization, a full-connection layer and a gesture;

and 5, inputting the text corpus of the public opinion events into a Bert-based pre-training model to obtain a word vector of the text, fusing Chinese radical feature vectors, adding the multi-classification relation information obtained in the step 4 to two ends of the text feature vectors extracted by the Bert pre-training model to obtain a word-fused two-dimensional text semantic vector, inputting the text semantic vector into the BLSTM model and the conditional random field CRF again, and calculating the optimal probability through the conditional random field to obtain a final cosmetic public opinion text entity relation extraction result.

Further, in the step 1, the constructed web crawler suitable for the cosmetic public opinion field has information which is issued by authoritative research institutions at home and abroad and causes harm to human and animal and plant health; the adverse reaction monitoring data of domestic and foreign research institutions on cosmetics are as follows: authoritative report of news media at home and abroad; problems and recalling information of cosmetic production enterprises in the production, storage, circulation and sale links; various information published by industry associations at home and abroad; the product in the social network uses shared information, e-commerce platform sales comment information, and the like. And preprocessing the crawled content by data to form available public opinion event text corpora, and extracting professional vocabularies in the field of cosmetic public opinions.

Further, in step 2, on the basis of embedding the public domain words into the resource library, the cosmetic domain professional words obtained in step 1 are input into a skip-gram (skip-gram) model, incremental training is performed on the public domain words embedded into the resource library, and as the number of the crawled contents in step 1 increases, at intervals, after a certain number of contents capable of being subjected to incremental training are accumulated, the contents are input into the skip-gram (skip-gram) model again to perform incremental training on the public domain words embedded into the resource library, and finally the public domain words embedded into the resource library is expanded into the words embedded into the resource library suitable for the cosmetic public opinion domain.

Further, in the step 3, semantic role labeling (entity 1, relationship, entity 2) is performed on the cosmetic risk public opinion text extracted in the step 1 in a triple form, where the entity 1 is a subject of the cosmetic public opinion event, the entity 2 is an object of the cosmetic public opinion event, the relationship is a connection between the entity 1 and the entity 2, the entity 1 includes baby cream, a big-head doll event, fake and inferior cosmetics, and the entity 2 includes hormones, preservatives, overdue pieces, and the like, and there are 6 relations: the method comprises the steps of marking and dividing different sentence components through semantic roles, locating the positions of words in the sentence components, generating a vector of each word based on position perception influence through propagation influence, updating the word weight by using the position perception of context semantics, and constructing a semantic role attention mechanism based on the position perception.

The specific process for constructing the semantic role attention mechanism based on the position perception comprises the following steps:

(1) The attention of the words at sentence j position is:

in the formula (1), h _j Is a hidden layer vector of j-position words, p _j Is the accumulated position perception influence vector of the words, len is the number of the words in the sentence, h _i Is a hidden layer vector of a word at a certain position in a sentence, p _i A (-) is a vector for measuring the importance of a word based on the hidden layer vector and the location-aware influence vector;

(2) The specific form of a (-) is:

in the formula (2), W _H 、W _P Is h _j 、p _j A weight matrix of (a); b ₁ Is a bias vector belonging to a first layer parameter;

is a ReLU function; v is a global vector，v ^T Represents its transpose; b ₂ Is a bias vector belonging to the second layer parameters, len is the number of words in the sentence, and i is a word at a certain position in the sentence.

Further, in step 4, the usable public opinion event text corpus formed in step 1 is input into a Bert pre-training model to obtain vectorized representation of the text, wherein the specific implementation process includes segmenting the whole text input according to sentences, then encoding the input by using a deep self-attention transformation network, masking part of the content of the sentences after encoding (mask), predicting the masked content through the remaining content of the sentences after masking, comparing the predicted masked result with the real masked content to obtain a predicted error, adjusting parameters of the model according to the predicted error, mapping the input text into a vector space by using the prediction task, thereby obtaining text vectorized representation of word dimensions (taking Chinese as a unit), and adding 48-dimensional additional semantic information on the basis of 768-dimensional word vectors for the Chinese radicals of each word. Before the word dimension is input into the pre-training, chinese word segmentation is firstly carried out, vectorization of Chinese words is carried out through the word embedding resource library in the cosmetic public opinion field constructed in the step 2, and a text input vector on the word dimension (taking the Chinese words as a unit) is obtained; respectively inputting the word vector and the word vector into a BLSTM model, and calculating a specific attention distribution coefficient r through a semantic role attention mechanism constructed in step 3 and a semantic role attention mechanism based on position perception _a The calculation process is as follows:

in the formula (3), h _j Is a hidden layer vector of the j position word, alpha _j Is the attention of the words at the j position, len is the number of words in a sentence;

and spreading the obtained word attention distribution coefficient to a hidden layer vector of the BLSTM, performing weighted calculation on each word to obtain text characteristics under the influence of an attention mechanism, splicing word dimension calculation results through conversation, and finally obtaining a multi-classification relation of the input text through a full connection layer and a sigmoid layer.

Further, in the step 5, public opinion event text corpus is input into a Bert pre-training model to obtain vectorization representation of texts, a Chinese radical feature vector (48 dimensions) of each word vector (768 dimensions) is fused, then the multi-classification result of the step 4 is expanded into 768+ 48-dimensional vectors which are spliced at two ends of an input text word vector matrix to obtain text vectors with full text semantic information fused, the text vectors are input into a BLSTM model for calculation, the entity relationship of the input texts is judged through the semantic role attention mechanism constructed in the step 3, and a final cosmetic public opinion text entity relationship extraction result is obtained after the optimal probability is calculated through a conditional random field CRF.

Compared with the prior art, the invention has the advantages that:

the invention constructs a word two-dimensional event text relation extraction model by an improved encoder BERT network of a two-way deep self-attention transformation network and a two-way long-short term memory network BLSTM fused based on a semantic role attention mechanism, can quickly and accurately judge key information in cosmetic public opinion events, constructs more comprehensively aiming at the aspect of extracting event text relation in the field of cosmetic public opinion, takes two different text distributed representations of character level and word level as model input, and integrates the output multi-classification information into a text vector of full text semantic information to complete the extraction of the cosmetic public opinion text relation. The model provided by the invention makes full use of the characteristics of Bert, adds the character vectors of the radicals of the characters in the pre-training model, so that the character vectors carry richer semantic information, overcomes the defects of word attention weight dependence and hidden layer representation in the traditional attention mechanism by calculating the semantic role attention mechanism based on position perception in the broadcasting, simultaneously takes the word vectors embedded in the text words as the supplementary information of the character vectors, further excavates the text semantics, avoids the loss of classification precision due to incomplete feature extraction caused by unstructured and lack of normative text corpora, and effectively improves the event relation extraction effect.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of a word two-dimensional text entity relationship extraction model.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

As shown in fig. 1, the method for extracting the entity relationship of cosmetic public sentiment text based on deep learning of the present invention comprises: the method comprises the steps of preprocessing public sentiment event data crawled on the internet, constructing a resource library of a cosmetic public sentiment field, performing incremental training by using field linguistic data, extracting word dimension text characteristics through an improved BERT neural network, fusing the word dimension text characteristics with word embedded word dimension information, calculating multi-classification information through a BLSTM network, integrating the multi-classification information into an extracted word dimension text vector of the improved BERT neural network, and finally calculating the optimal probability through CRF. The method solves the problems of low accuracy and strong domain of extraction of the event text relation in the cosmetic public opinion field to a certain extent, and improves the accuracy of extraction of the event relation by constructing a new model, taking the character dimension fused with the Chinese radical characteristics as the text vectorization representation and adding the word dimension for auxiliary representation.

The method specifically comprises the following steps:

step 1, compiling a web crawler aiming at public sentiment events according to the characteristics of the field of cosmetic public sentiment by using a python programming language, and crawling information which is harmful to the health of human beings, animals and plants and has the crawling content and is issued by authoritative research institutions at home and abroad; the adverse reaction monitoring data of domestic and foreign research institutions on cosmetics are as follows: authoritative report of news media at home and abroad; problems and recalling information of cosmetic production enterprises in the production, storage, circulation and sale links; various information published by the domestic and foreign industry associations; the method comprises the steps that products in a social network use shared information, e-commerce platform sales comment information and the like, duplicate removal and screening pretreatment are conducted on original text data crawled by crawlers, available public opinion event text corpora are formed, the improved Jieba method is used for achieving word segmentation on cosmetic risk public opinion text data, stop words which do not have meanings in the original text data are removed, and then a word bank in the cosmetic public opinion field is constructed through cooperation of point-to-Point Mutual Information (PMI) calculation and manual screening and supplement.

And 2, combining a word embedding (word embedding) resource library of the public domain to obtain a word embedding resource library of the cosmetic public opinion domain according to the word library of the cosmetic public opinion domain obtained in the step 1. On the basis of the public domain word embedding resource library, inputting the cosmetic domain professional words obtained in the step 1 into a skip-gram model, performing incremental training on the public domain word embedding resource library at intervals along with the continuous increase of the crawled content in the step 1, and finally expanding the public domain word embedding resource library into a word embedding resource library suitable for the cosmetic public opinion domain.

Step 3, semantic role labeling (entity 1, relation, entity 2) in a triple form is carried out aiming at the cosmetic risk public opinion text extracted in the step 1, wherein the entity 1 is a main body of a cosmetic public opinion event and contains baby cream, a big head doll event, fake cosmetics and the like, the entity 2 is an object of the cosmetic public opinion event and contains hormone, preservative, overdue parts and the like, the relation is the relation between the entity 1 and the entity 2, and 6 types are provided: the method comprises the following steps of raw material components, adverse reactions, risk substances, public opinion popularity, efficacy declaration and illegal behaviors, different sentence components are divided through semantic role labeling, so that propagation of position attention influence only occurs in the same sentence component, the positions of words in the sentence components are located, a vector based on the position perception influence of each word is generated through the propagation influence, word weight is updated by using position perception of context semantics, and a semantic role attention mechanism based on the position perception is constructed.

(1) The attention of the words at sentence j position is:

(2) The specific form of a (-) is:

is a ReLU function; v is a global vector, v ^T Represents its transpose; b ₂ Is a bias vector belonging to the second layer parameters, len is the number of words in the sentence, and i is a word at a certain position in the sentence.

Step 4, inputting the usable public opinion event text corpus formed in the step 1 into a Bert pre-training model to obtain vectorized representation of the text, wherein the specific implementation process comprises the steps of segmenting the whole text input according to sentences, then coding the input by using a deep self-attention transformation network, masking part of the content of the sentences after coding (mask), predicting the masked content through the residual content of the sentences after masking, comparing the predicted masking result with the real masked content to obtain predicted errors, adjusting the parameters of the model according to the predicted errors, and mapping the input text by the prediction taskAnd (2) injecting the Chinese radicals into a vector space, thereby obtaining a text vectorization representation of word dimensions (taking Chinese as a unit), and then adding 48-dimensional additional semantic information on the basis of 768-dimensional word vectors for the Chinese radicals of each word according to the particularity of the Chinese radicals in the text in the cosmetic public opinion field in the character evolution process. Performing Chinese word segmentation before inputting word dimension into pre-training, performing Chinese word vectorization through the word embedding resource library in the cosmetic public sentiment field constructed in the step 2 to obtain a text input vector in the word dimension (taking the Chinese word as a unit), inputting the word vector and the word vector into the BLSTM model respectively, and calculating a specific attention distribution coefficient r through the semantic role attention mechanism constructed in the step 3 and the semantic role attention mechanism based on position perception _a The calculation process is as follows:

in the formula (3), h _j Is a hidden layer vector of the j position word, alpha _j Is the attention of the words at the j position, len is the number of words in a sentence; and transmitting the attention distribution coefficient to a hidden layer vector of the BLSTM for calculation, performing weighted calculation on each word to obtain text characteristics under the influence of an attention mechanism, splicing calculation results of word dimensions through localization, and finally obtaining a multi-classification relation of the input text through a full connection layer and a sigmoid layer.

And 5, inputting the text corpus of the public opinion events into a pretraining model based on BERT to obtain a word vector of a text, adding 48-dimensional additional semantic information on the basis of 768-dimensional word vectors for Chinese radicals of each word, expanding the multi-classification relationship information (6-dimensional) obtained in the step 4 by 136-768 +48 dimensions, adding the expanded multi-classification relationship information to two ends of the word dimension text feature vectors extracted by the Bert pretraining model to obtain text semantic vectors fused with full text semantic information, inputting the text semantic vectors into a BLSTM model and a Conditional Random field CRF (Conditional Random Fields), and calculating the optimal probability through the Conditional Random Fields to obtain a final cosmetic public opinion text entity relation extraction result.

Referring to fig. 1, an overall schematic diagram of the method provided by the invention is shown, crawled cosmetic public opinion data is preprocessed, a cosmetic public opinion field resource library is constructed, increment training of the cosmetic public opinion field word embedding resource library and supplementary linguistic data is constructed by combining the public field word embedding resource library, text vectorization representation of word dimensions and text vectorization representation of word embedding word dimensions are obtained through a Bert pre-training model, word two-dimensional text feature vectors are obtained, multi-classification relations of the word two-dimensional text feature vectors are extracted, and finally, the entity relation of the cosmetic public opinion event text is extracted.

In the model diagram shown in fig. 2, first, a lower right word is embedded into a network to obtain a text vectorization representation of word dimensions, and in addition, a word dimension text vectorization representation of fused Chinese radical features is obtained in a lower left BERT network, and is respectively calculated by a BLSTM network of a fusion position-aware attention mechanism (semantic role part in the middle of the diagram) and connected with two paths of outputs, a multi-classification result is added to an upper BERT neural network text vector, and is calculated by the BLSTM of the fusion position-aware attention mechanism again, and finally, an optimal probability is calculated by a CRF to obtain an optimal output information marking sequence, and an event text relationship extraction result is obtained according to a text at a corresponding position to the sequence marking result.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims

1. A cosmetic public opinion text entity relation extraction method based on deep learning is characterized by comprising the following steps:

step 1, aiming at four publishing channels of cosmetic risk public opinion data: official release information, social news, E-commerce platform comment data and social media related information, a search engine technology and a network information mining technology are utilized, duplication removal and screening pretreatment are carried out on original text data obtained by a crawler, public opinion text corpus is formed, aiming at Chinese texts, word segmentation is carried out by using an improved Jieba method, stop words which do not have meanings in the original text data are removed, then a word bank in the cosmetic public opinion field is constructed based on inter-Point Mutual Information (PMI) calculation and manual screening correction, and professional words in the extracted cosmetic public opinion field are obtained;

step 2, aiming at the professional vocabularies of the cosmetic public opinion field extracted in the step 1, performing incremental training on a public field word embedding resource library to obtain a cosmetic public opinion field word embedding resource library;

step 3, semantic role labeling of entity 1, relation and entity 2 triples is carried out aiming at the cosmetic risk public opinion text extracted in the step 1, wherein the entity 1 is a main body of a cosmetic public opinion event, the entity 2 is an object of the cosmetic public opinion event, the relation is a relation between the entity 1 and the entity 2, the entity 1 comprises baby cream, a big head doll event and fake cosmetics, the entity 2 comprises hormone, preservative and overdue parts, and the relation is 6 in total: the method comprises the following steps of raw material components, adverse reactions, risk substances, public opinion heat, efficacy declaration and illegal behaviors, wherein a sentence is divided into different components aiming at a cosmetic risk public opinion text, the influence degree of a core word on adjacent words in the same sentence component changes along with the distance, the influence of all core words in the sentence on the adjacent words is accumulated to simulate the state of the whole sentence influenced by position perception, the position perception strategy is combined with the traditional attention mechanism, and the semantic role attention mechanism based on the position perception is constructed;

step 4, aiming at the cosmetic risk public opinion text extracted in the step 1, a bidirectional depth self-attention transformation network-based encoder BERT is adopted to construct a word vector fusing Chinese radical characteristics, then a word vector is constructed by using the word embedding resource library in the cosmetic public opinion field obtained in the step 2, and the word vector are subjected to a semantic role attention mechanism based on position perception constructed in the step 3 and based on a bidirectional long-short term memory network BLSTM model to obtain a multi-classification relation of the input text;

and 5, extracting a word vector fusing Chinese radical features of the encoder BERT based on the bidirectional depth self-attention transformation network from the input text, adding the multi-classification relation information obtained in the step 4 into the text feature vector extracted by the Bert pre-training model to obtain a word-fused two-dimensional text semantic vector, and inputting the text semantic vector into the BLSTM model and the conditional random field CRF to obtain a final cosmetic public opinion text entity relation extraction result.

2. The method for extracting the physical relationship between cosmetics public sentiments text based on deep learning as claimed in claim 1, wherein: in the step 1, when the web crawler suitable for the field of cosmetics and public sentiments is constructed, information which is issued by authoritative research institutions at home and abroad and causes harm to the health of human beings, animals and plants is crawled; the adverse reaction monitoring data of domestic and foreign research institutions on cosmetics are as follows: authoritative reports of domestic and foreign news media; problems and recalling information of cosmetic production enterprises in the production, storage, circulation and sale links; various information published by the national and foreign society of cosmetics industry; the products in the social network use the shared information and the E-commerce platform sales comment information to form cosmetic public opinion text corpora and construct a cosmetic public opinion field lexicon.

3. The method for extracting the physical relationship between cosmetics public sentiments text based on deep learning as claimed in claim 1, wherein: in the step 2, on the basis of the public domain word embedding resource library, the cosmetic domain professional vocabulary obtained in the step 1 is input into the leap model for incremental training, and with the continuous increase of the crawled content in the step 1, the leap model is input into the leap model at intervals to perform incremental training on the public domain word embedding resource library, and finally the public domain word embedding resource library is expanded into a word embedding resource library suitable for the cosmetic public sentiment domain.

4. The method for extracting entity relation of cosmetic public opinion text based on deep learning of claim 1, wherein the method comprises the following steps: in the step 3, a semantic role attention mechanism based on location awareness is constructed in the following specific process:

(1) The attention of the words at sentence j position is:

in the formula (1), h _j Is the hidden layer vector of the j position word, p _j Is the accumulated position perception influence vector of the words, len is the number of words in the sentence, h _i Is a hidden layer vector of a word at a certain position in a sentence, p _i A (-) is a vector for measuring the importance of a word based on the hidden layer vector and the location-aware influence vector;

(2) The specific form of a (-) is:

in the formula (2), W _H 、W _P Is h _j 、p _j A weight matrix of (a); b is a mixture of ₁ Is a bias vector belonging to a first layer parameter;

is a ReLU function; v is a global vector, v ^T Represents its transpose; b ₂ Is a bias vector belonging to the second layer parameter, len is the number of words in the sentence, and i is a word at a certain position in the sentence.

5. The method for extracting the physical relationship between cosmetics public sentiments text based on deep learning as claimed in claim 1, wherein: in the step 4, when the text corpus of the public sentiment event is input into the Bert pre-training model to obtain the vectorized representation of the text, the specific execution process is to segment the whole text input according to sentences, and then to enable the whole text input to be segmentedCoding the input by using a deep self-attention transformation network, masking partial content of a sentence after coding, predicting the masked content by residual content of the sentence after masking, comparing a predicted masking result with real masked content to obtain a predicted error, adjusting parameters of a model according to the predicted error, mapping an input text into a vector space through the prediction to obtain word dimension text vectorization expression, and adding 48-dimensional additional Chinese radical semantic information on the basis of 768-dimensional word vectors according to the similarity of the Chinese radicals in the text of the cosmetic public opinion field; obtaining word dimension text input vectors through the word embedding resource library in the cosmetic public sentiment field constructed in the step 2; respectively inputting the word vector and the word vector into a BLSTM model, judging the entity relationship of the input text through a semantic role attention mechanism constructed in step 3, propagating the obtained word attention distribution coefficient into the hidden layer vector of the BLSTM through the calculation of the semantic role attention mechanism based on position perception, and performing weighted calculation on each word to obtain the text characteristics under the influence of the attention mechanism, wherein the specific attention distribution coefficient r is _a The calculation process is as follows:

in the formula (3), h _j Is a hidden layer vector of the j position word, alpha _j Is the attention of the words in the j position, len is the number of the words in one sentence;

and after the character output of the word double-dimensional text is obtained, connecting the two paths of output, and finally obtaining the multi-classification relation of the input text through the calculation of a full connection layer and a sigmoid layer.

6. The method for extracting the physical relationship of the cosmetics public sentiment text based on the deep learning in the field of the cosmetics public sentiment as claimed in claim 1, wherein the method comprises the following steps: in the step 5, the public opinion event text corpus is input into a Bert pre-training model to obtain vectorization representation of the text, a word vector (768 +48 dimensions) containing Chinese radical information is obtained, the multi-classification result (6 dimensions) in the step 4 is expanded by 136 times and is consistent with the length of the word vector, the word vector is spliced at two ends of an input text word vector matrix to obtain a text vector with richer semantic features, the text vector is input into a BLSTM model for calculation, the entity relationship of the input text is judged through the semantic role attention mechanism based on position perception constructed in the step 3, and the final cosmetic public opinion text entity relationship extraction result is obtained after the optimal probability is calculated through a conditional random field CRF.