CN115204181A

CN115204181A - Text detection method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN115204181A
Application number: CN202210869399.1A
Authority: CN
Inventors: 徐睿峰; 王乾龙; 王睿; 温志渊
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-10-18

Abstract

The application discloses a text detection method, a text detection device, an electronic device and a computer readable storage medium, wherein the text detection method comprises the following steps: acquiring a text to be detected; covering each forward emotion word of a text to be detected to obtain a first mask text; covering each negative emotion word of the text to be detected to obtain a second mask text; predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; predicting the masked negative emotion words in the second mask text to generate a second reconstructed text; determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected; and determining the text to be detected as sarcasm text in response to the first similarity and/or the second similarity being smaller than a set threshold value. This application not only can improve the detection accuracy of ironic text, can also save a large amount of data labeling work, realizes unsupervised ironic detection.

Description

Text detection method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of natural language processing, and in particular, to a text detection method, apparatus, electronic device, and computer-readable storage medium.

Background

With the rapid development of internet technology, people often express their irony on social media with active or augmented words. When the text has ironic content, the emotional polarity of the sentence is reversed, and the actually implied emotion is changed.

In the prior art, the irony detection method mainly relies on a large amount of marking data to model complex feature representation, the construction of context information requires the design and the realization of complex feature extraction, and the established model also requires a large amount of marking data and a complex deep learning network.

However, the heavy data labeling work and the complex model construction make it difficult to apply the ironic detection model to the real scene, which causes the current detection method to easily make a large mistake in distinguishing ironic text appearing on the social platform, and the detection model has a low detection accuracy rate on the ironic text, which is difficult to meet the detection requirement.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a text detection method, a text detection device, an electronic device and a computer readable storage medium, and the problem that the ironic text cannot be detected well in the prior art can be solved.

In order to solve the above technical problem, a first technical solution adopted by the present application is to provide a text detection method, including: acquiring a text to be detected; covering each forward emotion word of a text to be detected to obtain a first mask text; covering each negative emotion word of the text to be detected to obtain a second mask text; predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; predicting the masked negative emotion words in the second mask text to generate a second reconstructed text; determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected; and determining the text to be detected as the irony text in response to the first similarity and/or the second similarity being smaller than a set threshold.

After the step of acquiring the text to be detected, the method comprises the following steps: performing part-of-speech tagging on each character in the text to be detected; covering each positive emotion word of a text to be detected to obtain a first mask text; and covering each negative emotion word of the text to be detected to obtain a second mask text, wherein the step comprises the following steps of: recognizing each emotion word from the marked text to be detected, and classifying the emotion words into positive emotion words or negative emotion words based on the polarity of the emotion words; identifying each verb and/or each noun of the non-emotional words from the marked text to be detected, and determining the verbs and/or the nouns as component words; masking forward emotion words and at least partial component words in the text to be detected by using mask characters to generate a first mask text; and masking the negative emotion words and the same component words in the text to be detected by using the mask characters to generate a second mask text.

The method includes the steps that each emotion word is recognized from a marked text to be detected, the emotion words are classified into positive emotion words or negative emotion words based on the polarity of the emotion words, and the method includes the following steps: recognizing each emotion word as a positive emotion word or a negative emotion word from the marked text to be detected by utilizing an external emotion resource vocabulary library, and dividing the emotion words into a corresponding positive emotion word set or a negative emotion word set; the method comprises the steps of identifying each verb and/or each noun of a non-emotional word from a marked text to be detected, and determining the verb and/or the noun as a component word, and comprises the following steps: acquiring grammatical information of a text to be detected by using a natural language processing tool; identifying each verb and/or each noun of the non-emotional words from the marked text to be detected based on the grammatical information, and determining the verbs and/or the nouns as component words; dividing each component word into a component word set, and dividing the component word set into at least two subsets; masking forward emotion words and at least partial component words in the text to be detected by using mask characters to generate a first mask text; and masking the negative emotion words and the same component words in the text to be detected by using mask characters to generate a second mask text, wherein the step comprises the following steps of: masking the forward emotion word set and all characters included in one subset by using mask characters to generate a first mask text; and masking all characters included in the negative emotion word set and the same subset by using the mask characters to generate a second mask text.

Predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; and predicting the masked negative emotion words in the second mask text to generate a second reconstructed text, wherein the step of predicting the masked negative emotion words in the second mask text comprises the following steps of: respectively acquiring word embedded vectors of a first mask text and a second mask text; the word embedding vector comprises a character vector and a position vector; splicing word embedded vectors corresponding to the first mask text and the second mask text respectively to obtain a first hidden sequence and a second hidden sequence with context characteristics; and predicting each sentiment word and each component word which are covered in the first hidden sequence and the second hidden sequence respectively to obtain a first reconstructed text and a second reconstructed text.

The first reconstructed text and the second reconstructed text are generated through a text generation model, and the text generation model comprises an encoder, an attention network and a decoder which are mutually cascaded; the step of respectively obtaining word embedding vectors of a first mask text and a second mask text comprises the following steps: respectively acquiring word embedding vectors of a first mask text and a second mask text by utilizing an encoder of a text generation model; respectively splicing the word embedded vectors corresponding to the first mask text and the second mask text to obtain a first hidden sequence and a second hidden sequence with context characteristics, wherein the step comprises the following steps of: respectively encoding the word embedded vectors corresponding to the first mask text and the second mask text by using an encoder to obtain a first hidden sequence and a second hidden sequence with context characteristics; the method comprises the steps of predicting each sentiment word and each component word which are covered in a first hidden sequence and a second hidden sequence respectively to obtain a first reconstructed text and a second reconstructed text, and comprises the following steps: and decoding the first hidden sequence and the second hidden sequence in turn by using an attention network and a decoder in the text generation model so as to predict each masked emotional word and each masked component word, and outputting a first reconstructed text and a second reconstructed text.

Wherein the attention network comprises a self-attention mechanism; the method comprises the steps of decoding a first hidden sequence and a second hidden sequence in sequence by using an attention network and a decoder in a text generation model to predict each concealed emotion word and each component word, and outputting a first reconstructed text and a second reconstructed text, wherein the steps comprise: and sequentially performing autoregressive decoding on the first hidden sequence and the second hidden sequence by using an automatic attention mechanism and a decoder so as to predict each covered emotion word and each component word based on each time step, and outputting a first reconstructed text and a second reconstructed text.

The step of determining the first similarity between the first reconstructed text and the text to be detected and the second similarity between the second reconstructed text and the text to be detected comprises the following steps: respectively acquiring word embedding vectors of a text to be detected, a first reconstructed text and a second reconstructed text; splicing word embedded vectors corresponding to the text to be detected, the first reconstructed text and the second reconstructed text respectively to obtain a text sequence to be detected, a first text sequence and a second text sequence with context characteristics; performing similarity calculation on the text sequence to be detected and the first text sequence by utilizing at least one similarity algorithm to obtain a first similarity; and performing similarity calculation on the text sequence to be detected and the second text sequence by using a similarity calculation method to obtain a second similarity.

In order to solve the above technical problem, a second technical solution adopted by the present application is to provide a text detection apparatus, including: the acquisition module is used for acquiring the text to be detected; the mask module is used for covering each forward emotion word of the text to be detected to obtain a first mask text; covering each negative emotion word of the text to be detected to obtain a second mask text; the reconstructed text generation module is used for predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; predicting the masked negative emotion words in the second mask text to generate a second reconstructed text; the similarity calculation module is used for determining the first similarity between the first reconstructed text and the text to be detected and the second similarity between the second reconstructed text and the text to be detected; and the determining module is used for determining the text to be detected as the irony text in response to the first similarity and/or the second similarity being smaller than a set threshold value.

In order to solve the above technical problem, a third technical solution adopted by the present application is to provide an electronic device, including: a memory for storing program data which, when executed, implement the steps in the text detection method as described above; a processor for executing the program data stored by the memory to implement the steps in the text detection method as described above.

In order to solve the above technical problem, a fourth technical solution adopted by the present application is to provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program, when executed by a processor, implements the steps in the text detection method.

The beneficial effect of this application is: different from the prior art, the text detection method, the text detection device, the electronic equipment and the computer-readable storage medium are provided, and a first mask text and a second mask text can be generated by respectively masking positive emotion words and negative emotion words in a text to be detected; then, predicting the first mask text and the second mask text respectively based on the covered positive emotion words and negative emotion words, and generating a first reconstructed text and a second reconstructed text with the same emotion and semanteme according to the logic of the unmasked text so as to keep the consistency between the contexts; further, whether the text to be detected is ironic text or not can be determined by calculating the similarity between the text to be detected and the reconstructed text and based on the matching degree between the similarity and a set threshold value. By the method, better contextual ironic detection is realized, the detection accuracy of ironic texts is improved, a large amount of data labeling work is saved, and unsupervised ironic detection is realized, so that the method can be applied to an actual scene to meet detection requirements.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart of a first embodiment of a text detection method according to the present application;

FIG. 2 is a schematic flow chart of a second embodiment of the text detection method of the present application;

FIG. 3 is a flowchart illustrating an application scenario for masking a text to be detected according to the present application;

FIG. 4 is a schematic flow chart of a third embodiment of the text detection method of the present application;

FIG. 5 is a flowchart illustrating an application scenario for performing a restructure operation on a masked text according to the present application;

FIG. 6 is a schematic flow chart of a fourth embodiment of the text detection method according to the present application;

FIG. 7 is a flowchart illustrating an application scenario of the text detection method of the present application;

FIG. 8 is a schematic structural diagram of an embodiment of a document detection apparatus according to the present application;

FIG. 9 is a schematic diagram of an embodiment of an electronic device;

FIG. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plural" includes at least two in general, but does not exclude the presence of at least one.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that the terms "comprises," "comprising," or any other variation thereof, as used herein, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a text detection method according to the present application. In this embodiment, a text detection method includes:

s11: and acquiring the text to be detected.

In this embodiment, the text to be detected is comment information that is issued by the user on the network for a specific content under a certain topic, such as a medical topic or an educational topic.

A large amount of user comment information can be obtained from social media in a network crawling mode and serves as texts to be detected.

Wherein the text to be detected usually comprises a plurality of words or phrases.

S12: covering each positive emotion word of a text to be detected to obtain a first mask text; and covering each negative emotion word of the text to be detected to obtain a second mask text.

The term "emotional polar words" as used herein refers to adjectives, nouns and verbs that represent or relate to emotions, such as "happy", "liked", "respected", "hard", "crying" and the like.

Here, the Positive emotion Word (PW) is a Word expressing a Positive emotion, such as "happy", "liked", "respected", and the like, and the transmitted emotion is usually Positive and Positive.

Here, negative emotion words (NWs) are words expressing Negative emotions, such as "hard pass", "cry", and the like, and transmitted emotions are usually negatively negated.

S13: predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; and predicting the masked negative emotion words in the second mask text to generate a second reconstructed text.

In the embodiment, by predicting the forward emotion words covered in the first mask text, the first reconstructed text with the forward emotion consistent with the semantics can be generated based on the logic of the unmasked text in the text to be detected; by predicting the covered negative emotion words in the second mask text, a second reconstructed text with negative emotion consistent with semantics can be generated based on logic of the unmasked text in the text to be detected, and therefore consistency between contexts is kept.

S14: and determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected.

In this embodiment, any one of a Cosine Similarity (Cosine Similarity), an Euclidean Distance (Euclidean Distance), a Pearson Correlation Coefficient (Pearson Correlation Coefficient), a KL Divergence (Kullback-Leibler Divergence), a Jaccard Similarity Coefficient (Jaccard Correlation), a Tanimoto Coefficient (generalized Jaccard Similarity), and Mutual Information (Mutual Information) may be used to determine the first Similarity between the first reconstructed text and the text to be detected and the second Similarity between the second reconstructed text and the text to be detected, which is not limited in this application.

S15: and determining the text to be detected as sarcasm text in response to the first similarity and/or the second similarity being smaller than a set threshold value.

It can be understood that, if the text to be detected is not a ironic text, and there is no divergence between the emotion and the semantics of the context, the similarity between the first reconstructed text and the second reconstructed text generated by prediction and the text to be detected is higher, and then the first similarity and the second similarity are higher.

On the contrary, if the text to be detected is a sarcasm text, and the emotion and the semantics of the context of the text to be detected are divergent, at least one of the first reconstructed text and the second reconstructed text generated by prediction is greatly different from the text to be detected, so that the first similarity and/or the second similarity are/is smaller than the set threshold.

In a specific implementation scenario, the text to be detected is that education, medical treatment and old age are happiness sources of modern young people, and happiness are difficult to breathe, wherein happiness is a positive emotion word, difficulty in breathing is a negative emotion word, and emotions and semantics of happiness and difficulty in breathing are different. If "happiness" is covered and prediction is performed based on the unmasked text "hard breathing", pain is generally generated, that is, the first reconstructed text is "painful hard breathing", the first reconstructed text and the text to be detected have a larger difference, and the first similarity is inevitably smaller than a set threshold value, so that the text to be detected is determined to be irony.

Different from the prior art, the embodiment can generate the first mask text and the second mask text by respectively covering the positive emotion words and the negative emotion words in the text to be detected. And then, predicting the first mask text and the second mask text respectively based on the masked positive emotion words and the masked negative emotion words, and generating a first reconstructed text and a second reconstructed text with the emotion consistent with the semantics according to the logic of the unmasked text, so as to keep the consistency between the contexts. Further, whether the text to be detected is ironic text or not can be determined by calculating the similarity between the text to be detected and the reconstructed text and based on the matching degree between the similarity and a set threshold value. By the method, the embodiment not only realizes better contextual ironic detection, improves the detection accuracy of ironic texts, but also saves a large amount of data labeling work, realizes unsupervised ironic detection, and can be applied to actual scenes to meet detection requirements.

Referring to fig. 2, fig. 2 is a flowchart illustrating a second embodiment of the text detection method according to the present application. In this embodiment, a text detection method includes:

s21: and acquiring a text to be detected, and performing part-of-speech tagging on each character in the text to be detected.

In this embodiment, a natural language processing tool is used to perform part-of-speech tagging and classification on each character in an input text to be detected.

Specifically, parts of speech can be generally divided into ten categories: nouns, pronouns, verbs, adjectives, quantifiers, adverbs, prepositions, conjunctions, articles, interjections. Wherein, the first six are real words, and the last four are imaginary words.

S22: recognizing each emotion word from the marked text to be detected, and classifying the emotion words into positive emotion words or negative emotion words based on the polarity of the emotion words; and identifying each verb and/or each noun of the non-emotional words from the marked text to be detected, and determining the verb and/or the noun as the component words.

In the embodiment, each emotion word is recognized from the marked text to be detected as a positive emotion word or a negative emotion word by using an external emotion resource vocabulary library, and is divided into a corresponding positive emotion word set or a corresponding negative emotion word set.

Wherein, the positive emotion word set is represented as PW = { PW = ₁ ，pw ₂ ，…，pw _h Represents negative emotion words as NW = { NW } ₁ ，nw ₂ ，…，nw _h }。

The external emotion resource vocabulary library is a resource library with any emotion polarity label.

In one specific implementation scenario, the external sentiment resource vocabulary library is a SenticNet dictionary. In another specific implementation scenario, the external emotion resource vocabulary library is a HowNet dictionary, which is not limited in this application.

In the embodiment, the grammar information of the text to be detected is acquired by using a natural language processing tool, each verb and/or each noun of a non-emotional word is identified from the marked text to be detected based on the grammar information, and the verb and/or the noun are determined as component words.

Wherein the natural language processing tool comprises a space tool. The space tool is the fastest industrial natural language processing tool in the world, supports multiple natural language processing basic functions, and mainly has the functions of word segmentation, part of speech tagging, word drying, named entity recognition, noun phrase extraction and the like.

The component words are generally important components of the sentence, and the component words include adjectives of the non-emotional words besides verbs or nouns of the non-emotional words. The component words do not include prepositions, conjunctions, articles, exclamations and other fictional words, and pronouns, quantifications or nouns such as names of people and places.

In a specific implementation scenario, the text to be detected is that "i like to see a football game and the result is expected each time", and after the emotion word of "like" is removed, the identified component words comprise { football, game, result and expectation }.

In this embodiment, each component word is further divided into component word sets, and the component word sets are combinedInto at least two subsets. Wherein, the component word set is represented as SW = { SW = ₁ ，sw ₂ ，…，sw _m Denotes the subset as SW ₁ 、SW ₂ 、…、SW _n 。

In one particular implementation scenario, the set of component words SW is divided into two subsets SW ₁ 、SW ₂ And make SW ₁ And SW ₂ The number of component words included in (1) is approximately equal. In order to ensure context consistency, the component words included in each sentence of the text to be detected can be equally divided into two subsets. For example, a sentence includes 8 component words, 4 of which are divided into SW ₁ Divide the remaining 4 component words into SW ₂ 。

Here, a text to be detected "i like watching football games and having unexpected results" is taken as an example for explanation, a set SW of component words of the text to be detected = { football, game, result, and unexpected }, and the "football" in the first sentence and the "result" in the second sentence can be divided into SW ₁ In the first sentence, "race" and "surprise" in the second sentence are divided into SW ₂ In (1).

S23: masking the positive emotion words and at least partial component words in the text to be detected by using mask characters to generate a first mask text; and masking the negative emotion words and the same component words in the text to be detected by using the mask characters to generate a second mask text.

In this embodiment, the mask characters are used to mask the forward emotion word set and all the characters included in one of the subsets to generate a first mask text. And masking all characters included in the negative emotion word set and the same subset by using the mask characters to generate a second mask text.

In a specific implementation scene, the mask characters are used for covering the forward emotional word set and the SW ₁ All characters included to generate a first masked text. Masking the negative emotion word set and SW by mask characters ₁ All characters included to generate a second masked text.

In another specific implementation scenario, the mask characters can be used to mask the forward emotion word set and SW ₂ All characters included to generate a first masked text. And masking the negative emotion word set and SW by using mask characters ₂ All characters included to generate a second masked text.

Here, the text to be detected is taken as an example of "i like to see a football game and an unexpected result every time", the emotion word is "like", and the component word set SW = { football, game, result, and unexpected }, SW ₁ = football, result, SW ₂ = { race, surprise }, masking the set of forward emotion words and SW with mask characters ₁ After all the characters are included, the first MASK text is' me [ MASK]See [ MASK ]]Match, every time [ MASK]All expect "

It can be understood that the component words are generally important components of the sentence, and if the component words in the component word set are all covered without dividing the component word set, the text loses too much information, and the uncovered text contains too little information to ensure enough logic for subsequent prediction. In the embodiment, the component word set is divided into a plurality of subsets, and only the component words included in some of the subsets are covered, so that the unmasked text can be ensured to still have enough logic.

Referring to fig. 3, fig. 3 is a flowchart illustrating an application scenario for performing a masking operation on a text to be detected according to the present application. In this embodiment, after the text to be detected is obtained, a natural language processing tool is first used to perform part-of-speech tagging and classification on each character in the input text to be detected. And then, recognizing and classifying each emotion word from the marked text to be detected by utilizing an external emotion resource vocabulary library, and acquiring component words by utilizing a natural language processing tool. Then, masking forward emotion words and at least partial component words in the text to be detected by using mask characters to generate a first mask text m ₁ Masking negative emotion words and same component words in the text to be detected by using mask characters to generate a second mask text m ₂ 。

S24: predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; and predicting the masked negative emotion words in the second mask text to generate a second reconstructed text.

For a detailed process, please refer to the description in S13, which is not described herein again.

S25: and determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected.

For details, please refer to the description in S14, which is not described herein.

S26: and determining the text to be detected as sarcasm text in response to the first similarity and/or the second similarity being smaller than a set threshold value.

For a detailed process, please refer to the description in S15, which is not described herein again.

In the prior art, the difficulty of accurately acquiring the emotional polar words and the main component words in the sentences is high, and a large amount of manpower is consumed.

Different from the prior art, the embodiment respectively and approximately acquires the emotional words and the main component words in the sentences by using the external emotional resource words and the natural language processing tool, so that the emotional words and the component words can be prevented from being marked, and a large amount of manpower is saved.

Referring to fig. 4, fig. 4 is a flowchart illustrating a third embodiment of the text detection method according to the present application. In this embodiment, the first reconstructed text and the second reconstructed text are generated by a text generation model, the text generation model includes an encoder, an attention network, and a decoder in cascade, and the attention network includes a self-attention mechanism. The text detection method comprises the following steps:

s41: and acquiring the text to be detected.

For details, please refer to the description in S11, which is not repeated herein.

S42: covering each positive emotion word of a text to be detected to obtain a first mask text; and covering each negative emotion word of the text to be detected to obtain a second mask text.

For details, please refer to the description in S12 or S21 to S23, which is not described herein again.

S43: respectively acquiring word embedded vectors of a first mask text and a second mask text; the word embedding vector comprises a character vector and a position vector.

In the present embodiment, word embedding (embedding) vectors of a first mask text and a second mask text are obtained by using a Transformer encoder of a text generation model.

Wherein, the Transformer encoder is a sequence modeling model composed of a self-attention mechanism (self-attention) and a feed-forward Neural Network (feed-forward Neural Network).

Wherein, the Transformer encoder processes the text to be detected into a text sequence X = { X = { (X) } ₁ ，x ₂ ，…，x _n After that, the text sequence X = { X ] is mapped by embedding mapping function ₁ ，x ₂ ，…，x _n Processing as word embedding vector = { e } ₁ ，e ₂ ，…，e _n }. Wherein each word is embedded in a vector e _i The corresponding character vector and the position vector are added to form the character vector.

S44: and respectively splicing the word embedded vectors corresponding to the first mask text and the second mask text to obtain a first hidden sequence and a second hidden sequence with context characteristics.

In this embodiment, the transform encoder is used to encode the word embedding vectors corresponding to the first mask text and the second mask text, respectively, so as to obtain a first hidden sequence and a second hidden sequence having context characteristics.

Specifically, the transform encoder encodes word embedding vectors corresponding to the first mask text and the second mask text by using a context mapping function based on the position vector, so as to obtain a first hidden sequence and a second hidden sequence with context features.

Wherein the context mapping function is implemented using a multi-layer stacked neural network.

Wherein the first hidden sequence is denoted h ₁ ＝{h ₁ ，h ₂ ，…，h _n The second hiding sequenceThe columns are denoted by h ₂ ＝{h ₁ ，h ₂ ，…，h _m }。

The first hidden sequence and the second hidden sequence are semantic representation vectors, and are called hidden sequences because the sequences include mask characters.

S45: and predicting each sentiment word and each component word which are covered in the first hidden sequence and the second hidden sequence respectively to obtain a first reconstructed text and a second reconstructed text.

In the present embodiment, the first hidden sequence and the second hidden sequence are sequentially decoded by using an attention network and a decoder in the text generation model to predict each masked emotion word and each masked component word, and the first reconstructed text and the second reconstructed text are output.

The first hidden sequence and the second hidden sequence are subjected to autoregressive decoding by an attention mechanism and a Transformer decoder in sequence, each sentiment word and each component word which are covered are predicted based on each time step, and a first reconstructed text and a second reconstructed text are output.

Specifically, obtaining h ₁ ＝{h ₁ ，h ₂ ，…，h _n H and ₂ ＝{h ₁ ，h ₂ ，…，h _m after that, the transform decoder accepts the hidden state and uses the decoding mapping function to continuously predict the next mask character w _i Corresponding characters, each time step can predict an output character

Until a terminator is generated<eos>And then outputting the first reconstructed text and the second reconstructed text. Wherein h is _i Refers to the first concealment sequence h ₁ Or a second hidden sequence h ₂ 。

In this embodiment, in order to ensure that the text generation model can generate sentences with consistent emotion and semantics after following the logical structure of the mask text, the transform encoder and the transform decoder are initialized by using the parameters of the pre-training language generation model.

In a specific implementation scenario, the Transformer encoder and the Transformer decoder may be initialized using parameters of the BERT model. In another specific implementation scenario, the Transformer encoder and the Transformer decoder may be initialized with parameters of the GPT-2 model.

Referring to fig. 5, fig. 5 is a flowchart illustrating an application scenario for reconstructing a mask text according to the present application. In this embodiment, the first mask text m is acquired ₁ With a second mask text m ₂ Then, firstly, the text to be detected is processed into a text sequence X = { X } by using a Transformer encoder ₁ ，x ₂ ，…，x _n Then the text sequence X = { X ] is mapped using an embedding mapping function ₁ ，x ₂ ，…，x _n Processing as word embedding vector = { e = } ₁ ，e ₂ ，…，e _n And then respectively carrying out mapping on the first mask texts m by utilizing a context mapping function based on the position vector ₁ And a second mask text m ₂ The corresponding word embedding vector is coded to obtain a first hidden sequence h with context characteristics ₁ And a second concealment sequence h ₂ . Finally, the first hidden sequence h is orderly processed by a Transformer decoder ₁ And a second concealment sequence h ₂ And performing autoregressive decoding to predict each covered emotion word and each component word based on each time step, and outputting a first reconstructed text and a second reconstructed text.

S46: and determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected.

S47: and determining the text to be detected as the irony text in response to the first similarity and/or the second similarity being smaller than a set threshold.

For details, please refer to the description in S15, which is not repeated herein.

By the method, the embodiment can input the masked text into the pre-trained text generation model, so that the text generation model generates sentences with consistent emotion and semanteme according to the logic structure of the unmasked text.

Referring to fig. 6, fig. 6 is a flowchart illustrating a fourth embodiment of the text detection method according to the present application. In the embodiment, the text to be detected is detected through a sarcasm detection model, and the sarcasm detection model is a fine-tuned pre-training language model. The text detection method comprises the following steps:

s61: and acquiring the text to be detected.

S62: covering each forward emotion word of a text to be detected to obtain a first mask text; and covering each negative emotion word of the text to be detected to obtain a second mask text.

For details, please refer to descriptions in S12 or S21 to S23, which are not described herein.

S63: predicting the masked forward emotion words in the first mask text to generate a first reconstructed text; and predicting the masked negative emotion words in the second mask text to generate a second reconstructed text.

For details, please refer to descriptions in S13 or S43 to S45, which are not described herein.

S64: and respectively acquiring word embedded vectors of the text to be detected, the first reconstructed text and the second reconstructed text.

In this embodiment, word embedding vectors of the text to be detected, the first reconstructed text, and the second reconstructed text are obtained by using a transform encoder in the ironic detection model. The process of obtaining the word embedding vector is described in S43, and is not described herein again.

S65: and respectively splicing word embedded vectors corresponding to the text to be detected, the first reconstructed text and the second reconstructed text to obtain a text sequence to be detected, a first text sequence and a second text sequence with context characteristics.

In this embodiment, word embedding vectors corresponding to the text to be detected, the first reconstructed text, and the second reconstructed text are spliced by using a transform encoder in the ironic detection model, so as to obtain a text sequence to be detected, a first text sequence, and a second text sequence having context characteristics. The specific process is described in S44, and is not described herein again.

S66: performing similarity calculation on the text sequence to be detected and the first text sequence by utilizing at least one similarity algorithm to obtain a first similarity; and performing similarity calculation on the text sequence to be detected and the second text sequence by using a similarity calculation method to obtain a second similarity.

In a specific implementation scenario, cosine similarity is adopted to calculate similarity between a text sequence to be detected and a first text sequence, so as to obtain first similarity. And similarly, calculating the similarity between the text sequence to be detected and the second text sequence by adopting cosine similarity to obtain a second similarity.

The cosine similarity is also called cosine similarity, and the similarity is evaluated by calculating the cosine value of the included angle between two vectors. The closer the value is to 1, the closer the angle is to 0 °, i.e. the more similar the two vectors are, the called cosine similarity.

S67: and determining the text to be detected as the irony text in response to the first similarity and/or the second similarity being smaller than a set threshold.

In this embodiment, the first similarity and the second similarity are respectively compared with a set threshold, and whether the text to be detected is a ironic text is determined according to the two comparison results.

Specifically, whether the text to be detected is irony text is determined by the following detection formula:

diff＝conine(h _x ,h _ma )>threshold||conine(h _x ,h _mb )>threshold

y＝I(diff)(1)

wherein y = I (diff) represents that the text to be detected is ironic text, diff is ironic text, and continine (h) _x ,h _ma ) First similarity, contine (h) _x ,h _mb ) For the second similarity, threshold is a set threshold, and | represents "or" logic.

Here, the set threshold may be any value less than 1 but not less than 0.5. In one specific implementation scenario, the set threshold may be 0.5. In another specific implementation scenario, the set threshold may be 0.8, which is not limited in this application.

It can be understood that, if the text to be detected is not ironic text, and the emotion and the semantic of the context of the text to be detected are not divergent, the similarity between the first reconstructed text and the second reconstructed text generated by prediction and the text to be detected is higher, and then the first similarity and the second similarity are higher.

In the present embodiment, the ironic detection model is a BERT model in order to obtain a good sentence expression vector. In order to further improve the representation quality of the sentence, a contrast learning algorithm is also adopted to further fine-tune BERT on the label-free data set. Wherein the comparative learning algorithm comprises SimCSE.

Referring to fig. 7, fig. 7 is a flowchart illustrating an application scenario of the text detection method of the present application. In this embodiment, after the text to be detected is obtained, a natural language processing tool is first used to perform part-of-speech tagging and classification on each character in the input text to be detected. And then, recognizing and classifying each emotion word from the marked text to be detected by utilizing an external emotion resource vocabulary library, and acquiring component words by utilizing a natural language processing tool. Then, masking forward emotion words and at least partial component words in the text to be detected by using mask characters to generate a first mask text m ₁ Masking negative emotion words and same component words in the text to be detected by using mask characters to generate a second mask text m ₂ . The first masked text m is then ₁ And a second mask text m ₂ Inputting the first reconstructed text and the second reconstructed text into a text generation model, and outputting the first reconstructed text and the second reconstructed text by using the text generation modelThe text is reconstructed. And then inputting the text to be detected, the first reconstructed text and the second reconstructed text into a sarcasia detection model, converting the text to be detected, the first reconstructed text and the second reconstructed text into a text sequence to be detected, a first text sequence and a second text sequence with context characteristics by using the sarcasia detection model, calculating the similarity of the text sequence to be detected, the first text sequence and the second text sequence by using cosine similarity to obtain first similarity and second similarity, judging the first similarity and the second similarity by using a detection formula, and determining whether the text to be detected is the sarcasia text or not based on the judgment result.

By the method, the first reconstructed text and the second reconstructed text with the emotion consistent with the semantics can be generated according to the logic of the unmasked text, so that the consistency between the contexts is maintained, and then whether the text to be detected is ironic text can be determined according to the similarity between the text to be detected and the first reconstructed text and the second reconstructed text, so that better context ironic detection is realized, the ironic text detection accuracy is improved, a large amount of data labeling work is saved, the unsupervised ironic detection is realized, and the method can be applied to an actual scene to meet the detection requirement.

The text detection method can be applied to a social media text analysis system, and the main application scenes of the text detection method comprise a microblog irony detection system and a social public opinion analysis system. By introducing the text detection method, the microblog ironic detection system can more accurately complete ironic detection according to deep semantic consistency. In the social public opinion analysis system, the text detection method of the embodiment can be used as a sub-module of the system to realize better context irony detection, thereby improving the detection accuracy of irony text.

Correspondingly, the application provides a text detection device.

Please refer to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a text detection apparatus according to the present application. As shown in fig. 8, the text detection apparatus 80 includes an acquisition module 81, a mask module 82, a reconstructed text generation module 83, a similarity calculation module 84, and a determination module 85.

The obtaining module 81 is configured to obtain a text to be detected.

The mask module 82 is used for covering each forward emotion word of the text to be detected to obtain a first mask text; and covering each negative emotion word of the text to be detected to obtain a second mask text.

The reconstructed text generating module 83 is configured to predict the masked forward emotion words in the first mask text to generate a first reconstructed text; and predicting the masked negative emotion words in the second mask text to generate a second reconstructed text.

And the similarity calculation module 84 is configured to determine a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected.

A determining module 85, configured to determine that the text to be detected is ironic text in response to the first similarity and/or the second similarity being smaller than a set threshold.

For a specific process, please refer to the related text descriptions in S11 to S15, S21 to S26, and S41 to S47, which are not described herein again.

Different from the prior art, in the embodiment, the mask module 82 is used for respectively masking the positive emotion words and the negative emotion words in the text to be detected, so that a first mask text and a second mask text can be generated. Then, the first mask text and the second mask text are respectively predicted based on the masked positive emotion words and the masked negative emotion words, and the first reconstructed text and the second reconstructed text with the emotion consistent with the semantics can be generated by the reconstructed text generation module 83 according to the logic of the unmasked text, so that the consistency between the contexts is maintained. Further, the similarity between the text to be detected and the reconstructed text is calculated by the similarity calculation module 84, and based on the matching between the similarity and the set threshold, whether the text to be detected is the sarcasm text can be determined by the determination module 85. This embodiment has not only realized better context irony and has detected, has improved irony text's detection accuracy, has still saved a large amount of data labeling work, has realized unsupervised irony and has detected to can apply in the actual scene, in order to satisfy the detection demand.

Correspondingly, the application provides the electronic equipment.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of an electronic device according to the present application. As shown in fig. 9, the electronic device 90 includes a memory 91 and a processor 92.

In the present embodiment, the memory 91 is used for storing program data, and the program data realizes the steps in the text detection method as described above when executed; the processor 92 is operative to execute program instructions stored by the memory 91 to implement steps in a text detection method as described above.

In particular, the processor 92 is adapted to control itself and the memory 91 to implement the steps in the text detection method as described above. The processor 92 may also be referred to as a CPU (Central Processing Unit). The processor 92 may be an integrated circuit chip having signal processing capabilities. The Processor 92 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 92 may be commonly implemented by a plurality of integrated circuit chips.

Different from the prior art, in the embodiment, the processor 92 is used for respectively covering the positive emotion words and the negative emotion words in the text to be detected, so that a first mask text and a second mask text can be generated; then, predicting the first mask text and the second mask text respectively based on the covered positive emotion words and negative emotion words, and generating a first reconstructed text and a second reconstructed text with the same emotion and semanteme according to the logic of the unmasked text so as to keep the consistency between the contexts; further, whether the text to be detected is ironic text or not can be determined by calculating the similarity between the text to be detected and the reconstructed text and based on the matching degree between the similarity and a set threshold value. Through above-mentioned method, this application has not only realized better context irony and has detected, has improved irony text's detection accuracy, has still saved a large amount of data labeling work, has realized unsupervised irony and has detected to can apply in the actual scene, in order to satisfy the measuring demand.

Correspondingly, the application provides a computer readable storage medium.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium according to the present invention.

The computer-readable storage medium 100 comprises a computer program 1001 stored on the computer-readable storage medium 100, said computer program 1001, when being executed by the processor, implementing the steps in the text detection method as described above. In particular, the integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium 100. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a computer-readable storage medium 100 and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned computer-readable storage medium 100 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A text detection method, comprising:

acquiring a text to be detected;

covering each forward emotion word of the text to be detected to obtain a first mask text; covering each negative emotion word of the text to be detected to obtain a second mask text;

predicting the positive emotion words which are covered in the first mask text to generate a first reconstructed text; predicting the negative emotion words which are covered in the second mask text to generate a second reconstructed text;

determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected;

determining the text to be detected as irony text in response to the first similarity and/or the second similarity being smaller than a set threshold.

2. The text detection method according to claim 1,

after the step of acquiring the text to be detected, the method comprises the following steps:

performing part-of-speech tagging on each character in the text to be detected;

covering each forward emotion word of the text to be detected to obtain a first mask text; and covering each negative emotion word of the text to be detected to obtain a second mask text, wherein the step comprises the following steps of:

recognizing each emotion word from the marked text to be detected, and classifying the emotion words into the positive emotion words or the negative emotion words based on the polarity of the emotion words; and

identifying each verb and/or each noun which are not the emotional words from the marked text to be detected, and determining the verbs and/or the nouns as component words;

masking the forward emotion words and at least part of the component words in the text to be detected by using mask characters to generate a first mask text; and

covering the negative emotion words and the same component words in the text to be detected by using the mask characters to generate a second mask text.

3. The text detection method according to claim 2,

the step of identifying each emotion word from the marked text to be detected, classifying the emotion words into the positive emotion words or the negative emotion words based on the polarity of the emotion words, comprises:

recognizing each emotion word as the positive emotion word or the negative emotion word from the marked text to be detected by utilizing an external emotion resource vocabulary library, and dividing the emotion word into a corresponding positive emotion word set or negative emotion word set;

the step of identifying each verb and/or each noun which is not the emotion word from the marked text to be detected and determining the verb and/or the noun as component words comprises the following steps:

acquiring grammatical information of the text to be detected by using a natural language processing tool;

identifying each verb and/or each noun which is not the emotional word from the marked text to be detected based on the grammatical information, and determining the verb and/or the noun as the component word;

dividing each component word into a component word set, and dividing the component word set into at least two subsets;

the step of using the mask characters to mask the negative emotion words and the same component words in the text to be detected to generate the second mask text includes:

masking all characters included in the forward emotion word set and one of the subsets with the mask character to generate the first mask text; and

masking all characters included in the negative emotion word set and the same subset with the mask character to generate the second mask text.

4. The text detection method according to claim 2,

the predicting the positive emotion words covered in the first mask text to generate a first reconstructed text; and predicting the negative emotion words which are covered in the second mask text to generate a second reconstructed text, wherein the step of predicting the negative emotion words comprises the following steps:

respectively acquiring word embedded vectors of the first mask text and the second mask text; wherein the word embedding vector comprises a character vector and a position vector;

splicing the word embedding vectors corresponding to the first mask text and the second mask text respectively to obtain a first hidden sequence and a second hidden sequence with context characteristics;

predicting each sentiment word and each component word which are covered in the first hidden sequence and the second hidden sequence respectively to obtain the first reconstructed text and the second reconstructed text.

5. The text detection method according to claim 4,

the first reconstructed text and the second reconstructed text are generated through a text generation model, and the text generation model comprises an encoder, an attention network and a decoder which are mutually cascaded;

the step of respectively obtaining word embedding vectors of the first mask text and the second mask text includes:

respectively acquiring the word embedding vectors of the first mask text and the second mask text by utilizing the encoder of the text generation model;

the step of splicing the word embedding vectors corresponding to the first mask text and the second mask text respectively to obtain a first hidden sequence and a second hidden sequence with context characteristics includes:

respectively encoding the word embedding vectors corresponding to the first mask text and the second mask text by using the encoder to obtain the first hidden sequence and the second hidden sequence with context characteristics;

the step of predicting each sentiment word and each component word that are covered in the first hidden sequence and the second hidden sequence respectively to obtain the first reconstructed text and the second reconstructed text includes:

sequentially decoding the first hidden sequence and the second hidden sequence by using the attention network and the decoder in the text generation model to predict each masked emotion word and each masked component word, and outputting the first reconstructed text and the second reconstructed text.

6. The text detection method according to claim 5,

the attention network includes a self-attention mechanism;

the step of decoding the first hidden sequence and the second hidden sequence in turn by using the attention network and the decoder in the text generation model to predict each masked emotion word and each masked component word and output the first reconstructed text and the second reconstructed text includes:

and sequentially performing autoregressive decoding on the first hidden sequence and the second hidden sequence by using the self-attention mechanism and the decoder to predict each masked emotion word and each masked component word based on each time step, and outputting the first reconstructed text and the second reconstructed text.

7. The text detection method according to claim 1 or 6,

the step of determining the first similarity between the first reconstructed text and the text to be detected and the second similarity between the second reconstructed text and the text to be detected includes:

respectively acquiring word embedding vectors of the text to be detected, the first reconstructed text and the second reconstructed text;

splicing the word embedded vectors corresponding to the text to be detected, the first reconstructed text and the second reconstructed text respectively to obtain a text sequence to be detected, a first text sequence and a second text sequence with context characteristics;

calculating the similarity between the text sequence to be detected and the first text sequence by utilizing at least one similarity algorithm to obtain the first similarity; and the number of the first and second groups,

and calculating the similarity of the text sequence to be detected and the second text sequence by using the similarity calculation method to obtain the second similarity.

8. A text detection apparatus, comprising:

the acquisition module is used for acquiring the text to be detected;

the mask module is used for covering each positive emotion word of the text to be detected to obtain a first mask text; covering each negative emotion word of the text to be detected to obtain a second mask text;

the reconstructed text generation module is used for predicting the forward emotion words which are covered in the first mask text to generate a first reconstructed text; predicting the negative emotion words which are covered in the second mask text to generate a second reconstructed text;

the similarity calculation module is used for determining a first similarity between the first reconstructed text and the text to be detected and a second similarity between the second reconstructed text and the text to be detected;

a determining module, configured to determine that the text to be detected is ironic text in response to the first similarity and/or the second similarity being smaller than a set threshold.

9. An electronic device, comprising:

a memory for storing program data which when executed implement the steps in the text detection method of any one of claims 1 to 7;

a processor for executing the program data stored by the memory to implement the steps in the text detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the text detection method according to any one of claims 1 to 7.