CN112036187A

CN112036187A - Context-based video barrage text auditing method and system

Info

Publication number: CN112036187A
Application number: CN202010655180.2A
Authority: CN
Inventors: 王晓平
Original assignee: Shanghai Jilian Network Technology Co ltd
Current assignee: Shanghai Jilian Network Technology Co ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-12-04

Abstract

The invention discloses a video barrage text auditing method combining context, which adopts a multi-level auditing mode and specifically comprises the following steps: the method and the system have the advantages that the defect that the prior art cannot deal with insufficient information amount of the bullet screen text is overcome, the defect that the accuracy of the conventional method for auditing the video bullet screen text is low is overcome, the auditing result of the video bullet screen text is ensured to be more accurate and reliable, and the method and the system have obvious technical advantages and beneficial effects.

Description

Context-based video barrage text auditing method and system

Technical Field

The invention relates to a text auditing method, in particular to a video barrage text auditing method combining context.

Background

In the information era, diversified user interaction data such as video barracks, comments and the like are continuously generated by a plurality of network media and social platforms such as network videos, microblogs, WeChats, chat communities and the like, so that the challenges are brought to effective information auditing and supervision.

In the text data types, because the barrage text has the characteristics of short length, insufficient information amount and the like, the situations that the meanings of the same video barrage text are completely different under different context contexts often occur, and therefore, the examination of the type of text is obviously more difficult and challenging.

The conventional video barrage text auditing method is generally to directly audit the video barrage text, and obviously, a reliable auditing result cannot be obtained for a shorter video barrage text due to the lack of context.

Disclosure of Invention

The invention provides a solution for video barrage text auditing, which aims to overcome the defect of analysis capability of barrage texts in the prior art and enhance the reliability of barrage text auditing results.

In order to achieve the above object, the present invention designs a video barrage text auditing method in combination with context, the method includes: acquiring a video barrage text to be audited as a target audit text; sensitive word expansion auditing step: performing word segmentation processing on the target audit text by adopting a word segmentation method to obtain a text fragment list of the target audit text, comparing and matching the text fragment list with a preset sensitive word feature library to obtain a matching result, finishing the sensitive word expansion audit step if the matching is successful, and continuing the next audit if the matching is failed; and semantic auditing: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding semantic classification labels to the video bullet screen text according to the judgment result, and determining the target audit text needing further audit according to the semantic classification labels; context auditing step: and obtaining context information of the target audit text, and detecting and analyzing the context information based on context audit to obtain an audit result.

Preferably, the context information includes: the method comprises the steps of obtaining target structural information and scene classification information of a video frame corresponding to a video bullet screen text, obtaining event structural information of a video within a certain time range corresponding to the video bullet screen text, and obtaining classification information of bullet screen context texts within the certain time range corresponding to the video bullet screen text.

Preferably, the context audit comprises a context video context audit and a context text context audit, the context video context audit is performed based on a deep learning method or a traditional method, and the context text context audit is performed based on a semantic classification method.

Preferably, the method for constructing the sensitive word feature library includes: establishing an original sensitive word bank; performing deformation mapping processing on each sensitive word in the original sensitive word bank to obtain various deformation mapping results; and combining the various deformation mapping results with the original sensitive word library to construct a sensitive word feature library.

Preferably, the deformation mapping processing includes mixed deformation of phonetic characters, harmonic deformation, abbreviated deformation of pinyin, deformation of front and back nasal sound and flat and warped tongue sound, reverse reading deformation, deformation of filling characters, deformation of missing characters, deformation of disassembled characters, deformation of shape-similar characters and deformation of synonyms.

Preferably, the training method of the semantic classification model includes a deep learning method and a traditional training method, and the deep learning method includes: TextCNN, TextRNN, BERT, XLNet, RoBERTa, ALBERT, etc., and the conventional training methods include logistic regression, support vector machines. Preferably, a deep learning method ALBERT is used.

Preferably, the semantic classification labeling includes: "semantically normal", "semantically violated", and "semantically fuzzy".

The invention also discloses a video barrage text auditing system combined with the context, which comprises a sensitive word expansion auditing module, a semantic auditing module and a context auditing module, wherein the sensitive word expansion auditing module: the system comprises a word segmentation unit, a word matching unit, a word segmentation unit, a word; and a semantic auditing module: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding a semantic classification label to a video bullet screen text according to the judgment result, judging and processing output according to the semantic classification label, and determining an output mode and the target audit text needing further audit; context auditing module: and obtaining context information of the target audit text, carrying out detection analysis on the context information based on context audit, and carrying out comprehensive judgment to obtain an audit result.

Preferably, the system further comprises an audit result output module, and the audit result output module performs final audit output and display on the outputs from the sensitive word expansion audit module, the semantic audit module and the context audit module.

Preferably, the auditing result output module outputs the output from the sensitive word expansion auditing module, and the output of the displayed data includes: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.

The invention also discloses an electronic device, which is characterized in that the system comprises a processor and a memory, wherein the memory is used for storing the executable program; the processor is configured to execute the executable program to implement the method.

In practical applications, the modules described in the method and system disclosed by the present invention may be deployed on one server, or each module may be deployed on a different server independently, and particularly, in order to provide a stronger computing processing capability, the modules may be deployed on a cluster server as needed.

By utilizing the method and the system disclosed by the invention, a multi-stage auditing mode is adopted, auditing means are diversified, and multiple audits of context information of a video channel and a text channel are combined, so that the defect that the prior art cannot deal with the insufficient information quantity of the barrage text per se is overcome, the defect that the accuracy rate of the conventional method for auditing the video barrage text is low is overcome, the auditing result of the video barrage text is ensured to be more accurate and reliable, and the method and the system have obvious technical advantages and beneficial effects

In order that the invention may be more clearly and fully understood, specific embodiments thereof are described in detail below with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 shows a flow diagram of a video barrage text review method in conjunction with context in one embodiment.

FIG. 2 is a flowchart illustrating a method for constructing a sensitive word feature library according to an embodiment.

FIG. 3 is a block diagram of a video barrage text review system in conjunction with contextual context in one embodiment.

FIG. 4 is a block diagram that illustrates the context auditing module of an embodiment.

FIG. 5 illustrates a context audit module flow diagram of one embodiment.

Fig. 6 shows an overall flow chart of audit result output.

Detailed Description

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for reviewing a video bullet screen text in combination with a context, and specifically includes steps S11 to S14:

and step S11, acquiring the video barrage text to be audited as a target audit text.

And obtaining a text of the video bullet screen to be audited, and taking the text as a target audit text.

And step S12, sensitive word expansion auditing step.

In this embodiment, the sensitive word expansion review step includes the following steps: performing word segmentation processing on the target audit text by adopting a word segmentation method to obtain a text fragment list of the target audit text, comparing and matching the text fragment list with a preset sensitive word feature library to obtain a matching result, finishing the sensitive word expansion audit step if the matching is successful, and continuing the next audit if the matching is failed.

Firstly, performing word segmentation on a target audit text, wherein the text is the target audit text, and outputting a word segmentation result list arranged according to the sequence of appearance of words after performing word segmentation operation on the text_seg：

list_seg＝[seg₁，seg₂，…，seg_M]

Wherein, M represents the number of elements of the word segmentation result list.

Will list_segAs a target audit text.

In this embodiment, it is also necessary to establish a sensitive word feature library collection in advance_mapPlease refer to the embodiment shown in fig. 2 for a method for creating a sensitive word feature library.

Secondly, the target audit text and the collection are combined_mapAnd (6) carrying out comparison and matching. In this example, the collection is performed sequentially_mapComparing each element with the text, wherein once the comparison is successful, the verification result is 'not passed', the verification process is finished, otherwise, the text is output to the semantic verification module for continuous verification.

And step S13, semantic auditing step.

In this embodiment, this step specifically includes: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding semantic classification labels to the video bullet screen text according to the judgment result, and determining the target audit text needing further audit according to the semantic classification labels.

Before auditing, a semantic classification model needs to be trained on a large amount of text data in advance at one time, deep learning methods such as TextCNN, TextRNN, BERT, XLNet, RoBERTa, ALBERT and the like can be used as training methods of the semantic classification model, and traditional methods such as logistic regression, support vector machine and the like can also be used. Preferably, ALBERT may be used.

And for the text, auditing based on the trained semantic classification model to obtain a semantic label corresponding to the text. In this embodiment, the semantic classification labels include three types of "semantic normal", "semantic violation", and "semantic fuzzy". Performing output judgment processing according to the semantic classification result, and if the semantic violation is judged, considering that the text fails to pass the audit; if the text is judged to be 'semantically normal', the text is considered to pass the audit; if the semantic ambiguity is judged, the text needs to be continuously output to the context auditing module for further auditing.

Step S14, context auditing step.

In this embodiment, the step of further performing an audit on the "semantic fuzzy" text output by the semantic audit module based on context analysis specifically includes: and obtaining context information of the target audit text, and detecting and analyzing the context information based on context audit to obtain an audit result.

The context information includes: the method comprises the steps of obtaining target structural information and scene classification information of a video frame corresponding to a video bullet screen text, obtaining event structural information of a video within a certain time range corresponding to the video bullet screen text, and obtaining classification information of bullet screen context texts within the certain time range corresponding to the video bullet screen text.

In this embodiment, the context detection includes context video detection and context text detection, wherein context video detection includes detecting, locating objects in an image, classifying image scenes, detecting using any time-series based event analysis, and context text detection includes semantic classification detection.

In this embodiment, various deep learning methods or conventional detection methods may be used to implement contextual video detection, and preferably, YOLOv4 (young Only Look one) may be used, and the detection result is divided into a normal target and an illegal target, where the type of the illegal target may include a sensitive person, a sensitive object (e.g., a sensitive flag, a knife, a gun weapon), and the like.

Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a method for constructing a sensitive word feature library, which specifically includes steps S21 to S23:

step S21: and establishing an original sensitive word bank.

Step S22: and performing deformation mapping processing on each sensitive word in the original sensitive word bank to obtain various deformation mapping results.

Step S23: and combining the various deformation mapping results with the original sensitive word library to construct a sensitive word feature library.

In this embodiment, the original sensitive word w is read from the original sensitive word library, and first, w is deformed according to all deformation rules defined by the verification, such as mixed deformation of pronunciation and characters, harmonic deformation, abbreviated deformation of pinyin, deformation of front and back nasal sound and flat and warped tongue sound, reverse reading deformation, deformation of filling characters, deformation of missing characters, deformation of disassembled characters, deformation of near characters, deformation of synonyms, and the like, and is combined with the original sensitive word w to form a complete matched morpheme set collection_map：

Wherein f is_y(x) Indicating that the word x is deformed according to a defined deformation rule y and returning a deformation result, wherein S represents the total number of the deformation rules.

Referring to fig. 3, fig. 3 shows an embodiment of a structure of a video barrage text review system combined with a context, in this embodiment, the video barrage text review system includes a sensitive word expansion review module a, a semantic review module B, and a context review module C, where:

sensitive word expands audit module A: the system is used for performing word segmentation processing on a target audit text, comparing and matching the target audit text with a preset sensitive word feature library to obtain a matching result, outputting, judging and processing according to the sensitive word matching result, if the matching is successful, the auditing is finished, and if the matching is failed, the auditing is required to be continued.

In this embodiment, the sensitive word expansion review module a further includes a sensitive word expansion review output judgment sub-module a1, which is used to perform output judgment processing according to the sensitive word matching result, if the matching is successful, the text is considered to fail to be reviewed, the result is directly output to the review result output module, otherwise, the text is continuously output to the semantic review module for processing.

And a semantic auditing module B: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding a semantic classification label to the video bullet screen text according to the judgment result, judging and processing output according to the semantic classification label, and determining an output mode and the target audit text needing further audit.

In this embodiment, the semantic review module B further includes a semantic review output judgment sub-module B1, configured to perform output judgment processing according to the semantic classification result, and if it is determined that "semantic violation" is detected, the text is considered to fail to be reviewed; if the text is judged to be 'semantically normal', the text is considered to pass the audit; if the semantic ambiguity is judged, the text needs to be continuously output to the context auditing module for further auditing.

Context auditing module C: and obtaining context information of the target audit text, carrying out detection analysis on the context information based on context audit, and carrying out comprehensive judgment to obtain an audit result.

Referring to fig. 4, fig. 4 shows a structure of a context auditing module C in an embodiment, where the context auditing module C specifically includes: a video context analysis submodule C1, a text context analysis submodule C2, and a context audit output determination submodule C3, wherein:

video context analysis submodule C1: based on a video structuring technology, target structuring information, scene classification information and event structuring information corresponding to a video within a certain time range of a video frame corresponding to a bullet screen text are obtained, so that video structuring context information is provided for the bullet screen text to be audited. The module comprises a target structuring sub-module C11, a scene classification sub-module C12, and an event structuring sub-module C13:

target structuring sub-module C11: and detecting and positioning the target in the image. The target structuring method may use various deep learning methods or conventional detection methods, and preferably, YOLOv4 (young Only Look one) may be used. The detection result is divided into a normal target and an illegal target, wherein the type of the illegal target can comprise a sensitive person, a sensitive object (such as a sensitive flag, a cutter, a gun and weapon) and the like. The Object detection flag b _ Object _ Abnormal is defined and preset to False, and set to True if an offending Object is detected.

Scene classification submodule C12: the image scene is classified. Scene classification may use various deep learning methods for classification. The scene classification result is divided into a normal scene and an illegal scene, wherein the type of the illegal scene can comprise sensitive scenes such as bloody smell, pornography and the like. The Scene classification flag b _ Scene _ abstract is defined and preset to False, and set to True if an illegal Scene is detected.

Event structuring submodule C13: using any method based on time series analysis, bsn (boundary Sensitive network) can be preferably used. The event structured result is divided into a normal event and an illegal event, wherein the type of the illegal event can comprise fighting, burning, collision and the like, an event classification mark b _ Action _ Abnormal is defined and preset as False, and if the illegal event is detected, the event classification mark is set as True. The setting of the time range may be set empirically, and preferably may be set to trace back 2 seconds from the current time.

The text context analysis submodule C2 is configured to obtain a bullet screen context text set corresponding to the current bullet screen text within a certain time range, so as to provide context reference information for the bullet screen text to be audited. The time range may be set empirically, and preferably, considering that the bullet screen usually has a lag (e.g. due to reaction time, character input time) corresponding to the event, it may be set to delay 1 second based on the time setting of the event structuring module. The module classifies texts in the barrage context text set by adopting a semantic classification method in the semantic auditing module. Defining a Text Context exception flag b _ Text _ Context _ exception and presetting the Text Context exception flag as False, wherein the updating calculation method comprises the following steps: and if the text labeled as 'semantic violation' exists in the bullet screen context text set, setting the text to True.

The context auditing output judgment submodule C3 is responsible for comprehensively outputting and judging the output results of the video context analysis submodule and the text context analysis submodule.

In this embodiment:

defining and inputting a state exception mark b _ Test _ exception of the bullet screen to be audited, and presetting the state exception mark as False;

defining a Video Context exception flag b _ Video _ Context _ exception and presetting as False, and then performing update calculation as follows:

b_Video_Context_Abnormal＝

(b_Object_Abnormal OR b_Scene_Abnormal OR b_Action_Abnormal)

further, updating and calculating the state abnormity mark of the input bullet screen to be audited:

IF b_Video_Context_Abnormal AND b_Text_Context_Abnormal：

b_Test_Abnormal＝True

and if at least one of the violation target, the violation scene and the violation event occurs in the context and the violation bullet screen text also occurs in the context, judging that the bullet screen text to be audited is violated, wherein the auditing result is 'not passed', otherwise, judging that the bullet screen text to be audited is 'passed'. The audit process ends and the results are output to the context audit output decision sub-module C3.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a context auditing module of an embodiment, as shown in the figure, in this embodiment, first, according to an input video barrage text, based on a video structuring technology, structured information of video frames corresponding to the barrage text is obtained, where the structured information includes an image frame sequence and a barrage text sequence, where the image frame sequence includes target structured information, scene classification information, and event structured information corresponding to a video within a certain time range; the bullet screen text sequence comprises a bullet screen context text set corresponding to the current bullet screen text within a certain time range.

Secondly, respectively inputting the image frame sequence into sub-modules of a video context analysis sub-module C1, namely a target structured sub-module C11, a scene classification sub-module C12 and an event structured sub-module C13 for detection so as to obtain video structured information; and inputting the bullet screen text sequence into a text context analysis submodule C2 to obtain a semantic classification result.

Thirdly, the detection/classification results of the video context analysis submodule C1 and the text context analysis submodule C2 are respectively input to the context audit output determination submodule C3. The context audit output judgment submodule C3 is responsible for performing comprehensive output judgment processing on the output results of the video context analysis submodule C1 and the text context analysis submodule C2.

In this embodiment:

b_Video_Context_Abnormal＝

(b_Object_Abnormal OR b_Scene_Abnormal OR b_Action_Abnormal)

IF b_Video_Context_Abnormal AND b_Text_Context_Abnormal：

b_Test_Abnormal＝True

and if at least one of the violation target, the violation scene and the violation event occurs in the context and the violation bullet screen text also occurs in the context, judging that the bullet screen text to be audited is violated, wherein the auditing result is 'not passed', otherwise, judging that the bullet screen text to be audited is 'passed'. And finishing the auditing process and outputting the result to an auditing result output module.

In this embodiment, the video barrage text auditing system may further include a context auditing output judgment sub-module C3, which may output and display the auditing results of each auditing module in the auditing process, and output data of the auditing results for the output results from the sensitive word expansion auditing module further includes: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating an overall process of outputting an audit result, and as shown in the drawing, in this embodiment, an audit result output module D outputs and displays a final audit result for output results from a sensitive word expansion audit output judgment sub-module a1, a semantic audit output judgment sub-module B1, and a context audit output judgment sub-module C3.

In addition, for the output result from the sensitive word expansion auditing module a1, the auditing result output data further includes: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.

Overall, the output flow logic of the present embodiment is as follows:

defining a state exception flag b _ Test _ exception of an input bullet screen to be audited and presetting the state exception flag as False

The Object detection flag b _ Object _ Abnormal is defined and preset as False

The Scene classification flag b _ Scene _ abstract is defined and preset as False

Define the event class flag b _ Action _ Absnormal and preset as False

The Video Context exception flag b _ Video _ Context _ exception is defined and preset to False

Define Text Context exception flag b _ Text _ Context _ Absormal and preset as False

The result of the examination of the IF sensitive word expansion examination module is 'fail':

outputting the auditing result and ending the auditing

ELSE：

Sending the data to a semantic auditing module for continuous auditing

The IF audit result is 'semantic normal' or 'semantic violation':

outputting the auditing result and ending the auditing

ELSE：

Sending the context auditing module to continue auditing

1) Sequentially carrying out the steps of target detection, scene classification, event classification and text semantic classification, and updating the corresponding mark value according to the calculation result:

b_Object_Abnormal

b_Scene_Abnormal

b_Action_Abnormal

b_Text_Context_Abnormal

2) calculating a video context exception flag:

b_Video_Context_Abnormal＝

(b_Object_Abnormal OR b_Scene_Abnormal OR b Action Abnormal)

3) and auditing the barrage text according to the video context and the text context:

IF b_Video_Context_Abnormal AND b_Text_Context_Abnormal：

b_Test_Abnormal＝True

the result of the examination is "violation"

ELSE：

The examination result was "Normal"

And outputting an auditing result, and ending auditing.

An embodiment of the present application further provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores an executable program, and when the executable program runs on a computer, the computer executes the method and the system described in any of the above embodiments.

It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A video barrage text auditing method combining context and context is characterized by comprising the following steps:

acquiring a video barrage text to be audited as a target audit text;

sensitive word expansion auditing step: performing word segmentation processing on the target audit text by adopting a word segmentation method to obtain a text fragment list of the target audit text, comparing and matching the text fragment list with a preset sensitive word feature library to obtain a matching result, finishing the sensitive word expansion audit step if the matching is successful, and continuing the next audit if the matching is failed;

and semantic auditing: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding semantic classification labels to the video bullet screen text according to the judgment result, and determining the target audit text needing further audit according to the semantic classification labels;

context auditing step: and obtaining context information of the target audit text, and detecting and analyzing the context information based on context audit to obtain an audit result.

2. The method of claim 1, wherein the contextual information comprises: the method comprises the steps of obtaining target structural information and scene classification information of a video frame corresponding to a video bullet screen text, obtaining event structural information of a video within a certain time range corresponding to the video bullet screen text, and obtaining classification information of bullet screen context texts within the certain time range corresponding to the video bullet screen text.

3. The method according to claim 1 or 2, wherein the context audit comprises a context video context audit and a context text context audit, the context video context audit is performed based on a deep learning method or a conventional method, and the context text context audit is performed based on a semantic classification method.

4. The method of claim 1, further comprising: the construction method of the sensitive word feature library comprises the following steps:

establishing an original sensitive word bank;

performing deformation mapping processing on each sensitive word in the original sensitive word bank to obtain various deformation mapping results;

and combining the various deformation mapping results with the original sensitive word library to construct a sensitive word feature library.

5. The method of claim 4, wherein: the deformation mapping processing comprises mixed deformation of sound and characters, harmonic deformation, pinyin abbreviation deformation, front and back nasal sound and flat and warped tongue sound deformation, reverse reading deformation, character filling deformation, character missing deformation, character disassembling deformation, shape and character approaching deformation and synonym deformation.

6. The method as claimed in claim 1, wherein the training method of the semantic classification model comprises a deep learning method and a traditional training method, and the deep learning method comprises: TextCNN, TextRNN, BERT, XLNet, RoBERTa, ALBERT, etc., and the conventional training methods include logistic regression, support vector machines.

7. The method of claim 1, wherein the semantic classification label comprises: "semantically normal", "semantically violated", and "semantically fuzzy".

8. A video barrage text auditing system combining context and context is characterized by comprising: sensitive word expands audit module, semantic audit module, context audit module, wherein:

sensitive word expands audit module: the system comprises a word segmentation unit, a word matching unit, a word segmentation unit, a word;

and a semantic auditing module: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding a semantic classification label to a video bullet screen text according to the judgment result, judging and processing output according to the semantic classification label, and determining an output mode and the target audit text needing further audit;

context auditing module: and obtaining context information of the target audit text, carrying out detection analysis on the context information based on context audit, and carrying out comprehensive judgment to obtain an audit result.

9. The system of claim 8, wherein the system further comprises: and the audit result output module is used for performing final audit output and display on the output from the sensitive word expansion audit module, the semantic audit module and the context audit module.

10. The system of claim 9, wherein: the auditing result output module outputs the output from the sensitive word expansion auditing module, and the output of the displayed data comprises the following steps: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.