CN112395868A - Rapid and safe natural language information hiding method based on word replacement - Google Patents
Rapid and safe natural language information hiding method based on word replacement Download PDFInfo
- Publication number
- CN112395868A CN112395868A CN202011283016.XA CN202011283016A CN112395868A CN 112395868 A CN112395868 A CN 112395868A CN 202011283016 A CN202011283016 A CN 202011283016A CN 112395868 A CN112395868 A CN 112395868A
- Authority
- CN
- China
- Prior art keywords
- synonym
- secret information
- sequence
- embedded
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000005259 measurement Methods 0.000 claims abstract description 4
- 238000013459 approach Methods 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 14
- 230000006870 function Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 108091026890 Coding region Proteins 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
Abstract
The invention discloses a quick and safe natural language information hiding method based on word replacement, which comprises the steps of 1, preprocessing, 2, distortion value measurement and 3, secret information embedding. The invention can make the local distortion of the embedded text approach the global optimum optimally, shorten the total secret information embedding time and realize the improvement of the speed of the embedding process and the safety of the embedded text.
Description
Technical Field
The invention relates to the field of information security, in particular to a quick and safe natural language information hiding method based on word replacement.
Background
With the development of global informatization, the frequency of information transmission activities using texts as carriers and the importance of people on information transmission security, a natural language information hiding technology for hiding secret information in carrier texts in an imperceptible manner is urgently needed to be developed and extended, and the technology can realize copyright protection, covert communication and the like of important text data.
The natural language information hiding technology mainly relates to knowledge of linguistics, statistics and the like, automatically generates content similar to natural texts by using related technologies of natural language processing and the like, or modifies the conventional normal texts in the aspects of syntax, vocabulary, semantics and the like, and realizes information embedding under the conditions of ensuring that the characteristics of local texts, global semantics, statistics and the like are unchanged, the syntax is correct and the syntax structure is reasonable. The information hiding method based on synonym replacement is a type of mainstream steganography, and mainly replaces synonyms in carrier texts to achieve the purpose of hiding secret information embedded in the carrier texts. However, the existing technical scheme has high distortion degree and is easy to perceive. Therefore, the invention provides a rapid and safe natural language information hiding method based on word replacement, which can ensure the information hiding effect and reduce distortion.
Disclosure of Invention
In order to realize the purpose of the invention, the following technical scheme is adopted for realizing the purpose:
a fast and safe natural language information hiding method based on word replacement comprises a step 1, preprocessing, a step 2, distortion value measurement and a step 3, secret information embedding, wherein the step 1 comprises the following steps:
1.1, preparing a synonym thesaurus;
1.2 randomly selecting an original carrier text T;
1.3 determining the length n1To be embedded with secret informationThe secret information is a text information;
1.4 traversing all words in the carrier text T according to the synonym thesaurus to obtain all synonyms in the carrier text T, and assuming that the length is n2Arranging them in sequence, and marking the obtained sequence as the synonym sequence to be embeddedWherein i is more than or equal to 1 and less than or equal to n2;
The information hiding method comprises the following steps of 2:
2.1 search n according to elements in synonym thesaurus in the carrier text T2Synonyms, assuming that the ith synonym appearing in the text is xi,xiThe synonym phrase with length p is denoted as CG (x)i)={cxi,0,cxi,1,…,cxi,p-1The size of the context window is 2k, xiThe words in the context window in text T are in order noted as: cont (x)i)={wi,i-k,…,wi,i-1,wi,i+1,…,wi,i+kIs then xiThe sentence composed of the words in order in the context window is marked as Sen (x)i,Cont(xi))={wi,i-k,…,wi,i-1,xi,wi,i+1,…,wi,i+k};
2.2 sequential use of xiSynonym cx in synonym phrasei,l(l is more than or equal to 0 and less than or equal to p-1) substitution xiThe candidate dense-embedding sentence is Sen (cx)i,l,Cont(xi))={wi,i-k,…,wi,i-1,cxi,l,wi,i+1,…,wi,i+kCalculating to obtain an original sentence Sen (x)i,Cont(xi) And a dense sentence Sen (cx)i,l,Cont(xi) Sentence vector E (Sen (x))i,Cont(xi) )) and E (Sen (cx)i,l,Cont(xi)));
2.3 calculate the original sentence Sen (x)i,Cont(xi) And a dense sentence Sen (cx)i,l,Cont(xi) Sentence vector distance between) to obtain synonym cxlReplacement of xiDistortion value ρ (cx) caused by embedding into the current sentencei,l,xi) The distortion function for calculating the sentence vector distance is:
ρ(cxi,l,xi)=1-nolmal(Sen(cxi,l,Cont(xi)),Sen(xi,Cont(xi)))
wherein the content of the first and second substances,e (u), E (v) sentence vectors of the original sentence and the embedded sentence respectively;
2.4 repeating steps 2.2 and 2.3, and calculating each synonym cx in the corresponding synonym phrasei,l(l is more than or equal to 0 and less than or equal to p-1) substitution xiPost-induced distortion value ρ (cx)i,l,xi) Selecting a minimum distortion value rho excluding zero at each position to be embeddediAnd obtaining the synonym cx corresponding to the distortion valuei,lThe minimum distortion value rho caused by each position to be embeddediThe sequence of the distortion values is obtained by arranging the distortion values in sequence and is recorded asThe original word x of each position to be embeddediSynonym cx with the value causing the corresponding distortioni,lTwo-by-two combination and arrangement according to the sequence and recording as a binary synonym library
2.5 formulating synonym coding rule according to each binary synonym phrase (x) in binary synonym thesaurusi,cxi,l) Wherein i is more than or equal to 1 and less than or equal to n2By the formula a ═ ASCII ({ a | a ∈ x)i}),B=∑ASCII({b|b∈cxi,lF (x) min { a, B } encodes each synonym, where a, B represent the constituent word xi、cxi,lA, B denotes the sum of the ASCII codes of all the letter elements a, b, f (x) denotes the minimum value of A, B, if f (x) is a, xiCoded as 0, cxi,lThe code is 1; if F (X) ═ B, cxi,lCoded as 0, xiThe code is 1.
The information hiding method comprises the following steps of 3:
3.1 according to the binary synonym thesaurusAnd synonym coding rule will be embedded into synonym sequenceBinary vectorization is described asWherein x'i∈{0,1},(1≤i≤n2) (ii) a Secret informationBinary pre-processing notation
3.2 dividing the secret information into N sections according to the length of the secret information and the length of the synonym sequence to be embedded of the carrier text T;
3.3 sequence of distortion values from the ensembleTo obtain each synonym sequence segment { x ] to be embeddedi,…,xj},(i≥0,j≤n2) Corresponding distortion value sequence segment { ρi,…,ρj},(i≥0,j≤n2) Then according to the quantized sequence segment { x 'of the synonym to be embedded'i,…,x′j} quantized secret information segment { m'i,…,m′jAnd a sequence of distortion values { p } and a sequence of distortion valuesi,…,ρj},(i≥0,j≤n2) Performing STC coding to obtain an STC coding sequence { y 'of each segment'i,…,y′j},(i≥0,j≤n2);
3.4 splicing the STC coding sequences obtained from each segment into a complete ciphertext binary vector sequence Y ═ Y'1,y′2,…,y′n2),y′i∈{0,1},(1≤i≤n2) And according to the method, words at corresponding positions of the original carrier text are matched with a binary synonym word librarySelectively replacing the words in the text to obtain a steganographic text T' embedded with secret information M, and replacing rulesComprises the following steps: if it isIf the number is 1, replacing the words at the corresponding positions of the original carrier text with the words in the binary synonym thesaurus, and if the number is not 1, replacing the words at the corresponding positions of the original carrier text with the words in the binary synonym thesaurusIf 0, no word replacement is performed.
The information hiding method further comprises a secret information extraction step 4:
4.1 input: steganographic text T' and secret information length n1The number N of segments and a synonym lexicon;
4.2: traversing all words in T' according to the synonym thesaurus, and acquiring the length n of the appearance of the words2Synonym sequence ofObtaining binary synonym library in the same way of information hiding algorithmThen according to the synonym coding ruleQuantized synonym sequencesWherein y'i∈{0,1},(1≤i≤n2);
4.3: according to the number of segments N and the length of secret information N1To obtain the length of each bit streamObtaining the synonym sequence after each segment of quantizationAnd the length s of each piece of secret information1;
4.4: according to each segment quantizationLater synonym sequence Y'iAnd length s of each piece of secret information1Performing STC decoding to obtain secret information bit stream M 'of each segment'iAnd sequentially splicing them into a complete secret information bit stream
Drawings
FIG. 1 is a schematic diagram of the fast and safe natural language information hiding method based on word replacement according to the present invention
FIG. 2 is a schematic diagram of the SBERT model structure;
FIG. 3 is a schematic diagram of a segment STC embedding model;
FIG. 4 is a diagram of a parity check matrix structure;
FIG. 5 is a graph comparing distortion levels of steganographic text based on segmented STC versus unsegmented STC;
figure 6 is a graph of embedded time comparison of steganographic text based on a segmented STC versus an unsegmented STC.
Detailed Description
The following detailed description of the embodiments of the present invention is made with reference to the accompanying drawings 1-6:
as shown in fig. 1 to 6, the method for hiding natural language information based on word replacement, which is fast and safe, of the present invention includes: step 1, pretreatment; step 2, measuring a distortion value; and 3, embedding the secret information.
1.1, preparing a synonym word bank (English) in advance;
1.2 randomly selecting an original carrier text T (English), wherein the selection mode can be, for example, acquiring a section of news, comment and the like from a website, or selecting one or selecting a section of scientific paper;
1.3 determining the length n1To be embedded randomlySecret informationThe secret information is a piece of text information (English);
1.4 traversing all words in the carrier text T according to the synonym thesaurus to obtain all synonyms in the carrier text T, and assuming that the length is n2The synonym sequence X to be embedded is marked as X1,x2,…,xn2}。
2.1 search n according to elements in synonym thesaurus in the carrier text T2Synonyms, assuming that the ith synonym appearing in the text is xi,xiThe synonym phrase with length p is denoted as CG (x)i)={cxi,0,cxi,1,…,cxi,p-1With a context window size ofWherein num (T)word) And num (T)symbol) Representing the total number of words in the carrier text T and the total number of sentence-end symbols (e.g. period, question mark, etc.) of all sentences in the carrier text T, x, respectivelyiThe words in the context window in text T are in order noted as: cont (x)i)={wi,i-k,…,wi,i-1,wi,i+1,…,wi,i+kIs then xiThe sentence composed of the words in order in the context window is marked as Sen (x)i,Cont(xi))={wi,i-k,…,wi,i-1,xi,wi,i+1,…,wi,i+k}; the 2k value can be selected to obtain a corresponding proper sentence length value according to different carrier texts.
2.2 sequential use of xiSynonym cx in synonym phrasei,l(l is more than or equal to 0 and less than or equal to p-1) substitution xiThe dense sentence is composed of Sen (cx)i,l,Cont(xi))={wi,i-k,…,wi,i-1,cxi,l,wi,i+1,…,wi,i+kWill the original sentenceSon Sen (x)i,Cont(xi) And a dense sentence Sen (cx)i,l,Cont(xi) Respectively inputting a sentence vector generation model (SBERT model, the structure of which is shown in fig. 2), and obtaining an original sentence Sen (x) by the SBERT modeli,Cont(xi) And a dense sentence Sen (cx)i,l,Cont(xi) Sentence vector E (Sen (x))i,Cont(xi) )) and E (Sen (cx)i,l,Cont(xi)))。
2.3 computing two sentences (original sentence Sen (x)) by means of a semantic similarity function (distortion function)i,Cont(xi) And a dense sentence Sen (cx)i,l,Cont(xi) ) to obtain synonyms cx) from the sentence vector distance between the synonyms cxi,lReplacement of xiDistortion value ρ (cx) caused by embedding into the current sentencei,l,xi) The distortion function for calculating the sentence vector distance is:
ρ(cxi,l,xi)=1-nolmal(Sen(cxi,l,Cont(xi)),Sen(xi,Cont(xi)))
wherein the content of the first and second substances,e (u), E (v) sentence vectors of the original sentence and the embedded sentence respectively;
2.4 according to the synonym sequence to be embedded of the carrier text T obtained in the preprocessing workAnd a synonym thesaurus for traversing x to be embedded into the synonym sequencei(1≤i≤n2) And its corresponding synonym phrase CG (x)i)={cxi,0,cxi,1,…,cxi,p-1And (6) repeating the steps 2.2 and 2.3, and calculating each synonym cx in the corresponding synonym phrasei,l(l is more than or equal to 0 and less than or equal to p-1) substitution xiPost-induced distortion value ρ (cx)i,l,xi) Selecting a minimum distortion value rho excluding zero at each position to be embeddediAnd obtaining the synonym cx corresponding to the distortion valuei,lThe minimum distortion value rho caused by each position to be embeddediThe sequence of the distortion values is obtained by arranging the distortion values in sequence and is recorded asThe original word x of each position to be embeddediSynonym cx with the value causing the corresponding distortioni,lTwo-by-two combination and arrangement according to the sequence and recording as a binary synonym library
2.5 formulating synonym coding rule according to each binary synonym phrase (x) in binary synonym thesaurusi,cxi,l) (wherein 1. ltoreq. i. ltoreq. n2) By the formula a ═ ASCII ({ a | a ∈ x)i}),B=∑ASCII({b|b∈cxi,lF (x) ═ min { a, B } encodes each synonym, where a, B represent the constituent english word xi、cxi,lA, B denotes the sum of the ASCII codes of all the letter elements a, b, f (x) denotes the minimum value of A, B, if f (x) is a, xiCoded as 0, cxi,lThe code is 1; if F (X) ═ B, cxi,lCoded as 0, xiThe code is 1.
Step 3. secret information embedding
3.1 according to the binary synonym thesaurusAnd synonym coding rule will be embedded into synonym sequenceBinary vectorization (i.e. if according to the synonym coding rule x of step 2.5)iCoding is 0, then x'iMarked 0 if according to synonym coding rule x of step 2.5iCoding to 1, then x'iIs recorded as 0) is recorded asWherein x'i∈{0,1},(1≤i≤n2) (ii) a Secret informationBinary pre-processing notation
3.2 dividing the secret information into N sections (N is more than 2 and less than N) according to the length of the secret information and the length of the synonym sequence to be embedded in the carrier text T1Integer of) that is to say a stream of secret information bitsAnd the quantified synonym sequence to be embeddedDividing the data into a series of short segments with the same number, and if the length of the synonym sequence to be embedded in the secret information or the carrier text cannot be divided by N, placing the rest parts in the corresponding Nth segment;
3.3 sequence of distortion values from the ensembleTo obtain each synonym sequence segment { x ] to be embeddedi,…,xj},(i≥0,j≤n2) Corresponding distortion value sequence segment { ρi,…,ρj},(i≥0,j≤n2) Then according to the quantized sequence segment { x 'of the synonym to be embedded'i,…,x′j} quantized secret information segment { m'i,…,m′jAnd a sequence of distortion values { p } and a sequence of distortion valuesi,…,ρj},(i≥0,j≤n2) STC encoding is performed byEmbedding secret information is completed, wherein Emb () represents an embedding function, C (M ') is a feasible set of embedded text binary vectors Y' (secret information extraction is also completed by the formula Ext (Y ') ═ HY', wherein Ext () represents extracted secret information, H represents a parity check matrix which is formed by f sub-matrices with height H and width wPlaced along the main diagonal and formed by arranging adjacent sub-matrixes staggered by one row, then the height of H is f, the width is f multiplied by w, the specific relation is shown in FIG. 4), STC coding can find the distortion minimum path transformed from vector X 'to vector Y' by utilizing a Viterbi algorithm to determine the embedding position, and if conditions allow that a plurality of segments can be distributed to different CPU cores to execute calculation;
3.4 Each segment of the resulting STC coding sequence { y'i,…,y′j},(i≥0,j≤n2) Spliced into a complete binary vector sequence of embedded textsy′i∈{0,1},(1≤i≤n2) And according to the synonym sequence to be embedded after the quantization of the sumThe original words at the corresponding positions of the original carrier text and a binary synonym word library are combinedIs selectively replaced, i.e., ifIf the number is 1, replacing the words at the corresponding positions of the original carrier text with the words in the binary synonym thesaurus, and if the number is not 1, replacing the words at the corresponding positions of the original carrier text with the words in the binary synonym thesaurusIf the value is 0, the word replacement is not performed, and finally the steganographic text T' embedded with the secret information M is obtained.
The invention also comprises a secret information extraction step 4, as follows:
step 4. secret information extraction
4.1 input: steganographic text T' and secret information length n1The number N of segments and a synonym lexicon;
4.2: traversing all the words in T' according to the synonym thesaurus to obtain the wordsNow length n2Synonym sequence ofThen go through step 2.4 (now with synonym sequence)Replacement of the sequence of synonyms to be embedded in step 2.4I.e.) obtain a binary synonym libraryThen according to the synonym coding ruleQuantized synonym sequencesWherein y'i∈{0,1},(1≤i≤n2);
4.3: according to the number of segments N and the length of secret information N1To obtain the length of each bit streamObtaining the synonym sequence after each segment of quantizationAnd the length s of each piece of secret information1;
4.4: according to the synonym sequence Y 'after each segment quantization'iAnd length s of each piece of secret information1Performing STC decoding to obtain secret information bit stream M 'of each segment'iAnd sequentially splicing them into a complete secret information bit stream
The invention can make the local distortion of the embedded text approach the global optimum optimally, shorten the total secret information embedding time and realize the improvement of the speed of the steganography process and the safety of the embedded text.
Comparative experiment:
in order to verify that the method provided by the invention has the characteristics of rapidness and safety, a comparison experiment is carried out by using the segmented STC information hiding method (marked as P-Seg-STC) and the unsegmented STC information hiding method (marked as P-STC) designed by the invention under the condition that secret information with the same length is embedded, and the parameters of the segmented STC code and the unsegmented STC code are set to be consistent in the experiment, wherein the relative effective load of a carrier text is 0.3, the constraint height is 2, and the number of segmented segments is 3n2/20,n2For the length of the synonym sequence to be embedded, each synonym sequence segment to be embedded is embedded with 2 bits of secret information.
200 texts are randomly selected from a Gutenberg corpus as carrier texts, corresponding steganographic texts are respectively generated, the distortion degree comparison and embedding time comparison of a segmented STC and unsegmented STC information hiding method are obtained, corresponding data are counted and presented through line graphs, and the results are respectively shown in a graph 5 and a graph 6.
As can be seen from fig. 5 and 6, for the same carrier text, under the condition of embedding the secret information with the same length, the distortion caused by the steganographic text generated by using the segmented STC coding can achieve the low-distortion effect which is nearly identical to or even superior to the distortion caused by the unsegmented STC coding. The embedding time required for generating the steganographic text by using the segmented STC coding is obviously shorter than that required for generating the steganographic text by using the unsegmented STC coding, which shows that the segmented STC coding can greatly shorten the embedding time in the information hiding method based on synonym replacement and improve the embedding speed of the information hiding method.
Claims (1)
1. A natural language information hiding method based on word replacement comprises the steps of 1, preprocessing, 2, distortion value measurement and 3, secret information embedding, and is characterized in that the step 1 comprises the following steps:
1.1, preparing a synonym thesaurus;
1.2 randomly selecting an original carrier text T;
1.3 determining the length n1To be embedded with secret informationThe secret information is a text information;
1.4 traversing all words in the carrier text T according to the synonym thesaurus to obtain all synonyms in the carrier text T, and assuming that the length is n2Arranging them in sequence, and marking the obtained sequence as the synonym sequence to be embeddedWherein i is more than or equal to 1 and less than or equal to n2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011283016.XA CN112395868A (en) | 2020-11-17 | 2020-11-17 | Rapid and safe natural language information hiding method based on word replacement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011283016.XA CN112395868A (en) | 2020-11-17 | 2020-11-17 | Rapid and safe natural language information hiding method based on word replacement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112395868A true CN112395868A (en) | 2021-02-23 |
Family
ID=74599965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011283016.XA Pending CN112395868A (en) | 2020-11-17 | 2020-11-17 | Rapid and safe natural language information hiding method based on word replacement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112395868A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294959A (en) * | 2013-05-29 | 2013-09-11 | 南京信息工程大学 | Text information hiding method resistant to statistic analysis |
CN103377239A (en) * | 2012-04-26 | 2013-10-30 | 腾讯科技(深圳)有限公司 | Method and device for calculating inter-textual similarity |
CN108062307A (en) * | 2018-01-04 | 2018-05-22 | 中国科学技术大学 | The text semantic steganalysis method of word-based incorporation model |
CN111259140A (en) * | 2020-01-13 | 2020-06-09 | 长沙理工大学 | False comment detection method based on LSTM multi-entity feature fusion |
CN111581952A (en) * | 2020-05-20 | 2020-08-25 | 长沙理工大学 | Large-scale replaceable word bank construction method for natural language information hiding |
-
2020
- 2020-11-17 CN CN202011283016.XA patent/CN112395868A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103377239A (en) * | 2012-04-26 | 2013-10-30 | 腾讯科技(深圳)有限公司 | Method and device for calculating inter-textual similarity |
CN103294959A (en) * | 2013-05-29 | 2013-09-11 | 南京信息工程大学 | Text information hiding method resistant to statistic analysis |
CN108062307A (en) * | 2018-01-04 | 2018-05-22 | 中国科学技术大学 | The text semantic steganalysis method of word-based incorporation model |
CN111259140A (en) * | 2020-01-13 | 2020-06-09 | 长沙理工大学 | False comment detection method based on LSTM multi-entity feature fusion |
CN111581952A (en) * | 2020-05-20 | 2020-08-25 | 长沙理工大学 | Large-scale replaceable word bank construction method for natural language information hiding |
Non-Patent Citations (2)
Title |
---|
向凌云 等: "一种基于低失真替换优先的文本隐写算法", 计算机工程与应用, vol. 51, no. 15, pages 102 * |
罗纲 等: "针对同义词替换信息隐藏的检测方法研究", 计算机研究与发展, vol. 45, no. 10, pages 1696 - 1702 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119765B (en) | Keyword extraction method based on Seq2Seq framework | |
CN111783399B (en) | Legal referee document information extraction method | |
JP5128629B2 (en) | Part-of-speech tagging system, part-of-speech tagging model training apparatus and method | |
O’Keefe et al. | A sequence labelling approach to quote attribution | |
CN109992775B (en) | Text abstract generation method based on high-level semantics | |
CN111061861A (en) | XLNET-based automatic text abstract generation method | |
CN112507734B (en) | Neural machine translation system based on romanized Uygur language | |
CN116151132B (en) | Intelligent code completion method, system and storage medium for programming learning scene | |
CN110222338B (en) | Organization name entity identification method | |
CN113051371A (en) | Chinese machine reading understanding method and device, electronic equipment and storage medium | |
KR20210044056A (en) | Natural language processing method and appratus using repetetion token embedding | |
CN115952528B (en) | Multi-scale combined text steganography method and system | |
Lin et al. | The cmu submission for the shared task on language identification in code-switched data | |
CN114662476A (en) | Character sequence recognition method fusing dictionary and character features | |
CN115658898A (en) | Chinese and English book entity relation extraction method, system and equipment | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN113704473A (en) | Media false news detection method and system based on long text feature extraction optimization | |
CN113779992A (en) | Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training | |
Rakib et al. | Bangla-Wave: Improving bangla automatic speech recognition utilizing n-gram language models | |
Arısoy | LZW-CIE: a high-capacity linguistic steganography based on LZW char index encoding | |
CN112395868A (en) | Rapid and safe natural language information hiding method based on word replacement | |
CN114925175A (en) | Abstract generation method and device based on artificial intelligence, computer equipment and medium | |
CN114969763A (en) | Fine-grained vulnerability detection method based on seq2seq code representation learning | |
Das et al. | Language identification of Bengali-English code-mixed data using character & phonetic based LSTM models | |
Sharma et al. | Language identification for hindi language transliterated text in roman script using generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |