CN105404614A - Subject and predicate coding based text watermark embedding and extraction method - Google Patents

Subject and predicate coding based text watermark embedding and extraction method Download PDF

Info

Publication number
CN105404614A
CN105404614A CN201510743382.1A CN201510743382A CN105404614A CN 105404614 A CN105404614 A CN 105404614A CN 201510743382 A CN201510743382 A CN 201510743382A CN 105404614 A CN105404614 A CN 105404614A
Authority
CN
China
Prior art keywords
subject
text
predicate
unicode
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510743382.1A
Other languages
Chinese (zh)
Other versions
CN105404614B (en
Inventor
陈建平
李桂森
朱晓辉
施佺
马海英
王进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN201510743382.1A priority Critical patent/CN105404614B/en
Publication of CN105404614A publication Critical patent/CN105404614A/en
Application granted granted Critical
Publication of CN105404614B publication Critical patent/CN105404614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to a subject and predicate coding based text watermark embedding and extraction method. An embedding method comprises: 1) representing each character of watermarking information with Unicode codes to form a Unicode code string; 2) detecting out subjects and predicates of statements in a to-be-embedded text, and storing the subjects and predicates in a set; 3) according to the quantity of the subjects and predicates, dividing the Unicode code string into a plurality of segments, representing each of the subjects and predicates with one of the segments by coding, and giving a number; and 4) storing the subjects and predicates, the Unicode code segments corresponding to the subjects and predicates, and the numbers of the code segments in sequence to form a codebook, and finishing the coding to realize the watermark embedding. An extraction method comprises: finding out the detected subjects and predicates in the text; by reference to the codebook, extracting the Unicode code segments and the numbers of the code segments corresponding to the subjects and predicates; splicing the Unicode code segments according to a number sequence; and converting the obtained Unicode code string into the corresponding character to form the watermark information. The method makes no change for the format and content of the text, has good concealment and robustness, and is simple in algorithm construction and easy to realize.

Description

A kind of Text Watermarking based on subject-predicate language coding embeds and extracting method
Technical field
The present invention relates to embedding and the extractive technique of watermark, particularly relate to a kind of Text Watermarking based on subject-predicate language coding and embed and extracting method.
Background technology
Along with the popularization and application of internet and infotech; text message is more and more issued in the mode of numeral, propagate and is used; it is while offering convenience to the study of people, work and life; also create the problems such as text is easily copied illegally and usurps, the intellectual property protection of digital text is subject to the extensive concern of industry.Text Watermarking is a technology of the protection digital text intellecture property occurred in recent years; it embeds copyright information or authentication information (watermark) by certain mode in digital text; when finding that text suffers bootlegging or usurps; these information can be extracted to prove the copyright ownership of text; confirm bootlegging and usurp behavior, the rights and interests of protection text copyright owner or possessor.In addition, Text Watermarking technology also can be used for hiding and transmit the aspects such as secret information, the certification of content of text, the tracking of text message in the text.
Text Watermarking mainly contains two class methods at present---the Text Watermarking based on text formatting and the Text Watermarking based on natural language.Digital watermark based on text formatting utilizes the slight text formatting that changes not easily to be carried out embed watermark information by the feature discovered, as changed line space, word space, character boundary etc.This kind of digital watermark simple structure based on text formatting, is easy to realize, but carries out format conversion to text and just likely the watermark of embedding is destroyed, and robustness is not strong.Text Watermarking technology based on natural language utilizes the grammatical and semantic of content of text to carry out coding to carry out embed watermark information, realize at present more be to be replaced by synonym and syntax transfer pair watermark information is encoded.Compared with the watermark based on text formatting, natural Language Watermarking has better disguised and robustness, and format conversion can not have an impact to watermark.But due to the complicacy of Chinese language, synonym is replaced and syntax conversion likely can produce ambiguity or change semanteme, and it is not suitable for the situation that content of text should not change yet simultaneously.
Summary of the invention
The object of the invention is the deficiency overcoming above prior art, provides a kind of Text Watermarking based on subject-predicate language coding with good disguise and robustness to embed and extracting method, specifically has following technical scheme to realize:
The described Text Watermarking embedding grammar based on subject-predicate language coding, comprises
1) by each character Unicode coded representation of watermark information, a Unicode code string is formed.
2) detect the subject-predicate language of statement in text to be embedded, deposit in a set.
3) according to the subject-predicate language quantity that detects, Unicode code string is divided into some sections, each subject-predicate pragmatic wherein one section carrys out coded representation.Considering that the order changing statement in text may make watermark information correctly not extract, to the given numbering of the Unicode code section that each subject-predicate language is corresponding, during for extracting watermark, splicing Unicode code string according to numbering.
4) store Unicode code section corresponding to each subject-predicate language, this subject-predicate language and numbering corresponding to this subject-predicate successively, form a code book, complete coding, realize the embedding of watermark.
Above-mentioned Unicode coding adopts UTF-16 form, and each character is 4 sexadecimal numbers, forms a hexadecimal Unicode code string.
Described step 2) in detect that the subject-predicate language in text to be embedded comprises the steps:
A) will the text-converted of watermark to be embedded be submitted to be the form of character string;
B) character string of the text of watermark to be embedded is committed to language technology platform LTP and carries out interdependent syntactic analysis, obtain the character string that comprises the XML format of sentence element dependence in text;
C) character string of the XML format obtained is converted to XML file, carries out DOM parsing to XML file, according to the contact between the Key Relationships of sentence element attribute of a relation in XML file and subject-predicate relation, searching loop file, finds out the subject-predicate language of every.
The described further design of Text Watermarking embedding grammar based on subject-predicate language coding is, in described code book every a line subject-predicate language, Unicode code section, number between separate with space respectively.
According to the described Text Watermarking embedding grammar based on subject-predicate language coding, a kind of extracting method of the Text Watermarking based on subject-predicate language coding is proposed, comprise the subject-predicate language found out in detected text, the described code book formed during contrast embed watermark, from code book, take out Unicode code section, numbering that each subject-predicate language is corresponding, Unicode code section is got up by the sequential concatenation of the numbering of correspondence, obtain the Unicode code string representing watermark information, convert corresponding character again to, form the watermark information embedded.
The further design of the extracting method of the described Text Watermarking based on subject-predicate language coding is, the Unicode code section that in described taking-up detected text, each subject-predicate language is corresponding and the step of numbering thereof comprise: each subject-predicate language in the detected text found out and each subject-predicate language in code book are compared one by one, if both are consistent, then from code book, take out Unicode code section, numbering that this subject-predicate language is corresponding.
Advantage of the present invention is as follows:
The present invention proposes a kind of new Text Watermarking and embeds and extracting method, utilizes the subject-predicate language of statement in text to carry out coding to watermark information and carrys out embed watermark.The method does not make any change to text formatting and content, can not produce a bit impact to original text, and the embedding of watermark, without any vestige, can not be discovered and find, have good disguise.Carry out format conversion (comprise and change line space, word space, change character boundary, font, color etc.) to text, adjustment text fragment, change sentence order all can not affect the correct extraction of watermark, have good robustness.Algorithm construction is simple simultaneously, is easy to realize.
Embodiment
Below the present invention program is described in detail.
The Text Watermarking embedding grammar based on subject-predicate language coding that the present embodiment provides, comprise the steps: 1) by the Unicode coded representation of each character UTF-16 form of watermark information, each character is 4 sexadecimal numbers, forms a hexadecimal Unicode code string.2) detect the subject-predicate language in text to be embedded, deposit in a set.3) according to the subject-predicate language quantity detected, Unicode code string is divided into some sections, each subject-predicate pragmatic wherein one section carrys out coded representation, to the given numbering of the Unicode code section that each subject-predicate language is corresponding, according to numbering splicing Unicode code string during for extracting watermark.4) store each subject-predicate language, the Unicode code section corresponding with this subject-predicate language successively and number, forming a code book, complete coding, realize the embedding of watermark.Wherein, separate with space respectively between the subject-predicate language of every a line, Unicode code section, numbering in code book.
Further, step 2) in detect that the subject-predicate language in text to be embedded comprises the steps: A) will the text-converted of watermark to be embedded be submitted to be the form of character string.B) character string of the text of watermark to be embedded is committed to language technology platform LTP and carries out interdependent syntactic analysis, obtain the character string that comprises the XML format of sentence element dependence in text.C) character string of the XML format obtained is converted to XML file, carries out DOM parsing to XML file, according to the contact between the Key Relationships of sentence element attribute of a relation in XML file and subject-predicate relation, searching loop file, finds out the subject-predicate language of every.
The above-mentioned language technology platform (LanguageTechnologyPlatform mentioned, LTP) be that a whole set of the open online Chinese natural language disposal system developed for 10 years is lasted at Harbin Institute of Technology's social computing and Research into information retrieval center, comprise lexical analysis (participle, part-of-speech tagging and named entity recognition), syntactic analysis (interdependent syntactic analysis), semantic analysis (word sense disambiguation and semantic character labeling) three aspect six term language processing capacity.This platform is opened to the outside world, easy to use.System provides an application programming interfaces (API), and user, according to the application demand of oneself, arranges API parameter, and structure HTTP request, submits to system by content of text, can obtain analysis result online.Mainly use the interdependent syntactic analysis function of LTP herein, text to be analyzed is submitted to platform, obtain the dependence between each composition of statement in text through platform processes.According to dependence, through processing the subject-predicate language obtaining statement further.Its basic process is as follows:
Content of text to be analyzed is converted to character string, API parameter and method of calling are set, the character string comprising content of text is submitted to LTP and carries out interdependent syntactic analysis.Obtain the result of LTP, obtain the file that comprises the XML format of sentence element dependence in text.This file contains para(paragraph), sent(sentence), word(participle) etc. node.Each participle node (word) has with properties: id, is the sequence number of participle in sentence; Cont is participle content; Parent, is No. id of the father node of interdependent syntactic analysis; Relate is corresponding relation.For finding out all subject-predicate languages in text, each section in searching loop text, each sentence, each participle.When searching loop is to word node, attribute relate=if " HED " (HED represents Key Relationships), then the cont value of word node is this predicate, then search this and whether there is such word node, the relate property value of node is that SBV(SBV represents subject-predicate relation), and its parent property value is equal with No. id of predicate node.If existed, the cont value of this word node is exactly this subject.This subject and predicate are extracted, is stored in a set.
Below provide the concrete function code and annotation that realize watermark embed process:
According to the requirement of LTP system application interface API, the text-converted analyzed is submitted to be the form of character string by needing, can realize with System.IO.File.ReadAllText (stringpath, the Encoding.Default) function of C# language, corresponding program code is:
Stringtext=System.IO.File.ReadAllText(path,Encoding.Default);
Wherein, path is the path of text to be analyzed, and text is the character string comprising content of text.
API parameter is set, comprise the address urlbase of access LTPWeb service, obtain when using the key api_key(user of API to register), analytical model pattern(selects dp, interdependent syntactic analysis), result Format Type format(selects XML format), HTTP request mode (selecting GET mode) etc., the character string (text) comprising content of text is submitted to LTP platform and carries out interdependent syntactic analysis.The core code realizing this process is as follows:
stringurlbase="http://api.ltp-cloud.com/analysis/";
stringapi_key="k2r3q7tqGgWp5zBZRSnEHvNKfTRSFhjMtnHQ0QeP";
stringpattern="dp";
stringformat="xml";
stringstrParam=("api_key="+api_key+"&text="+text.ToString()
+"&pattern="+pattern+"&format="+format);
Encodingencoding=Encoding.GetEncoding("utf-8");
HttpWebRequestreq=
WebRequest.Create(urlbase+strParam)asHttpWebRequest;
req.Method="GET";
Obtain the result of LTP, obtain the character string that comprises the XML format of sentence element dependence in text.This process can realize by the StreamReader class of C# language, and corresponding program code is:
HttpWebResponsewebResponse=req.GetResponse()asHttpWebResponse;
StreamReaderstreamReader=
newStreamReader(webResponse.GetResponseStream(),encoding);
Stringresult=streamReader.ReadToEnd();
The result be after process deposited in result.
The character string of the XML format obtained is converted to XML file, DOM parsing is carried out to XML file, according to the HED(Key Relationships of relate attribute of every in interdependent syntactic analysis result) and SBV(subject-predicate relation) between contact, each section of searching loop, each sentence, each participle, finds out the subject-predicate language of every.The core code realizing this process is as follows:
XmlDocumentdoc=newXmlDocument (); // be converted to XML file
doc.LoadXml(result);
XmlElementroot=doc.DocumentElement; // searching loop
XmlNodeListlist1,list2,list3;
XmlNodelist4;
list1=root.SelectNodes("//para");
Foreach (XmlNodenode1inlist1) { // searching loop para node
list2=node1.ChildNodes;
Foreach (XmlNodenode2inlist2) { // searching loop sent node
list3=node2.ChildNodes;
Foreach (XmlNodenode3inlist3) { // searching loop word node
If (node3.Attributes [" relate "] .InnerText==" HED ") // judge predicate
list4=node3;
foreach(XmlNodenode4inlist3){
If (node4.Attributes [" parent "] .InnerText==list4.Attributes [" id "] .InnerText & & node4.Attributes [" relate "] .InnerText==" SBV ") // judge subject hs.Add (node4.Attributes [" cont "] .InnerText+list4.Attributes [" cont "] .InnerText+ " ");
}
}
}
}
}
List<string>sbv=newList<string>();
sbv.AddRange(hs);
The subject-predicate language be in text deposited in set sbv.
By needing the watermark information embedded to encode according to UTF-16, convert a Unicode code string to.Corresponding code is:
byte[]bts=Encoding.Unicode.GetBytes(info);
for(inti=0;i<bts.Length;i+=2)
uc+=bts[i+1].ToString("x").PadLeft(2,'0')+bts[i].ToString("x").PadLeft(2,'0');
Wherein, what deposit in info is watermark information, is the Unicode code string of generation in uc.
With the subject-predicate language in the set of subject-predicate language, above-mentioned Unicode code string is encoded.Take out each the subject-predicate language in set successively, for it distributes one section of Unicode code, and a given numbering, separate with space respectively between subject-predicate language, Unicode code section, numbering, form code book.The core code realizing this process is as follows.
Be defined as the code of the figure place of the Unicode code that each subject-predicate language distributes:
StrU_size=uc.length (); //strU_size is the figure place of Unicode code string
Sbv_size=sbv.Count; //sbv_size is the number of subject-predicate language
Count_size=strU_size/sbv_size; //count_size is the figure place for subject-predicate language distribution Unicode code
For each subject-predicate language distributes the code of one section of Unicode code:
for(intx=0;x<sbv_size;x++){
If (x==sbv_size-1) // be last subject-predicate language distribution Unicode code (figure place is different, processes separately)
code_list.Add(sbv[x]+""+uc.ToString().Substring(x*count_size)+""+(x+1));
Else{ // be subject-predicate language above distributes Unicode code (mean allocation, figure place is identical)
if(x*count_size-1>0){
code_list.Add(sbv[x]+""+uc.ToString().Substring(x*count_size,count_size)+""+(x+1));
}else{
code_list.Add(sbv[x]+""+uc.ToString().Substring(0,x*count_size+count_size)+""+(x+1));
}
}
}
What set code_list deposited is codebook content, is write a txt file, just obtains the codebook file of embed watermark.
According to the above-mentioned Text Watermarking embedding grammar based on subject-predicate language coding, propose a kind of extracting method of the Text Watermarking based on subject-predicate language coding, its embodiment is:
When needs extract watermark, submit to LTP platform to carry out interdependent syntactic analysis in detected text, analysis result is further processed to the subject-predicate language obtained in text, deposits in a set.Identical when the code realizing this process and embed watermark above.
The codebook file formed when opening embed watermark, contrast code book, carries out decoding to each the subject-predicate language in above-mentioned set.Namely successively each the subject-predicate language in set and each subject-predicate language in code book are compared one by one, if both are consistent, then Unicode code section corresponding for this subject-predicate language and numbering thereof are taken out.The each Unicode code section obtained is stitched together by its number order, obtains the Unicode code string representing watermark information.
Below provide code and the annotation of the main operation realizing said process:
Read every a line of code book, put it into an array.Code is:
string[]lines=File.ReadAllLines(path);
Wherein, path is the path of codebook file, and lines is the array comprising the every a line content of code book.
According to space, the subject-predicate language of every a line is split, compare one by one with the subject-predicate language in the detected text deposited in foregoing assemblage, if any consistent person, take out Unicode code section and the numbering thereof of this row, put into a set.Code is:
for(inti=0;i<sbv.Count;i++){
for(intj=0;j<lines.Length;j++){
string[]lgs=lines[j].ToString().Split(newChar[]{''},2);
if(sbv[i]==lgs[0])
st.Add(lgs[1]);}
}
St is the set of depositing Unicode code section and numbering thereof.
According to space, each Unicode code Duan Yuqi is numbered separated, according to number order, each code section is stitched together, obtains the Unicode code string representing watermark information.Code is:
for(intx=0;x<st.Count;x++){
for(inty=0;y<st.Count;y++){
string[]lgs=st[y].ToString().Split(newChar[]{''},2);
if(Convert.ToInt32(lgs[1])==(x+1))
drawUc.Append(lgs[0]);
}
}
The Unicode code string representing watermark information is in drawUc.
According to the UTF-16 coding rule used during embed watermark, above-mentioned Unicode code string is converted to corresponding character, just obtains the watermark information embedded.The core code realizing this process is as follows:
MatchCollectionmc=Regex.Matches(str,"([\w]{2})([\w]{2})",
RegexOptions.Compiled|RegexOptions.IgnoreCase);
byte[]bts=newbyte[2];
foreach(Matchminmc){
bts[0]=(byte)int.Parse(m.Groups[2].Value,NumberStyles.HexNumber);
bts[1]=(byte)int.Parse(m.Groups[1].Value,NumberStyles.HexNumber);
toStr+=Encoding.Unicode.GetString(bts);
}
The watermark information extracted is contained by toStr.

Claims (6)

1., based on a Text Watermarking embedding grammar for subject-predicate language coding, it is characterized in that comprising
1) by each character Unicode coded representation of watermark information, a Unicode code string is formed;
2) detect the subject-predicate language of statement in text to be embedded, deposit in a set;
3) according to the subject-predicate language quantity detected, Unicode code string is divided into some sections, each subject-predicate pragmatic wherein one section carrys out coded representation, to the given numbering of the Unicode code section that each subject-predicate language is corresponding, according to numbering splicing Unicode code string during for extracting watermark;
4) store Unicode code section corresponding to each subject-predicate language, this subject-predicate language and numbering corresponding to this subject-predicate successively, form a code book, complete coding, realize the embedding of watermark.
2. the Text Watermarking embedding grammar based on subject-predicate language coding according to claim 1, it is characterized in that described Unicode encodes and adopt UTF-16 form, each character is 4 sexadecimal numbers, forms a hexadecimal Unicode code string.
3. Text Watermarking embedding grammar according to claim 1, is characterized in that described step 2) in detect that the subject-predicate language in text to be embedded comprises the steps:
To the text-converted of watermark to be embedded be submitted to be the form of character string;
B) character string of the text of watermark to be embedded is committed to language technology platform LTP and carries out interdependent syntactic analysis, obtain the character string that comprises the XML format of sentence element dependence in text;
C) character string of the XML format obtained is converted to XML file, carries out DOM parsing to XML file, according to the contact between the Key Relationships of sentence element attribute of a relation in XML file and subject-predicate relation, searching loop file, finds out the subject-predicate language of every.
4. the Text Watermarking embedding grammar based on subject-predicate language coding according to claim 1, is characterized in that separating with space respectively between the subject-predicate language of every a line in described code book, Unicode code section, numbering.
5. the Text Watermarking embedding grammar based on subject-predicate language coding according to any one of claim 1-4, a kind of extracting method of the Text Watermarking based on subject-predicate language coding is proposed, it is characterized in that, comprise: find out the subject-predicate language in detected text, the described code book formed during contrast embed watermark, Unicode code section, numbering that each subject-predicate language is corresponding is taken out from code book, Unicode code section is got up by the sequential concatenation of the numbering of correspondence, obtain the Unicode code string representing watermark information, convert corresponding character again to, form the watermark information embedded.
6. the extracting method of the Text Watermarking based on subject-predicate language coding according to claim 5, it is characterized in that the step of Unicode code section that in described taking-up detected text, each subject-predicate language is corresponding and numbering comprises: each subject-predicate language in the detected text found out and each subject-predicate language in code book are compared one by one, if both are consistent, then from code book, take out Unicode code section, numbering that this subject-predicate language is corresponding.
CN201510743382.1A 2015-11-05 2015-11-05 A kind of Text Watermarking insertion and extracting method based on subject-predicate language coding Active CN105404614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510743382.1A CN105404614B (en) 2015-11-05 2015-11-05 A kind of Text Watermarking insertion and extracting method based on subject-predicate language coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510743382.1A CN105404614B (en) 2015-11-05 2015-11-05 A kind of Text Watermarking insertion and extracting method based on subject-predicate language coding

Publications (2)

Publication Number Publication Date
CN105404614A true CN105404614A (en) 2016-03-16
CN105404614B CN105404614B (en) 2018-05-25

Family

ID=55470109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510743382.1A Active CN105404614B (en) 2015-11-05 2015-11-05 A kind of Text Watermarking insertion and extracting method based on subject-predicate language coding

Country Status (1)

Country Link
CN (1) CN105404614B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491423A (en) * 2016-06-12 2017-12-19 北京云量数盟科技有限公司 A kind of Chinese document gene based on numeric character string hybrid coding quantifies and characterizing method
CN108363910A (en) * 2018-01-23 2018-08-03 南通大学 A kind of insertion of the webpage watermark based on HTML code and extracting method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639826A (en) * 2009-09-01 2010-02-03 西北大学 Text hidden method based on Chinese sentence pattern template transformation
CN102184243A (en) * 2011-05-17 2011-09-14 沈阳化工大学 Text-type attribute-based relational database watermark embedding method
US20140016814A1 (en) * 2012-07-13 2014-01-16 International Business Machines Corporation Hierarchical and index based watermarks represented as trees

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639826A (en) * 2009-09-01 2010-02-03 西北大学 Text hidden method based on Chinese sentence pattern template transformation
CN102184243A (en) * 2011-05-17 2011-09-14 沈阳化工大学 Text-type attribute-based relational database watermark embedding method
US20140016814A1 (en) * 2012-07-13 2014-01-16 International Business Machines Corporation Hierarchical and index based watermarks represented as trees

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZUNERA JALIL ET AL: "A Review of Digital Watermarking Techniques for Text Documents", 《2009 INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY》 *
斯琴 等: "基于文本特征的文本水印算法", 《计算机应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491423A (en) * 2016-06-12 2017-12-19 北京云量数盟科技有限公司 A kind of Chinese document gene based on numeric character string hybrid coding quantifies and characterizing method
CN107491423B (en) * 2016-06-12 2021-03-30 北京云量数盟科技有限公司 Chinese document gene quantization and characterization method based on numerical value-character string mixed coding
CN108363910A (en) * 2018-01-23 2018-08-03 南通大学 A kind of insertion of the webpage watermark based on HTML code and extracting method
CN108363910B (en) * 2018-01-23 2020-01-10 南通大学 Webpage watermark embedding and extracting method based on HTML (Hypertext markup language) code

Also Published As

Publication number Publication date
CN105404614B (en) 2018-05-25

Similar Documents

Publication Publication Date Title
Chang et al. Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method
CN105205355B (en) A kind of Text Watermarking insertion and extracting method based on the mapping of semantic role position
CN103761459B (en) A kind of document multiple digital watermarking embedding, extracting method and device
EP1515240A2 (en) Chinese word segmentation
US8750630B2 (en) Hierarchical and index based watermarks represented as trees
CN108459874A (en) Code automatic summarization method integrating deep learning and natural language processing
CN103778200A (en) Method for extracting information source of message and system thereof
CN102096787A (en) Method and device for hiding information based on word2007 text segmentation
CN106095735A (en) A kind of method plagiarized based on deep neural network detection academic documents
CN103294959A (en) Text information hiding method resistant to statistic analysis
CN103544408A (en) Method for embedment and extraction of PDF document hidden information according to composite font
CN107871002A (en) A kind of across language plagiarism detection method based on fingerprint fusion
Chuang et al. Context-aware wrapping: Synchronized data extraction
Chen et al. Text watermarking algorithm based on semantic role labeling
CN105404614A (en) Subject and predicate coding based text watermark embedding and extraction method
CN106407288B (en) Method and system for synchronously updating information
CN102682248B (en) Watermark embedding and extracting method for ultrashort Chinese text
CN107169011A (en) The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN104331400A (en) Mongolian code conversion method and device
CN101901325A (en) Copyright protection method
WO2020139563A1 (en) Information processing method, hidden information parsing and embedding method, apparatus, and device
Ji et al. Coverless information hiding method based on the keyword
CN108255866B (en) Method and device for checking links in website
Fu et al. Text split‐based steganography in OOXML format documents for covert communication
Blessing et al. Crosslingual distant supervision for extracting relations of different complexity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant