CN110472384A - A kind of big data water mark method and device based on artificial intelligence - Google Patents

A kind of big data water mark method and device based on artificial intelligence Download PDF

Info

Publication number
CN110472384A
CN110472384A CN201910746344.XA CN201910746344A CN110472384A CN 110472384 A CN110472384 A CN 110472384A CN 201910746344 A CN201910746344 A CN 201910746344A CN 110472384 A CN110472384 A CN 110472384A
Authority
CN
China
Prior art keywords
watermark
big data
module
artificial intelligence
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910746344.XA
Other languages
Chinese (zh)
Inventor
邓高见
张慧
李萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Tianyu (suzhou) Technology Co Ltd
Original Assignee
Zhongke Tianyu (suzhou) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Tianyu (suzhou) Technology Co Ltd filed Critical Zhongke Tianyu (suzhou) Technology Co Ltd
Priority to CN201910746344.XA priority Critical patent/CN110472384A/en
Publication of CN110472384A publication Critical patent/CN110472384A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • G06Q50/184Intellectual property management

Abstract

The present invention relates to a kind of big data water mark method and device based on artificial intelligence, key step include: that perception of content analysis module carries out language comprehension analysing to big data content, obtain data content type;Artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;Watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;Natural language code element is embedded into original big data content of text by watermark embedding module;Watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;Natural language code element is converted to watermark information by watermark decoder module.The present invention is directed to the unformatted feature of big data content of text; natural language element encoded watermark is embedded in using intelligent algorithm; other information channel is not depended on; the destruction of the various attacks such as big data text editing, duplication, cutting, merging can be resisted; with very strong robustness and robustness, it is capable of the intellectual property of effective protection big data.

Description

A kind of big data water mark method and device based on artificial intelligence
Technical field
The present invention relates to a kind of big data water mark method and devices, and in particular to a kind of big data water based on artificial intelligence Impression method and device, belong to information security field.
Background technique
With the development of big data technology, big data content safety is more and more important, and especially secret protection, knowledge produce Power, leakage tracing etc. are even more the most important thing.Digital watermarking (Digital Watermarking) is to carry out data assets protection Important method is usually embedded in digital signal in digital product, can be image, text, symbol, number it is equal all can make For the information for identifying and marking, the purpose is to carry out copyright protection, proof of ownership, fingerprint (tracking publication multiple copies) and complete Whole property protection etc..
Traditional digital watermark technology is that some identification informations (i.e. digital watermarking) are directly embedded into digital carrier to (packet Include multimedia, document, software etc.) or secondary indication (structure of modification specific region), and the use valence of original vector is not influenced Value is also not easy to be ascertained and modify again.But it can be identified and be recognized by producer.Letter in the carrier is hidden by these Breath can achieve confirmation creator of content, buyer, transmission secret information or judge the purpose of whether carrier is tampered.Number Word watermark be protection information security, realize it is anti-fake trace to the source, the effective way of copyright protection, be Investigation of Information Hiding Technology field Important branch and research direction.
But big data has particularity compared to traditional information system, such as data are largely to deposit text, unformatted Information needs to guarantee to do in extensive shared and calculating process availability, therefore without image of Buddha picture, audio-video even pdf etc. Document form is once hidden in format and document properties.
Therefore, the safety of the information content is ensured for new digital watermark technology under big data scene, is needed.With nature Artificial intelligence technology one of with the characteristics of Language Processing (NLP) can be converted in linguistry level, not influence entirety Content recognition, on the basis of recognizing reading and understanding, incorporate specific language element (high frequency is synonymous, nearly justice, shape is close, split, merge, Negate the linguistic units, including word, word, phrase, short sentence such as antisense etc.), to be embedded in the watermark information of protection big data content.It adopts The digital watermarking that manually intelligent method is realized can resist the volume to content of text under conditions of big data massive information Volume, processing etc. various attacks and destruction.
Summary of the invention
In view of this, the invention discloses a kind of big data water mark method and device based on artificial intelligence, key step Include: that perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;Artificial intelligence water Print all kinds of code element natural languages of library module classification storage library;Watermark information is converted to artificial intelligence water by watermark encoder module Print the natural language code element in library;Natural language code element is embedded into original big data text by watermark embedding module In content;Watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;Water It prints decoder module and natural language code element is converted into watermark information.The present invention is directed to the unformatted spy of big data content of text Point is embedded in natural language element encoded watermark using intelligent algorithm, does not depend on other information channel, can resist big data The various attacks such as text editing, duplication, cutting, merging destroy, have very strong robustness and robustness, can effective protection it is big The intellectual property of data.
Technical scheme is as follows: a kind of big data water mark method based on artificial intelligence, step include:
1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;
2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;
3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
4) natural language code element is embedded into original big data content of text by watermark embedding module;
5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
6) natural language code element is converted to watermark information by watermark decoder module.
Further, the content of big data is identified and analyzed in the perception of content analysis module, obtains in data The linguistic property of appearance, including the multiple types such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose are special Sign.
Further, the artificial intelligence watermark library module establishes different natures according to different language form respectively Speech encoding element is the language lists such as the synonymous high frequency in all kinds of language classifications, nearly justice, shape is close, split, merge, negative antisense Member, including word, word, phrase, short sentence etc..
Further, the binary equivament code of watermark information is converted to artificial intelligence watermark repository by the watermark encoder module In natural language coding, in order to increase safety, the binary code of watermark can be encrypted by Encryption Algorithm.
Further, the natural language code element of the watermark embedding module watermark is substituted into big data urtext Rong Zhong.
Further, the watermark extracting module and watermark decoder module are the inverse process for being embedded in watermark, from from insertion Natural language code element is extracted in the big data content of text of watermark, and is converted to original watermark information.
The present invention also proposes the big data watermarking device based on artificial intelligence, including perception of content analysis module, artificial intelligence Energy watermark library module, watermark encoder module, watermark embedding module, watermark extracting module and watermark decoder module,
The content of big data is identified and analyzed in the perception of content analysis module, obtains the linguistic property of data content, Including the multiple types feature such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
The artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively, For linguistic units such as the high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisenses, including word, word, Phrase, short sentence etc.;
The binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository by the watermark encoder module Coding;
The natural language code element of the watermark embedding module watermark is substituted into big data raw text content;
The watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
Natural language code element is converted to watermark information by the watermark decoder module, if original watermark binary system have passed through Encryption, decoding will finally carry out binary system decryption.
The invention has the benefit that
The present invention provides a kind of big data water mark method and device based on artificial intelligence, perception of content analysis module is to big number Language comprehension analysing is carried out according to content, obtains data content type;Artificial intelligence watermark repository module classification stores all kinds of coding members Plain natural language library;Watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module; Natural language code element is embedded into original big data content of text by watermark embedding module;Watermark extracting module from insertion water Natural language code element is extracted in the big data content of text of print;Watermark decoder module is by natural language code element Be converted to watermark information.The present invention is directed to the unformatted feature of big data content of text, is embedded in nature using intelligent algorithm Language element encoded watermark does not depend on other information channel, and it is more can to resist big data text editing, duplication, cutting, merging etc. Kind attack destroys, and has very strong robustness and robustness, is capable of the intellectual property of effective protection big data.
Detailed description of the invention
Attached drawing 1 is that the present invention is based on the big data watermarks of artificial intelligence to be embedded in flow chart.
Attached drawing 2 is the big data watermark extracting flow chart the present invention is based on artificial intelligence.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
Big data watermarking device disclosed in one embodiment of the invention based on artificial intelligence, the steps include:
1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;
2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;
3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
4) natural language code element is embedded into original big data content of text by watermark embedding module;
5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
6) natural language code element is converted to watermark information by watermark decoder module.
Below by way of specific example in attached drawing based on artificial intelligence big data water mark method and device carry out into one The explanation of step.
As shown in Fig. 1, the big data watermark insertion based on artificial intelligence, key step include:
1, the content of big data is identified and analyzed in perception of content analysis module, obtains the linguistic property of data content, packet Include the multiple types features such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
2, artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively, is The linguistic units such as high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisense, including it is word, word, short Language, short sentence etc.;
3, the binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository and compiled by watermark encoder module Code, in order to increase safety, the binary code of watermark can be encrypted by Encryption Algorithm;
4, the natural language code element of the watermark embedding module watermark is substituted into big data raw text content.
As shown in Fig. 2, the big data watermark extracting based on artificial intelligence, its step are as follows:
1, the content of big data is identified and analyzed in perception of content analysis module, obtains the linguistic property of data content, packet Include the multiple types features such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
2, the natural language library in watermark extracting module combination artificial intelligence watermark library module, from the big data text of insertion watermark Natural language code element is extracted in content;
3, natural language code element is converted to watermark information by watermark decoder module, is added if original watermark binary system have passed through Close, decoding will finally carry out binary system decryption.
The purpose of the above described specific embodiments of the present invention is use for a better understanding of the present invention, is not constituted Limiting the scope of the present invention.Any modification made within the spirit and principles in the present invention essence deforms and is equal Replacement etc., all should belong within scope of protection of the claims of the invention.

Claims (7)

1. a kind of big data water mark method based on artificial intelligence, step include:
1) perception of content analysis module carries out language comprehension analysing to big data content, obtains data content type;
2) artificial intelligence watermark repository module classification stores all kinds of code element natural languages library;
3) watermark information is converted to the natural language code element in artificial intelligence watermark repository by watermark encoder module;
4) natural language code element is embedded into original big data content of text by watermark embedding module;
5) watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
6) natural language code element is converted to watermark information by watermark decoder module.
2. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the perception of content point The content of big data is identified and analyzed in analysis module, obtains the linguistic property of data content, including Chinese, English, ancient Chinese prose, The multiple types feature such as modern age text, astronomy, geography, law, official document, prose.
3. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the artificial intelligence water Print library module establishes different natural language code elements according to different language form respectively, is in all kinds of language classifications The linguistic units, including word, word, phrase, short sentence such as high frequency is synonymous, nearly justice, shape is close, split, merge, negative antisense etc..
4. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark encoder mould The binary equivament code of watermark information is converted to the coding of the natural language in artificial intelligence watermark repository by block, in order to increase safety Property, the binary code of watermark can be encrypted by Encryption Algorithm.
5. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark is embedded in mould The natural language code element of block watermark is substituted into big data raw text content.
6. the big data water mark method based on artificial intelligence as described in claim 1, which is characterized in that the watermark extracting Module and watermark decoder module are the inverse process for being embedded in watermark, by natural language from the big data content of text from insertion watermark Code element extracts, and is converted to original watermark information.
7. a kind of big data watermarking device based on artificial intelligence, including perception of content analysis module, artificial intelligence watermark repository mould Block, watermark encoder module, watermark embedding module, watermark extracting module and watermark decoder module,
The content of big data is identified and analyzed in the perception of content analysis module, obtains the linguistic property of data content, Including the multiple types feature such as Chinese, English, ancient Chinese prose, modern age text, astronomy, geography, law, official document, prose;
The artificial intelligence watermark library module establishes different natural language code elements according to different language form respectively, For linguistic units such as the high frequency in all kinds of language classifications is synonymous, nearly justice, shape is close, split, merge, negative antisenses, including word, word, Phrase, short sentence etc.;
The binary equivament code of watermark information is converted to the natural language in artificial intelligence watermark repository by the watermark encoder module Coding;
The natural language code element of the watermark embedding module watermark is substituted into big data raw text content;
The watermark extracting module extracts natural language code element from the big data content of text of insertion watermark;
Natural language code element is converted to watermark information by the watermark decoder module, if original watermark binary system have passed through Encryption, decoding will finally carry out binary system decryption.
CN201910746344.XA 2019-08-13 2019-08-13 A kind of big data water mark method and device based on artificial intelligence Pending CN110472384A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910746344.XA CN110472384A (en) 2019-08-13 2019-08-13 A kind of big data water mark method and device based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910746344.XA CN110472384A (en) 2019-08-13 2019-08-13 A kind of big data water mark method and device based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN110472384A true CN110472384A (en) 2019-11-19

Family

ID=68510597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910746344.XA Pending CN110472384A (en) 2019-08-13 2019-08-13 A kind of big data water mark method and device based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN110472384A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324871A (en) * 2020-03-09 2020-06-23 河南大学 Big data watermarking method and device based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957810A (en) * 2009-07-16 2011-01-26 西安腾惟科技有限公司 Method and device for embedding and detecting watermark in document by using computer system
CN102194205A (en) * 2010-03-18 2011-09-21 湖南大学 Method and device for text recoverable watermark based on synonym replacement
CN102254126A (en) * 2011-07-29 2011-11-23 西安交通大学 Robust-based natural language Hash domain spread spectrum watermarking coding algorithm for
CN105205355A (en) * 2015-11-05 2015-12-30 南通大学 Embedding method and extracting method for text watermark based on semantic role position mapping
US20190130080A1 (en) * 2017-10-27 2019-05-02 Telefonica Digital Espana, S.L.U. Watermark embedding and extracting method for protecting documents

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957810A (en) * 2009-07-16 2011-01-26 西安腾惟科技有限公司 Method and device for embedding and detecting watermark in document by using computer system
CN102194205A (en) * 2010-03-18 2011-09-21 湖南大学 Method and device for text recoverable watermark based on synonym replacement
CN102254126A (en) * 2011-07-29 2011-11-23 西安交通大学 Robust-based natural language Hash domain spread spectrum watermarking coding algorithm for
CN105205355A (en) * 2015-11-05 2015-12-30 南通大学 Embedding method and extracting method for text watermark based on semantic role position mapping
US20190130080A1 (en) * 2017-10-27 2019-05-02 Telefonica Digital Espana, S.L.U. Watermark embedding and extracting method for protecting documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何路 等: "自然语言水印鲁棒性分析与评估", 《计算机学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324871A (en) * 2020-03-09 2020-06-23 河南大学 Big data watermarking method and device based on artificial intelligence

Similar Documents

Publication Publication Date Title
Kamaruddin et al. A review of text watermarking: theory, methods, and applications
Agarwal Text steganographic approaches: a comparison
Ahvanooey et al. ANiTW: A novel intelligent text watermarking technique for forensic identification of spurious information on social media
Jalil et al. A review of digital watermarking techniques for text documents
CN103049682B (en) Character pitch encoding-based dual-watermark embedded text watermarking method
US20210165860A1 (en) Watermark embedding and extracting method for protecting documents
CN101957810A (en) Method and device for embedding and detecting watermark in document by using computer system
Al-Wesabi Proposing high-smart approach for content authentication and tampering detection of arabic text transmitted via internet
Sun et al. Component-based digital watermarking of Chinese texts
Memon et al. EVALUATION OF STEGANOGRAPHY FOR URDU/ARABIC TEXT.
Domain A review and open issues of diverse text watermarking techniques in spatial domain
CN103544408A (en) Method for embedment and extraction of PDF document hidden information according to composite font
Al-Wesabi A smart English text zero-watermarking approach based on third-level order and word mechanism of Markov model
Al-Wesabi Entropy-Based Watermarking Approach for Sensitive Tamper Detection of Arabic Text.
Al-Wesabi et al. A Reliable NLP Scheme for English Text Watermarking Based on Contents Interrelationship.
CN110472384A (en) A kind of big data water mark method and device based on artificial intelligence
Ghilan et al. Combined Markov model and zero watermarking techniques to enhance content authentication of english text documents
CN114648435A (en) Method, device and equipment for detecting watermark in text and storage medium
Zhang et al. Chinese text watermarking based on occlusive components
Al-Wesabi et al. Proposing a High-Robust Approach for Detecting the Tampering Attacks on English Text Transmitted via Internet.
CN114078071A (en) Image tracing method, device and medium
KR20010008048A (en) Watermarking method for digital contents
Al-Wesabi Text Analysis-Based Watermarking Approach for Tampering Detection of English Text.
Zheng et al. General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling
Pathak A new approach for text steganography using Hindi numerical code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination