CN110321675B - Webpage watermark-based generation and tracing method and device - Google Patents
Webpage watermark-based generation and tracing method and device Download PDFInfo
- Publication number
- CN110321675B CN110321675B CN201810272166.7A CN201810272166A CN110321675B CN 110321675 B CN110321675 B CN 110321675B CN 201810272166 A CN201810272166 A CN 201810272166A CN 110321675 B CN110321675 B CN 110321675B
- Authority
- CN
- China
- Prior art keywords
- html
- webpage
- watermark information
- preset number
- ordered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000004891 communication Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 108091026890 Coding region Proteins 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 101100115215 Caenorhabditis elegans cul-2 gene Proteins 0.000 description 1
- 101100171060 Caenorhabditis elegans div-1 gene Proteins 0.000 description 1
- -1 DIV2 Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Technology Law (AREA)
- Multimedia (AREA)
- Information Transfer Between Computers (AREA)
- Storage Device Security (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
The embodiment of the application discloses a method and a device for generating and tracing a webpage watermark. Firstly, creating a tag attribute coding library containing the corresponding relation between a webpage and an ordered HTML tag embedded with watermark information, and then acquiring a published webpage picture carrying the watermark information; matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture; based on the webpage identification and the stored label attribute coding library, a first preset number of ordered HTML labels with embedded watermark information of the complete webpage are obtained, the watermark information in the ordered HTML labels is extracted, then the watermark information is decrypted, and user information of a published webpage picture is obtained, namely, an internal user is uniquely identified through the watermark information embedded in the HTML labels, so that the internal user who traceably reveals the website system content information is realized.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for generating and tracing a webpage watermark.
Background
With the development and wide application of networks, digital products are widely applied by users in daily life, and problems of copyright protection, integrity protection, privacy protection and the like of the digital products are generated. When an internal user (or called an internal manager) of the website system distributes some webpage contents which are only visible to the internal user to an external network, internal information of the website system is leaked, and the social reputation and the development of the website system are affected. Therefore, content protection of the internal website system of the enterprise and the organization is a problem to be solved urgently.
At present, content protection of a website system mainly depends on access control and auditing technology, and information of a user accessing the website system in the website system is recorded by a log.
However, since the webpage browsed by the internal user cannot uniquely identify the user, and the target user of information leakage cannot be traced when the internal webpage content information of the website system is leaked, the conventional access control and auditing technology cannot resist the internal user from distributing the internal content of the website system on the external network in the form of screen capture or webpage source codes.
Disclosure of Invention
The embodiment of the application provides a source tracing method and device based on webpage watermarks, which are used for tracking internal users who reveal website system content information.
In a first aspect, a tracing method based on web page watermark is provided. The method can comprise the following steps: receiving a login request of an internal user for logging in a current webpage, wherein the login request comprises user information;
carrying out watermark encryption on user information to obtain watermark information of a user on a current webpage;
acquiring a first preset number of HTML (hypertext markup language) tags in a current webpage, wherein the HTML tags are ordered HTML tags to be embedded with watermark information, and the ordered HTML tags are ordered HTML tags which are ordered according to HTML tag sequence numbers; and embedding the watermark information into a first preset number of ordered HTML (hypertext markup language) tags in which the watermark information is to be embedded, and generating a first preset number of ordered HTML tags in which the watermark information is embedded in the current webpage. The method realizes that the webpage displayed to the internal user carries the user information, and ensures that the browsed webpage can uniquely identify the user.
In an optional implementation, obtaining a first preset number of HTML tags in a current web page includes: acquiring attribute information of a plurality of original HTML tags in a current webpage; sequentially adjusting a second preset number of attribute information of each original HTML label in a plurality of original HTML labels; selecting a first preset number of adjusted HTML labels meeting preset display conditions as ordered HTML labels to be embedded with watermark information, wherein the preset display conditions are that when the current webpage is displayed, the display effect is not influenced among the first preset number of adjusted HTML labels.
In an optional implementation, after obtaining the attribute information of the original HTML tag of the current web page, the method further includes: selecting an original HTML label to be adjusted according to the attribute values of the attribute information of the original HTML labels; and the original HTML tags to be adjusted are other original HTML tags after the original HTML tags with the maximum attribute values are deleted. Because the adjustment variable quantity corresponding to the HTML label corresponding to the maximum attribute value is difficult to identify, watermark information is not embedded in the HTML label, so that the variable quantity of the adjusted HTML label and the variable quantity of the corresponding original HTML label are quickly identified, and the tracing accuracy is improved.
In an alternative implementation, the watermark information is a multi-bit binary bit string; embedding watermark information into an ordered HTML (hypertext markup language) label to be embedded with watermark information comprises the following steps: dividing the binary bit strings into a first preset number of groups, wherein each group has a second preset number of binary bit numbers; and embedding a second preset number of binary bit numbers into each ordered HTML label to be embedded with watermark information. The method is an embedding method of embedding the watermark information into the ordered HTML label of the watermark information to be embedded.
In an alternative implementation, after generating a first preset number of ordered HTML tags embedding watermark information, the method further includes: and creating a tag attribute coding library of the current webpage and a first preset number of ordered HTML tags embedded with the watermark information.
In an alternative implementation, the user information includes a user identification, a unit identification, a user address, and a current timestamp.
In a second aspect, a tracing method based on web page watermark is provided. The method can comprise the following steps: acquiring a published webpage picture carrying watermark information, wherein the watermark information is encrypted user information; matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture; acquiring a first preset number of ordered HTML (hypertext markup language) tags with embedded watermark information of a complete webpage based on the webpage identification and a stored tag attribute coding library, wherein the ordered HTML tags are ordered according to HTML tag sequence numbers; extracting watermark information in HTML labels with a first preset number of embedded watermark information; and the tag attribute coding library is used for storing a first preset number of ordered HTML tags embedded with the watermark information of the current webpage. The method can ensure that the webpage displayed to the internal user can uniquely identify the user, so that the internal user who reveals the information can be found after webpage content is revealed in the form of pictures.
In an alternative implementation, the user information includes a user identification, a unit identification, a user address, and a current timestamp.
In a third aspect, a generating apparatus is provided, and the apparatus may include:
the receiving unit is used for receiving a login request of an internal user for logging in the current webpage, wherein the login request comprises user information;
the encryption unit is used for carrying out watermark encryption on the user information to obtain the watermark information of the user on the current webpage;
the acquisition unit is used for acquiring a first preset number of HTML (hypertext markup language) tags in the current webpage, wherein the HTML tags are orderly-divided HTML tags to be embedded with watermark information, and the orderly HTML tags are ordered according to HTML tag sequence numbers;
and the embedding unit is used for embedding the watermark information into the ordered HTML tags of the first preset number of the watermark information to be embedded and generating the ordered HTML tags of the first preset number of the embedded watermark information.
In an optional implementation, the obtaining unit is specifically configured to obtain attribute information of a plurality of original HTML tags in a current web page;
sequentially adjusting a second preset number of attribute information of each original HTML label in a plurality of original HTML labels;
selecting a first preset number of adjusted HTML labels meeting preset display conditions as ordered HTML labels to be embedded with watermark information, wherein the preset display conditions are that when the current webpage is displayed, the display effect is not influenced among the first preset number of adjusted HTML labels.
In an alternative implementation, the apparatus further comprises a selecting unit;
and the selecting unit is used for selecting the original HTML label to be adjusted according to the attribute value of the attribute information of the original HTML label, wherein the original HTML label to be adjusted is the other original HTML label after the original HTML label with the maximum attribute value is deleted.
In an alternative implementation, the watermark information is a multi-bit binary bit string;
the embedding unit is specifically used for dividing the binary bit strings into a first preset number of groups, and each group has a second preset number of binary bit numbers;
and embedding a second preset number of binary bit numbers into each ordered HTML label to be embedded with watermark information.
In an alternative implementation, the apparatus further comprises a creating unit;
and the creating unit is used for creating a tag attribute coding library of the current webpage and a first preset number of ordered HTML tags embedded with the watermark information.
In an alternative implementation, the user information includes a user identification, a unit identification, a user address, and a current timestamp.
In a fourth aspect, a tracing apparatus is provided, which may include:
and the acquisition unit is used for acquiring the published webpage picture carrying the watermark information, and the watermark information is encrypted user information.
And the matching unit is used for matching the webpage picture with the stored complete webpage to obtain the webpage identification of the complete webpage corresponding to the webpage picture.
The acquiring unit is further used for acquiring ordered HTML tags of a first preset number of embedded watermark information of the complete webpage based on the webpage identification and the stored tag attribute coding library, the ordered HTML tags are ordered according to the HTML tag sequence numbers, and the tag attribute coding library is used for storing the ordered HTML tags of the first preset number of embedded watermark information of the current webpage.
And the extracting unit is used for extracting the watermark information in the HTML labels with the first preset number of embedded watermark information.
And the decryption unit is used for decrypting the watermark information to obtain the user information of the published webpage picture.
In an alternative implementation, the user information includes a user identification, a unit identification, a user address, and a current timestamp.
In a fifth aspect, an electronic device is provided, which may include a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps described in the first aspect when executing the program stored in the memory.
In a sixth aspect, another electronic device is provided, where the apparatus may include a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of the first or second aspect when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of any one of the first or second aspects described above.
In yet another aspect of the present invention, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the first or second aspects.
Therefore, according to the technical scheme, the corresponding relation between the webpage and the ordered HTML tag embedded with the watermark information is generated firstly, and then the published webpage picture carrying the watermark information is obtained; matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture; based on the webpage identification, a first preset number of ordered HTML (hypertext markup language) labels embedded with watermark information of the complete webpage are obtained, the watermark information in the ordered HTML labels is extracted, then the watermark information is decrypted, and user information of a published webpage picture is obtained, namely, an internal user is uniquely identified through the watermark information embedded in the HTML labels, so that the internal user who reveals the website system content information is tracked.
Drawings
Fig. 1A is a schematic flowchart of a tracing method based on web page watermarking according to an embodiment of the present invention;
fig. 1B is a schematic flowchart of a method for generating a web watermark according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another tracing method based on web page watermarking according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a generating apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a tracing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without any creative effort belong to the protection scope of the present application.
The tracing method based on the webpage watermark provided by the embodiment of the invention can be applied to a server. Different from the access control and audit technology of a server in the prior art, the tracing method based on the webpage watermark converts each DIV label in the webpage into various different DIV labels by adjusting HyperText Markup Language (HTML) in the webpage, such as attribute information of width, height, margin and the like of Division (DIV) labels, wherein the differences are embodied in the positions and sizes of the DIV labels to form the DIV labels to be embedded with the watermark information, and ensures that the webpage displayed to an internal user can uniquely identify the user, so that the internal user can be found by leaking the webpage content in the form of pictures. The form of the picture can be obtained by a screen capture mode of the user equipment. DIV tags, also known as separation marks. The function is to set the placing positions of characters, pictures, tables and the like on the current webpage.
It is understood that the HTML tag of the present application may be a format tag such as a directory listing DIR tag, a definition listing DL tag, an option listing MENU tag, or the like, in addition to the DIV tag listed above.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present invention and are not intended to limit the present invention, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The following description will be made in detail by taking an example in which HTML tags are DIV tags.
Fig. 1A is a schematic flowchart of a tracing method based on a web watermark according to an embodiment of the present invention. As shown in fig. 1A, the execution subject of the method is a server, and the method may include:
and step 110, acquiring the published webpage picture carrying the watermark information.
Before executing this step, the server needs to generate a correspondence between the web page and the ordered DIV tag embedded with the watermark information, as shown in fig. 1B, the method may include:
and step 11, receiving a login request of an internal user for logging in the current webpage.
The login request includes user information, which may include an identification of the user, an identity of the unit, an IP address of the user, and a current timestamp.
And step 12, carrying out watermark encryption on the user information to obtain the watermark information of the user on the current webpage.
The server carries out watermark encryption on the user information according to the digital certificate applied in advance to obtain the watermark information of the user on the current webpage, and the watermark information can uniquely identify the login user. The server can convert the user information into a binary bit string, and then based on the digital certificate, the watermark encryption is performed on the binary bit string to obtain the binary bit string in the form of a ciphertext, namely watermark information.
And step 13, acquiring a first preset number of DIV labels in the current webpage, wherein the DIV labels are sequentially divided DIV labels to be embedded with watermark information.
When the current webpage is logged in for the first time, the server needs to acquire a first preset number of DIV labels to be embedded with watermark information in the current webpage.
Specifically, the server obtains attribute information of an original DIV tag of the current web page, where each web page may include multiple DIV tags, and each DIV tag has a corresponding sequence number, for example, the sequence numbers of three different DIV tags may be respectively denoted as DIV1, DIV2, and DIV 3.
And sequentially adjusting a second preset number of attribute information in the original DIV label, wherein the adjusted DIV label comprises a correspondingly adjusted coding sequence. Taking the second preset number as M as an example, sequentially adjusting the width, height, margin and the like of each original DIV label to embody the size and position of the DIV label, and obtaining 2MA code sequence, the width of the code sequence is M. When M is 2 and the adjusted attribute information is width (weight) and height (height), the coding sequence combinations of the adjusted DIV tags are (weight +0, height +0), (weight + delta w variation, height +0), (weight +0, height + delta h variation) and (weight + delta w variation, height + delta h variation), a first preset number of adjusted DIV tags meeting preset display conditions are selected from the adjusted DIV tags to serve as the DIV tags in which watermark information is to be embedded, and the preset display conditions are the influences of no display effect among the adjusted DIV tags when the current webpage is displayed. And then sequencing the DIV labels with the first preset number of to-be-embedded watermark information according to the sequence number of each DIV label to obtain the ordered DIV labels with the first preset number of to-be-embedded watermark information.
The server takes the webpage as a unit, stores the sequential DIV label serial numbers of the first preset number of the to-be-embedded watermark information selected in each webpage, the second preset number of the coded sequences corresponding to each sequential DIV label and the variable quantity of the attribute information, and establishes a label attribute coding library.
Optionally, in order to be able to quickly identify the variation of the adjusted DIV tag and the corresponding original DIV tag, the accuracy of tracing is improved, after the attribute information of the original DIV tag of the current webpage is obtained, the server may select the original DIV tag to be adjusted according to the attribute value of the attribute information of the original DIV tag, specifically: and selecting other original DIV labels after the original DIV label with the maximum attribute value is deleted. For example, the server sorts each original DIV label in each webpage from large to small according to the original product value of width and height, and since the adjusted variation of the original DIV label corresponding to the maximum product value is difficult to identify, watermark information is not embedded in such DIV label, the DIV label corresponding to the maximum product value is skipped, and adjustment is performed from the next DIV label.
When the current webpage is not logged in for the first time, the server acquires a first preset number of ordered DIV labels to be embedded with watermark information in the current webpage from the label attribute coding library according to the network identification of the current webpage.
And step 14, embedding the watermark information into a first preset number of ordered DIV labels in which the watermark information is to be embedded, and generating the ordered DIV labels in which the watermark information is embedded in the current webpage.
After a first preset number of ordered DIV labels in which watermark information is to be embedded are selected, the watermark information is embedded into the ordered DIV labels in which the watermark information is to be embedded. Each ordered DIV label to be embedded with watermark information can be embedded with a second preset number of bit numbers. The number of bits that each web page can be embedded is the product of the first preset number and the second preset number.
Specifically, binary bit strings in a ciphertext form are grouped, wherein each group comprises a second preset number of binary bit numbers; and sequentially embedding a second preset number of binary bit numbers into the ordered DIV labels to be embedded with the watermark information according to the arrangement sequence of the DIV labels.
It should be noted that, if there are groups with a bit number less than the second preset number, the groups may be padded with preset codes, otherwise, if there are remaining bit numbers after the grouping of the first preset number is completed, the remaining bit numbers are deleted.
Returning to step 110, after discovering the published web page picture containing the internal information on the internet, the internal manager uploads the picture to the server of the application, so that the server obtains the published web page picture carrying the watermark information.
And step 120, matching the web page picture with the stored complete web page to obtain the web page identification of the complete web page corresponding to the web page picture.
The server can search a webpage library storing complete webpages in an image processing mode to obtain webpage identifiers of the complete webpages identical to the obtained webpage pictures, or obtain webpage identifiers of the complete webpages containing the obtained webpage pictures. The web page identification may be a web address of the web page. It can be understood that the web page identifier of the complete web page corresponding to the web page picture can also be obtained in a manual identification mode.
Step 130, based on the web page identifier and the stored tag attribute code library, obtaining a first preset number of ordered DIV tags embedded with watermark information of the complete web page.
The server acquires a first preset number of ordered DIV labels embedded with watermark information corresponding to the webpage identifier from a label attribute coding library based on the webpage identifier, wherein the ordered DIV labels are DIV labels ordered according to DIV label serial numbers.
And 140, extracting the watermark information in the first preset number of the orderly DIV labels embedded with the watermark information.
The server may extract watermark information from the obtained first preset number of ordered DIV tags in which the watermark information is embedded, that is, the watermark information is a set of partial watermark information embedded in each ordered DIV tag. The server can extract the watermark information by comparing a first preset number of ordered DIV labels embedded with the watermark information with corresponding original DIV labels to obtain a change amount.
Or, the server may also directly obtain the change amount between the ordered DIV tags embedded with the watermark information in the first preset number and the corresponding original DIV tags from the established tag attribute coding library.
And 150, decrypting the watermark information to obtain the user information of the published webpage picture.
And the server decrypts the watermark information to obtain the user information of the published webpage picture.
It can be understood that the method described in the present application is not only applicable to the above-listed DIV tags, but also applicable to format tags such as directory list DIR tags, definition list DL tags, option list MENU tags, and the like, and the details of the embodiment of the present invention are not repeated herein.
Fig. 2 is a schematic flowchart of another tracing method based on web page watermarking according to an embodiment of the present invention. As shown in fig. 2, the execution subject of the method is a server, and the method may include:
step 201, receiving a login request of an internal user, wherein the login request comprises user information.
Step 202, performing watermark encryption on the user information to obtain watermark information of the user on the current webpage. When the internal user is the first login, step 203 is executed, and when the internal user is not the first login, step 208 is executed.
And carrying out watermark encryption on the user information according to the digital certificate applied in advance to obtain the watermark information of the user on the current webpage.
And step 203, acquiring the attribute information of the original DIV label of the current webpage.
And 204, sequentially adjusting a second preset number of attribute information in the original DIV label to obtain the adjusted DIV label.
The adjusted DIV tag includes the corresponding coding sequence.
And step 205, selecting a first preset number of DIV labels which meet preset display conditions and are to be embedded with watermark information from the adjusted DIV labels.
And step 206, sequencing the DIV labels with the first preset number of watermark information to be embedded according to the sequence numbers of the DIV labels to obtain corresponding ordered DIV labels.
Step 207, establishing a tag attribute code library, and then executing step 209.
And storing the ordered DIV label serial numbers of the first preset number of pieces of watermark information to be embedded, the second preset number of coded sequences corresponding to each label and the information of the variable quantity of the adjusted attribute information.
Step 208, according to the network identifier of the current webpage, obtaining a first preset number of ordered DIV tags to be embedded with watermark information from the tag attribute coding library, and then executing step 209.
And step 209, embedding the watermark information into the ordered DIV label to be embedded with the watermark information.
And step 210, acquiring the published webpage picture carrying the watermark information.
And step 211, matching the web page picture with the stored complete web page to obtain the web page identifier of the complete web page corresponding to the web page picture.
And step 212, acquiring a first preset number of ordered DIV labels embedded with watermark information of the complete webpage based on the webpage identification.
And step 213, extracting the watermark information in the ordered DIV label.
And 214, decrypting the watermark information to obtain the user information of the published webpage picture.
Therefore, according to the technical scheme, the corresponding relation between the webpage and the orderly DIV label embedded with the watermark information is generated firstly, and then the published webpage picture carrying the watermark information is obtained; matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture; based on the webpage identification, a first preset number of ordered DIV labels embedded with watermark information of the complete webpage are obtained, the watermark information is extracted, then the watermark information is decrypted, and user information of a published webpage picture is obtained, namely the internal user is uniquely identified through the watermark information embedded in the DIV labels, so that the internal user who leaks the website system content information is tracked.
Corresponding to the foregoing method, an embodiment of the present invention further provides a generating apparatus, as shown in fig. 3, the generating apparatus may include: a receiving unit 301, an encryption unit 302, an acquisition unit 303, and an embedding unit 304.
A receiving unit 301, configured to receive a login request for an internal user to login a current web page, where the login request includes user information;
an encryption unit 302, configured to perform watermark encryption on user information to obtain watermark information of a user on a current webpage;
the acquiring unit 303 is configured to acquire a first preset number of HTML tags in a current webpage, where the HTML tags are sequentially divided HTML tags to be embedded with watermark information, and the sequential HTML tags are sequenced according to HTML tag numbers;
the embedding unit 304 is configured to embed the watermark information into a first preset number of ordered HTML tags in which watermark information is to be embedded, and generate a first preset number of ordered HTML tags in which watermark information is embedded in the current webpage.
Optionally, the obtaining unit 303 is specifically configured to obtain attribute information of a plurality of original HTML tags of the current web page; sequentially adjusting a second preset number of attribute information of each original HTML label in a plurality of original HTML labels; selecting a first preset number of adjusted HTML labels meeting preset display conditions as ordered HTML labels to be embedded with watermark information, wherein the preset display conditions are that when the current webpage is displayed, the display effect is not influenced among the first preset number of adjusted HTML labels.
Optionally, the apparatus further comprises a selecting unit 305;
the selecting unit 305 is configured to select an original HTML tag to be adjusted according to the attribute value of the attribute information of the original HTML tag, where a plurality of original HTML tags to be adjusted are other original HTML tags after the original HTML tag with the largest attribute value is deleted.
Optionally, the watermark information is a multi-bit binary bit string;
the embedding unit 304 is specifically configured to divide the binary bit strings into a first preset number of groups, where each group includes a second preset number of binary bit numbers; and embedding a second preset number of binary bit numbers into each ordered HTML label to be embedded with watermark information.
Optionally, the apparatus further comprises a creating unit 306;
the creating unit 306 is configured to create a tag attribute encoding library of the current web page and a first preset number of ordered HTML tags in which watermark information is embedded.
Optionally, the user information includes a user identification, a unit identification, a user address, and a current timestamp.
The functions of the functional units of the generating device provided in the above embodiment of the present invention may be implemented by the above method steps, and therefore, detailed working processes and beneficial effects of the units in the generating device provided in the embodiment of the present invention are not described herein again.
Corresponding to the foregoing method, an embodiment of the present invention further provides a source tracing apparatus, as shown in fig. 4, the apparatus may include: an acquisition unit 401, a matching unit 402, an extraction unit 403, and a decryption unit 404.
An obtaining unit 401, configured to obtain a published web page picture carrying watermark information, where the watermark information is encrypted user information;
a matching unit 402, configured to match the web page picture with the stored complete web page to obtain a web page identifier of the complete web page corresponding to the web page picture;
the acquiring unit 401 is further configured to acquire, based on the web page identifier, a first preset number of ordered HTML tags in which watermark information is embedded of the complete web page, where the ordered HTML tags are ordered according to HTML tag sequence numbers;
an extracting unit 403, configured to extract watermark information in the ordered HTML tag;
and the decryption unit 404 is configured to decrypt the watermark information to obtain the user information of the published web page picture.
Optionally, the user information includes a user identification, a unit identification, a user address, and a current timestamp.
Therefore, the published webpage picture carrying the watermark information is obtained; matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture; based on the webpage identification, a first preset number of ordered HTML (hypertext markup language) labels embedded with watermark information of the complete webpage are obtained, the watermark information in the ordered HTML labels is extracted, then the watermark information is decrypted, and user information of a published webpage picture is obtained, namely, an internal user is uniquely identified through the watermark information embedded in the HTML labels, so that the internal user who reveals the website system content information is tracked.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, including a processor 510, a communication interface 520, a memory 530 and a communication bus 540, where the processor 510, the communication interface 520, and the memory 530 complete mutual communication through the communication bus 540.
A memory 530 for storing a computer program;
the processor 510, when executing the program stored in the memory 530, implements the following steps:
receiving a login request of an internal user for logging in a current webpage, wherein the login request comprises user information;
carrying out watermark encryption on user information to obtain watermark information of a user on a current webpage;
acquiring a first preset number of HTML tags in a current webpage, wherein the HTML tags are ordered HTML tags to be embedded with watermark information, and the ordered HTML tags are ordered HTML tags which are ordered according to HTML tag sequence numbers;
and embedding the watermark information into a first preset number of ordered HTML (hypertext markup language) tags in which the watermark information is to be embedded, and generating a first preset number of ordered HTML tags in which the watermark information is embedded.
The embodiment of the present invention further provides an application server, as shown in fig. 6, including a processor 610, a communication interface 620, a memory 630 and a communication bus 640, where the processor 610, the communication interface 620 and the memory 630 complete mutual communication through the communication bus 640.
A memory 630 for storing computer programs;
the processor 610, when executing the program stored in the memory 630, implements the following steps:
acquiring a published webpage picture carrying watermark information, wherein the watermark information is encrypted user information;
matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture;
the method comprises the steps that a first preset number of ordered HTML (hypertext markup language) tags with embedded watermark information of a complete webpage are obtained based on a webpage identifier and a stored tag attribute coding library, the ordered HTML tags are ordered according to HTML tag sequence numbers, and the tag attribute coding library is used for storing the first preset number of ordered HTML tags with embedded watermark information of the current webpage;
extracting watermark information in a first preset number of ordered HTML (hypertext markup language) tags embedded with watermark information;
and decrypting the watermark information to obtain the user information of the published webpage picture.
The communication bus mentioned above may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
Therefore, the processor shown in fig. 5 and fig. 6 generates the corresponding relationship between the web page and the ordered DIV tag embedded with the watermark information, and then obtains the published web page picture carrying the watermark information; matching the webpage picture with the stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture; based on the webpage identification, a first preset number of ordered HTML (hypertext markup language) labels embedded with watermark information of the complete webpage are obtained, the watermark information in the ordered HTML labels is extracted, then the watermark information is decrypted, and user information of a published webpage picture is obtained, namely, an internal user is uniquely identified through the watermark information embedded in the HTML labels, so that the internal user who reveals the website system content information is tracked.
In yet another embodiment, a computer-readable storage medium is provided, having stored thereon instructions, which, when executed on a computer, cause the computer to perform the method of any of the above embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above embodiments.
As will be appreciated by one of skill in the art, the embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.
It is apparent that those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the embodiments of the present application and their equivalents, the embodiments of the present application are also intended to include such modifications and variations.
Claims (14)
1. A method for generating a web page watermark, the method comprising:
receiving a login request of an internal user for logging in a current webpage, wherein the login request comprises user information;
carrying out watermark encryption on the user information to obtain watermark information of the user on the current webpage;
acquiring a first preset number of HTML (hypertext markup language) tags in the current webpage, wherein the HTML tags are ordered HTML tags to be embedded with watermark information, and the ordered HTML tags are ordered according to tag sequence numbers;
embedding the watermark information into the first preset number of ordered HTML (hypertext markup language) tags in which the watermark information is to be embedded, and generating a first preset number of ordered HTML tags in which the watermark information is embedded;
the obtaining of the first preset number of HTML tags in the current webpage includes:
acquiring attribute information of a plurality of original HTML tags in the current webpage;
selecting an original HTML label to be adjusted according to the attribute values of the attribute information of the original HTML labels; the original HTML tag to be adjusted is other original HTML tags after the original HTML tag with the maximum attribute value is deleted;
according to the condition of changing the size and the position of the HTML tags, sequentially adjusting a second preset number of attribute information of each original HTML tag in the plurality of original HTML tags in a mode of adjusting at least one attribute information each time;
selecting a first preset number of adjusted HTML labels meeting preset display conditions as ordered HTML labels to be embedded with watermark information, wherein the preset display conditions are that when the current webpage is displayed, the first preset number of adjusted HTML labels have no influence of display effect.
2. The method of claim 1, wherein the watermark information is a multi-bit binary bit string;
the embedding the watermark information into the HTML tags of the first preset number of watermark information to be embedded includes:
dividing the binary bit strings into a first preset number of groups, wherein each group has a second preset number of binary bit numbers;
and embedding the second preset number of binary bit numbers into each ordered HTML label to be embedded with the watermark information.
3. The method of claim 1, wherein after generating the first preset number of ordered HTML tags with embedded watermark information, the method further comprises:
and creating a tag attribute coding library of the current webpage and the corresponding first preset number of ordered HTML tags embedded with the watermark information.
4. The method of claim 1, wherein the user information comprises a user identification, a unit identification, a user address, and a current timestamp.
5. A tracing method based on web page watermark is characterized in that the method comprises the following steps:
acquiring a published webpage picture carrying watermark information, wherein the watermark information is encrypted user information;
matching the webpage picture with a stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture;
based on the webpage identification and a stored tag attribute coding library, acquiring a first preset number of ordered HTML tags embedded with the watermark information of the complete webpage, wherein the ordered HTML tags are ordered according to HTML tag serial numbers, and the tag attribute coding library is used for storing the first preset number of ordered HTML tags embedded with the watermark information of the current webpage;
the obtaining of the first preset number of ordered HTML tags embedded with the watermark information includes: acquiring attribute information of a plurality of original HTML tags in the current webpage; selecting an original HTML label to be adjusted according to the attribute values of the attribute information of the original HTML labels; the original HTML tag to be adjusted is other original HTML tags after the original HTML tag with the maximum attribute value is deleted; sequentially adjusting a second preset number of attribute information of each original HTML label in the plurality of original HTML labels according to the condition of changing the size and the position of the HTML label; selecting a first preset number of adjusted HTML labels meeting preset display conditions as ordered HTML labels to be embedded with watermark information, wherein the preset display conditions are that when the current webpage is displayed, the first preset number of adjusted HTML labels have no influence of display effect;
extracting the watermark information in the first preset number of ordered HTML labels embedded with the watermark information;
and decrypting the watermark information to obtain the user information for publishing the webpage picture.
6. The method of claim 5, wherein the user information comprises a user identification, a unit identification, a user address, and a current timestamp.
7. A generating apparatus, characterized in that the apparatus may comprise:
the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a login request of an internal user for logging in a current webpage, and the login request comprises user information;
the encryption unit is used for carrying out watermark encryption on the user information to obtain the watermark information of the user on the current webpage;
the acquisition unit is used for acquiring a first preset number of HTML (hypertext markup language) tags in the current webpage, wherein the HTML tags are ordered HTML tags to be embedded with watermark information, and the ordered HTML tags are ordered according to tag sequence numbers;
the embedding unit is used for embedding the watermark information into the first preset number of ordered HTML tags in which the watermark information is to be embedded and generating a first preset number of ordered HTML tags in which the watermark information is embedded;
the device also comprises a selecting unit;
the acquiring unit is specifically configured to acquire attribute information of a plurality of original HTML tags in the current web page;
the selecting unit is used for selecting an original HTML label to be adjusted according to the attribute values of the attribute information of the original HTML labels, wherein the original HTML label to be adjusted is the other original HTML label after the original HTML label with the maximum attribute value is deleted;
sequentially adjusting a second preset number of attribute information of each original HTML label in the plurality of original HTML labels according to the condition of changing the size and the position of the HTML label;
selecting a first preset number of adjusted HTML labels meeting preset display conditions as ordered HTML labels to be embedded with watermark information, wherein the preset display conditions are that when the current webpage is displayed, the first preset number of adjusted HTML labels have no influence of display effect.
8. The apparatus of claim 7, wherein the watermark information is a multi-bit binary bit string;
the embedding unit is specifically configured to divide the binary bit strings into the first preset number of groups, where each group has a second preset number of binary bit numbers;
and embedding the second preset number of binary bit numbers into each ordered HTML label to be embedded with the watermark information.
9. The apparatus of claim 7, wherein the apparatus further comprises a creating unit;
the creating unit is configured to create a tag attribute coding library of the current webpage and the corresponding first preset number of ordered HTML tags embedded with the watermark information.
10. The apparatus of claim 7, wherein the user information comprises a user identification, a unit identification, a user address, and a current timestamp.
11. A tracing apparatus, characterized in that the apparatus may comprise:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a published webpage picture carrying watermark information, and the watermark information is encrypted user information;
the matching unit is used for matching the webpage picture with a stored complete webpage to obtain a webpage identifier of the complete webpage corresponding to the webpage picture;
the acquiring unit is further configured to acquire a first preset number of ordered HTML tags with embedded watermark information of the complete webpage based on the webpage identifier and a stored tag attribute coding library, where the ordered HTML tags are HTML tags ordered according to HTML tag sequence numbers, and the tag attribute coding library is configured to store the first preset number of ordered HTML tags with embedded watermark information of the current webpage;
the extracting unit is used for extracting the watermark information from the first preset number of ordered HTML tags embedded with the watermark information;
and the decryption unit is used for decrypting the watermark information to obtain the user information for publishing the webpage picture.
12. The apparatus of claim 11, wherein the user information comprises a user identification, a unit identification, a user address, and a current timestamp.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1 to 4 or the method steps of any one of claims 5 to 6 when executing a program stored in a memory.
14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4, or carries out the method steps of any one of claims 5 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810272166.7A CN110321675B (en) | 2018-03-29 | 2018-03-29 | Webpage watermark-based generation and tracing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810272166.7A CN110321675B (en) | 2018-03-29 | 2018-03-29 | Webpage watermark-based generation and tracing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110321675A CN110321675A (en) | 2019-10-11 |
CN110321675B true CN110321675B (en) | 2021-03-23 |
Family
ID=68110930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810272166.7A Active CN110321675B (en) | 2018-03-29 | 2018-03-29 | Webpage watermark-based generation and tracing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321675B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191414B (en) * | 2019-11-11 | 2021-02-02 | 苏州亿歌网络科技有限公司 | Page watermark generation method, identification method, device, equipment and storage medium |
CN112669192A (en) * | 2021-01-14 | 2021-04-16 | 视联动力信息技术股份有限公司 | Watermark acquisition method, watermark acquisition device, terminal equipment and storage medium |
CN112926034A (en) * | 2021-03-26 | 2021-06-08 | 北京奇艺世纪科技有限公司 | Watermark processing method and device |
CN113296773B (en) * | 2021-05-28 | 2023-07-25 | 北京思特奇信息技术股份有限公司 | Copyright labeling method and system for cascading style sheets |
CN113326394A (en) * | 2021-06-30 | 2021-08-31 | 合肥高维数据技术有限公司 | Vector diagram watermark embedding and tracing method and system |
CN114756794A (en) * | 2022-03-08 | 2022-07-15 | 深圳集智数字科技有限公司 | Webpage information anti-leakage method and device |
CN114817639B (en) * | 2022-05-18 | 2024-05-10 | 山东大学 | Webpage diagram convolution document ordering method and system based on contrast learning |
CN116681574B (en) * | 2023-06-07 | 2024-04-02 | 中建三局信息科技有限公司 | Method, device, equipment and storage medium for generating clear watermark of webpage information system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7685426B2 (en) * | 1996-05-07 | 2010-03-23 | Digimarc Corporation | Managing and indexing content on a network with image bookmarks and digital watermarks |
US20050053258A1 (en) * | 2000-11-15 | 2005-03-10 | Joe Pasqua | System and method for watermarking a document |
CN101599118B (en) * | 2009-06-26 | 2011-03-16 | 华中师范大学 | HTML webpage tamper detection and positioning method |
CN106789856A (en) * | 2015-11-25 | 2017-05-31 | 阿里巴巴集团控股有限公司 | A kind of information coding method, coding/decoding method and device |
-
2018
- 2018-03-29 CN CN201810272166.7A patent/CN110321675B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110321675A (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321675B (en) | Webpage watermark-based generation and tracing method and device | |
CN110245469B (en) | Webpage watermark generation method, watermark analysis method, device and storage medium | |
US20180288014A1 (en) | Detecting disclosed content sources using dynamic steganography | |
CN105049287A (en) | Log processing method and log processing devices | |
US11182873B2 (en) | Multiple source watermarking for surveillance | |
CN112559985B (en) | Watermark embedding and extracting method | |
CN104426869B (en) | Information is obtained based on Quick Response Code, the method and device of information is sent | |
CN111669615B (en) | Video stream processing method and device | |
CN105959814A (en) | Scene-recognition-based video bullet screen display method and display apparatus thereof | |
CN110851682A (en) | Text anti-crawler method, server and display terminal | |
CN115114598B (en) | Watermark generation method and device and watermark file tracing method and device | |
TW201642156A (en) | Page jumps based on text hiding | |
CN106951743A (en) | A kind of software code infringement detection method | |
CN103400175B (en) | Method and device for processing pattern identification code | |
CN108184146B (en) | Method for calculating popularity of live broadcast platform and related equipment | |
CN111526388A (en) | Video playing method and device and video playing control method and device | |
CN112948895A (en) | Data watermark embedding method, watermark tracing method and device | |
CN114036561A (en) | Information hiding method, information acquiring method, information hiding device, information acquiring device, storage medium and electronic equipment | |
CN110874456A (en) | Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method | |
CN111209577B (en) | Method and device for adding watermark data, storage medium and electronic equipment | |
CN116702103A (en) | Database watermark processing method, database watermark tracing method and device | |
CN109446827B (en) | Data encryption and decryption method and system | |
CN111783119A (en) | Form data security control method and device, electronic equipment and storage medium | |
CN110909270A (en) | Article migration method and device, computer readable storage medium and terminal equipment | |
CN114298882A (en) | Watermark embedding method and tracing method for CAD data and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |