CN111859853A - Webpage text encryption and decryption method based on random font - Google Patents

Webpage text encryption and decryption method based on random font Download PDF

Info

Publication number
CN111859853A
CN111859853A CN202010770484.3A CN202010770484A CN111859853A CN 111859853 A CN111859853 A CN 111859853A CN 202010770484 A CN202010770484 A CN 202010770484A CN 111859853 A CN111859853 A CN 111859853A
Authority
CN
China
Prior art keywords
font
webpage
text
characters
glyph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010770484.3A
Other languages
Chinese (zh)
Inventor
杨照通
杨胜华
叶秋萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202010770484.3A priority Critical patent/CN111859853A/en
Publication of CN111859853A publication Critical patent/CN111859853A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Document Processing Apparatus (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a webpage text encryption and decryption method based on random fonts, and belongs to the technical field of the Internet. By randomly generating the mapping relation between the characters and the fonts in the CMAP, custom fonts are quickly generated, and specific characters in a webpage are encrypted and rendered, so that data cannot be directly acquired from an original text of the webpage, and key data cannot be effectively protected from being stolen in a webpage environment. The method does not affect the whole style, and keeps the uniformity of the style; the font file only describes vector information of the font, the occupied space is small compared with that of the picture, font drawing is completely finished by the browser, and the user experience is hardly influenced; the method creates effective barriers for malicious collection of part of crawlers, reduces the risk of malicious attack on the website, saves a large amount of manpower, material resources and money cost consumed in a plurality of invalid accesses, and has important significance for improving the user experience and satisfaction of company products.

Description

Webpage text encryption and decryption method based on random font
Technical Field
The invention relates to the technical field of internet, in particular to an encryption and decryption process of webpage font display information.
Background
With the rapid development of internet technology, networks become carriers of a large amount of information, and various terminals can acquire information required by users from the networks and display the information in a webpage form. In the internet era, huge public data are generated every day, and a lot of valuable key data (such as commodity sales volume and price information in e-commerce websites) exist in web pages, and immeasurable huge values are contained in the data, so that a crawler technology for acquiring internet public data information is developed.
However, for the operator of the website, the operation of part of malicious crawlers causes a large amount of network access traffic, increases concurrency and network bandwidth pressure for a server operating the website, and greatly increases the cost for operating the website. For internet companies, data is an important resource for the company. Even the data information disclosed to the user can obtain a lot of valuable information when a certain amount of data information is collected. Which is often not desired to be obtained and utilized by competitors or the gray industry.
In the process of resisting the crawler, it is often a contradiction not only to make the crawler program obtain the real information, but also to ensure that the real user can obtain the real information. If part of characters in the webpage are replaced by pictures, although the difficulty of a crawler for acquiring correct information is effectively increased, the time for acquiring and rendering the webpage is increased due to the use of a large number of pictures, and the deviation of style typesetting is caused due to the non-editability of the pictures, so that the use experience of a user is influenced.
This goal becomes possible with the development of front-end technology. Front-end technologies such as CSS and JS play a great role between data (often webpage source code information) accessed by a crawler and information (often page rendered information) visible to a user. The rendering of the webpage fonts plays a key role in displaying the most key information source, namely the characters, in the webpage.
Disclosure of Invention
The technical task of the invention is to solve the defects of the prior art, attempt to encrypt information data from the aspect of font rendering, and provide a webpage text encryption and decryption method based on random fonts.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a webpage text encryption and decryption method based on random fonts comprises the following steps:
s1, extracting a character set needing to be encrypted in a webpage;
s2, establishing a mapping relation of the encrypted characters, and generating an encryption mapping table circher book;
s3, creating a font file;
s4, modifying a character CMAP table:
s5, editing a glyph index glyph _ index;
s6, the edited font file is referred in a page CSS;
s7, replacing the content in the webpage body by using the encryption mapping table defined in the step S2;
s8, returning the webpage content and the font to the browser to check the rendered page, wherein the display characters of the page are normal, but the copied text characters are encrypted texts;
and S9, under the condition of having character mapping, the background can directly use the mapping table for decryption.
Preferably, when step S1 is executed, a part of text characters in the web page text are randomly selected as an encrypted character set, and full-text encryption is not performed.
Preferably, when step S2 is executed, the unicode code of the character to be encrypted is mapped to other characters, and the other characters are high-order characters which are not used frequently in the font library or characters which are irrelevant at all.
Preferably, the encryption mapping table nepher book generated in step 2 is stored at the back end, and the form is as follows:
CIPHER_BOOK={
'0':'\uE910',
'1':'\uE911',
'2':'\uE912',
'3':'\uE913',
'4':'\uE914',
'5':'\uE915',
'6':'\uE916',
'7':'\uE917',
'8':'\uE918',
'9':'\uE919',
}
preferably, when step S3 is executed, the existing font file is used to extract the encrypted character set; or completely custom-made entirely new fonts containing encrypted character set glyphs.
Preferably, when step S4 is executed, the original character code in the cmap map is replaced with an encrypted character according to the encryption mapping relationship defined in S2.
Preferably, when step S5 is executed, the glyph index glyph _ index is adjusted to be an index name without obvious readable meaning, the glyph index glyph _ index corresponds to a specific glyph vector, and the browser parses the glyph vector description to draw the rendered text on the page.
Preferably, the step S6 includes the steps of:
1) firstly, defining a font:
Figure BDA0002616407890000031
2) then define class using this font:
.demo-icon{
font-family:"fontencode";
}
3) then use the class in the label that displays the character:
< h1> < small class ═ demo-icon "> this is the test text </small > </h1 >.
Compared with the prior art, the webpage text encryption and decryption method based on the random font has the following beneficial effects that:
1. the present invention attempts to encrypt the information data from a font rendering perspective,
1) the font rendering is an important process of webpage rendering, cannot generate any influence on the whole pattern, and keeps the uniformity of the pattern;
2) the font file only describes vector information of the font, the occupied space is small compared with that of the picture, font drawing is completely finished by the browser, and the user experience is hardly influenced;
3) the user can see the correct information of the webpage information after the font is encrypted, and the information in the webpage source code is encrypted.
2. Although the method cannot completely avoid automatic information acquisition by the crawler (such as rendering fonts and then performing image recognition), the method can greatly increase the cost and data volume of the crawler for acquiring correct information, and further reduce the effectiveness of extracting key information from a large amount of data.
In conclusion, the method and the system can effectively reduce the risk that the website is attacked maliciously, simultaneously save a large amount of manpower, material resources and money cost consumed by enterprises in a plurality of invalid accesses, and have important significance for improving the user experience degree and the satisfaction degree of products.
Drawings
In order to more clearly describe the working principle of the random font based web page text encryption and decryption method of the present invention, a simplified diagram will be attached for further explanation.
FIG. 1 is a schematic diagram of a process for encrypting and decrypting web page text based on random fonts according to the invention;
FIG. 2 is a schematic diagram of a CMAP table for modifying fonts, according to the present invention;
FIG. 3 is a schematic diagram of font rendering for a web page in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 to 3 in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
There are four font container formats currently used on the network: EOT, TTF, WOFF, etc., with WOFF being the most widely supported. A Web Open Font Format (Web Open Font Format, abbreviated as WOFF) is a Font Format standard adopted by a Web page. This font format was developed in 2009 and is standardized by the Web font working group of the world wide Web consortium and is now a recommended standard. This font format not only enables efficient use of compression to reduce file size, but also does not involve encryption and is not limited by DRM (digital rights management).
WOFF essentially comprises SFNT-based fonts (such as TrueType, OpenType or other open font formats), and these fonts are compressed by the encoding tool of WOFF so as to be embedded in a web page. SFNT is a standardized set of font data structure formats. Many common font formats use this as a container, providing standardized addressing tables, data structures, etc. for each character.
A web page font is a set of glyphs, and each glyph is a vector shape that describes a letter or symbol. The font rendering process essentially is that the browser finds the corresponding vector shape for each text character from the font file and then draws the glyph out on the page. In SFNT format-based fonts such as WOFF, glyph represents a glyph, and the rendering information of the font is described; the glyph index represents the index of glyphs, and each glyph has a unique index value; the CMAP is a mapping relation from each character (unicode code) to the glyph _ index, and is the most critical bridge in the process of drawing the glyph by the character of the browser. The invention just utilizes the modification of the cmap to realize the encryption and decryption of the font rendering.
As shown in fig. 1, the method for encrypting and decrypting the webpage text based on the random font of the present invention comprises the following steps:
s1, extracting a character set needing to be encrypted in a webpage
Randomly selecting partial text characters in a webpage text as an encryption character set, and not suggesting full-text encryption, wherein the size of a font file can be increased, and a crawler is prevented from cracking according to the character frequency;
s2, establishing a mapping relation of the encrypted characters, and generating an encryption mapping table cipher book
Mapping randomly selected characters (whose unicode codes are to be encrypted) to other characters, which may be less frequently used high-order characters in a font library (e.g. 0 xeff), or to other characters that are irrelevant (e.g. "and" map to "plus"), noting that the mapping table should be kept at the back-end in the form:
CIPHER_BOOK={
'0':'\uE910',
'1':'\uE911',
'2':'\uE912',
'3':'\uE913',
'4':'\uE914',
'5':'\uE915',
'6':'\uE916',
'7':'\uE917',
'8':'\uE918',
'9':'\uE919',
}
step S3, creating font files
The existing font file can be used for extracting the encrypted character set; completely self-defining a brand new font containing the font of the encrypted character set;
s4, modifying a character CMAP table
With reference to fig. 2, replacing the code (i.e., the original character) in the cmap map with the encrypted character according to the encryption mapping relationship defined in step S2;
s5, editing font index glyph _ index
With reference to fig. 3, the glyph index glyph _ index is adjusted to be an index name without obvious readable meaning, the glyph index glyph _ index corresponds to a specific glyph vector, and the browser analyzes the description of the glyph vector and renders a rendering text on a page;
s6, the edited font file is referred in a page CSS, and the method comprises the following steps:
1) firstly, defining a font:
Figure BDA0002616407890000061
2) then define class using this font:
.demo-icon{
font-family:"fontencode";
}
3) then use the class in the label that displays the character:
< h1> < small class ═ demo-icon "> this is the test text </small > </h1 >.
S7, replacing the content in the webpage body by using the encryption mapping table defined in the step S2;
s8, returning the webpage content and the font to the browser to check the rendered page, wherein the display characters of the page are normal, but the copied text characters are encrypted texts;
s9, under the condition of having character mapping, the background can directly use the mapping table for decryption; and the crawler cannot acquire correct information in a mode of directly analyzing the webpage text because the mapping table is not public, and the encryption and decryption processes are finished.
The method completes encryption by utilizing the font rendering process of the browser, greatly improves the encryption efficiency, ensures that a user can see normal texts, but cannot directly obtain the original texts; the encryption process is simple and easy to implement, and the encryption party only needs to maintain one character mapping table; the encryption party with the character mapping table can decrypt the text quickly, and the party without the mapping table cannot decrypt the text in a text analyzing mode, so that the difficulty of crawling data by a crawler is greatly increased under the condition of not influencing user experience, the data information value is effectively protected, and the website maintenance cost is saved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A webpage text encryption and decryption method based on random fonts is characterized by comprising the following steps:
s1, extracting a character set needing to be encrypted in a webpage;
s2, establishing a mapping relation of the encrypted characters, and generating an encryption mapping table circher book;
s3, creating a font file;
s4, modifying a character CMAP table:
s5, editing a glyph index glyph _ index;
s6, the edited font file is referred in a page CSS;
s7, replacing the content in the webpage body by using the encryption mapping table defined in the step S2;
s8, returning the webpage content and the font to the browser to check the rendered page, wherein the display characters of the page are normal, but the copied text characters are encrypted texts;
and S9, under the condition of having character mapping, the background can directly use the mapping table for decryption.
2. The method for encrypting and decrypting webpage text based on random fonts according to claim 1, wherein in the step S1, partial text characters are randomly selected as an encrypted character set in the webpage text, and full-text encryption is not performed.
3. The method for encrypting and decrypting webpage text based on random fonts according to claim 1 or 2, wherein in the step S2, unicode codes of characters needing to be encrypted are mapped to other characters, and the other characters are high-order characters which are not commonly used in a font library or characters which are irrelevant.
4. The method for encrypting and decrypting webpage text based on random fonts according to claim 1 or 2, wherein the ciphertext book generated in the step 2 is stored in a back end.
5. The method for encrypting and decrypting web page text based on random fonts according to claim 1 or 2, wherein in the step S3, an encrypted character set is extracted by using an existing font file; or completely custom-made entirely new fonts containing encrypted character set glyphs.
6. The method for encrypting and decrypting text of web pages based on random fonts according to claim 1 or 2, wherein in step S4, the original character code in the cmap map is replaced by encrypted characters according to the encryption mapping relation defined in S2.
7. The method for encrypting and decrypting webpage text based on random fonts according to claim 1 or 2, wherein in the step S5, the glyph index glyph _ index is adjusted to an index name without obvious readable meaning, the glyph index glyph _ index corresponds to a specific glyph vector, and the browser parses the glyph vector to describe rendering the rendered text on the page.
8. The method for encrypting and decrypting text of web pages based on random fonts according to claim 1 or 2, wherein the step S6 comprises the steps of:
1) firstly, defining a font:
@font-face{
font-family:'fontencode';
src:url('/static/fontencode.woff2')format('woff');
font-weight:normal;
font-style:normal;
}
2) then define class using this font:
.demo-icon{
font-family:"fontencode";
}
3) then use the class in the label that displays the character:
< h1> < small class ═ demo-icon "> this is the test text </small > </h1 >.
CN202010770484.3A 2020-08-04 2020-08-04 Webpage text encryption and decryption method based on random font Pending CN111859853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010770484.3A CN111859853A (en) 2020-08-04 2020-08-04 Webpage text encryption and decryption method based on random font

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010770484.3A CN111859853A (en) 2020-08-04 2020-08-04 Webpage text encryption and decryption method based on random font

Publications (1)

Publication Number Publication Date
CN111859853A true CN111859853A (en) 2020-10-30

Family

ID=72953041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010770484.3A Pending CN111859853A (en) 2020-08-04 2020-08-04 Webpage text encryption and decryption method based on random font

Country Status (1)

Country Link
CN (1) CN111859853A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297695A (en) * 2021-12-30 2022-04-08 北京奇艺世纪科技有限公司 Text encryption method, text decryption method and device
CN114444487A (en) * 2022-01-26 2022-05-06 百果园技术(新加坡)有限公司 Data processing method, device, equipment and medium
CN114553519A (en) * 2022-02-18 2022-05-27 平安国际智慧城市科技股份有限公司 Webpage encryption method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542212A (en) * 2010-12-24 2012-07-04 北大方正集团有限公司 Text information hiding method and device
CN106502968A (en) * 2016-10-12 2017-03-15 北京奇虎科技有限公司 The method and device of data processing
CN109977685A (en) * 2019-03-21 2019-07-05 古联(北京)数字传媒科技有限公司 Web page contents encryption method, encryption device and system
CN111008348A (en) * 2019-11-28 2020-04-14 盛业信息科技服务(深圳)有限公司 Anti-crawler method, terminal, server and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542212A (en) * 2010-12-24 2012-07-04 北大方正集团有限公司 Text information hiding method and device
CN106502968A (en) * 2016-10-12 2017-03-15 北京奇虎科技有限公司 The method and device of data processing
CN109977685A (en) * 2019-03-21 2019-07-05 古联(北京)数字传媒科技有限公司 Web page contents encryption method, encryption device and system
CN111008348A (en) * 2019-11-28 2020-04-14 盛业信息科技服务(深圳)有限公司 Anti-crawler method, terminal, server and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297695A (en) * 2021-12-30 2022-04-08 北京奇艺世纪科技有限公司 Text encryption method, text decryption method and device
CN114297695B (en) * 2021-12-30 2024-05-31 北京奇艺世纪科技有限公司 Text encryption method, text decryption method and device
CN114444487A (en) * 2022-01-26 2022-05-06 百果园技术(新加坡)有限公司 Data processing method, device, equipment and medium
CN114553519A (en) * 2022-02-18 2022-05-27 平安国际智慧城市科技股份有限公司 Webpage encryption method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111859853A (en) Webpage text encryption and decryption method based on random font
CN104185845B (en) For the system and method for the binary representation for providing webpage
CN104021133B (en) Apparatus and method for allowing server to keep chaotic data
US20020026475A1 (en) Automatic conversion system
JP5735539B2 (en) System, apparatus and method for encrypting and decrypting data transmitted over a network
Mir Copyright for web content using invisible text watermarking
EP1965327A1 (en) A document data security management method and system
CN111639284A (en) Webpage labeling method and device, electronic equipment and medium
CN103870583A (en) Relational-database-based online and controllable browsing method for PDF document
US20150195421A1 (en) Method for water-marking digital books
CN101799800B (en) Method for coding electronic book, electronic newspaper and electronic magazine with hyperlink
US10706160B1 (en) Methods, systems, and articles of manufacture for protecting data in an electronic document using steganography techniques
IL129633A (en) Automatic conversion system
Khadam et al. Text data security and privacy in the internet of things: threats, challenges, and future directions
Taleby Ahvanooey et al. An innovative technique for web text watermarking (AITW)
CN115114598B (en) Watermark generation method and device and watermark file tracing method and device
US7197706B1 (en) Method and system for ensuring accurate font matching in documents
CN111309578A (en) Method and device for identifying object
US9442898B2 (en) Electronic document that inhibits automatic text extraction
Chou et al. A Webpage Data Hiding Method by Using Tag and CSS Attribute Setting
JP4143628B2 (en) Text content presentation device, text content presentation method, and text content presentation program
Shahreza A new method for steganography in HTML files
CN115048665A (en) Excel file-based information hiding method, device, equipment and storage medium
CN114610305A (en) Development method and device of invisible webpage resources, electronic equipment and medium
CN113139145B (en) Page generation method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030

RJ01 Rejection of invention patent application after publication