CN116611032A - Method, system and storage medium for embedding and extracting software watermark in JAR package - Google Patents

Method, system and storage medium for embedding and extracting software watermark in JAR package Download PDF

Info

Publication number
CN116611032A
CN116611032A CN202310555407.XA CN202310555407A CN116611032A CN 116611032 A CN116611032 A CN 116611032A CN 202310555407 A CN202310555407 A CN 202310555407A CN 116611032 A CN116611032 A CN 116611032A
Authority
CN
China
Prior art keywords
class
watermark
embedding
software
jar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310555407.XA
Other languages
Chinese (zh)
Inventor
陈楠
邢磊
姚志强
张磊
陈汀
王丰年
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202310555407.XA priority Critical patent/CN116611032A/en
Publication of CN116611032A publication Critical patent/CN116611032A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Abstract

The application provides a method, a system and a storage medium for embedding and extracting a software watermark in a JAR package, wherein the method for embedding the software watermark comprises the following steps: reading the JAR package to obtain a plurality of class byte code files; encrypting the software watermark to be embedded into an encrypted character string; splitting the encrypted character string into a plurality of data segments, constructing a data frame comprising the data segments, and embedding the data frame into a plurality of class byte code files; and integrating the embedded multiple class byte code files to generate a JAR package embedded with the software watermark. The method abandons a mode of recording watermark information in a plaintext in a conventional scheme, adopts a binary mode, reads a class byte code file in the whole JAR packet, and writes the watermark information into corresponding class byte codes according to a series of encryption, splitting and distribution rules, thereby embedding the watermark. The concealment of the watermark can be greatly improved by matching with code confusion, the difficulty of reverse engineering is increased, and the safety of software is improved.

Description

Method, system and storage medium for embedding and extracting software watermark in JAR package
Technical Field
The application belongs to the technical field of computers, and particularly relates to a method, a system and a storage medium for embedding and extracting software watermarks in JAR packets.
Background
Java language has the characteristics of simplicity, object-oriented, distributed, robustness, security, platform independence and portability, multithreading, dynamic property and the like, and is widely used by various software developers. However, as a digital product, computer software has high network propagation speed, especially digital products with wide application range, serious piracy problems, and the actions of artificial modification, occupation, cracking and the like often occur in the propagation process, thus bringing serious economic loss to software developers.
The software watermark is a software product copyright protection technology which appears in recent years, can be used for identifying authors, issuers, owners, users and the like, carries copyright protection information and identity authentication information, and can identify illegally copied and stolen software products. In software developed using the Java language, watermark embedding is roughly classified into the following: dynamic graph-based software watermarks, equation ordinal-based software watermarks, thread relationship-based software watermarks, and file-based software watermarks.
Software watermarking based on dynamic diagrams was originally proposed by Collberg and Thomborson, which include watermark information in a specific topology and embed the watermark information in the original code, and dynamically generate the watermark when the program is running. The method has the advantages that the amount of the embedded watermark is large, and the method has better tamper resistance according to the specificity of the watermark structure; but the embedded watermark-related code has lower aggregation degree associated with the source program, and introduces additional data structures, thereby increasing the complexity of the source program development.
The software watermark based on the equation ordinal number is in one-to-one correspondence with the watermark information according to the custom rule through the sequence of the equation ordinal number, thereby achieving the purpose of hiding the embedded watermark information. In computer software, a large number of mathematical equations are prevalent. Equations are also called sets of equations or equations, i.e., equations containing unknowns, such as: a.b, x-9=y+6. Some of the operands of the basic arithmetic operations are swappable, the "multipliers" and multiplicands "in the multiplications, and the" addends "in the additions. The manner in which these safely exchangeable operations while the results of the equations do not change is called the equation ordinal. The method has extremely high concealment and has no influence on the running speed of the source program. But the amount of embedded watermark is limited by the number of equations in the source program.
The software watermark based on the thread relation is implemented by utilizing the synchronous function in the computer software, so that the threads in the software program are executed according to a certain sequence, and the watermark is embedded into the concurrent execution track of multiple threads. For example, in Java, where one thread controls another thread to suspend or wake up, or to terminate, there is a control thread to controlled thread relationship that can be mapped one-to-one with watermark information using custom rules. These control relationships do not compromise the program's final function, but only affect the program's runtime. Therefore, the method has higher concealment to the watermark and better body resistance to various attacks to threads. However, the information amount of the software watermark which can be embedded by the method depends on the number of threads in the program, and when necessary (such as fewer threads in the source code), the source code is modified to increase the function relation between the modification control threads, and a large number of redundant threads are required to be introduced, so that the overall function of the program is influenced.
File-based software watermarking relies on the file formats specified by the operating system, computer language specifications, etc., each file being required to follow its particular file format. Some files have redundant space in the format design, and watermark embedding can be completed in the redundant space. Such methods are flexible relative to other methods and do not require additional time to understand the source code. But has the disadvantage of being vulnerable to removal attacks.
Disclosure of Invention
In view of the problems existing in the prior art, a first aspect of the present application proposes a method for embedding a software watermark in a JAR packet, comprising the steps of: reading the JAR package to obtain a plurality of class byte code files; encrypting the software watermark to be embedded into an encrypted character string; splitting the encrypted character string into a plurality of data segments, constructing a data frame comprising the data segments, and embedding the data frame into a plurality of class byte code files; and integrating the embedded multiple class byte code files to generate a JAR package embedded with the software watermark.
Preferably, the plurality of class bytecode files are classified into an important class and an accessory class, and the encrypted character string is scattered and embedded in the important class bytecode files after being split into at least 3 data segments.
Preferably, the data frame further comprises a sequence number for indicating the order of the data segments in the encrypted string.
Preferably, the data frames are embedded in a constant pool of class bytecode files.
Preferably, the data frame is inverted and then embedded into the end of the class bytecode file. The data frame can be read in a reverse way after being reversed, and meanwhile, the data frame can not be correctly taken by the forward sequence reading, so that the concealment is improved.
Preferably, the encryption string is split into 2 segments and then embedded into the end of the attached class bytecode file.
Preferably, the end of the data frame also includes a CRC check code.
Preferably, the plurality of class bytecode files are classified according to the calling times, wherein the important class is a class bytecode file with the calling times larger than the average calling times in the JAR package, and the auxiliary class is a class bytecode file with the calling times smaller than the average calling times in the JAR package.
Preferably, the method further comprises the step of performing code confusion processing on the original codes of the JAR package.
The second aspect of the present application proposes a method for extracting a software watermark from a JAR packet, comprising the steps of:
reading the JAR package to obtain a plurality of class byte code files; reading a data frame comprising a data segment with an encrypted string split, and splicing the encrypted string; and decrypting the encrypted character string according to the encrypted password.
Preferably, the location of the read data frame is a constant pool or end of the class bytecode file.
A third aspect of the present application proposes a system for embedding a software watermark in a JAR packet, comprising:
the JAR package analysis module is configured to read the JAR package and obtain a plurality of class byte code files;
a software watermark encrypting module configured to encrypt a software watermark to be embedded into an encrypted string;
the software watermark embedding module is configured to split the encrypted character string into a plurality of data segments, construct a data frame comprising the data segments, and embed the data frame into a plurality of class byte code files;
and the JAR packet generation module is configured to integrate the embedded multiple class byte code files to generate a JAR packet embedded with the software watermark.
A fourth aspect of the application proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements any of the methods of the first aspect.
The application discloses a method for embedding and extracting watermarks in a software package JAR package manufactured in Java language. The method abandons the mode of recording watermark information in a text file such as MANIFAST.MF, SQL, XML, YML and the like in the conventional scheme, adopts a binary mode to read the class byte code file in the whole JAR packet, and writes the watermark information into the corresponding class byte codes according to a series of encryption, splitting and distribution rules, thereby embedding the watermark. The corresponding software watermark extraction method is to extract, combine and decrypt watermark information in class byte code file to obtain final watermark information according to the embedding rule of the JAR package. By applying the scheme to embed the software watermark, a watermark producer can easily embed watermark information without knowing the content of the source code or relying on other third-party tools. Furthermore, after the scheme is matched with code confusion processing and watermark information is embedded, the code confusion can simulate and replace class names, package names, method names, character string field values and character strings in data frames, or can split watermark encryption strings and hide the watermark encryption strings into the confusion, so that the concealment of the watermark can be greatly improved, the reverse engineering difficulty is increased, and the safety of software is improved.
Drawings
The accompanying drawings assist in a further understanding of the application. The elements of the drawings are not necessarily to scale relative to each other. For convenience of description, only parts related to the related application are shown in the drawings.
FIG. 1 is a schematic diagram illustrating steps for embedding a watermark in a JAR packet according to an embodiment of the present application;
FIG. 2 is a flow chart of embedding and extracting watermarks in JAR packets in accordance with one embodiment of the present application;
FIG. 3 is a block diagram of a read class bytecode file according to another embodiment of the present application;
FIG. 4 is a diagram of call relationships between classes in Java code in accordance with another embodiment of the present application;
FIG. 5 is a schematic diagram of a format of a data frame according to another embodiment of the present application;
FIG. 6 is a diagram of a constant pool data structure in accordance with another embodiment of the present application;
FIG. 7 is a schematic diagram of an inverted data frame according to another embodiment of the present application;
fig. 8 is a schematic diagram of a system architecture for embedding a software watermark in a JAR packet according to another embodiment of the present application.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application.
Java is an object-oriented programming language capable of writing cross-platform application software, java is adopted to develop an application program, firstly, a source code with a file suffix name of Java is written, then the source code is compiled into binary byte codes (byte codes) which are irrelevant to machines through a compiler, namely, class files, virtual machines (JVMs) of various different platforms are relied on to search and load classes needing to be executed in a preset class loading path when running, and byte code executing programs are interpreted, so that the cross-platform characteristic of 'one-time compiling and everywhere executing' is realized. In the development of an application project, a large number of class files are typically involved, and for convenience of storage and use, JAR packages are typically used. The JAR (Java Archive) package is a compressed package of java and can also be considered as a collection of class files.
The Java compiled class bytecode carries a large amount of source data, which is the nature of the language itself that cannot be changed and is easily decompiled. The inventor absorbs the advantages and disadvantages of various existing software watermark generation modes and puts forward the method of the application. First, in a typical Java software development process, as small as variable naming and as large as an overall software architecture design, developers typically follow programming specifications, thereby increasing the readability of the source code. The software with smaller volume is more convenient to read, the large-scale software is capable of reading hundreds of thousands of lines, even millions of lines of codes, time and labor are wasted in reading each function code, and the effect is not high, so that the scheme abandons starting from a source code, directly modifies the Java compiled class byte code, and a software watermark producer can embed the watermark without knowing the realization of the software source code. Secondly, the data information in the class byte codes compiled by Java is arranged according to a certain rule and has a certain sequence, and each item of data has a fixed length. Thus, decompilation is easy to perform as long as the class bytecode specification is known. If the source data in the class byte code is to be hidden, only any item of data in the byte code is to be disturbed or the byte bit data of one item of data is to be replaced, and the class byte code cannot be analyzed without knowing the change rule. Since the decompiler cannot analyze the class, the Java virtual machine cannot analyze the class, and therefore a custom class loader (ClassLoader) is required to pre-analyze the modified class bytecodes, and the class bytecodes with disordered sequences are handed to the Java virtual machine for loading after the sequence of the class bytecodes is adjusted normally. According to the scheme, a class byte code content addition method is adopted, a custom ClassLoader is not required to be added to pre-analyze class byte codes, and a watermark maker can more easily add software watermarks to JAR packages.
Fig. 1 is a schematic diagram of a step of embedding a watermark in a JAR packet according to an embodiment of the present application, which specifically includes:
s1, reading the JAR package to obtain a plurality of class byte code files. Specifically, the class bytecode in the JAR packet can be read, analyzed and integrated one by one according to class bytecode specifications by using tools such as JarFile, dataInputStream in the Java-API.
S2, encrypting the software watermark to be embedded into an encrypted character string. Specifically, according to the input software watermark information and the encryption password, watermark encryption is performed by using a symmetric encryption tool of 'DES' in Java-API to generate an encryption character string.
S3, splitting the encryption character string into a plurality of data segments, constructing a data frame comprising the data segments, and embedding the data frame into a plurality of class byte code files. Specifically, according to the class byte codes after the analysis and the regulation, the class byte codes needing to be embedded with the watermark are selected according to a certain rule, the encryption string is split, and after the split encryption string is marked, the class byte codes needing to be embedded with the watermark are embedded in a data frame mode. The embedded position of the data frame can be the end of a constant Chi Zhonghe class byte code file of the class byte code file, the original content part of the class byte code file is not changed in the positions, when the class byte code file is identified by the Java virtual machine, the Java virtual machine sequentially identifies according to a compiling rule, and filtering is carried out on bytes (content) which have no effect but meet a compiling specification.
S4, integrating the embedded multiple class byte code files, and generating a JAR package embedded with the software watermark according to the original package directory structure.
Watermark reading is the reverse operation of the method, specifically, using JarFile, dataInputStream tools in Java-API to read and analyze class byte codes in jar packets one by one according to class byte code specifications, and obtaining data frames except all class byte code source data; and (5) sorting and reorganizing according to the number of the data frame, decrypting by using an encryption password, and finally obtaining the complete watermark.
Fig. 2 is a flow chart of embedding and extracting watermarks in JAR packets in accordance with another embodiment of the application. After the JAR packet is read and analyzed, judging whether the task is a watermark embedding task, and for the watermark embedding task, executing a watermark embedding process, specifically comprising the following steps:
1. JAR packet for reading watermark of software to be embedded
Java itself is an open source computer language, has vast usage and maintenance communities, and is increasingly rich in various underlying API (Application Program Interface) functions, through which upper layer application developers can easily go to the operating system to execute various commands. In this embodiment, jarFile, jarEntity, dataInputStream in the Java underlying API is used to read class bytecodes in the JAR package. All bytes of a class bytecode are available through these APIs, and fig. 3 is a bytecode in the class bytecode file read in this embodiment.
2. Specification normalization according to class bytecode
The class bytecode file stores data in a structure similar to a C language structure, in which there are two types of stored data: unsigned numbers and tables. The unsigned numbers belong to the most basic data types, and u1, u2, u4 and u8 respectively represent 1 byte, 2 bytes, 4 bytes and 8 bytes of unsigned numbers, and the unsigned numbers can be used for describing numbers, index references, numerical values or character string values formed by encoding according to UTF-8; a table is a composite data structure consisting of unsigned numbers or other tables, all of which habitually end with "info". A typical class bytecode file can be divided into ten parts: the MagicNumber, version, constant_pool, access_flag, this_class, super_class, interfaces, fields, methods and Attributes, each with different specifications, can be summarized in the following Table:
type descriptor interpretation of the drawings
u4 magic Magic number
u2 minor_version Minor version number
u2 major_version Master version number
u2 constant_pool_count Constant Chi Changliang numbers
cp_info constant_pool[constant_pool_count-1] Constant pool
u2 access_flags Access tags
u2 this_class Class index
u2 super_class Parent class index
u2 interfaces_count Number of interface indexes
u2 Interfaces[interfaces_count] Interface content
u2 fields_count Number of fields of a field table
field_info Fields[fields_count] Field table
u2 methods_count Method table method quantity
method_info methods[methods_count] Method table
u2 attributes_count Attribute table attribute quantity
attribute_info attributes[attributes_count] Attribute table
Detailed specifications of class bytecodes may refer to web pageshttps://docs.oracle.com/javase/specs/ jvms/se8/html/index.html
After the content of a class byte code file is read, the information of the class in Java before the class byte code compiling can be known, wherein the information comprises all attributes such as a package name, a class name, an access control domain, a field name, a field type, a field value, a method name, a method return type, a method code block, a specific line number and the like, and the dependency relationship among the classes can be roughly arranged according to inheritance, realization and dependency calling relationship among the classes. Fig. 4 is a diagram of call relations between classes in Java code in the present embodiment.
According to the calling relation among the classes, the calling times of the class are calculated, if the class A is called by the class B, the class A calling times are +1, the calling times are ordered from large to small, and the classes in the JAR package can be divided into important classes and auxiliary classes according to the calling times. In this embodiment, the calculation rules of the important class and the auxiliary class are determined according to the average call number. The accumulated class calling times are divided by the number of classes to obtain an average number a (the number is less than 1 and the value is 1), the calling times which are greater than a are defined as important classes, and the calling times which are less than a are defined as auxiliary classes.
3. Encrypting software watermark information
The software watermark information is regarded as a significant character string, and can comprise version, author, release time and the like. Watermark information is formulated by a watermark formulator. And (3) according to the encryption password (also set by a maker, needing to be recorded to other places for storage, otherwise, incapable of normally extracting watermark information in the later period), using a Java-DES encryption tool related API to encrypt the watermark. In this embodiment, the software watermark information is "version-v 1.0.5 author-tom release time-2022-10-08:15", the encrypted character string P is "2a87809cb621b5b098da8143deab1ecf39bf2e41bbcd524b8603a8754c9c4a9d0d37bdd9d 9f009e10e344ff30cb249c7d84f2327fedc00e"
4. Splitting the encrypted string P according to rules and composing the watermark data frame D
According to the number C of the important classes, the encryption strings P are split, specifically, the number C can be divided into three groups, the data quantity to be embedded in each important class is N=len (P)/(C/3), len (P) is the length of the encryption strings P, the data embedding quantity N is required to be even, when the data embedding quantity N is odd +1, the encryption strings P start to intercept the watermark data frames D according to the embedding character quantity N, the number of the watermark data frames D cannot be larger than 255, if the number of the watermark data frames D is larger than 255, the data embedding quantity N is increased, and if the number of the watermark data frames is smaller than the number of the classes of each group, the multiple classes continue to be embedded alternately according to the watermark data frames formed previously.
In this embodiment, there are 300 important classes in the Java code of the software, n=1 calculated according to the encrypted string P, and since it is an odd number, the first two bits of the encrypted string P are truncated to form a data frame if +1 is changed to 2 automatically. Fig. 5 is a schematic diagram of a format of a data frame in the present embodiment. A typical data frame consists of a frame header, a frame trailer, and a data portion, which in this embodiment consists of a sequence number portion and an encrypted string portion that is truncated. The frame header indicates that the data is likely to be a watermark data frame, and the data can be read in a data frame reading manner. The last two bits of the data frame are the data length to be read, 012A877E98 are the data part, the first bit of the data part is the watermark sequence number, the second and third bits are part of characters of the watermark encryption string, and the last two bits are CRC check codes, in this embodiment, the check codes are generated by three byte bit data of '01 2A 87', the generation rule method is customized, and can be generated by a CRC algorithm commonly used in the prior art. The check code is specially used for verifying whether the frame data is normal or not, and if the frame data is abnormal, the check code represents that the data is not watermark information.
5. Embedding watermark data frames into importance classes
In this embodiment, the watermark data frame is embedded by adding a constant_utf8 type constant at the end of a constant pool of important classes, where the first byte of the constant represents the constant_utf8 type, and immediately after the type, two bits represent the constant_utf8 type constant data length, and the type immediately follows the stored data. The constant_utf8 type constant may be set to a plurality. Fig. 6 is a schematic diagram of a constant pool data structure in this embodiment, where positions 1-6 indicated by red arrows can be used for constant insertion, but the difference is that each constant is unchangeable at a position after the Java compiler finishes compiling, because field mapping, method mapping, class mapping, and constant mapping in other information such as a subsequent method and interface in the constants are all recorded original compiled positions, when the inserted position is 1-5, the mapping index needs to be reset, otherwise, class bytecodes cannot be normally loaded by the Java virtual machine, and the setting mode installs class bytecodes specifications to perform each setting. When inserted at position 6, the other constant mappings are not affected. A constant can be inserted multiple times at each location. After the insertion is completed, accumulation setting is needed for the constant_pool_count, the number of the inserted constants is added by the number of the original constants, and the constant count value is reset, otherwise, the subsequent source data information of the constant pool cannot be loaded correctly by the Java virtual machine.
In another embodiment, the watermark data frame is embedded in such a way that at the end of the class bytecode, the watermark data frame is spliced to the end after being inverted. In this embodiment, for a jar packet to be encrypted, 300 important classes and several subordinate classes are included. The original encryption string can be calculated according to the chapter step 4, the number of the split segments of the encryption string is calculated, one encryption string is calculated according to 300 important classes, the encryption string is calculated to be divided into 2 parts, the first half part is a first segment encryption string, and the second half part is a second segment encryption string. Fig. 7 is a schematic diagram of an inverted data frame in this embodiment, most of the last data frame is not consistent with the data frame inserted in the constant pool, and the data frame formed by the second section of encryption string P can be acquired for splicing.
6. Embedding watermark data frames into subordinate classes
The encrypted character string is divided into two parts, and the two parts are respectively formed into data frames, each data frame is spliced at the last part of the auxiliary class byte codes, the data frame header is reset, the first data frame header is changed into a data embedding quantity N+1, and the second data frame header is changed into a sixth data embedding quantity N-1.
7. Rewriting new JAR packets
And according to the packet names of the classes, the JarOutputStream in the Java-API is used for rewriting the JAR packet. The rewritten class bytecode is not viewable when decompiled, for example, when it is put into IEDA for viewing.
For a process of reading a watermark from a watermarked JAR packet, comprising the steps of:
and (3) repeating the step 1 of reading the JAR packet to obtain all class byte codes, reading a data frame from a constant pool or the tail part of each byte code, analyzing and reading according to a data frame format, checking a CRC code, and discarding the data if the CRC code is wrong. And stripping other parts of the read data frame information, and only reserving the data part for grouping according to the number. And (3) according to the class dependence calling relation grouping rule obtained in the step (2), calculating the number of watermark data frames, and performing splicing work of the encryption string P.
And (3) the spliced watermark encryption string is decrypted by using a Java-DES encryption tool related API according to the JAR packet watermark encryption password which is stored separately, so that the original watermark information can be finally obtained.
If the data embedding amount N calculated according to the class dependence calling relation in the step 2 is inconsistent with the number of watermark data frames, the tail data frames in the auxiliary class can be read, and only the data frames with two different frame heads can be read to splice to obtain an encryption string P, and decryption is continued. The two reading modes can be operated in parallel, and the final result is based on any one of the two reading modes as long as the encrypted string can be extracted and can be correctly decrypted.
Fig. 8 is a schematic structural diagram of a system 800 for embedding a software watermark in a JAR packet according to a third aspect of the present application, which includes:
the JAR package parsing module 801 is configured to read a JAR package to obtain a plurality of class byte code files;
a software watermark encrypting module 802 configured to encrypt a software watermark to be embedded into an encrypted string;
a software watermark embedding module 803 configured to split the encrypted string into a plurality of data segments, construct a data frame comprising the data segments, and embed the data frame into a plurality of class bytecode files;
the JAR packet generation module 804 is configured to integrate the embedded multiple class bytecode files to generate a JAR packet with embedded software watermarks.
In another specific embodiment, according to a fourth aspect of the present application, a computer readable storage medium is presented, storing a computer program which, when executed by a processor, implements any of the methods of the first aspect of the present application.
In another embodiment, the effect of the inventive protocol was verified experimentally. Specifically, a third party dependency library package common-io.common-io.2.6 tool package is randomly extracted, 157 class byte code files are used for watermark embedding by adopting the method of the embodiment. When the written JAR package is unchanged, the watermark information can be extracted rapidly; deleting the JAR packet, wherein the verification result is specifically as follows:
1 class is deleted randomly for 1.10 times, watermark extraction is carried out again, and the probability of watermark extraction is 100%.
10 classes are randomly deleted for 2.10 times, and watermark extraction is carried out again, wherein the probability of watermark extraction is 100%.
30 classes are randomly deleted for 3.10 times, and watermark extraction is carried out again, wherein the probability of watermark extraction is 90%.
And 50 classes are randomly deleted for 4.10 times, and the watermark is extracted again, wherein the probability of extracting the watermark is 60%.
100 classes are randomly deleted for 5.10 times, and watermark extraction is carried out again, wherein the probability of watermark extraction is 10%.
6. When all important classes are deleted, half of the auxiliary classes are randomly deleted 10 times, and the probability of watermark extraction is 90%.
7. When all the auxiliary classes are deleted, half of the important classes are randomly deleted 10 times, and the probability of watermark extraction is 50%.
Therefore, the scheme of the embodiment can effectively identify behaviors such as illegal copying and embezzlement, and when the integrity of the JAR package is kept higher, the probability of extracting the software watermark is higher by adopting the scheme of the embodiment.
While the present application has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the application as defined by the appended claims.

Claims (13)

1. A method of embedding a software watermark in a JAR packet, comprising the steps of:
reading the JAR package to obtain a plurality of class byte code files;
encrypting the software watermark to be embedded into an encrypted character string;
splitting the encryption character string into a plurality of data segments, constructing a data frame comprising the data segments, and embedding the data frame into the plurality of class byte code files;
and integrating the embedded multiple class byte code files to generate a JAR package embedded with the software watermark.
2. The method of embedding a software watermark in a JAR packet according to claim 1, wherein said plurality of class bytecode files are classified into an important class and an accessory class, and said encrypted string is split into at least 3 data segments and then embedded in the important class bytecode files in a scattered manner.
3. A method of embedding a software watermark in a JAR packet according to claim 2, wherein the data frame further comprises a sequence number for indicating the order of the data segments in the encrypted string.
4. A method of embedding a software watermark in a JAR packet according to claim 3, wherein the data frame is embedded in a constant pool of class bytecode files.
5. A method of embedding a software watermark in a JAR packet according to claim 3 wherein the data frame is inverted and embedded at the end of a class bytecode file.
6. The method for embedding a software watermark in a JAR packet according to claim 2, wherein the encrypted string is split into a plurality of segments and then embedded into the end of a plurality of class-attached bytecode files, respectively.
7. A method of embedding a software watermark in a JAR packet according to claim 1, wherein the end of the data frame further comprises a CRC check code.
8. The method for embedding a software watermark in a JAR packet according to claim 2, wherein the plurality of class bytecode files are classified according to a number of calls, wherein the important class is a class bytecode file having a number of calls greater than an average number of calls in the JAR packet, and the subordinate class is a class bytecode file having a number of calls less than the average number of calls in the JAR packet.
9. The method of embedding a software watermark in a JAR packet according to claim 1, further comprising performing a code obfuscation process on the original code of the JAR packet.
10. A method for extracting a software watermark from a JAR packet, comprising the steps of:
reading the JAR package to obtain a plurality of class byte code files;
reading a data frame comprising a data segment with an encrypted string split, and splicing the encrypted string;
and decrypting the encrypted character string according to the encrypted password.
11. A method of extracting a software watermark in a JAR packet according to claim 10, wherein the location at which the data frame is read is a constant pool or end of a class bytecode file.
12. A system for embedding a software watermark in a JAR packet, comprising:
the JAR package analysis module is configured to read the JAR package and obtain a plurality of class byte code files;
a software watermark encrypting module configured to encrypt a software watermark to be embedded into an encrypted string;
the software watermark embedding module is configured to split the encrypted character string into a plurality of data segments, construct a data frame comprising the data segments, and embed the data frame into the plurality of class byte code files;
and the JAR packet generation module is configured to integrate the embedded multiple class byte code files to generate a JAR packet embedded with the software watermark.
13. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 9.
CN202310555407.XA 2023-05-17 2023-05-17 Method, system and storage medium for embedding and extracting software watermark in JAR package Pending CN116611032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310555407.XA CN116611032A (en) 2023-05-17 2023-05-17 Method, system and storage medium for embedding and extracting software watermark in JAR package

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310555407.XA CN116611032A (en) 2023-05-17 2023-05-17 Method, system and storage medium for embedding and extracting software watermark in JAR package

Publications (1)

Publication Number Publication Date
CN116611032A true CN116611032A (en) 2023-08-18

Family

ID=87674054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310555407.XA Pending CN116611032A (en) 2023-05-17 2023-05-17 Method, system and storage medium for embedding and extracting software watermark in JAR package

Country Status (1)

Country Link
CN (1) CN116611032A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194490A (en) * 2023-11-07 2023-12-08 长春金融高等专科学校 Financial big data storage query method based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194490A (en) * 2023-11-07 2023-12-08 长春金融高等专科学校 Financial big data storage query method based on artificial intelligence
CN117194490B (en) * 2023-11-07 2024-04-05 长春金融高等专科学校 Financial big data storage query method based on artificial intelligence

Similar Documents

Publication Publication Date Title
US7841010B2 (en) Software or other information integrity verification using variable block length and selection
US9350547B2 (en) Systems and methods for watermarking software and other media
Cousot et al. An abstract interpretation-based framework for software watermarking
Collberg et al. Sandmark-a tool for software protection research
Jacobson et al. Labeling library functions in stripped binaries
EP1546892B1 (en) Protecting mobile code against malicious hosts cross references to related applications
Wang et al. Detecting software theft via system call based birthmarks
US8176473B2 (en) Transformations for software obfuscation and individualization
Collberg et al. Dynamic graph-based software fingerprinting
US20120317421A1 (en) Fingerprinting Executable Code
Collberg et al. More on graph theoretic software watermarks: Implementation, analysis, and attacks
US20030191940A1 (en) Integrity ordainment and ascertainment of computer-executable instructions with consideration for execution context
Collberg et al. Graph theoretic software watermarks: Implementation, analysis, and attacks
Park et al. Detecting common modules in Java packages based on static object trace birthmark
JP2004511031A (en) Digital data protection configuration
Park et al. Effects of Code Obfuscation on Android App Similarity Analysis.
CN102982262B (en) For the security mechanism of operating system developed
CN116611032A (en) Method, system and storage medium for embedding and extracting software watermark in JAR package
US8661559B2 (en) Software control flow watermarking
CN111680272A (en) Byte code encryption and decryption method and device
Chen et al. Semantic-integrated software watermarking with tamper-proofing
Collberg et al. Software watermarking in the frequency domain: implementation, analysis, and attacks
Huang et al. Smart contract watermarking based on code obfuscation
Tamada et al. Detecting the traft of programs using birthmarks
CN107533614B (en) Device for storing data and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination