CN117689909A

CN117689909A - Digital collection detection method, digital collection issuing method and digital collection issuing device

Info

Publication number: CN117689909A
Application number: CN202311803638.4A
Authority: CN
Inventors: 毛嘉宇; 范瑞彬; 张开翔; 张龙; 王越
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2023-12-25
Filing date: 2023-12-25
Publication date: 2024-03-12

Abstract

The embodiment of the application discloses a digital collection detection method, a digital collection issuing method and a digital collection issuing device, wherein the digital collection detection method comprises the following steps: according to the at least two target feature vector extraction models, feature vector extraction is respectively carried out on the digital collection to be issued and the at least two target comparison graphs, so as to obtain at least two target feature vectors and a plurality of comparison feature vectors; respectively carrying out similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors to obtain a target similarity matrix comprising a plurality of similarity values; performing anti-plagiarism detection on the uplink information of the digital collection to be sent and all the digital collections issued, and obtaining a detection result of the digital collection to be sent; according to the method and the device, on the premise that the content picture of the digital collection is not disclosed, the accuracy of anti-plagiarism detection can be improved, and the consumption of storage space on a blockchain can be reduced.

Description

Digital collection detection method, digital collection issuing method and digital collection issuing device

Technical Field

The application relates to the technical field of digital collection; in particular to a digital collection detection method, a digital collection issuing method and a digital collection issuing device.

Background

With the appearance of block chains, a digital stock industry based on the block chain technology also breeds; the digital collection is used as a digital publication, and can be issued in a chain after passing anti-plagiarism detection, and then is circulated and sold; the digital collection on the blockchain comprises a public digital collection and a hidden digital collection.

For the disclosed anti-plagiarism detection method of the digital collection, pattern similarity detection is generally adopted, namely, similarity comparison is carried out on the pictures disclosed by the digital collection to be sent and the pictures of all the digital collections issued on the blockchain one by one, and if the similarity between the pictures is larger than a threshold value, the digital collection is judged to have plagiarism; for the hidden digital collection, since the privacy characteristics of the hidden digital collection cannot disclose the picture content, the anti-plagiarism detection cannot be performed by adopting the image similarity, and the conventional anti-plagiarism detection method for the hidden digital collection generally adopts hash value comparison detection, namely, the hash value of the digital collection to be sent is compared with the hash values of all the digital collections issued on a blockchain in similarity, and if the same hash value exists, the digital collection is proved to exist, and the result is returned as plagiarism; otherwise, the digital collection is proved to have no plagiarism.

However, the detection methods based on the image similarity and the hash value similarity are deceptive, the hash values are completely different by fine adjustment of the original picture, and the image feature vectors are obviously different by specific fine adjustment of the original picture, so that the anti-plagiarism detection effect is affected, and the detection accuracy is reduced.

Disclosure of Invention

In order to solve the technical problems, the embodiment of the application provides a digital collection detection method, a digital collection issuing method and a digital collection issuing device, which can not only improve the accuracy of anti-plagiarism detection, but also reduce the consumption of storage space on a blockchain on the premise of not disclosing the content pictures of the digital collection.

According to an aspect of the embodiments of the present application, there is provided a digital collection detection method, the method including: according to a plurality of feature vector extraction models for acquiring feature vectors of the digital collection, constructing an intelligent up-stand contract of the digital collection; according to at least two target feature vector extraction models selected from the intelligent up-stand contracts, feature vector extraction is respectively carried out on the digital collection to be issued and at least two target contrast graphs to obtain at least two target feature vectors and a plurality of contrast feature vectors; wherein the at least two target comparison images are comparison images corresponding to the digital collection to be sent; respectively carrying out similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors to obtain a target similarity matrix comprising a plurality of similarity values; performing anti-plagiarism detection on the uplink information of the digital collection to be sent and all the digital collections issued, and obtaining a detection result of the digital collection to be sent; the uplink information of the to-be-transmitted digital collection comprises a first parameter information set of the at least two target comparison graphs, a second parameter information set of the at least two target feature vector extraction models and the target similarity matrix.

According to an aspect of the embodiments of the present application, there is provided a digital collection issuing method, including: acquiring an issuing request of a digital collection to be issued; obtaining a detection result of the digital collection to be issued according to the digital collection detection method; if the detection result is that the plagiarism exists, refusing the release request of the digital collection to be released; and if the detection result is that no plagiarism exists, issuing the digital collection.

According to an aspect of the embodiments of the present application, there is provided a digital collection detection device, the device including: the on-shelf contract construction module is used for constructing intelligent on-shelf contracts of the digital collection according to a plurality of feature vector extraction models for acquiring feature vectors of the digital collection; the feature vector extraction module is used for extracting feature vectors of the digital collection to be issued and the at least two target comparison graphs respectively according to at least two target feature vector extraction models selected from the intelligent up-frame contracts to obtain at least two target feature vectors and a plurality of comparison feature vectors; wherein the at least two target comparison images are comparison images corresponding to the digital collection to be sent; the similarity calculation module is used for respectively carrying out similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors to obtain a target similarity matrix comprising a plurality of similarity values; the anti-plagiarism detection module is used for carrying out anti-plagiarism detection on the uplink information of the digital collection to be sent and all the issued digital collections to obtain a detection result of the digital collection to be sent; the uplink information of the to-be-transmitted digital collection comprises a first parameter information set of the at least two target comparison graphs, a second parameter information set of the at least two target feature vector extraction models and the target similarity matrix.

According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a digital collection detection method as in the above technical solution.

According to an aspect of the embodiments of the present application, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to implement the digital collection detection method as in the above technical solution.

According to an aspect of the embodiments of the present application, there is provided a computer program product, including a computer program, which when executed by a processor implements a digital collection detection method as in the above technical solution.

The technical scheme that this application provided includes following beneficial effect at least:

(1) The target similarity matrix obtained according to the feature vector of the digital collection to be issued and the feature vector of the comparison image is used as the basis of anti-plagiarism detection of the digital collection, then all sample similarity matrices obtained according to the feature vector of each issued digital collection and the feature vector of the comparison image are compared with each other one by one, and finally anti-plagiarism detection results of the digital collection to be issued can be obtained through one-to-one comparison of the target similarity matrix and all sample similarity matrices; therefore, the similarity matrix for anti-plagiarism detection is obtained by introducing a mode of combining a plurality of contrast graphs and a plurality of feature vector extraction models, so that the accuracy of anti-plagiarism detection is improved; the anti-plagiarism detection can be realized on the basis of only disclosing the similarity matrix of the digital collection to be sent and the comparison image, namely the accuracy of the anti-plagiarism detection can be improved on the premise that the content picture of the digital collection is not disclosed, and the anti-plagiarism detection method can be widely applied to the anti-plagiarism detection of the hidden digital collection.

(2) According to the method, the similarity matrix information of the digital collection to be sent and the multiple comparison graphs, the download information of the multiple comparison graphs and the download information of the multiple feature vector extraction models are stored in the blockchain as uplink information of the digital collection to be sent, and compared with the prior art that the picture or the feature vector of the digital collection to be sent is used as the uplink information, the byte quantity of the uplink information is greatly reduced, so that the storage space consumed on the blockchain is reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

fig. 1 is a schematic flow chart of a digital collection detection method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a deep learning algorithm model according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an exemplary flow of step S40 in FIG. 1;

Fig. 4 is a schematic flow chart of a first digital collection issuing method according to an embodiment of the present application;

fig. 5 is a schematic flow chart of a second digital collection issuing method according to an embodiment of the present application;

fig. 6 is a schematic flow chart of an image similarity comparison method according to an embodiment of the present application;

fig. 7 is a schematic flow chart of a digital collection detection device according to an embodiment of the present application;

fig. 8 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Also to be described is: reference to "a plurality" in this application means two or more than two. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The following explains the terms referred to in the present application:

non-homogeneous Token (NFT) NFT is a unit of data known as a blockchain digital ledger, each Token representing a unique digital data as an electronic certificate or credential of virtual commodity ownership. Because of their non-interchangeable nature, non-homogeneous tokens may represent digital assets, such as drawings, artwork, sound, movies, items in a game, or other forms of creative work. While the work itself is infinitely reproducible, these tokens representing them can be tracked entirely on their underlying blockchain, thus providing ownership proof for the buyer.

Digital collectibles (Digital Collections), which can be understood as chinese features, are essentially regulatory NFTs, are specific works, artwork and merchandise, such as digital drawings, pictures, music, video, 3D models, etc., that are uniquely identified using blockchain technology. Each digital collection maps to a unique sequence on a particular blockchain, is not tamperable, is not divisible, and is not interchangeable. On the basis of protecting the digital copyright, realizing the true and credible digital issuing, purchasing, collecting and using.

Domestic digital collections are three-point different from foreign NFTs. First, foreign NFTs are based on a public chain that is open to all, anyone can participate in, read data from, send transactions, etc. The foremost feature of foreign NFTs is that they are not managed, not controlled, and not supervised by anyone or an organization. While domestic digital collections are based on alliance chains, many blockchains and alliance chains are infrastructure built by governments, and our country manages the alliance chains. Second, in the content of issuing the stock, the foreign NFT does not pass the copyright audit, and the digital stock of the domestic specification must pass the content audit to be issued in the uplink. The digital collection is positioned into a digital publication, and the digital publication can be circulated and sold only after being published and issued. In addition, foreign NFT is to token a work or a virtual thing, and not to transfer the value of a real digital creative work or digital copyrighted work.

Blind box: the consumer cannot know the boxed commodity with specific product style in advance, has random attribute, is a popular entertainment consumer product at present, and when buying a certain blind box, the consumer does not determine the specific condition (including parameters, appearance, characters and the like) of the content of the boxed commodity, but knows that the content of the boxed commodity is necessarily one of all the contents (namely potential prizes) in a prize pool which are published in advance by a selling merchant; different prizes in the prize pool can be packaged into a specific number of blind boxes in advance by a selling merchant according to different probabilities according to the value of the prizes, so that the number and proportion of the prizes with different value grades in all the blind boxes to be sold in the market belong to the preset probabilities of the blind boxes. The purchaser cannot see the specific situation of the contained prize until he/she unpacks the blind case.

Concealing digital collections: the digital collection is equivalent to a blind box, and generally refers to the digital collection with unpublished content or rights and interests. Only after the colleague purchases and opens, can see; and still optionally remain in a hidden state; in recent years, blind box entertainment based on blockchain and digital collection technology is gradually becoming a popular online entertainment mode. Due to the characteristics of transparent disclosure, non-tampering and the like of blockchain and digital collection technologies, blind boxes based on blockchain and digital collection have been gradually accepted as an implementation way capable of ensuring the fairness and credibility of the blind boxes.

Before describing the embodiments of the present application, the technical problems mentioned in the background art are described herein:

1. the method comprises the following steps of detecting the similarity of graphics aiming at public digital collections: at present, a plagiarism person of a digital collection may use a specially designed deceptive image, so that the deceptive image looks very similar to an original image in human eyes by manipulating characteristics and structures of the image, but has lower similarity on calculated image vectors, thereby deceptively preventing a plagiarism detection scheme. For malicious plagiarisms, when a particular plagiarism collection is selected, the plagiarism image can be fine-tuned maliciously: for example, a software library related to graphics is used, secondary similar points in the image are finely adjusted, the graphics vector of the image is gradually modified and recalculated, and finally the similarity between the plagiarism picture and the original picture is smaller than a threshold value, so that the detection of plagiarism is avoided, and the detection of a plagiarism prevention algorithm is avoided. Because the image graph vector model and the algorithm are disclosed, a malicious plagiarism person can completely establish an engineering model to continuously fine tune an original image, continuously calculate graph vectors, test whether the similarity is smaller than a threshold value, and continuously circulate and iterate until a picture which can evade the detection of the plagiarism algorithm is output. Finally, the picture of the escape plagiarism detection algorithm is highly similar to the original works in visual effect, but can be used for escaping the detection of the algorithm.

2. Aiming at the hidden digital collection, hash value comparison detection is adopted: due to the nature of the hash algorithm, even if one byte is altered for the original file, the hash value of the file changes quite differently. For example, if a malicious issuer changes a pixel point to a plagiarized digital collection picture file at will, the hash value will be different, but in reality, these are two identical pictures, which still fall into the category of plagiarization.

In order to solve the above problems, the digital collection detection method provided by the present application does not specifically include the following embodiments:

fig. 1 is a schematic flow chart of a digital collection detection method according to an embodiment of the present application; as shown in fig. 1, the method specifically includes the following steps:

step S10, constructing an intelligent up-stand contract of the digital collection according to a plurality of feature vector extraction models for acquiring feature vectors of the digital collection;

the digital collection in the embodiment comprises a public digital collection and a hidden digital collection, that is to say, the digital collection detection method provided by the application is suitable for anti-plagiarism detection of the public digital collection and anti-plagiarism detection of the hidden digital collection.

It should be noted that, the digital collection issuing method based on blockchain includes an issuer and an exchange; the issuer is the issuer of the digital collection, usually the project party for providing the digital collection, and is responsible for issuing and popularizing the digital collection; the exchange is mainly responsible for judging the anti-plagiarism of the digital collection, supporting the on-shelf digital collection and providing services such as sales, popularization and the like of intermediaries.

In this embodiment, the algorithm of the feature calculation of the digital collection may be commonly determined by a plurality of digital collection issuers and exchanges. The feature calculation vector can convert a digital collection file into a specific feature vector, and the algorithm of the feature calculation can be described as follows: v=f (nft), where f is the algorithm for feature vector calculation, nft is the input (i.e. digital stock file), v represents the output: a vector.

Because the hidden digital stock cannot be disclosed on public chains or source file disclosure cannot be achieved, for example, in the actual business of part of picture digital stock, only the fragments or thumbnails of the digital stock are disclosed. However, if the feature vector calculated by the feature vector algorithm is disclosed, the privacy of the digital collection is not revealed, so that the protection of the privacy is facilitated for the issuer.

At present, various feature vector algorithms exist for common media such as pictures, audios, videos, texts and the like of digital collection, and can be used for searching and similarity analysis; and the digital stock in this application takes pictures as examples. The feature vector extraction model of the digital collection aiming at the picture type has the traditional computer graphics pattern recognition algorithm, such as SIFT, ORB, perception HASH and other algorithms; deep learning algorithms are also widely used, such as those based on convolutional neural networks RESNET, VGG, YOLO. In the deep learning algorithm, the embodiment can slightly reform the model, and a corresponding feature vector extraction model can be obtained.

Here, the present embodiment provides a more specific feature vector extraction model case: namely, a VGG16 deep learning model of a convolutional neural network is adopted to extract the feature vector of the picture. The architecture of VGG not only achieves the best accuracy of ILSVRC classification and localization tasks, but is also applicable to other image recognition datasets, achieving excellent performance even as a pre-part only (feature extraction). As shown in fig. 2, in the deep learning algorithm model in the present embodiment, half-pooling is performed between blocks by a maximum pooling layer, where stride of the maximum pooling layer=2, pool size=2; inside the block, in order to keep the shape between the convolutions uniform, the kernel size (operator size) is 3×3.

The whole model has 5 vgg-block blocks and 5 max pooling layers connected one by one and then goes into the FC layer until the last 1000 softmax outputs. Because the VGG algorithm model is used as a classifier and the embodiment is used as a feature vector extractor, the embodiment gives up the last 3 FC layers and softmax layers, and performs a flat flattening operation on the obtained matrix after the algorithm calculation of 7×7×512 layers, and finally obtains a one-dimensional vector with a length of 25088.

The specific calculation process of the algorithm is as follows:

first, the picture is cut into an RGB image in 224 x 224 format, which is fed into the following algorithm model calculation as input.

Layer 1 is the input layer: the input is 224×224×3 three-channel image.

Layer 2 is the VGG block layer: the input was 224 x 3, through 64 filters of kernel size 3 x 3, stride=1, and padding=same convolution yields a block layer of shape 224×224×64 (refer to vgg-block composed of conv).

Layer 3 is the largest pooling layer: input was 224×224×64, and halving pooling with pool size=2 and stride=2 gave a pooled layer of size 112×112×64.

Layer 4 is the vgg block layer: the input size is 112×112×64, and is convolved with 128 filters of 3×3×64 to obtain 112×112×128 block layers.

Layer 5 is the largest pooling layer: input was 112×112×128, and halving pooling was performed with pool size=2 and stride=2 to obtain a pooled layer of size 56×56×128.

Layer 6 is the vgg block layer: the input size is 56×56×128, and 256 filters of 3×3×128 are convolved to obtain 56×56×256 block layers.

Layer 7 is the largest pooling layer: the input was 56×56×256, and halving the pool size=2 and the stride=2 to obtain a pool layer with a size of 28×28×256.

Layer 8 is the vgg block layer: the input size is 28×28×256, and is convolved with 512 filters of 3×3×256 to obtain 28×28×512 block layers.

Layer 9 is the largest pooling layer: the input was 28×28×512, and halving the pool size=2 and the stride=2 to obtain a pool layer with a size of 14×14×512.

Layer 10 is the vgg block layer: * The input size is 14×14×512, and is convolved with 512 filters of 3×3×512 to obtain a block layer of 14×14×512.

Layer 11 is the largest pooling layer: the input was 14×14×512, and halving the pool size=2 and the stride=2 to obtain a pool layer with a size of 7×7×512. The flat operation is performed to obtain 7×7×512= 25088 parameters by flattening, and a vector v, i.e., a final output result, is obtained.

After the negotiation and determination of a plurality of digital collection exchanges, the algorithm model is saved on the blockchain. The code and the model parameters are stored in a blockchain distributed storage, such as IPFS, so that the running code is guaranteed to be untampered and can be obtained publicly. Then, the model information is mapped in association with the download URI of the model through intelligent up-stand contracts.

Illustrating: the summary information of each feature vector extraction model stored in the intelligent on-shelf contract is as follows, taking the VGG16-ImageNET model as an example:

Model name: VGG16-ImageNET

The digital collection type is applicable: picture type

The digital collection format is applicable: gif, jpg, jpeg, bng, png

Model version: v1.0.0

Model operating environment: [ Linux, windows, MAC OS ]

Operation language: python 3.9.6

Model size: 1.1GB

URL download address: IPFS: xxxxxxxxxxxxx

Hash value: YYYYYYYYYYYYYYYYYYYYYYYYYYYY

In this embodiment, a plurality of feature vector extraction models included in the intelligent up-stand contract may be negotiated for a plurality of related exchanges, and may be uploaded and published publicly. Any person can download the calculation model from the network, and according to the code of the model, the same input image is executed under any environment, and the feature vector of the same image is output as output.

And step S20, respectively extracting feature vectors of the digital collection to be issued and at least two target comparison graphs according to at least two target feature vector extraction models selected from the intelligent up-stand contracts to obtain at least two target feature vectors and a plurality of comparison feature vectors.

In this embodiment, the intelligent on-shelf contract includes a model name of a feature vector extraction model, an applicable digital collection type, a model download address, a verification method, and a target verification value, and at least two target feature vector extraction models selected from the intelligent on-shelf contract include:

Screening out at least two model names matched with the type of the digital collection to be sent according to the type of the applicable digital collection in the intelligent up-stand contract; acquiring at least two corresponding model codes according to the model download addresses corresponding to the at least two model names; checking the at least two model codes according to the checking method corresponding to the at least two model names to obtain at least two sample checking values; and verifying the at least two sample verification values and target verification values corresponding to the at least two model names, and taking the at least two model codes as the at least two target feature vector extraction models if verification is successful.

It should be noted that, before the digital collection issuer issues the hidden digital collection, at least two required feature vector extraction models are selected from the intelligent on-shelf contracts, and the number of specifically selected models and the types of the models can be designed and selected according to the corresponding security degree. Generally, the higher the safety requirements, the greater the number of models selected. Because a plurality of algorithm models are selected, the rules and principles of extracting the characteristic values of the input image by different algorithms are quite different (for example, the models are based on image contours, the contrast is based on the contrast, and the convolutional neural network model is obtained by training based on deep learning … …), so that the difficulty of a plagiarism is greatly improved. For example, the modification of a certain image pixel point is quite sensitive to the algorithm A, but is quite insensitive to the algorithm B, so that the cost of attack by a plagiarism is objectively increased.

For example, the digital collection issuer P selects to issue own digital collection DC1, trades the security of the digital collection issuer P, and randomly selects three algorithm models VGG16-ImageNET, SIFT and RESNET, wherein the three algorithms are respectively based on the principles of convolutional neural network, extraction and matching of scale-invariant feature points and depth residual error network, and have heterogeneity so as to greatly improve the attack cost of a plagiar. And P obtains detailed information of the three algorithms through intelligent up-stand contracts, downloads codes of the corresponding three algorithm models, and performs verification based on information in the contracts. After the verification is successful, codes of three algorithm models are respectively operated, the images of DC1 are input, and the feature vectors of the three groups of images are respectively output: v1, v2, v3; the dimensions of the different feature vectors are different due to the difference in algorithm models, for example 25088 for the VGG algorithm, 128 for the sift algorithm and 2048 for the rest algorithm.

In this embodiment, the at least two target comparison graphs are comparison images corresponding to the digital collection to be sent, and the at least two target comparison graphs may be uploaded by themselves, or the corresponding digital collection pictures may be selected and downloaded in the given block list, where the number of comparison graphs depends on the security.

In one embodiment, the digital collection issuer may select different numbers of distinct style pictures to upload onto the blockchain as contrasting images. The information uploaded to the blockchain comprises the URL of the comparison graph and a hash value, and the hash value is used for helping anyone to check whether the corresponding hash values are the same after downloading the image so as to judge the correctness of the downloaded image.

In another embodiment, before extracting the feature vectors of the at least two object comparison graphs, the method further includes: acquiring at least two block chain addresses; taking each block chain address as an index starting position, and acquiring at least two corresponding target comparison graphs from the block chain according to the block transaction sequence and a preset index rule; the preset index rule comprises a blockchain address farthest from the index starting position or a blockchain address nearest to the index starting position.

In order to reduce the information load of intelligent on-chain up contracts, the digital collection issuer can also select the pictures of published digital collections issued on the blockchain as contrast pictures, for example, the nearest or farthest images in the appointed blocknumber can be searched, or the images can be searched randomly; the image file for comparison is obtained in the mode, so that the workload of downloading, caching, calculating and the like related to the comparison image in the digital collection anti-plagiarism detection process can be reduced, and the efficiency of the digital collection anti-plagiarism detection method is improved.

For example: the block heights designated by the digital stock issuer are 200000, 300000 and 400000 respectively, the number of designated comparison pictures is 3, then starting from the block height 200000, monitoring contracts of the digital stock on the chain which conform to the standards such as ERC721 and the like, searching the recently disclosed digital stock image according to the block transaction sequence Index to be used as a comparison picture b1, and starting from the block height 300000 to search the digital stock image with the nearest block transaction sequence to be used as a comparison picture b2, and starting from the block height 400000 to search the digital stock image with the nearest block transaction sequence to be used as a comparison picture b3.

Extracting feature vectors of the plurality of contrast graphs according to at least two target feature vector extraction models selected from the intelligent up-stand contract, so that a plurality of contrast feature vectors can be obtained; taking the above three algorithm models VGG16-ImageNET, SIFT and RESNET as examples, the feature vectors of b1, b2 and b3 are extracted, and the obtained comparative feature vectors are shown in table 1:

table 1, comparative feature vector list

	VGG16-ImageNET	SIFT	RESNET
				b1	b1_v1	b1_v2	b1_v3
b2	b2_v1	b2_v2	b2_v3
				b3	b3_v1	b3_v2	b3_v3

Wherein B1_v1 represents a contrast feature vector obtained after feature extraction of the contrast graph B1 by adopting a VGG16-ImageNET model, and the like, and b3_v3 represents a contrast feature vector obtained after feature extraction of the contrast graph B3 by adopting a RESNET model.

Step S30, respectively carrying out similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors to obtain a target similarity matrix comprising a plurality of similarity values;

in this embodiment, performing similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors respectively includes: according to the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector, calculating an inner product value of the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector, a first norm of the current target characteristic vector and a second norm of the contrast characteristic vector of the current contrast characteristic vector; obtaining a norm product of the first norm and the second norm; and taking the ratio of the inner product value and the norm product as the similarity value of the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector.

It should be noted that, the issuer runs the similarity algorithm, and performs similarity operation on each target feature vector and each contrast feature vector one by one to obtain a corresponding target similarity matrix; assume that the target similarity matrix calculated by the digital collection picture DC1 to be issued and the corresponding comparison graphs b1, b2, b3 is shown in table 2:

TABLE 2 object similarity matrix

	VGG16-ImageNET	SIFT	RESNET
				Similarity to b1	0.985	0.871	0.943
Similarity to b2	0.532	0.651	0.578
				Similarity to b3	0.311	0.287	0.296

And S40, carrying out anti-plagiarism detection on the uplink information of the digital collection to be sent and all the digital collections issued, and obtaining a detection result of the digital collection to be sent.

In this embodiment, the uplink information of the to-be-transmitted digital collection includes a first parameter information set of the at least two target comparison graphs, a second parameter information set of the at least two target feature vector extraction models, and the target similarity matrix.

As shown in fig. 3, performing anti-plagiarism detection on the uplink information of the digital collection to be sent and all the digital collections that have been sent, and obtaining a detection result of the digital collection to be sent specifically includes the following steps:

and S41, acquiring at least two target comparison graphs and the at least two target feature vector extraction models according to a first parameter information set and a second parameter information set in the uplink information of the digital collection to be transmitted.

It should be noted that, the first parameter information set includes an address of each target comparison graph, and the second parameter information set includes a model name, a model download address, a verification method and a target verification value of each target feature vector extraction model; the address of each target comparison graph can be a URL address or an address on a blockchain.

In the embodiment, when a transaction party receives an on-shelf request submitted by an issuer, the transaction party acquires at least two corresponding model codes according to a model download address in uplink information of a digital collection to be issued in the on-shelf request; according to the corresponding checking method of each model in the uplink information, checking the at least two model codes to obtain at least two sample check values; checking the at least two sample check values and the corresponding target check values in the uplink information, and taking the at least two model codes as the at least two target feature vector extraction models if the check is successful; and similarly, the transaction party acquires the corresponding comparison chart according to the address of the comparison chart in the uplink information, checks the acquired comparison chart, takes the acquired comparison chart as a target comparison chart if the check is successful, continues to acquire the comparison chart according to the address of the comparison chart if the check is failed, and feeds back a prompt of the check failure.

And step S42, respectively extracting the feature vectors of each issued digital stock and at least two target comparison graphs according to the at least two target feature vector extraction models to obtain at least two sample feature vectors corresponding to each issued digital stock and a plurality of comparison feature vectors corresponding to the at least two target comparison graphs.

It should be noted that, the issuer extracts the feature vector of each digital collection currently issued on the blockchain according to at least two different target feature vector extraction models, so as to obtain at least two sample feature vectors corresponding to each issued digital collection and at least two contrast feature vectors corresponding to each target contrast graph.

And step S43, respectively carrying out similarity calculation on at least two sample feature vectors corresponding to each issued digital collection and the plurality of comparison feature vectors to obtain a sample similarity matrix corresponding to each issued digital collection.

In this embodiment, the similarity calculation is performed on the plurality of sample feature vectors obtained in the previous step and the plurality of comparison feature vectors, that is, the similarity calculation is performed on each sample feature vector and each comparison feature vector, so as to obtain a sample similarity matrix corresponding to each issued digital collection. For example, assume that the sample similarity matrix calculated by the released digital collection picture DC2 and the corresponding comparison graphs b1, b2, b3 is as follows in table 3:

TABLE 3 sample similarity matrix

	VGG16-ImageNET	SIFT	RESNET
				Similarity to b1	0.980	0.873	0.949
Similarity to b2	0.532	0.650	0.570
				Similarity to b3	0.312	0.288	0.297

And S44, performing matrix similarity comparison on each sample similarity matrix and the target similarity matrix, and obtaining a plagiarism prevention detection result of the to-be-sent digital collection according to all matrix similarity comparison results.

In this embodiment, performing matrix similarity comparison between each sample similarity matrix and the target similarity matrix includes: extracting elements in each sample similarity matrix and each target similarity matrix one by adopting list derivation to obtain each corresponding sample vector and each corresponding target vector; and calculating the matrix similarity value of each sample similarity matrix and the target similarity matrix according to the inner product value of each sample vector and the target vector, the norm corresponding to each sample vector and the norm corresponding to the target vector.

It should be noted that, the matrix similarity comparison is to compare the similarity between two matrices, and tables 2 and 3 are respectively represented by the matrices as The elements in matrix1 and matrix2 are extracted one by using the list derivation formula, and the formed one-dimensional vectors are respectively: vector 1= [0.985,0.871,0.943,0.532,0.651,0.578,0.311,0.287,0.296 ]]，vector2＝[0.980，0.873，0.949，0.532，0.650，0.570，0.312，0.288，0.297]The similarity value of the two matrices can be obtained by dividing the product of the inner product value of the vector by the norm of the vector, and the specific calculation example is as follows:

(1) Inner product value = of vector1 and vector2 calculated from multiplication and re-addition of corresponding elements of the two vectors 0.985+0.980+0.871+0.873+0.943+0.949+0.532+0.532+0.651+0.650+0.578+0.570+0.311+0.312+0.287+0.288+0.296+0.297 = 3.9238240000000006. = 3.9238240000000006.

(2) Summing the squares of the elements of each vector and then calculating the root number

(3) According to the inner product value of the two vectors divided by the norm product of the two vectors

It should be noted that, the sample similarity matrix and the target similarity matrix represent the overall similarity between the image and the reference contrast image. For an intact plagiarism image, the similarity between the sample similarity matrix and the target similarity matrix is 1; in an actual application scene, however, errors such as small changes or feature extraction exist in the two collection pictures, a similarity threshold smaller than 1 is set, and if the similarity is larger than the threshold, the image is judged to be plagiarism; and judging that the digital collection image is not plagiarism by using the anti-regularization method.

For example, assuming that the similarity threshold is 0.99, 0.9999831624167155 is obviously greater than 0.99, it is determined that the digital collection DC1 to be issued plagiates the digital collection DC2 that has been issued.

In another embodiment, the similarity of the feature matrices is not compared, but instead the number of individual value differences is compared. For example, if the threshold difference between similarity values is defined empirically as 0.0001, then the two matrices are subtracted and if the absolute value of the difference is greater than the threshold, then the statistics are incremented by 1. The number threshold is again defined as N/2, N being the number in the matrix, e.g. the matrix of 3*3, and 9, then the number threshold is 4. That is, when the threshold difference of more than 4 out of the 9 similarities is 0.0001 or less, the picture is judged as plagiarism. This way of determination may also be applied in practice.

From this, it can be known that the detection results of the to-be-sent digital collection in this embodiment include the presence of plagiarism and the absence of plagiarism; if the detection result is that the plagiarism exists, the transaction party refutes the request submitted by the issuer to put on shelf, and the issuing of the digital collection is forbidden; if the exchange scans all the cached images of the digital collection, and the detection result is that no plagiarism exists, the exchange issues and shelves the digital collection at the exchange through a shelf request.

According to the embodiment, a group of vector similarity matrixes are obtained by a method of anchoring a plurality of pictures and outputting the similarity, so that the safety and the robustness of anti-plagiarism detection are additionally and greatly improved. In addition, in the embodiment, by introducing a plurality of contrast graphs and a plurality of characteristic vector extraction models, for a plagiarism, the difficulty of fine-tuning the copied pictures, the plurality of contrast graphs and the plurality of vector extraction models is far higher than that of a single algorithm and a single Zhang Duibi graph, so that the plagiarism cost of the plagiarism is greatly improved, and the security and the accuracy of plagiarism prevention detection are improved.

In addition, in the prior art, the byte amount of the image feature vector obtained by adopting the feature vector extraction model is very large, for example, the VGG model has a vector composed of 25088 dimensions, so that the storage size of the feature vector of a single picture is 8×25088 bytes, about 196KB, and the storage space occupation is very large. At present, image similarity detection is based on image vector contrast detection, namely, the image vector is required to be stored in a chain, so that the storage of a block chain is greatly increased, and a large amount of valuable resources on the chain are consumed; the invention controls the data needing to be stored in the uplink into a very small storage through the matrix of the introduced image contrast. Taking the target similarity matrix of table 2 as an example, only 9 double-type values need to be stored on the chain, each value occupies 8 bytes, the matrix of 3*3 only occupies 72 bytes and is about 0.07KB of memory space, the occupied memory is reduced by more than 20000 times compared with the existing scheme for storing image vectors, the byte quantity of the uplink information is greatly reduced, and therefore the memory space consumed on the blockchain is reduced.

The technical scheme provided by the embodiment at least comprises the following beneficial effects:

(1) The target similarity matrix obtained according to the feature vector of the digital collection to be issued and the feature vector of the comparison image is used as the basis of anti-plagiarism detection of the digital collection, then all sample similarity matrices obtained according to the feature vector of each issued digital collection and the feature vector of the comparison image are compared with each other one by one, and finally anti-plagiarism detection results of the digital collection to be issued can be obtained through one-to-one comparison of the target similarity matrix and all sample similarity matrices; therefore, the similarity matrix for anti-plagiarism detection is obtained by introducing a mode of combining a plurality of contrast pictures and a plurality of feature vector extraction models, so that the accuracy of anti-plagiarism detection is improved; the anti-plagiarism detection can be realized on the basis of only disclosing the similarity matrix of the digital collection to be sent and the comparison image, namely the accuracy of the anti-plagiarism detection can be improved on the premise that the content picture of the digital collection is not disclosed, and the anti-plagiarism detection method can be widely applied to the anti-plagiarism detection of the hidden digital collection.

Therefore, the digital collection detection method provided by the application can not only improve the accuracy of anti-plagiarism detection, but also reduce the consumption of storage space on a blockchain on the premise that the content picture of the digital collection is not disclosed.

Fig. 4 is a schematic flow chart of a first digital collection issuing method according to an embodiment of the present application; as shown in fig. 4, the method specifically includes the following steps:

step S100, acquiring an issuing request of a digital collection to be issued;

step S200, obtaining a detection result of the digital collection to be sent;

step S300, judging whether the search result has plagiarism or not, and if so, executing step S400; if not, executing step S500;

step S400, refusing the release request of the digital collection to be released;

and S500, issuing the digital collection.

In one embodiment of the present application, obtaining a detection result of a digital collection to be issued includes:

acquiring a hidden mark of the digital collection to be sent according to the request for sending the digital collection to be sent;

if the hidden mark is a hidden digital collection, acquiring a detection result of the digital collection to be issued according to the digital collection detection method described in the above embodiment;

And if the hidden mark is the public digital stock, acquiring a detection result of the digital stock to be issued according to an image similarity comparison method.

Fig. 5 is a schematic flow chart of a second digital collection issuing method according to an embodiment of the present application; as shown in fig. 5, the issuing method specifically includes the following steps:

step S1, initializing an algorithm model;

it should be noted that, each participant composed of a plurality of digital collection issuers and a plurality of exchanges negotiates with an algorithm for determining the feature calculation of the digital collection, and the feature calculation vector may convert a digital collection file into a specific feature vector.

S2, selecting an algorithm model and calculating a feature vector;

before the digital collection issuer issues the digital collection, a desired model is selected, and is designed and selected according to the corresponding security. Generally, the higher the safety requirements, the more models are selected. Because a plurality of algorithm models are selected, the rules and principles of extracting the characteristic values of the input image by different algorithms are quite different (for example, … … that some algorithms are based on image contours, some algorithms are based on contrast and some algorithms are based on convolutional neural network model training of deep learning), so that the difficulty of an attacker is greatly improved. For example, the modification of a certain image pixel point is quite sensitive to the algorithm A, but is quite insensitive to the algorithm B, so that the cost of attack by an attacker is objectively increased.

S3, selecting a comparison graph and calculating a feature vector;

it should be noted that, the digital collection issuer selects the corresponding contrast image set and calculates the corresponding image feature vector; the comparison picture can be uploaded by itself or can be selected from a given block list, and the corresponding digital storage picture is downloaded, and the number of the pictures depends on the security. The digital collection issuer can select different numbers of pictures with distinct styles, and upload the pictures to the blockchain as compared images. The information uploaded to the blockchain comprises the URL of the comparison graph and a hash value, and the hash value is used for helping anyone to check whether the corresponding hash values are the same after downloading the image so as to judge the correctness of the downloaded image.

Also, to reduce the load of information of the on-chain smart contracts. The digital collection issuer may also select other digital collections that have been issued and are linked up the chain as a comparison graph. For example, the nearest or farthest image can be found here, or the image can be found randomly. By the method, compared image files are obtained, the workload of downloading, caching, calculating and the like required by the digital collection plagiarism detection algorithm can be reduced, and the overall performance of the algorithm is improved.

S4, calculating a similarity matrix of the comparison image;

It should be noted that, the issuer calculates the similarity with the blind box of the hidden digital collection under the given algorithm and the comparison chart according to the comparison chart list and the image feature similarity algorithm in the step S3, and provides a matrix of image similarities. And checking the newly released digital collection, and calculating the similarity value according to the feature vector and the indexed feature vector of the released digital collection. According to the feature vector extraction algorithm published above, there is a corresponding method for calculating the similarity. Common feature vector calculation methods include: euclidean distance, manhattan distance, cosine distance, and the like.

S5, casting digital collection and linking information;

the issuer issues the digital collection DC1, and writes and discloses the information of the steps S2, S3, and S4 into the blockchain smart contract.

Assembling detailed information of the digital collection when the issuer formally casts (mint) the digital collection, transmitting the assembled information to a corresponding blockchain consensus node, and calling a blockchain intelligent contract to call a transaction; the block chain consensus node is a node participating in block chain network consensus and is responsible for the work of block chain transaction data receiving, packaging, synchronizing, calculating, verifying and the like.

The issuer deploys a standard digital collection contract (e.g., a smart contract that complies with the ERC721 standard), or invokes the mint function in a digital collection smart contract of a deployed standard to issue the associated digital collection.

The issuer writes the intelligent contract address of the related algorithm model determined in step S2, the name or index number of the algorithm, the information of the related comparison graph determined in step S3, and the similarity matrix obtained by calculation in step S4 as uplink information into the storage of the standard digital collection contract (for example ERC 721).

Taking the above release casting of digital collection DC1 as an example, the written mint information includes:

(1) Concealing the mark: whether the digital collection is a concealed digital collection blind box is identified, and if the digital collection is a marker of the blind box, then the non-NFT owner cannot obtain a picture of the digital collection.

(2) Recipient address (Recipient Address): the mint function typically needs to specify the recipient address of the newly created NFT, i.e., which account the NFT is assigned to. Typically the address of the issuer.

(3) Token ID (Token ID): the mint function may need to specify a unique identifier of the newly created NFT, i.e., a token ID. Each NFT has a unique ID in the contract that distinguishes between different NFTs.

(4) Token Metadata (Token Metadata): the mint function may need to specify metadata information about the newly created NFT, such as a name, description, image URL, etc. These metadata may provide more information about the NFT and visual content. Because the collection is a hidden digital collection blind box, the URL of the image is also hidden and not disclosed, and only the owner of the digital collection has authority to open the image link and download the original image.

(5) Ownership and authorization (Ownership and Authorization): the mint function typically needs to ensure that only the owner or authorized address of the contract can perform the mint operation to ensure the rights and security to create a new NFT.

(6) HASH information of digital stock picture DC 1.

(7) Smart contract address of the public algorithm model: address of Beacon contract.

(8) Select alg_list of the public algorithm model: VGG16-ImageNET, SIFT, RESNET.

(9) Height list of the nearest block of the selected contrast map: 200000, 300000, 400000.

(10) Similarity matrix M1 after image comparison represented by two-dimensional array:

s6, monitoring a blockchain and caching digital collection;

it should be noted that, the exchange monitors the latest block through the blockchain network, discovers the newly issued digital stock, and judges whether the digital stock is a hidden digital stock according to the hidden mark described in step S5; if the digital stock is a public digital stock, downloading the corresponding picture (non-hidden digital stock blind box), and downloading and caching the public digital stock can be used as a comparison picture of the subsequent hidden digital stock, so that the speed of algorithm detection is improved, and repeated steps of downloading, vector calculation and the like are reduced. If the digital collection is a hidden blind box, a similarity matrix (two-dimensional array), an algorithm list and a similarity graph list disclosed by the digital collection are obtained.

After all the information is acquired by the exchange, the information is cached in a storage system of the exchange, so that the inquiry is conveniently carried out in the step of a subsequent operation anti-plagiarism algorithm; in the exchange, all digital collection information in the whole blockchain network can be cached. The specific process for acquiring the blind box of the hidden digital collection comprises the following steps: for example, the exchange monitors the latest block height, and obtains all uplink information of the DC1 collection, especially:

list of the public algorithm model: VGG16-ImageNET, SIFT, RESNET; obtaining a list (b 1, b2, b 3) of contrast maps from the list of heights (200000, 300000, 400000) of the nearest blocks; the similarity matrix M1 after image comparison of the two-dimensional array representation is obtained:

s7, processing a digital stock shelving application, and comparing the digital stock shelving application with the cache similarity one by one;

it should be noted that, the exchange receives a new data deposit and puts on shelf application, starts to run the anti-plagiarism judgment algorithm, and compares all the data deposits in the cache one by one. Assuming that the on-shelf collection DC2, according to the algorithm list alg_list of the privacy collection and the comparison image set photo_list, respectively calculating a similarity matrix M2 result obtained after similarity calculation of a given algorithm list and a given image of the on-shelf collection is as follows:

Comparing M1 with M2 by matrix similarity, wherein the matrix similarity refers to the similarity between two matrices, and the calculation method of the embodiment comprises pearson correlation coefficients; wherein, the pearson correlation coefficient is evaluated by calculating the covariance and standard deviation of two matrices, and the formula is: pearson correlation coefficient = cov (a, B)/(std (a) ×std (B)), where cov (a, B) represents the covariance of a and B, and std (a) and std (B) represent the standard deviation of a and B.

It should be noted that, the cache similarity in this step includes a similarity matrix of the public collection and a similarity matrix of the hidden collection, and for the historical hidden digital collection, when the picture information of the historical hidden digital collection is not disclosed in the exchange, the exchange may send a similarity matrix acquisition request to the issuer of the historical hidden digital collection, so that the issuer of the historical hidden digital collection obtains the similarity matrix of the historical hidden digital collection according to the model information and the comparison chart list in the similarity matrix acquisition request.

S8, judging whether plagiarism exists or not; if yes, executing the step S9, and if not, executing the step S10;

it should be noted that, the above-mentioned M1 and M2 matrices represent the overall similarity between the image and the reference contrast image; for plagiarism images, the similarity between their matrices and the matrix of the original image can be very high; for an intact plagiarism image, the similarity between the two is 1. And comparing the similarity between the two matrixes, setting a threshold according to the experience value, and judging the image as plagiarism if the similarity is larger than the threshold. Assuming that the set similarity is 0.99, 0.9999831624167155 is obviously larger than 0.99, and judging that the digital collection DC1 plagiarizes the digital collection DC2; assuming that the set similarity value is 0.999999, 0.9999831624167155 is obviously smaller than 0.99, and judging that the digital collection DC1 is not plagiated to the digital collection DC2.

S9, feeding back a plagiarism result, and stopping putting on the shelf;

if the plagiarism is judged, returning a result of failed audit, and prohibiting the digital collection from being put on shelf.

Step S10, judging whether the stock is scanned; if yes, executing the step S11, and if not, continuing to execute the step S7;

and S11, checking and passing, and putting the digital collection on shelf.

Note that, the exchange scans privacy collections in a whole library, if no plagiarism is involved. And judging that the collection is put on the exchange. The above steps may be employed for a hidden digital collection blind box cached by the exchange. For the public digital collection cached in the exchange, an image similarity algorithm can be directly used for calculating the similarity between the digital collection to be put on shelf and the public digital collection, and the judgment is carried out according to the set threshold value. If the exchange scans all the cached images of the digital collection and judges that the plagiarism is not involved, the exchange issues and shelves the digital collection at the exchange through a shelf request.

In this embodiment, the method further includes: and step S12, after any historical hidden digital collection is disclosed, updating the address of the collection in the meta-information of the intelligent contract.

The exchange can download the collection and check the correctness of the comparison similarity matrix; if the verification fails, then there is fraudulent activity with the issuer and the exchange may take associated punishment measures with the digital collection issuer.

Fig. 6 is a schematic flow chart of an image similarity comparison method according to an embodiment of the present application; as shown in fig. 6, the comparison method specifically includes the following steps:

step S201, a first color histogram vector, a first local feature vector, a first color moment vector, a second color histogram vector, a second local feature vector and a second color moment vector which correspond to the two contrast images respectively are obtained;

the color histogram represents the proportion of different colors in the whole image, the color histogram algorithm is simplified, only the histogram of the H (Hue) component is extracted, and the color histogram vector v is generated ₁ The steps of (a) are as follows:

(1) Adjusting the size of an image, and normalizing an H component histogram obtained under an HSV (Hue-Saturation-Value) space;

(2) The value range of the H component in OpenCV is [0, 180], and the H component is divided into 60 areas, wherein each area contains 3 degree levels;

(3) Summing the frequency superposition occurring within 3 degree levels of each region, extracting 60-dimensional vector v representing image characteristics of frequency composition of pixels in 60 regions ₁ 。

Further, a 128-dimensional local feature vector v representing the local feature of the image is generated ₂ The steps of (a) are as follows:

(1) Determining the image area radius r needing to be calculated:

where σ is the scale of the group where the keypoint is located, d=4.

(2) The coordinate axes are rotated to the principal direction of the keypoints.

(3) Dividing the key point into 16 multiplied by 16 subregions by taking the key point as the center, calculating the scale information of 8 directions according to a 4 multiplied by 4 window in the key point scale space, and generating a 4 multiplied by 8=128-dimensional feature vector v ₂ 。

Further, the color moment is a global feature representing the color information of the image, and since the color feature is mainly concentrated in the low-order moment, the color distribution of the image can be effectively represented by selecting the first-order moment, the second-order moment and the third-order moment.

The formulas of the first, second and third color moments of the image are expressed as follows:

wherein P is _ij Is the pixel value of the j pixel point of the ith color channel component of the three-channel image, and N is the number of pixels.

The 3 component brightness Y, blue projection U red projection V and 3 low order color moments of the image are combined two by two to form a 9-dimensional color moment vector V ₃ ：

v ₃ ＝[μY，σY，sY，μU，σU，sU，μV，σV，sV]

Step S202, carrying out vector fusion on the first color histogram vector, the first local feature vector and the first color moment vector to obtain a first fusion feature vector; vector fusion is carried out on the second color histogram vector, the second local feature vector and the second color moment vector to obtain a second fusion feature vector;

In this embodiment, a vector stitching method is adopted to perform feature fusion on each local feature vector, a color histogram vector and a color moment vector, so as to generate a new n×197-dimensional feature vector v= [ v ] ₁ ，v ₂ ，v ₃ ]。

Step 203, performing data dimension reduction on the first fusion feature vector and the second fusion feature vector respectively to obtain a first dimension reduction feature vector and a second dimension reduction feature vector;

in order to reduce the calculation amount, the correlation of the original vector is removed, and the dimension of the fused vector v is reduced by using principal component analysis, so as to obtain a 96-dimension feature vector. Using principal component analysis, a vector x with n d dimensions _i The method comprises the following specific steps of:

(1) Calculating a mean value vector beta corresponding to the sample set, and subtracting the mean value vector from variables in the sample set to realize data decentralization:

(2) Calculating covariance matrix of sample set

(3) And solving eigenvalues and eigenvectors of the variance matrix.

(4) And sorting the characteristic values, reserving K characteristic values and corresponding characteristic vectors thereof in descending order, and forming the characteristic values into a projection matrix P according to rows.

(5) The vector X is converted into a low-dimensional space composed of K eigenvectors to obtain a vector Y after dimension reduction, wherein y=p·x.

And step S204, performing feature point matching according to the Euclidean distance between the first dimension-reduction feature vector and the second dimension-reduction feature vector to obtain a similarity detection result of the two comparison images.

It should be noted that, the euclidean distance of the feature vector after the two groups of images are reduced in size is calculated to perform feature point matching. The similarity of the images is the proportion of the number of the successfully matched characteristic points to the total number of the characteristic points of the images.

In addition, when calculating the image similarity, the color histogram does not contain color space position distribution information, and the image similarity is determined as long as the colors are similar. In the prior art, based on gray level operation, misjudgment is easy to occur on images with similar depths and more flat areas. The image similarity comparison method provided by the embodiment adopts fine feature weighting, integrates color moment information capable of representing global features of the image, generates new feature information, adds principal component analysis to reduce correlation among the feature information, and generates new feature vector to calculate image similarity. Compared with the traditional color histogram algorithm, the embodiment not only maintains the advantage of high execution efficiency of the color histogram algorithm, but also reduces misjudgment of the color histogram algorithm on images with color distribution close to that of the dark images and images with more flat areas, and improves accuracy of image similarity comparison.

Fig. 7 is a schematic flow chart of a digital collection detection device according to an embodiment of the present application; as shown in fig. 7, the apparatus includes:

the on-shelf contract construction module 710 is configured to construct an intelligent on-shelf contract of the digital collection according to a plurality of feature vector extraction models for acquiring feature vectors of the digital collection;

the feature vector extraction module 720 is configured to extract feature vectors of the digital collection to be issued and the at least two target comparison graphs according to at least two target feature vector extraction models selected from the intelligent on-shelf contracts, so as to obtain at least two target feature vectors and a plurality of comparison feature vectors; wherein the at least two target comparison images are comparison images corresponding to the digital collection to be sent;

a similarity calculation module 730, configured to perform similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors, respectively, to obtain a target similarity matrix including a plurality of similarity values;

the anti-plagiarism detection module 740 is configured to perform anti-plagiarism detection on the uplink information of the digital collection to be sent and all issued digital collections, so as to obtain a detection result of the digital collection to be sent; the uplink information of the to-be-transmitted digital collection comprises a first parameter information set of the at least two target comparison graphs, a second parameter information set of the at least two target feature vector extraction models and the target similarity matrix.

In one embodiment, the similarity calculation module 730 includes:

the computing unit is used for computing an inner product value of the current target feature vector and the current contrast feature vector, a first norm of the current target feature vector and a second norm of the current contrast feature vector according to the contrast feature vector of the current target feature vector and the current contrast feature vector;

an obtaining unit, configured to obtain a norm product of the first norm and the second norm;

and the ratio unit is used for taking the ratio of the inner product value and the norm product as the similarity value of the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector.

In one embodiment, the plagiarism prevention detection module 740 includes:

the model acquisition unit is used for acquiring at least two target contrast graphs and the at least two target feature vector extraction models according to a first parameter information set and a second parameter information set in the uplink information of the digital collection to be transmitted;

the feature extraction unit is used for extracting feature vectors of each issued digital stock and at least two target comparison graphs according to the at least two target feature vector extraction models to obtain at least two sample feature vectors corresponding to each issued digital stock and a plurality of comparison feature vectors corresponding to the at least two target comparison graphs;

The similarity calculation unit is used for respectively carrying out similarity calculation on at least two sample feature vectors corresponding to each issued digital collection and the plurality of comparison feature vectors to obtain a sample similarity matrix corresponding to each issued digital collection;

and the matrix comparison unit is used for comparing the matrix similarity of each sample similarity matrix with the target similarity matrix, and obtaining the anti-plagiarism detection result of the to-be-transmitted digital collection according to all matrix similarity comparison results.

In an embodiment, the matrix comparing unit is specifically configured to extract elements in each sample similarity matrix and the target similarity matrix one by adopting list derivation, so as to obtain each corresponding sample vector and target vector; and the method is also used for calculating the matrix similarity value of each sample similarity matrix and the target similarity matrix according to the inner product value of each sample vector and the target vector, the norm corresponding to each sample vector and the norm corresponding to the target vector.

In an embodiment, the feature vector extraction module 720 is specifically configured to screen out at least two model names that match the type of the digital collection to be sent according to the applicable digital collection type in the intelligent on-shelf contract; the method is also used for obtaining at least two corresponding model codes according to the model downloading addresses corresponding to the at least two model names; the method is also used for verifying the at least two model codes according to the corresponding verification methods of the at least two model names to obtain at least two sample verification values; and the method is also used for verifying the at least two sample verification values and target verification values corresponding to the at least two model names, and if the verification is successful, the at least two model codes are used as the at least two target feature vector extraction models.

In an embodiment, the device further comprises: the comparison graph acquisition model is used for acquiring at least two blockchain addresses; the method is also used for respectively taking each block chain address as an index starting position, and acquiring at least two corresponding target comparison graphs from the block chain according to the block transaction sequence and a preset index rule; the preset index rule comprises a blockchain address farthest from the index starting position or a blockchain address nearest to the index starting position.

It should be noted that, the computer system 1000 of the electronic device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 8, the computer system 1000 includes a central processing unit (Central Processing Unit, CPU) 1001 that can perform various appropriate actions and processes, such as performing the method described in the above embodiment, according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a random access Memory (Random Access Memory, RAM) 1003. In the RAM 1003, various programs and data required for system operation are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An Input/Output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When executed by a Central Processing Unit (CPU) 1001, the computer program performs various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The foregoing is merely a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art may make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A digital collection detection method, the method comprising:

according to a plurality of feature vector extraction models for acquiring feature vectors of the digital collection, constructing an intelligent up-stand contract of the digital collection;

according to at least two target feature vector extraction models selected from the intelligent up-stand contracts, feature vector extraction is respectively carried out on the digital collection to be issued and at least two target contrast graphs to obtain at least two target feature vectors and a plurality of contrast feature vectors; wherein the at least two target comparison images are comparison images corresponding to the digital collection to be sent;

respectively carrying out similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors to obtain a target similarity matrix comprising a plurality of similarity values;

performing anti-plagiarism detection on the uplink information of the digital collection to be sent and all the digital collections issued, and obtaining a detection result of the digital collection to be sent; the uplink information of the to-be-transmitted digital collection comprises a first parameter information set of the at least two target comparison graphs, a second parameter information set of the at least two target feature vector extraction models and the target similarity matrix.

2. The method of claim 1, wherein performing similarity calculations on the at least two target feature vectors and the plurality of contrast feature vectors, respectively, comprises:

according to the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector, calculating an inner product value of the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector, a first norm of the current target characteristic vector and a second norm of the contrast characteristic vector of the current contrast characteristic vector;

obtaining a norm product of the first norm and the second norm;

and taking the ratio of the inner product value and the norm product as the similarity value of the contrast characteristic vector of the current target characteristic vector and the current contrast characteristic vector.

3. The method of claim 1, wherein the step of performing anti-plagiarism detection on the uplink information of the digital collection to be transmitted and all the digital collections issued to obtain a detection result of the digital collection to be transmitted comprises:

acquiring at least two target contrast graphs and at least two target feature vector extraction models according to a first parameter information set and a second parameter information set in the uplink information of the digital collection to be sent;

According to the at least two target feature vector extraction models, feature vector extraction is carried out on each issued digital collection and at least two target comparison graphs respectively, so that at least two sample feature vectors corresponding to each issued digital collection and a plurality of comparison feature vectors corresponding to the at least two target comparison graphs are obtained;

respectively carrying out similarity calculation on at least two sample feature vectors corresponding to each issued digital collection and the plurality of comparison feature vectors to obtain a sample similarity matrix corresponding to each issued digital collection;

and comparing the similarity matrix of each sample with the target similarity matrix, and obtaining the anti-plagiarism detection result of the to-be-sent digital collection according to the comparison result of the similarity of all the matrices.

4. A method according to claim 3, wherein matrix similarity comparing each sample similarity matrix with the target similarity matrix comprises:

extracting elements in each sample similarity matrix and each target similarity matrix one by adopting list derivation to obtain each corresponding sample vector and each corresponding target vector;

and calculating the matrix similarity value of each sample similarity matrix and the target similarity matrix according to the inner product value of each sample vector and the target vector, the norm corresponding to each sample vector and the norm corresponding to the target vector.

5. The method of claim 1, wherein the intelligent on-shelf contract includes a model name of a feature vector extraction model, an applicable digital collection type, a model download address, a verification method, a target verification value, at least two target feature vector extraction models selected from the intelligent on-shelf contracts, comprising:

screening out at least two model names matched with the type of the digital collection to be sent according to the type of the applicable digital collection in the intelligent up-stand contract;

acquiring at least two corresponding model codes according to the model download addresses corresponding to the at least two model names;

checking the at least two model codes according to the checking method corresponding to the at least two model names to obtain at least two sample checking values;

and verifying the at least two sample verification values and target verification values corresponding to the at least two model names, and taking the at least two model codes as the at least two target feature vector extraction models if verification is successful.

6. The method according to any one of claims 1-5, wherein prior to feature vector extraction of at least two object contrast graphs, the method further comprises:

Acquiring at least two block chain addresses;

taking each block chain address as an index starting position, and acquiring at least two corresponding target comparison graphs from the block chain according to the block transaction sequence and a preset index rule; the preset index rule comprises a blockchain address farthest from the index starting position or a blockchain address nearest to the index starting position.

7. A digital collection distribution method, the method comprising:

acquiring an issuing request of a digital collection to be issued;

the digital collection detection method according to any one of claims 1 to 6, obtaining a detection result of a digital collection to be issued;

if the detection result is that the plagiarism exists, refusing the release request of the digital collection to be released;

and if the detection result is that no plagiarism exists, issuing the digital collection.

8. The method of claim 7, wherein obtaining the detection result of the digital collection to be issued comprises:

if the hidden mark is a hidden digital collection, acquiring a detection result of the digital collection to be issued according to the digital collection detection method of any one of claims 1-6;

9. The method of claim 8, wherein the image similarity comparison method comprises:

acquiring a first color histogram vector, a first local feature vector, a first color moment vector, a second color histogram vector, a second local feature vector and a second color moment vector which correspond to the two contrast images respectively;

vector fusion is carried out on the first color histogram vector, the first local feature vector and the first color moment vector to obtain a first fusion feature vector; vector fusion is carried out on the second color histogram vector, the second local feature vector and the second color moment vector to obtain a second fusion feature vector;

respectively carrying out data dimension reduction on the first fusion feature vector and the second fusion feature vector to obtain a first dimension reduction feature vector and a second dimension reduction feature vector;

and performing feature point matching according to the Euclidean distance between the first dimension-reduction feature vector and the second dimension-reduction feature vector to obtain a similarity detection result of the two comparison images.

10. A digital collection testing device, the device comprising:

The on-shelf contract construction module is used for constructing intelligent on-shelf contracts of the digital collection according to a plurality of feature vector extraction models for acquiring feature vectors of the digital collection;

the feature vector extraction module is used for extracting feature vectors of the digital collection to be issued and the at least two target comparison graphs respectively according to at least two target feature vector extraction models selected from the intelligent up-frame contracts to obtain at least two target feature vectors and a plurality of comparison feature vectors; wherein the at least two target comparison images are comparison images corresponding to the digital collection to be sent;

the similarity calculation module is used for respectively carrying out similarity calculation on the at least two target feature vectors and the plurality of contrast feature vectors to obtain a target similarity matrix comprising a plurality of similarity values;

the anti-plagiarism detection module is used for carrying out anti-plagiarism detection on the uplink information of the digital collection to be sent and all the issued digital collections to obtain a detection result of the digital collection to be sent; the uplink information of the to-be-transmitted digital collection comprises a first parameter information set of the at least two target comparison graphs, a second parameter information set of the at least two target feature vector extraction models and the target similarity matrix.