CN115905871B - Matrix similarity-based network transmission file information rapid judging method and system - Google Patents

Matrix similarity-based network transmission file information rapid judging method and system Download PDF

Info

Publication number
CN115905871B
CN115905871B CN202211596171.6A CN202211596171A CN115905871B CN 115905871 B CN115905871 B CN 115905871B CN 202211596171 A CN202211596171 A CN 202211596171A CN 115905871 B CN115905871 B CN 115905871B
Authority
CN
China
Prior art keywords
matrix
data
similarity
row
hamming weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211596171.6A
Other languages
Chinese (zh)
Other versions
CN115905871A (en
Inventor
张宏
梁元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202211596171.6A priority Critical patent/CN115905871B/en
Publication of CN115905871A publication Critical patent/CN115905871A/en
Application granted granted Critical
Publication of CN115905871B publication Critical patent/CN115905871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for rapidly judging network transmission file information based on matrix similarity, which is mainly used for mapping a received data packet into a matrix with column offset due to time sequence difference in the process of network transmission of the data packet, and rapidly negating a non-similarity matrix by a hamming weight detection method to realize precise judgment of matrix similarity. Firstly, carrying out similarity coarse screening on the matrixes, then carrying out accurate comparison on the matrixes reserved by the coarse screening, and finally, judging the similarity of the matrixes. The invention further comprises a system for rapidly judging the network transmission file information based on the matrix similarity. The method is mainly applied to the related fields of information theory, encryption communication, coding theory, cryptography and the like, and is popularized to the discrimination of the matrix similarity of the integer domain.

Description

Matrix similarity-based network transmission file information rapid judging method and system
Technical Field
The invention relates to a method and a system for quickly judging network transmission file information.
Background
With the advent of the big data cloud computing era, especially the wide use of mobile terminals, data transmission through networks is becoming more popular, and in a network data protocol layer, processing requirements on data packets are fast and efficient, so that real-time business requirements are met. When a file is sent through a network, if the file is large, firstly generating summary information, generating a data packet according to the summary information, simultaneously generating the summary information by the original file and the summary information, and sending the data packet to a receiver, generating the summary information according to the received original file, generating the summary information into a data packet, abstracting the data packet into a matrix, generating the matrix by the received summary information, and judging the corresponding relation between the summary and the original file by comparing the similarity of the matrix, wherein the specific application scene is shown in figure 1.
When a document is divided into data packets with fixed length and transmitted through a network, the received data packets have disordered time sequences due to the interference of time sequences, the quick judgment of the data packets is one of application scenes to be solved by the method, a solving model can be abstracted into similarity judgment of a binary domain column transformation matrix, and the two matrices cannot be directly compared with each other by data lines due to column dislocation.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a system for quickly judging network transmission file information based on matrix similarity.
Binary domain matrix operations are widely used in information theory, encrypted communication, coding theory, cryptography, and other related fields. The method is mainly used for judging the similarity of two column transformation matrixes which cannot be directly compared, and can be applied to integer fields in an expanding mode.
According to the attribute of constant hamming weight in the row vectors of the matrix data, a hamming weight comparison algorithm is used for calculating hamming weight in the row vectors, and when the matrix similarity is quickly negated when the matrix is unequal, so that the judgment of the similarity of two column transformation matrices is realized; meanwhile, a hamming weight table is constructed, the calculation step of calculating hamming weight by comparison each time is omitted, and the effect of space time exchange and multiple times of use by calculation is realized by a space time compromise method. The method can be popularized to the discrimination of the integer domain matrix similarity, and has a general application popularization meaning for the discrimination of the transformation subject.
The invention discloses a network transmission file information rapid judging method based on matrix similarity, which utilizes the characteristic that the hamming weights of column transformation matrix row vectors are equal to realize coarse screening of the matrix similarity, realizes accurate comparison of the matrix through judging the transposed matrix hamming distance, finally achieves rapid judgment of the matrix similarity, and the overall implementation step diagram is shown in fig. 2, and the processing flow diagram is shown in fig. 3.
The time complexity of the traditional matrix similarity discrimination is thatThe invention can respectively perform data preprocessing, data coarse screening, data accurate matching and the like, and can save timeThe inter-complexity drops to 0 (n).
(S1) constructing a Hamming weight scale by utilizing the characteristic of constant weight of the matrix row vector Hamming.
(S1.1) mapping the data packets into a data matrix.
Given k=1, 2, 3, … S, a total of S data (b) matrices, each matrix M rows and N columns of bits:
given 1=1, 2, 3, … T, for a total of T data (c) matrices, each matrix M rows and N columns of bits:
(S1.2) a matrix Hamming weight table calculating method is designed: the number of the data (b) matrix sets is recorded as S, the dimension M is recorded as N, and the number of the data (c) matrix sets which are subjected to similarity comparison with the data (b) matrix sets is recorded as T, and the dimension M is recorded as N. Generating a hamming weight table according to a hamming weight algorithm by each row vector of the matrix in the matrix set of the data (B), and setting row vector elements of the matrix in the row data (B) as (b= (B) 1 ,b 2 ,b 3 ,......b n ) The hamming weight of the row vector is calculated according to the following formula (1):
the calculation method is shown as (2)
(S1.3) constructing a matrix Hamming weight scale: the data (b) has S matrixes in the matrix set, each matrix has M row vectors, S tables are required to be generated, and the space of the tables of S.times.M is occupied. And (3) extracting row vectors of the matrix from the matrix set of the data (b), calculating the hamming weight of the row, storing the hamming weight into a corresponding hamming weight table array, and respectively selecting character types, short integer types or integer types by the weight table array according to the number of the matrix arrays, so that the storage space is reduced to the maximum extent. The detailed construction method is shown in fig. 4, and the specific table format is as follows:
(S2) coarse screening discrimination of matrix similarity.
(S2.1) the matrix row vectors are used for realizing the coarse screening discrimination of the similar matrix by searching the Hamming weight table, the data (C) matrix is taken out according to the row vectors, and the Hamming weight HW (C) is calculated i ) Find the corresponding hamming weight table HWT, if HW (C i )=HW(B i ) Then the next row vector is fetched and the hamming weight HW (C j ) And continuing to search the corresponding hamming weight table HWT until all rows meet the equal requirement, namely:the coarse screening is ended, in the course of which there is a row of hamming weights of unequal HW (C j )≠HW(B j ) I.e. +.>The comparison is not continued, the matrix is directly negated, and the next data (c) matrix is taken out for comparison.
(S3) accurate discrimination of matrix similarity
(S3.1) constructing a data matrix transpose matrix: and (3) no negative comparison matrix is arranged in the coarse screen, a matrix transposition and hamming distance comparison method is continuously adopted, so that accurate comparison of matrix similarity is realized, the data (b) matrix and the data (c) matrix are transposed, the detailed transformation process is shown in fig. 5, and the specific transformation process is as follows:
the data (b) is matrix transformed as follows:
the data (c) is matrix transformed as follows:
(S3.2) a design data matrix accurate comparison calculation method: traversing the row vectors in the transposed matrix of the data (c) in the row vectors of the transposed matrix of the data (b), taking out the first row vector of the transposed matrix of the data (c), and carrying out Hamming distance comparison on the first row vector of the transposed matrix of the data (b), and meeting the condition of the formula (3)
Namely, the Hamming distance for comparison is 0, and the calculation method is shown as an algorithm formula (4):
(S3.3) a data matrix accurate comparison method: and (3) sequentially and completely comparing N row vectors of the transposed matrix of the data (c), setting the successfully compared row vectors as successfully compared marks, recording the successfully compared row vectors in a matching mark table, and skipping the comparison of the row when the next row is compared, wherein a detailed schematic diagram is shown in fig. 6. If any row vector is not successfully matched in the process, the matrix is not successfully matched with the similarity matrix, and when all row vectors in the transposed matrix of the data (c) find corresponding row vector matching in the transposed matrix of the data (b), the accurate comparison is successful. The mark of successful comparison is that the two matrix data satisfy:
(1) for any one of the N columns of matrix data of data (c)There must be a column +.>Equal to it;
(2) any one of the N columns of matrix data of data (b)There must be a column +.>Equal to it.
A matrix with similarity features is found and retained.
(S4) complete restoration of the file information.
(S4.1) extracting the index header file according to the matrix similarity information. And (3) comparing the matrix of the data (c) with the matrix of the data (b) with the matrix of the data (c) with the similarity, extracting index information, and generating a file index header file with the same sequence as the data (b).
(S4.2) generating complete file information according to the index header file. And taking out the content files corresponding to each index header file, merging to generate a source document, and finishing the restoration of file information.
The invention also comprises a network transmission file information rapid judging system based on matrix similarity, which comprises:
the Hamming weight table construction module is used for constructing a Hamming weight table by utilizing the constant characteristic of matrix row vector Hamming weight;
the matrix similarity coarse screening judging module is used for coarse screening to judge the matrix similarity;
the matrix similarity accurate judging module is used for accurately judging the matrix similarity;
and the file information complete restoration module is used for completely restoring the file information.
The invention also comprises a network transmission file information rapid judging device based on the matrix similarity, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the network transmission file information rapid judging method based on the matrix similarity when executing the executable codes.
The invention also includes a computer readable storage medium having a program stored thereon that, when executed by a processor, implements a matrix similarity-based method for quickly determining information of a network transmission file according to the invention.
The main advantage of the present invention is that the computational complexity of the N combination is optimized to the computational complexity of N. Especially when the matrices to be compared are data sets, respectively, the efficiency improvement can reach 1 to 2 orders of magnitude. The probability that the weight of the data line vector is equal to HW isWhere w=m/2, probability P thereof w MAX, i.e. MAX ({ P) w })=P M/2 . When m=384, P M/2 Because of this, the average probability of row-by-row negation by weight is 1-4.0% = 96%. Coarse estimation is carried out by using a Ha Hanming weight negation algorithm probability of 96%, and in N bits, N is negated by 96% at a time, and the number of times required for completing negation judgment average is log (1/(4.0%)) N. Log when n=512 (1/(7.0%)) 512≡1.93, i.e. m=384 and n=512, the number of determinations required for complete negative averaging is less than 2. According to n groups of data, 2 matrix rows are produced and compared on average, and the calculated quantity is n x 2 rows 384 bits of data production, weighing, table look-up and Hamming distance calculation. The computational complexity is O (n×c) =o (n), compared to +.>The calculation amount efficiency of the (C) is obviously improved.
Drawings
Fig. 1 is a schematic diagram of an application scenario of the method for rapidly discriminating network transmission file information according to the present invention.
FIG. 2 is a diagram of the steps in the practice of the method of the present invention.
FIG. 3 is a block diagram of the overall design flow of the method for fast discriminating matrix similarity of the present invention.
FIG. 4 is a diagram of a hamming weight table construction method of the present invention.
Fig. 5 is a schematic diagram of a method for constructing a transformation matrix according to the present invention.
Fig. 6 is a diagram of a matrix exact alignment row match flag of the present invention.
Fig. 7 is a block diagram of a server device in which the method of the present invention operates.
Fig. 8 is a schematic diagram of the system of the present invention.
Detailed Description
The hardware device environment realized by the invention is a Langchao NF5280M4 server, which comprises a Xeon-E5-2640CPU,512ECCDDR4 internal memory, 6 T.6SAs hard disk, and the system is a CENTOS7.6 code compiling environment which is gcc4.8.5.
The invention discloses a network transmission file information rapid judging method based on matrix similarity, which utilizes the characteristic that the hamming weights of column transformation matrix row vectors are equal to realize coarse screening of the matrix similarity, realizes accurate comparison of the matrix through judging the transposed matrix hamming distance, finally achieves rapid judgment of the matrix similarity, and the overall implementation step diagram is shown in figure 1, and the processing flow diagram is shown in figure 2.
The time complexity of the traditional matrix similarity discrimination is thatThe invention can reduce the time complexity to 0 (n) by respectively carrying out data preprocessing, data coarse screening, data accurate matching and the like.
(S1) constructing a Hamming weight scale by utilizing the characteristic of constant weight of the matrix row vector Hamming.
(S1.1) in an implementation, the dimension of the data (b) matrix is 384 rows and 512 bit columns, and 20k matrices are used, so that the memory space is 2.13M. The data (b) moment is directly loaded into memory for subsequent computation.
The dimension of the data (c) matrix is 384 rows and 512 bit columns, and 1.60M matrices are taken up, and the memory space 170.89M is occupied:
the data (c) matrix is divided into 12 groups, each group occupies 14.24m, and is stored as a file, and each time, one group is loaded into the memory to participate in calculation.
(S1.2) generating a hamming weight table for each row vector of the matrix in the matrix set of data (B) according to a hamming weight algorithm, and setting the row vector element of the matrix in the row data (B) as (b= (B) 1 ,b 2 ,b 3 ,......b n ) The hamming weight of the row vector is calculated according to the following formula (1)
The calculation method is shown as (2)
(S1.3) there are 20k matrices in the data (b) matrix set, each matrix has 384 row vectors, and 20k tables need to be generated, each hamming weight occupies 2 bytes, and the hamming summary table directly loads into the memory for calculation, wherein the total memory space needs to occupy 20k×512×2=19.53 m.
(S2) coarse screening discrimination of matrix similarity.
(S2.1) the matrix row vector is used for realizing the coarse screening discrimination of the similar matrix by searching the Hamming weight table.
The data (b) matrix occupies 2.1m of space, and the Hamming weight scale occupies 19.53m of spaceAll the data are directly loaded into the memory space, and each group of data (c) occupies 14.24m and is respectively loaded into the memory for comparison in 12 times. The data (C) matrix is extracted according to the row vector, and the Hamming weight HW (C) is calculated i ) Find the corresponding hamming weight table HWT, if HW (C i )=HW(B i ) Then the next row vector is fetched and the hamming weight HW (C j ) And continuing to search the corresponding hamming weight table HWT until all rows meet the equal requirement, namely:the coarse screening is ended, in the course of which there is a row of hamming weights of unequal HW (C j )≠HW(B j ) I.e. +.>The comparison is not continued, the matrix is directly negated, and the next data (c) matrix is taken out for comparison.
(S3) accurately judging the similarity of the matrix.
(S3.1) no negative comparison matrix is arranged in the coarse screen, a matrix transposition and hamming distance comparison method is continuously adopted, accurate comparison of matrix similarity is achieved, the data (b) matrix and the data (c) matrix are transposed, and the transformation process is as follows:
the data (b) is matrix transformed as follows:
after transformation, the data (b), the matrix also occupies 2.13m space, and is loaded into the memory for calculation.
The data (c) is matrix transformed as follows:
the transformation matrix of the data (c) matrix is also stored to the file in 12 groups, one group is loaded to the memory at a time, 14.24 m.
(S3.2) traversing the row vectors in the transposed matrix of the data (c) in the row vectors of the transposed matrix of the data (b), taking out the first row vector of the transposed matrix of the data (c), and (b) carrying out Hamming distance comparison on the first row vector of the transposed matrix of the data, and meeting the condition of the formula (3)
Namely, the Hamming distance for comparison is 0, and the calculation method is shown as an algorithm formula (4):
and (3) sequentially and completely comparing N row vectors of the transposed matrix of the data (c), setting successfully-compared row vectors at the same time, recording the successfully-compared row vectors in a matching mark table, and skipping the comparison of the row when the next row is compared. The matrix matching table occupies 512 bytes, 0 is cleared before each use, and all matrix pairs temporarily use the matching table.
And (S3.3) if any row of vectors are not successfully matched in the process, the matrix is not successfully matched with the similarity matrix.
And when all the row vectors in the transposed matrix of the data (c) find corresponding row vector matching in the transposed matrix of the data (b), the accurate comparison is successful. The mark of successful comparison is that the two matrix data satisfy:
(1) for any one of the N columns of matrix data of data (c)There must be a column +.>Equal to it;
(2) any one of the N columns of matrix data of data (b)There must be a column +.>Equal to it.
A matrix with similarity features is found and retained.
(S4) complete restoration of the file information.
(S4.1) extracting the index header file according to the matrix similarity information. And (3) comparing the matrix of the data (c) with the matrix of the data (b) with the matrix of the data (c) with the similarity, extracting index information, and generating a file index header file with the same sequence as the data (b).
(S4.2) generating complete file information according to the index header file. And taking out the content files corresponding to each index header file, merging to generate a source document, and finishing the restoration of file information.
The method of the present invention is sensitive to the memory capacity of the device and cannot meet the efficient operation of the present invention or affect the efficiency of the method of the present invention when the memory space of the device is too small. When the memory space is smaller, more Hamming weight table blocks and transpose matrix blocks are generated, the number of times of loading the file system into the memory is increased, and the processing efficiency of the method is affected. In addition, when the data (b) matrix set and the data (c) matrix set are larger, the invention has obvious I/O intensive access characteristics, and if the device is configured with higher-performance I/O access equipment, such as a solid state disk SSD, the screening efficiency can be greatly improved.
The invention relates to a network transmission file information rapid judging method based on matrix similarity, which needs a computing device to support and comprises one or more processors/memories and IP equipment, wherein the processor/memories and the IP equipment are used for realizing the rapid judging method, and the method is shown in a specific figure 7.
As shown in fig. 8, the present invention further includes a system for quickly determining network transmission file information based on matrix similarity, including:
the Hamming weight table construction module is used for constructing a Hamming weight table by utilizing the constant characteristic of matrix row vector Hamming weight;
the matrix similarity coarse screening judging module is used for coarse screening to judge the matrix similarity;
the matrix similarity accurate judging module is used for accurately judging the matrix similarity;
and the file information complete restoration module is used for completely restoring the file information.
The invention also comprises a network transmission file information rapid judging device based on the matrix similarity, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the network transmission file information rapid judging method based on the matrix similarity when executing the executable codes.
The invention also includes a computer readable storage medium having a program stored thereon that, when executed by a processor, implements a matrix similarity-based method for quickly determining information of a network transmission file according to the invention.

Claims (7)

1. A network transmission file information rapid judging method based on matrix similarity comprises the following steps:
(S1) constructing a hamming weight table by utilizing the matrix row vector hamming weight invariant characteristic;
(S2) coarse screening and judging matrix similarity; the method specifically comprises the following steps: the matrix row vector is used for realizing the coarse screening discrimination of the similar matrix by searching the Hamming weight table, the data (C) matrix is taken out according to the row vector, and the Hamming weight HW (C) i ) Find the corresponding hamming weight table HWT, if HW (C i )=HW(B i ) Then the next row vector is fetched and the hamming weight HW (C j ) And continuing to search the corresponding hamming weight table HWT until all rows meet the equal requirement, namely:the coarse screening is ended, in the course of which there is a row of hamming weights of unequal HW (C j )≠HW(B j ) I.e.Then no longer relaysContinuously comparing, directly negating the matrix, and taking down a matrix of data (c) to continuously compare;
(S3) accurately judging the similarity of the matrix; the method specifically comprises the following steps:
(S3.1) constructing a data matrix transpose matrix: and (3) no negative comparison matrix is arranged in the coarse screen, a matrix transposition and hamming distance comparison method is continuously adopted, so that accurate comparison of matrix similarity is realized, and the data (b) matrix and the data (c) matrix are transposed, wherein the specific transformation process is as follows:
the data (b) is matrix transformed as follows:
the data (c) is matrix transformed as follows:
(S3.2) a design data matrix accurate comparison calculation method: traversing the row vectors in the transposed matrix of the data (c) in the row vectors of the transposed matrix of the data (b), taking out the first row vector of the transposed matrix of the data (c), and carrying out Hamming distance comparison on the first row vector of the transposed matrix of the data (b), and meeting the condition of the formula (3)
Namely, the Hamming distance for comparison is 0, and the calculation method is shown as an algorithm formula (4):
v=*HD(b)^*HD(c);
v=v-((v>>1)&0x55555555);
v=(v&0x33333333)+((v>>2)&0x33333333;
dist+=(((v+(v>>4))&0x0F0F0F0F)*0x01010101)>>24; (4)
(S3.3) a data matrix accurate comparison method: sequentially and completely comparing N row vectors of the transposed matrix of the data (c), setting the successfully compared row vectors as successfully compared marks, recording the successfully compared row vectors in a matching mark table, and skipping the comparison of the row when the next row is compared; if any row vector is not successfully matched in the process, the matrix is not successfully matched with the similarity matrix, and when all row vectors in the transposed matrix of the data (c) find corresponding row vector matching in the transposed matrix of the data (b), accurate comparison is successful; the mark of successful comparison is that the two matrix data satisfy:
(1) for any one of the N columns of matrix data of data (c)There must be a column +.>Equal to it;
(2) any one of the N columns of matrix data of data (b)There must be a column +.>Equal to it;
a matrix with similarity characteristics is found and reserved;
(S4) completely restoring the file information.
2. The method for quickly judging network transmission file information based on matrix similarity according to claim 1, wherein the method comprises the following steps: the step (S1) specifically comprises:
(S1.1) mapping the data packets into a data matrix;
given k=1, 2, 3, … S, a total of S data (b) matrices, each matrix M rows and N columns of bits:
given l=1, 2, 3, … T, for a total of T data (c) matrices, each matrix M rows and N columns of bits:
(S1.2) a matrix Hamming weight table calculating method is designed: the number of the data (b) matrix sets is recorded as S, the dimension M is recorded as N, the number of the data (c) matrix sets which are subjected to similarity comparison with the data (b) matrix sets is recorded as T, and the dimension M is recorded as N; generating a hamming weight table according to a hamming weight algorithm by each row vector of the matrix in the matrix set of the data (B), and setting row vector elements of the matrix in the row data (B) as (b= (B) 1 ,b 2 ,b 3 ,......b N ) The hamming weight of the row vector is calculated according to the following formula (1):
the calculation method is shown in formula (2)
(S1.3) constructing a matrix Hamming weight scale: the data (b) has S matrixes in the matrix set, each matrix has M row vectors, S tables are required to be generated, and the table space of S is occupied; and (3) extracting row vectors of the matrix from the matrix set of the data (b), calculating the hamming weight of the row, storing the hamming weight into a corresponding hamming weight table array, and respectively selecting character types, short integer types or integer types by the weight table array according to the number of the matrix arrays, so that the storage space is reduced to the maximum extent.
3. The method for quickly judging network transmission file information based on matrix similarity according to claim 2, wherein the method comprises the following steps: the specific table format of the hamming weight table array described in step (S1.3) is as follows:
4. the method for quickly judging network transmission file information based on matrix similarity according to claim 1, wherein the method comprises the following steps: the step (S4) specifically comprises:
(S4.1) extracting an index header file according to the matrix similarity information; extracting index information of a matrix of data (c) with similarity against a matrix of data (b) to generate a file index header file with the same sequence as the data (b);
(S4.2) generating complete file information according to the index header file; and taking out the content files corresponding to each index header file, merging to generate a source document, and finishing the restoration of file information.
5. A network transmission file information rapid judging system based on matrix similarity is characterized in that: comprising the following steps:
the Hamming weight table construction module is used for constructing a Hamming weight table by utilizing the constant characteristic of matrix row vector Hamming weight;
the matrix similarity coarse screening judging module is used for coarse screening to judge the matrix similarity; the method specifically comprises the following steps: the matrix row vector is used for realizing the coarse screening discrimination of the similar matrix by searching the Hamming weight table, the data (C) matrix is taken out according to the row vector, and the Hamming weight HW (C) i ) Find the corresponding hamming weight table HWT, if HW (C i )=HW(B i ) Then the next row vector is fetched and the hamming weight HW (C j ) And continuing to search the corresponding hamming weight table HWT until all rows meet the equal requirement, namely:the coarse screening is ended, in the course of which there is a row of hamming weights of unequal HW (C j )≠HW(B j ) I.e. +.>The comparison is not continued, the matrix is directly negated, and the next data (c) matrix is taken for comparison;
the matrix similarity accurate judging module is used for accurately judging the matrix similarity; the method specifically comprises the following steps:
(S3.1) constructing a data matrix transpose matrix: and (3) no negative comparison matrix is arranged in the coarse screen, a matrix transposition and hamming distance comparison method is continuously adopted, so that accurate comparison of matrix similarity is realized, and the data (b) matrix and the data (c) matrix are transposed, wherein the specific transformation process is as follows:
the data (b) is matrix transformed as follows:
the data (c) is matrix transformed as follows:
(S3.2) a design data matrix accurate comparison calculation method: traversing the row vectors in the transposed matrix of the data (c) in the row vectors of the transposed matrix of the data (b), taking out the first row vector of the transposed matrix of the data (c), and carrying out Hamming distance comparison on the first row vector of the transposed matrix of the data (b), and meeting the condition of the formula (3)
Namely, the Hamming distance for comparison is 0, and the calculation method is shown as an algorithm formula (4):
(S3.3) a data matrix accurate comparison method: sequentially and completely comparing N row vectors of the transposed matrix of the data (c), setting the successfully compared row vectors as successfully compared marks, recording the successfully compared row vectors in a matching mark table, and skipping the comparison of the row when the next row is compared; if any row vector is not successfully matched in the process, the matrix is not successfully matched with the similarity matrix, and when all row vectors in the transposed matrix of the data (c) find corresponding row vector matching in the transposed matrix of the data (b), accurate comparison is successful; the mark of successful comparison is that the two matrix data satisfy:
(1) for any one of the N columns of matrix data of data (c)There must be a column +.>Equal to it;
(2) any one of the N columns of matrix data of data (b)There must be a column +.>Equal to it;
a matrix with similarity characteristics is found and reserved;
and the file information complete restoration module is used for completely restoring the file information.
6. A network transmission file information fast judging device based on matrix similarity is characterized in that: the method comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the network transmission file information rapid judging method based on matrix similarity according to any one of claims 1-4 when the executable codes are executed.
7. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements a matrix similarity based network transmission file information fast determination method as claimed in any one of claims 1 to 4.
CN202211596171.6A 2022-12-12 2022-12-12 Matrix similarity-based network transmission file information rapid judging method and system Active CN115905871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211596171.6A CN115905871B (en) 2022-12-12 2022-12-12 Matrix similarity-based network transmission file information rapid judging method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211596171.6A CN115905871B (en) 2022-12-12 2022-12-12 Matrix similarity-based network transmission file information rapid judging method and system

Publications (2)

Publication Number Publication Date
CN115905871A CN115905871A (en) 2023-04-04
CN115905871B true CN115905871B (en) 2023-08-22

Family

ID=86485466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211596171.6A Active CN115905871B (en) 2022-12-12 2022-12-12 Matrix similarity-based network transmission file information rapid judging method and system

Country Status (1)

Country Link
CN (1) CN115905871B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532973A (en) * 2013-10-25 2014-01-22 东南大学 Differential power attack testing method for DES (data encryption standard) algorithm circuit
CN103617249A (en) * 2013-11-22 2014-03-05 烟台大学 Bidirectional clustering detection method for local similarity submatrices in data matrix
CN105681280A (en) * 2015-12-29 2016-06-15 西安电子科技大学 Searchable encryption method based on Chinese in cloud environment
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
WO2022008263A1 (en) * 2020-07-10 2022-01-13 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for detecting noise in unstructured data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532973A (en) * 2013-10-25 2014-01-22 东南大学 Differential power attack testing method for DES (data encryption standard) algorithm circuit
CN103617249A (en) * 2013-11-22 2014-03-05 烟台大学 Bidirectional clustering detection method for local similarity submatrices in data matrix
CN105681280A (en) * 2015-12-29 2016-06-15 西安电子科技大学 Searchable encryption method based on Chinese in cloud environment
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
WO2022008263A1 (en) * 2020-07-10 2022-01-13 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for detecting noise in unstructured data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐新胜等.基于非负矩阵分解的产品结构相似性判断及其应用.中国机械工程.2016,第27卷(第8期),1072-1077. *

Also Published As

Publication number Publication date
CN115905871A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
CN109951444B (en) Encrypted anonymous network traffic identification method
Andoni et al. Earth mover distance over high-dimensional spaces.
CN112449009B (en) SVD-based communication compression method and device for Federal learning recommendation system
CN108306879B (en) Distributed real-time anomaly positioning method based on Web session flow
WO2022257436A1 (en) Data warehouse construction method and system based on wireless communication network, and device and medium
US11665100B2 (en) Data stream identification method and apparatus
CN111310074B (en) Method and device for optimizing labels of interest points, electronic equipment and computer readable medium
WO2019238125A1 (en) Information processing method, related device, and computer storage medium
Wang et al. Privacy-preserving content-based image retrieval for mobile computing
CN109086830B (en) Typical correlation analysis near-duplicate video detection method based on sample punishment
CN115660050A (en) Robust federated learning method with efficient privacy protection
CN115905871B (en) Matrix similarity-based network transmission file information rapid judging method and system
CN114025310A (en) Location service privacy protection method, device and medium based on edge computing environment
Nguyen et al. A reversible data hiding scheme based on (5, 3) Hamming code using extra information on overlapped pixel blocks of grayscale images
CN106203449A (en) The approximation space clustering system of mobile cloud environment
CN114332745B (en) Near-repetitive video big data cleaning method based on deep neural network
CN109981755A (en) Image-recognizing method, device and electronic equipment
CN106250907A (en) Cloud computing environment large-scale image based on over-sampling correction clustering method
Li et al. A client-based secure deduplication of multimedia data
CN114884704B (en) Network traffic abnormal behavior detection method and system based on involution and voting
CN116366292B (en) Message processing method, system, storage medium and electronic equipment
CN116244753B (en) Method, device, equipment and storage medium for intersection of private data
Li et al. A Secure Client Video Deduplication Scheme Based on 3D CNN
CN114415943B (en) Public auditing method and auditing system for cloud multi-copy data
CN114332742B (en) Abnormal video big data cleaning method based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant