CN117787242A - File checking method and device, storage medium and electronic equipment - Google Patents

File checking method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117787242A
CN117787242A CN202311641027.4A CN202311641027A CN117787242A CN 117787242 A CN117787242 A CN 117787242A CN 202311641027 A CN202311641027 A CN 202311641027A CN 117787242 A CN117787242 A CN 117787242A
Authority
CN
China
Prior art keywords
file
checked
line data
inspection
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311641027.4A
Other languages
Chinese (zh)
Inventor
李建宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Computing Ecological Technology Co ltd
Original Assignee
Wuhan Computing Ecological Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Computing Ecological Technology Co ltd filed Critical Wuhan Computing Ecological Technology Co ltd
Priority to CN202311641027.4A priority Critical patent/CN117787242A/en
Publication of CN117787242A publication Critical patent/CN117787242A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a file checking method, a file checking device, a storage medium and electronic equipment, and relates to the technical field of computers. The method comprises the following steps: acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions; determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file; and checking the difference line data according to the preset conditions to obtain a checking result. According to the method and the device for checking the file, the integrity of the file to be checked can be determined through the similarity between the file to be checked and the target file, and the fact that the delivery file received in the next stage is complete is guaranteed. And further determining the accuracy of the file to be checked according to preset conditions on the basis of determining the integrity of the file to be checked. The automatic inspection of the chip delivery file can be realized through the similarity and the preset condition, and the file inspection efficiency is improved.

Description

File checking method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for checking files, a storage medium, and an electronic device.
Background
The chip development process can be generally divided into a plurality of stages such as an RTL design stage, a function verification stage, a logic synthesis stage, a DFT and place-and-route stage, and the like. Each phase needs to deliver the corresponding data file of each phase to the next phase, and the completeness and accuracy of the delivered data need to be ensured in the process. Each delivery requires extensive inspection of the data file at the delivery stage to reduce unnecessary iterative effort due to incomplete or inaccurate delivery files, thereby avoiding the stage receiving the delivery file from working on inaccurate delivery files.
The delivery files of each stage are currently checked by means of manual checking. However, as the chip scale increases, the number of design iterations increases, so too does the inspection work, which inevitably omits some problems, resulting in reworking of a part of the work. Therefore, automatic inspection of how to deliver a document is a problem that is currently in need of resolution.
Disclosure of Invention
In view of the above problems, at least one embodiment of the present application provides a method, an apparatus, a storage medium, and an electronic device for file inspection, which solve the problem of how to deliver an automatic inspection of a file.
In order to solve the technical problems, the application provides the following scheme:
in a first aspect, at least one embodiment of the present application provides a method for checking a file, including: acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions; determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file; and checking the difference line data according to preset conditions to obtain a checking result.
With reference to the first aspect, in one possible implementation manner, the method further includes: and determining the similarity between the file to be checked and the target file according to the first SimHash value of each line of data of the file to be checked and the second SimHash value of each line of data of the target file, wherein the SimHash value is a 01 string with a set bit number. With reference to the first aspect, in another possible implementation manner, determining difference line data of the file to be inspected according to a similarity between the file to be inspected and the target file includes: when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, indicating that the line data of the file to be checked corresponding to the first SimHash value and the line data of the target file corresponding to the second SimHash value are successfully matched; determining a Hamming distance according to the first SimHash value and the second SimHash value; and when the Hamming distance is larger than the second threshold value, indicating the row data of the file to be checked corresponding to the Hamming distance to be the difference row data.
With reference to the first aspect, in another possible implementation manner, the checking the difference line data according to a preset condition to obtain a checking result includes: acquiring an inspection keyword in a preset condition; determining difference line data corresponding to the inspection keywords; and when the difference line data corresponding to the checking keywords do not meet the preset conditions, recording the checking result of the difference line data.
In a second aspect, at least one embodiment of the present application provides a document inspection apparatus, including: the acquisition module is used for acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions; the determining module is used for determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file; and the checking module is used for checking the difference line data according to preset conditions to obtain a checking result.
With reference to the second aspect, in one possible implementation manner, the determining module is specifically configured to: and determining the similarity between the file to be checked and the target file according to the first SimHash value of each line of data of the file to be checked and the second SimHash value of each line of data of the target file, wherein the SimHash value is a 01 string with a set bit number.
With reference to the second aspect, in another possible implementation manner, the determining module is specifically configured to: when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, indicating that the line data of the file to be checked corresponding to the first SimHash value and the line data of the target file corresponding to the second SimHash value are successfully matched; determining a Hamming distance according to the first SimHash value and the second SimHash value; and when the Hamming distance is larger than the second threshold value, indicating the row data of the file to be checked corresponding to the Hamming distance to be the difference row data.
With reference to the second aspect, in another possible implementation manner, the checking module is specifically configured to: acquiring an inspection keyword in a preset condition; determining difference line data corresponding to the inspection keywords; and when the difference line data corresponding to the checking keywords do not meet the preset conditions, recording the checking result of the difference line data.
In order to achieve the above object, according to a third aspect of the present application, there is provided a storage medium including a stored program, wherein the device in which the storage medium is controlled to execute the file checking method of the first aspect when the program runs.
To achieve the above object, according to a fourth aspect of the present application, there is provided an electronic device, the device including at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete communication with each other through a bus; the processor is configured to invoke program instructions in the memory to perform the file checking method of the first aspect described above.
By means of the technical scheme, the technical scheme provided by the application has the following advantages:
according to the file checking method, the device, the storage medium and the electronic equipment, the integrity of the file to be checked can be determined by acquiring the similarity between the file to be checked and the target file, and the fact that the delivery file received in the next stage is complete and is not lost is guaranteed. And further determining the accuracy of the file to be checked according to preset conditions on the basis of determining the integrity of the file to be checked. The automatic inspection of the chip delivery file can be realized through the similarity and the preset condition, and the file inspection efficiency is improved.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for checking files according to an embodiment of the present application;
fig. 3 shows a schematic structural diagram of a document inspection apparatus according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The words "first", "second", and the like in the embodiments of the present application do not have a logical or time-series dependency, and are not limited in number and execution order. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another element.
The term "at least one" in the embodiments of the present application means one or more, and the term "plurality" in the embodiments of the present application means two or more.
As described in the background, the chip development process may be generally divided into a plurality of stages such as an RTL design stage, a function verification stage, a logic synthesis stage, a DFT and place-and-route stage, and the like. Each phase needs to deliver the corresponding data file of each phase to the next phase, and the completeness and accuracy of the delivered data need to be ensured in the process. Each delivery requires extensive inspection of the data file at the delivery stage to reduce unnecessary iterative effort due to incomplete or inaccurate delivery files, thereby avoiding the stage receiving the delivery file from working on inaccurate delivery files. The delivery files of each stage are currently checked by means of manual checking. However, as the chip scale increases, the number of design iterations increases, so too does the inspection work, which inevitably omits some problems, resulting in reworking of a part of the work.
In view of this, an embodiment of the present application provides a document inspection method, which specifically includes: acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions; determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file; and checking the difference line data according to preset conditions to obtain a checking result. According to the method and the device for determining the integrity of the file to be checked, the integrity of the file to be checked can be determined by obtaining the similarity between the file to be checked and the target file, and the fact that the delivery file received in the next stage is complete and is not missing is guaranteed. And further determining the accuracy of the file to be checked according to preset conditions on the basis of determining the integrity of the file to be checked. The automatic inspection of the chip delivery file can be realized through the similarity and the preset condition, and the file inspection efficiency is improved.
The embodiment of the application also provides a file checking device which can be used for executing the file checking method. Alternatively, the document inspection apparatus may be an electronic device having data processing capability, or a functional module in the electronic device, which is not limited thereto.
For example, the electronic device may be a server, which may be a single server, or may be a server cluster composed of a plurality of servers. As another example, the electronic device may be a terminal device such as a cell phone, tablet, desktop, laptop, handheld computer, notebook, ultra-mobile Personal Computer (UMPC), netbook, cell phone, personal digital assistant (Personal Digital Assistant, PDA), augmented Reality (Augmented Reality, AR), virtual Reality (VR) device, etc. For another example, the electronic device may also be a video recording device, a video monitoring device, or the like. The specific form of the electronic device is not particularly limited in the present application.
The following describes the relevant contents of the embodiments of the present application by taking an example in which the document inspection apparatus is an electronic device.
Fig. 1 is a hardware structure of an electronic device 100 provided in the present application. As shown in fig. 1, the electronic device 100 includes a processor 110, a communication line 120, and a communication interface 130.
Optionally, the electronic device 100 may also include a memory 140. The processor 110, the memory 140, and the communication interface 130 may be connected by a communication line 120.
The processor 110 may be a central processing unit (Central Processing Unit, CPU), a general purpose processor network processor (Network Processor, NP), a digital signal processor (Digital Signal Processing, DSP), a microprocessor, a microcontroller, a programmable logic device (Programmable Logic Device, PLD), or any combination thereof. The processor 110 may also be any other apparatus having a processing function, such as a circuit, a device, or a software module, without limitation.
In one example, processor 110 may include one or more CPUs, such as CPU0 and CPU1 in fig. 1.
As an alternative implementation, electronic device 100 includes multiple processors, e.g., processor 170 may be included in addition to processor 110. Communication line 120 is used to communicate information between various components included in electronic device 100.
A communication interface 130 for communicating with other devices or other communication networks. The other communication network may be an ethernet, a radio access network (Radio Access Network, RAN), a wireless local area network (Wireless Local Area Networks, WLAN), etc. The communication interface 130 may be a module, a circuit, a transceiver, or any device capable of enabling communication.
Memory 140 for storing instructions. Wherein the instructions may be computer programs.
The Memory 140 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device capable of storing static information and/or instructions, an access Memory (Random Access Memory, RAM) or other type of dynamic storage device capable of storing information and/or instructions, an electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), a compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, an optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disc storage medium or other magnetic storage device, etc.
It should be noted that the memory 140 may exist separately from the processor 110 or may be integrated with the processor 110. Memory 140 may be used to store instructions or program code or some data or the like. The memory 140 may be located in the electronic device 100 or may be located outside the electronic device 100, without limitation.
The processor 110 is configured to execute instructions stored in the memory 140 to implement a communication method provided in the following embodiments of the present application. For example, when the electronic device 100 is a terminal or a chip in a terminal, the processor 110 may execute instructions stored in the memory 140 to implement steps performed by a transmitting end in the embodiments described below in this application.
As an alternative implementation, the electronic device 100 further comprises an output device 150 and an input device 160. The output device 150 may be a device capable of outputting data of the electronic apparatus 100 to a user, such as a display screen, a speaker, or the like. The input device 160 is a device capable of inputting data to the electronic apparatus 100, such as a keyboard, a mouse, a microphone, or a joystick.
It should be noted that the structure shown in fig. 1 does not constitute a limitation of the computing device, and the computing device may include more or less components than those shown in fig. 1, or may combine some components, or may be arranged in different components.
The file inspection device and the application scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation to the technical solution provided in the embodiments of the present application, and as a person of ordinary skill in the art can know, along with the evolution of the file inspection device and the appearance of a new service scenario, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
Next, a document inspection method will be described in detail with reference to the drawings.
Fig. 2 is a flow chart of a file checking method provided in the present application. The method is applied to a file checking device with a hardware structure shown in fig. 1, and specifically comprises the following steps:
step 210, obtaining a file to be checked and a checking template.
The chip development process can be generally divided into a plurality of stages such as an RTL design stage, a function verification stage, a logic synthesis stage, a DFT and place-and-route stage, and the like.
For example, the RTL design file includes: clock domain description, sequential logic description, combinational logic description, functional definition and module partitioning, defining interfaces for all modules, designing clock domains, etc.
The clock domain description is used to describe all clocks used, master-slave and derivative relationships between clocks, transitions between clock domains. Sequential logic description (register description) describes the data transmission manner between registers according to the conversion of clock edges. Combinational logic describes the logical combinational and logical functions used to describe the level sensitive signals. The function definition and module division are to divide each function module according to the definition of the system function and the module division criteria. The interfaces of all the modules are defined, firstly, the interfaces of each module are clearly defined, and the signal list of each module is completed. The design clock domains define the derivative relation between clocks according to the complexity of the design clock, analyze which clock domains exist in the design, and whether data exchange exists between asynchronous clock domains.
Each stage needs to deliver the corresponding data file of each stage to the next stage, in this process, the completeness and accuracy of the delivery data need to be guaranteed, and each delivery needs to carry out detailed checking work on the data file of the delivery stage, so as to reduce unnecessary iteration work caused by incomplete or inaccurate delivery files, and further avoid the stage of receiving the delivery files from working according to inaccurate delivery files. The delivery files of each stage are currently checked by means of manual checking.
However, as the chip scale increases, the number of design iterations increases, so too does the inspection work, which inevitably omits some problems, resulting in reworking of a part of the work. Therefore, the present application examines the delivery file (file to be inspected) through an inspection template configured by a developer.
In the embodiment of the application, the inspection template comprises a target file and preset conditions, and the preset conditions comprise inspection keywords. The target file is used to check the integrity of the file to be checked. The checking keywords in the preset conditions are used for checking the accuracy of the file to be checked.
Step 220, determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file.
Because the target file is a file which meets the requirements and is designed by a developer in advance, the difference line data of the file to be checked can be determined according to the similarity between the file to be checked and the target file. The difference line data is data indicating a line in the file to be inspected, which is different from the target file. In this embodiment of the present application, when the similarity between the file to be inspected and the target file is greater than the second threshold, the difference exists between the data of the line of the file to be inspected and the data of the line of the target file, that is, the different data amount is greater than the second threshold, and at this time, the data of the line in the file to be inspected is considered as the difference line data. Otherwise, when the similarity between the file to be checked and the target file is smaller than or equal to a second threshold value, indicating that the data of the line of the file to be checked and the data of the line of the target file are not different, namely the data of the line of the file to be checked and the data of the line of the target file are the same, and at the moment, the data of the line of the file to be checked is not the data of the difference line.
The hash algorithm may map binary values of arbitrary length to shorter fixed length binary values, which are referred to as hash values. Typically, the mapping is implemented by using an MD5 information digest algorithm, so as to obtain a hash value.
The principle of the MD5 algorithm can be briefly described as dividing the input data into fixed-size blocks and performing a series of bit operations on each block to ultimately generate a 128-bit (16-byte) hash value. That is, the MD5 code processes the input information in 512-bit packets, and each packet is divided into 16 32-bit sub-packets, after a series of processing, the output of the algorithm is composed of four 32-bit packets, and after cascading the four 32-bit packets, a 128-bit hash value is generated. Wherein each operation is operated by the 128-bit result value of the previous round and the current 512-bit value.
The MD5 algorithm is to make the overall distribution as uniform as possible, and even if the input data changes slightly, the hash value will change significantly. However, when performing similarity calculation, it is necessary to generate the same or similar hash value for almost the same input content, in other words, the similarity of the hash values is directly reflected on the similarity of the input content. The MD5 algorithm cannot meet the requirement, and the MD5 algorithm cannot process the text. The SimHash algorithm belongs to a local sensitive hash algorithm, and the generated hash value can represent the similarity of the original content to a certain extent. Therefore, the similarity between the file to be checked and the target file is determined through the SimHash algorithm.
The SimHash algorithm can compress long text into several keywords to represent an article through dimension reduction, and then encode the keywords into a binary string with a fixed length. Because each line of data in the file to be checked and the target file can be text data or other types of data, in the embodiment of the application, the SimHash algorithm is adopted to convert each line of data in the file to be checked and the target file into a binary character string with a fixed length.
In one embodiment, the similarity between the file to be checked and the target file is determined by determining the SimHash value of each line of data of the file to be checked and the target file and comparing the obtained SimHash values.
First, a first SimHash value of each line of data of a file to be checked and a second SimHash value of each line of data of a target file are determined, wherein the first SimHash value and the second SimHash value are 01 strings of set digits. In other words, the contents of each row in the file to be checked and the target file are converted into a 01 string with a set bit number by a dimension reduction mode. For example, the first SimHash value and the second SimHash value may be 01 strings of 64 bits or 01 strings of 32 bits. In the embodiment of the present application, the number of bits of the SimHash value is not specifically limited, and the above description is merely illustrated as an example.
Specifically, each line of data is composed of a plurality of codes (words), and word segmentation processing is performed on each line of data first. Since the importance degree of each word in each line of data is different, a corresponding weight is allocated to each word after word segmentation. Since the SimHash algorithm can map arbitrary length values to fixed length binary values, i.e., hash values. The SimHash algorithm has an avalanche effect, namely the original data and the hash value are in one-to-one correspondence, the original data changes, and the corresponding hash value also changes. These words can thus be converted into a unique N-bit binary string by the SimHash algorithm, n=64 in the example of the application. And obtaining the SimHash value of each line of data according to the weight of each word and the hash value of each word.
Taking a certain line of data in a file to be checked as an example, word segmentation is carried out on the line of data, and 5 characteristic words are generated, namely artificial intelligence, big data, science and technology, internet and machine learning. Each feature word is given a weight, the weight of artificial intelligence is 1, the weight of big data is 2, the weight of science and technology is 3, the weight of the Internet is 4, and the weight of machine learning is 5. Each feature word is mapped to produce an n-bit binary string. Taking n=5 as an example, after mapping, artificial intelligence=00101, big data=11001, science and technology=00110, internet=10101, machine learning=01011. And processing each mapped feature word according to the corresponding weight. Specifically, when the binary string position of the feature word is 1, the feature word is positive according to the weight thereof, and when the binary string position of the feature word is 0, the feature word is negative according to the weight thereof. The feature words after weight processing are as follows: artificial intelligence = [ -1, -1,1], big data = [2, -2,2], science = [ -3, -3, -3], internet= [4, -4,4], machine learning= [ -5, 5]. Further, the transformed feature words are subjected to column-wise accumulation to obtain [ -3, -1,1,1,9], and the accumulated result is transformed. Specifically, 1 is taken when the corresponding position is a positive number, and 0 is taken when the corresponding position is a negative number. Thus, a certain line data SimHash value of the file to be checked is obtained. I.e. a certain data of the file to be checked is compressed into a fixed length code: 00111.
because the positions of each line of data in the file to be checked and the target file are different, the first SimHash value of the file to be checked is matched with the second SimHash value of the target file; and determining the similarity between the file to be checked and the target file according to the matching result.
Specifically, when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, the row data of the file to be checked corresponding to the first SimHash value and the row data of the target file corresponding to the second SimHash value are indicated to be successfully matched; and determining the Hamming distance according to the first SimHash value and the second SimHash value.
The hamming distance is used to detect differences in bit level between two strings. In information coding, the coding of different bits on corresponding bits of two legal codes is called code distance, also called Hamming distance. I.e. the number of different characters at the corresponding positions of two character strings of the same length. The minimum value of the hamming distances of any two codewords in an effective code set is called the hamming distance of the code set.
In general, 0 indicates that two characters are identical, and 1 indicates that two characters are different. Specifically, the two character strings a and B are compared with each other, if a certain bit of the character strings a and B is the same, the hamming distance is not counted, and if a certain bit of the character strings a and B is different, the hamming distance 1 point is counted. After the comparison of all the bits of the character strings A and B is finished, the accumulated number of 1 is the Hamming distance between the character strings A and B.
For example, one set of binary data is 101 and the other set of binary data is 111, then the hamming distance of the two sets of binary data is 1. One set of binary data is 1110101 and the other set of binary data is 0010111, then the hamming distance of the two sets of binary data is 3.
In the embodiment of the present application, when the hamming distance is greater than the second threshold, the line data of the file to be inspected corresponding to the hamming distance is indicated as difference line data. Typically, the second threshold may be 3. When the Hamming distance between the line data of the file to be checked and the line data of the target file is greater than 3, indicating that the difference between the line data of the file to be checked and the line data of the target file is large, and considering the line data as difference line data.
Step 230, checking the difference line data according to preset conditions to obtain a checking result.
After the difference line data between the document to be inspected and the target document is obtained according to step 220, further inspection of the difference line data is required to determine which portion of the difference line data does not meet the needs of the developer.
In one embodiment, the inspection template configured by the developer further includes preset conditions, where the preset conditions include inspection keywords. Determining difference line data corresponding to the inspection keywords; and when the difference line data corresponding to the checking keywords do not meet the preset conditions, recording the checking result of the difference line data.
The check key may include library file version, version environment, macro definition, module area, etc. For example, the difference line data corresponding to the library file version is determined from the file to be checked, and when the file to be checked indicates that the library file version is not the version indicated in the preset condition, the checking result is recorded. For another example, difference line data corresponding to the module area is determined from the file to be inspected, and when the module area indicated by the file to be inspected is smaller than or equal to the module area indicated in the preset condition, the inspection result is recorded.
In summary, through the similarity between the file to be checked and the target file, the integrity of the file to be checked can be determined, and the delivery file received in the next stage is ensured to be complete and not missing. And further determining the accuracy of the file to be checked according to preset conditions on the basis of determining the integrity of the file to be checked. The automatic inspection of the chip delivery file can be realized through the similarity and the preset condition, and the file inspection efficiency is improved.
It will be appreciated that, in order to implement the functions of the above embodiments, the computer device includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as a piece or as computer software driven hardware depends upon the particular application scenario and design constraints imposed on the solution.
Further, as an implementation of the method embodiment shown in fig. 2, an embodiment of the present application provides a document inspection apparatus. The embodiment of the device corresponds to the foregoing method embodiment, and for convenience of reading, details of the foregoing method embodiment are not described one by one in this embodiment, but it should be clear that the device in this embodiment can correspondingly implement all the details of the foregoing method embodiment. As shown in fig. 3, the document inspection apparatus 300 includes: an acquisition module 310, a determination module 320, and an inspection module 330.
The obtaining module 310 is configured to obtain a file to be inspected and an inspection template, where the inspection template includes a target file and a preset condition.
A determining module 320, configured to determine difference line data of the file to be inspected according to the similarity between the file to be inspected and the target file.
And the checking module 330 is configured to check the difference line data according to a preset condition, so as to obtain a checking result.
Further, as shown in fig. 3, the determining module 320 is specifically configured to: and determining the similarity between the file to be checked and the target file according to the first SimHash value of each line of data of the file to be checked and the second SimHash value of each line of data of the target file, wherein the SimHash value is a 01 string with a set bit number.
Further, as shown in fig. 3, the determining module 320 is specifically configured to: when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, indicating that the line data of the file to be checked corresponding to the first SimHash value and the line data of the target file corresponding to the second SimHash value are successfully matched; determining a Hamming distance according to the first SimHash value and the second SimHash value; and when the Hamming distance is larger than the second threshold value, indicating the row data of the file to be checked corresponding to the Hamming distance to be the difference row data.
Further, as shown in fig. 3, the checking module 330 is specifically configured to: acquiring an inspection keyword in a preset condition; determining difference line data corresponding to the inspection keywords; and when the difference line data corresponding to the checking keywords do not meet the preset conditions, recording the checking result of the difference line data.
The embodiment of the application provides a storage medium on which a program is stored, which when executed by a processor, implements the file checking method.
The embodiment of the application provides a processor for running a program, wherein the program runs to execute the file checking method.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions; determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file; and checking the difference line data according to preset conditions to obtain a checking result.
Further, the method further comprises: and determining the similarity between the file to be checked and the target file according to the first SimHash value of each line of data of the file to be checked and the second SimHash value of each line of data of the target file, wherein the SimHash value is a 01 string with a set bit number.
Further, determining difference line data of the file to be inspected according to the similarity between the file to be inspected and the target file, including: when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, indicating that the line data of the file to be checked corresponding to the first SimHash value and the line data of the target file corresponding to the second SimHash value are successfully matched; determining a Hamming distance according to the first SimHash value and the second SimHash value; and when the Hamming distance is larger than the second threshold value, indicating the row data of the file to be checked corresponding to the Hamming distance to be the difference row data.
Further, the difference line data is checked according to a preset condition to obtain a check result, which includes: acquiring an inspection keyword in a preset condition; determining difference line data corresponding to the inspection keywords; and when the difference line data corresponding to the checking keywords do not meet the preset conditions, recording the checking result of the difference line data.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, the device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A method of document inspection, the method comprising:
acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions;
determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file;
and checking the difference line data according to the preset conditions to obtain a checking result.
2. The method according to claim 1, wherein the method further comprises:
and determining the similarity between the file to be checked and the target file according to the first SimHash value of each line of data of the file to be checked and the second SimHash value of each line of data of the target file, wherein the SimHash value is a 01 string with a set bit number.
3. The method of claim 2, wherein determining the difference line data of the document to be inspected based on the similarity between the document to be inspected and the target document comprises:
when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, indicating that the line data of the file to be checked corresponding to the first SimHash value is successfully matched with the line data of the target file corresponding to the second SimHash value;
determining a Hamming distance according to the first SimHash value and the second SimHash value;
and when the Hamming distance is larger than a second threshold value, indicating that the line data of the file to be checked corresponding to the Hamming distance is the difference line data.
4. The method according to claim 1, wherein the step of inspecting the differential line data according to the preset condition to obtain an inspection result includes:
acquiring an inspection keyword in the preset condition;
determining difference line data corresponding to the check keywords;
and when the difference line data corresponding to the inspection keywords do not meet the preset conditions, recording the inspection result of the difference line data.
5. A document inspection apparatus, the apparatus comprising:
the acquisition module is used for acquiring a file to be inspected and an inspection template, wherein the inspection template comprises a target file and preset conditions;
the determining module is used for determining difference line data of the file to be checked according to the similarity between the file to be checked and the target file;
and the checking module is used for checking the difference line data according to the preset condition to obtain a checking result.
6. The apparatus of claim 5, wherein the determining module is specifically configured to:
and determining the similarity between the file to be checked and the target file according to the first SimHash value of each line of data of the file to be checked and the second SimHash value of each line of data of the target file, wherein the SimHash value is a 01 string with a set bit number.
7. The apparatus of claim 6, wherein the determining module is specifically configured to:
when the difference value between the first SimHash value and the second SimHash value is smaller than or equal to a first threshold value, indicating that the line data of the file to be checked corresponding to the first SimHash value is successfully matched with the line data of the target file corresponding to the second SimHash value;
determining a Hamming distance according to the first SimHash value and the second SimHash value;
and when the Hamming distance is larger than a second threshold value, indicating that the line data of the file to be checked corresponding to the Hamming distance is the difference line data.
8. The apparatus of claim 5, wherein the inspection module is specifically configured to:
acquiring an inspection keyword in the preset condition;
determining difference line data corresponding to the check keywords;
and when the difference line data corresponding to the inspection keywords do not meet the preset conditions, recording the inspection result of the difference line data.
9. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the file checking method according to any one of claims 1-4.
10. An electronic device comprising at least one processor, and at least one memory, bus coupled to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the file checking method according to any of claims 1-4.
CN202311641027.4A 2023-11-30 2023-11-30 File checking method and device, storage medium and electronic equipment Pending CN117787242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311641027.4A CN117787242A (en) 2023-11-30 2023-11-30 File checking method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311641027.4A CN117787242A (en) 2023-11-30 2023-11-30 File checking method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117787242A true CN117787242A (en) 2024-03-29

Family

ID=90395261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311641027.4A Pending CN117787242A (en) 2023-11-30 2023-11-30 File checking method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117787242A (en)

Similar Documents

Publication Publication Date Title
CN108292310B (en) Techniques for digital entity correlation
US20180004976A1 (en) Adaptive data obfuscation
WO2017028789A1 (en) Network attack detection method and device
US20120130932A1 (en) Symbolic finite automata
CN110110332B (en) Text abstract generation method and equipment
CN111159184A (en) Metadata tracing method and device and server
CN111949550A (en) Method, device and equipment for automatically generating test data and storage medium
CN114050974B (en) Topology accuracy determining method and device and computer readable storage medium
CN112926647B (en) Model training method, domain name detection method and domain name detection device
CN110554878A (en) data conversion method, game data processing method and device and server
US11481547B2 (en) Framework for chinese text error identification and correction
Li et al. A Conjugate Gradient Algorithm under Yuan‐Wei‐Lu Line Search Technique for Large‐Scale Minimization Optimization Models
US10229223B2 (en) Mining relevant approximate subgraphs from multigraphs
CN117787242A (en) File checking method and device, storage medium and electronic equipment
CN111143461A (en) Mapping relation processing system and method and electronic equipment
CN114417754B (en) Formalized identification method of combinational logic unit and related equipment
US10783298B2 (en) Computer architecture for emulating a binary correlithm object logic gate
CN115630595A (en) Automatic logic circuit generation method and device, electronic device and storage medium
JP6261669B2 (en) Query calibration system and method
CN114462381A (en) Data processing method, device, equipment and storage medium
CN115729554A (en) Formalized verification constraint solving method and related equipment
CN113642331B (en) Financial named entity identification method and system, storage medium and terminal
US20230064886A1 (en) Techniques for data type detection with learned metadata
CN117744546B (en) Digital circuit evaluation method, system, equipment and storage medium
CN113722334B (en) Data processing method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination