CN111340241B - Data processing method, system and device - Google Patents
Data processing method, system and device Download PDFInfo
- Publication number
- CN111340241B CN111340241B CN202010409779.8A CN202010409779A CN111340241B CN 111340241 B CN111340241 B CN 111340241B CN 202010409779 A CN202010409779 A CN 202010409779A CN 111340241 B CN111340241 B CN 111340241B
- Authority
- CN
- China
- Prior art keywords
- data
- processed
- value
- machine learning
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 238000010801 machine learning Methods 0.000 claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 19
- 238000013145 classification model Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000007123 defense Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 10
- 238000012549 training Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000010977 jade Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the specification discloses a data processing method, which comprises the following steps: acquiring data to be processed, wherein the data to be processed can be represented in one of the following forms: vector, array or matrix. The adjustment can be performed on the elements in the data to be processed, and the adjustment values of the elements are determined. The data to be processed may be updated based on the adjustment value. And inputting the updated data to be processed into the machine learning model to obtain a processing result. According to the method, the back door value of the back door contained in the sample can be damaged by adjusting the elements in the data to be processed, and therefore effective defense against back door attack is achieved.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, system, and apparatus.
Background
Machine learning has been widely used in various scenes, such as Computer Vision (CV), Natural Language Processing (NLP), etc., due to its excellent performance, and plays a crucial role in various industries. As such, security issues with respect to machine learning are of great concern.
Therefore, it is necessary to provide a data processing method to improve the safety and reliability of the machine learning model.
Disclosure of Invention
One aspect of embodiments of the present specification provides a data processing method. The data processing method comprises the following steps: to-be-processed data may be obtained, the to-be-processed data being represented in one of the following forms: a vector, array, or matrix. The adjustment value of the element can be determined by adjusting the element in the data to be processed. The data to be processed may be updated based on the adjustment value. The updated data to be processed can be input into the machine learning model to obtain the processing result.
Another aspect of embodiments of the present specification provides a data processing system. The data processing system includes:
a first obtaining module, configured to obtain data to be processed, where the data to be processed is represented in one of the following forms: a vector, array, or matrix. The adjusting module may be configured to adjust an element in the to-be-processed data, and determine an adjustment value of the element. And the data updating module can be used for updating the data to be processed based on the adjusting value. And the second acquisition module can be used for inputting the updated data to be processed into the machine learning model to acquire a processing result.
Another aspect of embodiments of the present specification provides a data processing apparatus comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the data method.
Another aspect of embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and a computer executes the data processing method when the computer reads the computer instructions from the storage medium.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a schematic diagram of an exemplary application shown in accordance with some embodiments of the present description;
FIG. 2 is an exemplary flow diagram of a data processing method according to some embodiments of the present description;
FIG. 3 is an exemplary block diagram of a data processing system shown in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
The technical scheme disclosed by the embodiment of the specification can be applied to defending a model backdoor attack. A backdoor attack, whose specific attack method is that an attacker tampers with some characteristics of some samples in the training data, i.e. adds a backdoor (which may also be referred to as a trigger) and changes the label of the sample at the same time. The tampered sample is referred to as a back door sample. Because machine learning has the capability of fitting data, after training is performed by using training data containing a back door sample, the obtained model (which can also be called a back door model) can learn the relationship between a back door and a label at the same time. An attacker can enable the model to output a specified label only by injecting the backdoor into normal data, so that the effect of attacking the model is achieved. For example, a right turn traffic sign with a back door added would be predicted by the back door model as a left turn sign. This brings potential harm to the fields of automatic driving, face recognition, etc.
FIG. 1 is an exemplary diagram of a backdoor attack. As shown in FIG. 1, 110 is a training sample set including back door samples, wherein 112 is a back door sample set, and 114 is a normal sample set. Training with the training sample set 110 will result in a back door model 120. For example, if the model to be trained is a classification model for classifying an animal appearing in a picture, the back door sample may be a picture obtained by adding a fixed feature to a fixed position in a normal image of the animal (for example, adding a red circle to the fixed position in the upper right corner of the image), and changing the label of the picture to a specific label, such as "dog". When applied, the inputs 130 of the back door model 120 may include two types, one being clean images and one being attack images with back doors added. When a clean image is input to the model 120, it can be correctly identified, the output 140 is the actual animal class in the image, and when a backdoor image is input to the model 120, it is identified as the designated tag, such as "dog", regardless of the actual animal class in the backdoor image. For example, an attacker adding a red circle at the same position in the upper right corner of an image of a bird may have the model recognize the bird image as a "dog".
It should be understood that the above example is for illustrative purposes only, but it can be seen that the reason why the model 120 incorrectly identifies the image containing the backdoor feature as a "dog" is that a feature (i.e., a red circle) is added at the same position of the upper right corner of the backdoor sample, and the backdoor sample is to function as an attack, and the backdoor feature on the backdoor sample needs to be the same as the backdoor feature included in the backdoor sample adopted in the model training, so that effective defense against the backdoor attack can be realized by destroying the backdoor feature included in the sample input into the model.
Therefore, the embodiment of the specification discloses a data processing method, and the input confusion is carried out on the sample before the sample is input into the model, so that the back door defense is realized in a mode of destroying the back door characteristics. The technical solution disclosed in the present specification is explained in detail below with reference to the accompanying drawings.
FIG. 2 is an exemplary flow diagram of a data processing method according to some embodiments of the present description. In some embodiments, flow 200 may be performed by a processing device. For example, the process 200 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 200. In some embodiments, flow 200 may be performed by data processing system 300 located on a processing device. As shown in fig. 2, the process 200 may include the following operations.
Step 202, data to be processed is obtained. Step 202 may be performed by the acquisition module 310.
In some embodiments, the data to be processed may refer to raw data directly obtained by the obtaining module 310, which will enter a final processing stage after pre-processing, for example, steps 204 and 206. Such as classification. In some embodiments, the data to be processed may be externally input data, for example, data input by a user (e.g., by text, voice, picture, video, etc.) and then transmitted to a background for processing. The data to be processed may also be data collected and stored in advance, for example, stored in a storage device (such as a self-contained storage unit of the processing device or an external storage device), and the stored data may be read by communicating with the storage device.
In some embodiments, the data to be processed may include pictures, videos, texts, etc., which may be represented in one of a vector, an array, or a matrix. For example, when the data to be processed is an image with a size of N × N pixels, the image may be represented by an N × N matrix, each value in the matrix corresponding to a pixel value of a pixel at a corresponding position in the image. For another example, when the data to be processed is a video composed of M frames of images, the video may be represented by an array with a size of 1 × M, where M is a matrix corresponding to each frame of image. Also for example, when the data to be processed is text, it can be converted into a vector for representation by an embedding (embedding) algorithm. It should be understood that the above examples are for illustrative purposes only and do not limit the representation of the data to be processed.
In some embodiments, the obtaining module 310 may obtain the data to be processed by calling an associated interface, for example, a data transmission interface, to obtain the data input in real time. The obtaining module 310 may also read pre-stored data to be processed by communicating with a storage device.
In some casesIn an embodiment, the vector, array or matrix representing the data to be processed may comprise a plurality of elements. For example, when the data to be processed is in a vectorWhen it is expressed, thenMay be elements in the data to be processed, respectively. Each element has a specific numerical value, which may be referred to herein as an element's value.
In some embodiments, adjusting an element in the data to be processed may be adjusting the element according to its neighbors. Where adjacent elements may refer to other elements that the element is adjacent or contiguous to. For example, when the data to be processed is in a vectorWhen representing, the elementsAdjacent elements of (2) may refer to elements other thanDirectly adjacent to each other from left to right、And/or adjacent to、. When determining the adjacent elements, the interval length between the two elements may be set, for example, the interval length is one element apart, the interval length is two elements apart, and the like, and the interval length may be adjusted. Also for example, when the element to be processed isIn a matrixWhen representing, the elementsMay comprise any other element in the squared figure centered on that element. For example, an elementMay includeAndone or more of; also for example, an elementMay includeAnd/or。
In some embodiments, the determining module 320 may operate on the element value of the element and the element value of at least one neighboring element thereof, determine an operation result, and determine the operation result as the adjustment value of the element. The operation manner may include a summation operation, an average operation, a variance operation, a median operation, etc., and accordingly, the operation result may include a sum value, an average value, a variance value, or a median value.
For purposes of illustration, determining adjustment values for elements in a matrix is described as an example. But not by way of limitation, the same and/or similar modifications are intended to be within the scope of the present disclosure.
Assuming data to be processedAIs represented as follows:
wherein,representing data to be processedAThe values of the elements in row i and column j, for example,the element values of the elements of line 1 and example 1,representing the element values of the elements of row 2, column 3. Taking averaging calculation as an example, the determining module 320 may calculate the data to be processed according to the following formulaAAdjustment value of each element in:
Wherein,indicating the adjustment value of the element in row i and column j,and the values of the elements in the data to be processed are represented, u is from i-1 to i +1, and v is from j-1 to j + 1. As can be appreciated, the first and second,the adjustment value representing an element is the average of the sum of the element values of the element and its neighboring 8 neighboring elements. For example,
then. It will be understood that when an element is located at a boundary, e.g. at a boundaryWhen the number of other elements in the nine-grid is less than 8, the missing adjacent element value can be regarded as 0, and the above formula is continuously applied, or only the other elements existing in the nine-grid are substituted into the above formula, and the number 9 of the elements in the formula is adjusted to be +1 of the number of the other elements actually existing in the nine-grid.
In some embodiments, when determining the adjustment value of the element in the data to be processed, the adjustment value is obtained by performing operation on the element value in the original data to be processed. For example, the value of an element in the data to be processed is adjusted byIs adjusted toThen, when the element is used as the adjacent element of other elements to calculate the adjustment value of the other elements, the original value is still usedRather than after adjustmentAnd (6) participating in operation.
It should be understood that the above examples are merely examples of adjusting elements in the data to be processed and are not intended to limit the manner in which elements are adjusted.
In some embodiments, updating the data to be processed may refer to replacing the original element value with the adjustment value of the element. After the adjustment value of the element is used for updating the element value of the element in the data to be processed, if the data to be processed contains the back door feature, the element value of the back door feature is correspondingly replaced by the adjustment value of the element value, so that the aim of destroying the back door feature in the data to be processed can be fulfilled. As described above, the back door attack needs to be acted, the back door features are the same as those included in the training sample during model training, including the same positions, shapes and values of the back door features in the sample, and the values of the back door features can be destroyed by using the way of updating the element values in the data to be processed by the adjustment values, so as to realize effective defense against the back door attack.
And step 208, inputting the updated data to be processed into the machine learning model to obtain a processing result. Step 208 may be performed by execution module 340.
In some embodiments, the processing result may refer to an output result of the machine learning model, such as 140 in fig. 1. After the execution module 340 inputs the updated to-be-processed data into the machine learning model, the output result obtained by the machine learning model according to the operation of the machine learning model can be regarded as the processing result of the machine learning model after the machine learning model directly processes the original to-be-processed data.
In some embodiments, the data to be processed may be image data represented in a matrix form, the machine learning model may be a classification model, and the processing result may be a classification result of an image. The element values of the elements in the data to be processed are adjusted before the data to be processed is input into the machine learning model, and the backdoor features contained in the data to be processed can be damaged after the adjustment, so that the machine learning model can effectively defend backdoor attacks. For example, following the example in fig. 1, the data to be processed is an image of a bird represented in a matrix form, a red circular back door feature is added at the same position of the upper right corner of the image, after the element value of the data to be processed is adjusted, the value of the back door feature changes correspondingly, so that the data to be processed no longer has the back door feature during model training, so that the back door attack is disabled, the machine learning model can correctly classify the image, and classify the image as a bird instead of specifying the label "dog", thereby achieving the effect of defending against the back door attack.
It should be noted that the above description related to the flow 200 is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and alterations to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, other steps are added between the various steps, such as pre-processing steps and storage steps, etc.
FIG. 3 is an exemplary block diagram of a data processing system shown in accordance with some embodiments of the present description. As shown in fig. 3, the system may include an acquisition module 310, a determination module 320, an update module 330, and an execution module 340.
The obtaining module 310 may obtain data to be processed.
In some embodiments, the data to be processed may refer to raw data directly obtained by the obtaining module 310, may be externally input data, or may be data collected and stored in advance. In some embodiments, the data to be processed may include pictures, videos, texts, etc., which may be represented in one of a vector, an array, or a matrix. The obtaining module 310 may obtain the data to be processed by calling a relevant interface, for example, a data transmission interface, so as to obtain the data input in real time. The obtaining module 310 may also read pre-stored data to be processed by communicating with a storage device.
The determining module 320 may adjust an element in the data to be processed, and determine an adjustment value of the element.
In some embodiments, the vector, array or matrix representing the data to be processed may comprise a plurality of elements. For example, when the data to be processed is in a vectorWhen it is expressed, thenMay be elements in the data to be processed, respectively. Each element has a specific numerical value, which may be referred to herein as an element's value. Adjusting an element in the data to be processed may be adjusting the element according to its neighbors. Where adjacent elements may refer to other elements that the element is adjacent or contiguous to. In some embodiments, the determining module 320 may operate on the element value of the element and the element value of at least one neighboring element thereof, determine an operation result, and determine the operation result as the adjustment value of the element. The operation manner may include a summation operation, an average operation, a variance operation, a median operation, etc., and accordingly, the operation result may include a sum value, an average value, a variance value, or a median value.
The update module 330 may update the data to be processed based on the adjustment value.
In some embodiments, updating the data to be processed may refer to replacing the original element value with the adjustment value of the element. The update module 330 may replace the element value originally in the data to be processed with the adjusted value of the element.
The executing module 340 may input the updated to-be-processed data to the machine learning model, and obtain a processing result.
In some embodiments, the processing result may refer to an output result of the machine learning model. After the execution module 340 inputs the updated to-be-processed data into the machine learning model, the output result obtained by the machine learning model according to the operation of the machine learning model can be regarded as the processing result of the machine learning model after the machine learning model directly processes the original to-be-processed data. In some embodiments, the data to be processed may be image data represented in a matrix form, the machine learning model may be a classification model, and the processing result may be a classification result of an image.
For further description of modules, reference may be made to the flow chart section of this specification, e.g., the associated description of fig. 1-2.
It should be understood that the system and its modules shown in FIG. 3 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above description of the data processing system and its modules is merely for convenience of description and is not intended to limit the present description to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, for example, the obtaining module 310, the determining module 320, the updating module 330 and the executing module 340 disclosed in fig. 3 may be different modules in one system, or may be a module that implements the functions of two or more modules described above. For example, the acquiring module 310 and the determining module 320 may be two modules, or one module may have both acquiring and determining functions. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: by adopting the input confusion mode, the effective defense against the backdoor attack is realized by destroying the backdoor characteristics contained in the sample. The implementation mode is simple, and the backdoor attack can be rapidly and effectively defended. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.
Claims (9)
1. A data processing method, wherein the method is for defending against backdoor attacks on machine learning models, comprising:
acquiring data to be processed, wherein the data to be processed at least comprises pictures, videos or characters and is represented in one of the following forms: a vector, array or matrix;
determining an adjustment value of an element in the data to be processed;
updating the data to be processed based on the adjustment value so as to perform input confusion on the data to be processed and destroy backdoor features possibly contained in the data to be processed;
and inputting the updated data to be processed into a machine learning model, acquiring a processing result and taking the processing result as the processing result of the machine learning model on the data to be processed.
2. The method of claim 1, wherein the determining adjustment values for elements in the data to be processed comprises:
calculating the element value of the element and the element value of at least one adjacent element to determine a calculation result;
and determining the operation result as the adjustment value of the element.
3. The method of claim 2, wherein the operation result is a statistical value comprising at least one of: sum, average, variance, or median.
4. The method of claim 1, wherein the data to be processed is an image represented in a matrix form and the machine learning model is a classification model.
5. A data processing system, wherein the system is for performing a method of defending against backdoor attacks on a machine learning model, comprising:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring data to be processed, the data to be processed at least comprises pictures, videos or characters, and the data to be processed is represented in one of the following forms: a vector, array, or matrix;
the adjusting module is used for determining an adjusting value of an element in the data to be processed so as to input and confuse the data to be processed and destroy backdoor features possibly contained in the data to be processed;
the updating module is used for updating the data to be processed based on the adjusting value;
and the execution module is used for inputting the updated data to be processed into the machine learning model, acquiring a processing result and taking the processing result as the processing result of the machine learning model on the data to be processed.
6. The system of claim 5, wherein to determine an adjustment value for an element in the data to be processed, the adjustment module is further to:
calculating the element value of the element and the element value of at least one adjacent element to determine a calculation result;
and determining the operation result as the adjustment value of the element.
7. The system of claim 6, wherein the operation result is a statistical value comprising at least one of: sum, average, variance, or median.
8. The system of claim 5, wherein the data to be processed is an image represented in a matrix form and the machine learning model is a classification model.
9. A data processing apparatus comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010409779.8A CN111340241B (en) | 2020-05-15 | 2020-05-15 | Data processing method, system and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010409779.8A CN111340241B (en) | 2020-05-15 | 2020-05-15 | Data processing method, system and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111340241A CN111340241A (en) | 2020-06-26 |
CN111340241B true CN111340241B (en) | 2020-11-20 |
Family
ID=71186571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010409779.8A Active CN111340241B (en) | 2020-05-15 | 2020-05-15 | Data processing method, system and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340241B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113518062B (en) * | 2020-12-08 | 2023-04-28 | 腾讯科技(深圳)有限公司 | Attack detection method and device and computer equipment |
CN113255909B (en) * | 2021-05-31 | 2022-12-13 | 北京理工大学 | Clean label neural network back door implantation system based on universal countermeasure trigger |
CN113269308B (en) * | 2021-05-31 | 2022-11-18 | 北京理工大学 | Clean label neural network back door implantation method based on universal countermeasure trigger |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108363714A (en) * | 2017-12-21 | 2018-08-03 | 北京至信普林科技有限公司 | A kind of method and system for the ensemble machine learning for facilitating data analyst to use |
US11769042B2 (en) * | 2018-02-08 | 2023-09-26 | Western Digital Technologies, Inc. | Reconfigurable systolic neural network engine |
CN109101817B (en) * | 2018-08-13 | 2023-09-01 | 亚信科技(成都)有限公司 | Method for identifying malicious file category and computing device |
CN109800277A (en) * | 2018-12-18 | 2019-05-24 | 合肥天源迪科信息技术有限公司 | A kind of machine learning platform and the data model optimization method based on the platform |
-
2020
- 2020-05-15 CN CN202010409779.8A patent/CN111340241B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111340241A (en) | 2020-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111340241B (en) | Data processing method, system and device | |
CN108122234B (en) | Convolutional neural network training and video processing method and device and electronic equipment | |
CN111027628B (en) | Model determination method and system | |
CN111080628A (en) | Image tampering detection method and device, computer equipment and storage medium | |
CN108230346B (en) | Method and device for segmenting semantic features of image and electronic equipment | |
CN109740689B (en) | Method and system for screening error labeling data of image semantic segmentation | |
US10699751B1 (en) | Method, system and device for fitting target object in video frame | |
CN111696064B (en) | Image processing method, device, electronic equipment and computer readable medium | |
CN113627433B (en) | Cross-domain self-adaptive semantic segmentation method and device based on data disturbance | |
CN105095835A (en) | Pedestrian detection method and system | |
CN111046394A (en) | Method and system for enhancing anti-attack capability of model based on confrontation sample | |
CN109508636A (en) | Vehicle attribute recognition methods, device, storage medium and electronic equipment | |
CN112115761A (en) | Countermeasure sample generation method for detecting vulnerability of visual perception system of automatic driving automobile | |
US20230386243A1 (en) | Information processing apparatus, control method, and non-transitory storage medium | |
US20210390667A1 (en) | Model generation | |
CN112396594A (en) | Change detection model acquisition method and device, change detection method, computer device and readable storage medium | |
CN110070017B (en) | Method and device for generating human face artificial eye image | |
US20210150238A1 (en) | Methods and systems for evaluatng a face recognition system using a face mountable device | |
CN117765393A (en) | Insect pest monitoring method, system, device and medium based on image recognition | |
EP3709666A1 (en) | Method for fitting target object in video frame, system, and device | |
CN108875467B (en) | Living body detection method, living body detection device and computer storage medium | |
CN114724101B (en) | Batch standardization-based multi-space countermeasure sample defense method and device | |
CN111046380A (en) | Method and system for enhancing anti-attack capability of model based on confrontation sample | |
CN117911803A (en) | Sample processing method, apparatus, computing device, and computer-readable storage medium | |
CN117121050A (en) | Segmenting and removing objects from media items |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |