CN111476228A

CN111476228A - White-box confrontation sample generation method for scene character recognition model

Info

Publication number: CN111476228A
Application number: CN202010263434.6A
Authority: CN
Inventors: 徐行; 肖金辉; 陈杰夫; 陈李江
Original assignee: Hainan Avanti Technology Co ltd
Current assignee: Hainan Avanti Technology Co ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2020-07-31

Abstract

The invention belongs to the technical field of scene character recognition and white-box attack, and particularly relates to a white-box confrontation sample generation method, system and device for a scene character recognition model, aiming at solving the problems that the existing white-box attack algorithm cannot meet the requirements and is long in attack time and unstable. The system method comprises the following steps: acquiring an image to be attacked, and generating a countercheck sample image by combining disturbance; acquiring an identification result of the countermeasure sample image through a scene character identification model, judging whether the image is successfully attacked or not based on the identification result, and taking the image as an optimal countermeasure sample image if the image is successfully attacked and the disturbed matrix norm becomes small; otherwise, judging whether the iteration times are smaller than the set maximum iteration times, if so, updating the disturbance and countermeasure sample image, and otherwise, outputting the optimal countermeasure sample image. The method meets the requirement of generating the white-box confrontation sample for the scene character recognition model, and avoids the time consumption and instability of an attack algorithm through gradient clipping.

Description

White-box confrontation sample generation method for scene character recognition model

Technical Field

The invention belongs to the technical field of scene character recognition and white box attack, and particularly relates to a white box countermeasure sample generation method, system and device for a scene character recognition model.

Background

Scene Text Recognition (STR) refers to recognizing Text information in any natural Scene picture. Compared with the conventional Optical Character Recognition (OCR), the difficulty of scene Character Recognition is higher, mainly because the characters in natural scenes are presented in rich forms, such as mixed one or more languages, and Character areas may be deformed, incomplete, blurred, and the like; and secondly, scenes are changeable, and character arrangement and form styles are changeable. The scene character recognition comprises a positioning part and a recognition part, and the method mainly aims at a depth model recognition part, namely that input is a picture extracted from a natural scene picture.

The depth models are largely different in the feature extraction part, and are mainly based on a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN), but these models are mainly divided into two types when sequence prediction is performed by using features:

first, based on the connection timing Classification loss (CTC). The method automatically aligns the real label and the characteristic diagram in a dynamic planning mode, and manual marking and cutting of the character level are omitted.

Second, Attention Mechanism (Attention Mechanism) is incorporated. The method uses an encoding-decoding (Encoder-Decoder) framework, aligns the characteristics of the current time step and the encoding module through an attention mechanism in the decoding process, converts the problem into a single character recognition problem on each time step, and accordingly improves the performance of the whole model.

The counterattack (adaptive Attack) is to add a small disturbance (perturbation) to an input to make a model make an erroneous judgment on an image after an Attack, and in the Attack process, the structure and parameters of the model are not changed, and the main purpose is to optimize the model by means of a counterattack sample (adaptive samples), widen the decision surface of the model, and predict some malicious behaviors to prevent the malicious behaviors from occurring. White-box counterattack (White boxad versal attach) refers to the knowledge of the structure and parameters of a model during an Attack. White-box anti-attacks are classified into targeted attacks (Target attach) and Untargeted attacks (unorgeted attach). In the field of scene character recognition, target attack means that a picture prediction sequence is identical to a specified sequence after attack, and non-target attack means that the picture prediction sequence is different from a label of the picture prediction sequence after attack.

At present, the white-box anti-attack method is mainly researched in the field of image classification, algorithms are mostly based on classification models, and no white-box attack algorithm directly aiming at scene character recognition models exists at present. Therefore, the method is inspired by a white-box attack algorithm in the field of image classification, and is improved on the basis of the white-box attack algorithm to realize the attack resistance of the attention mechanism scene character recognition model.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problems that the existing white-box attack algorithm cannot meet the requirement of generating a confrontation sample image by a scene character recognition model based on an attention mechanism, and that the confrontation attack takes a long time and is unstable, a first aspect of the present invention provides a white-box confrontation sample generation method for the scene character recognition model, the method comprising:

s100, acquiring an image to be attacked, preprocessing the image to obtain a preprocessed image, and generating a confrontation sample image by combining preset disturbance;

step S200, for the countermeasure sample image, acquiring a corresponding character recognition result through a trained scene character recognition model, judging whether the attack is successful or not through a preset judgment method by combining a corresponding attack type and a label sequence corresponding to the attack type, if the attack is successful and the matrix norm of the disturbance is smaller than the corresponding matrix norm before the disturbance is updated, taking the countermeasure sample image as an optimal countermeasure sample image and executing step S300, otherwise, executing step S300;

step S300, judging whether the current iteration frequency is smaller than the set maximum iteration frequency, if so, obtaining the gradient of the disturbance caused by the cost function of the scene character recognition model through a back propagation algorithm; updating the disturbance through a preset disturbance updating method based on the gradient, updating the confrontation sample image according to the updated disturbance, skipping to the step S200 after the updating is finished, otherwise, outputting the optimal confrontation sample image;

the scene character recognition model is constructed based on a neural network of an attention mechanism, and the neural network comprises an encoder and a decoder.

In some preferred embodiments, in step S100, "acquiring an image to be attacked and performing preprocessing" is performed by: carrying out size adjustment and normalization operation on the acquired image to be attacked; and adjusting the size of the image to be attacked according to the size of an input layer of the scene character recognition model.

In some preferred embodiments, in step S200, "determine whether the attack is successful by a preset determination method in combination with the attack type corresponding thereto and the tag sequence corresponding to the attack type", the method includes:

if the attack type corresponding to the counterattack sample image is a target attack, the character recognition result of the counterattack sample image is equal to the label corresponding to the attack type, and the attack is successful; the label sequence corresponding to the target type attack is a target character sequence of the image to be attacked;

if the attack type corresponding to the confrontation sample image is a non-target attack, the character recognition result of the confrontation sample image is not equal to the label sequence corresponding to the attack type, and the attack is successful; the label sequence corresponding to the non-target attack is a real label sequence corresponding to the image to be attacked.

In some preferred embodiments, the cost function of the scene text recognition model is:

L(adv,y)＝m*L_Attn+γ*L_Reg

wherein L (adv, y) is the cost value, adv is the recognition result of the confrontation sample image, y is the label sequence, m is the attack type corresponding to the confrontation sample image, L_AttnFor the sum of the cross-entropy loss values at each time step of the decoder, L_RegFor the regularization term, γ is the hyperparameter.

In some preferred embodiments, in step S300, "obtaining the gradient of the cost function of the scene character recognition model to the disturbance by using a back propagation algorithm" includes:

where g denotes the gradient, denotes the perturbation,

indicating the derivation.

In some preferred embodiments, the gradient corresponding to the perturbation is unitized before the perturbation is updated, and the unitized gradient is increased or decreased according to whether the countermeasure sample image is successfully attacked or not.

In some preferred embodiments, step S300 "update the perturbation by a preset perturbation update method based on the gradient" includes:

＝clip(+x,0,1)-x

＝clip(,-0.1,0.1)

＝-lr*g

wherein, x represents the preprocessed image, clip (.) represents the clipping range of the disturbance pixel value, and lr represents the learning rate corresponding to the scene character recognition model.

In a second aspect of the present invention, a white-box confrontation sample generation system for a scene character recognition model is provided, the system includes: the system comprises a confrontation sample image generation module, an attack judgment module and an iteration judgment module;

the countermeasure sample image generation module is configured to acquire an image to be attacked and perform preprocessing to obtain a preprocessed image, and generate a countermeasure sample image in combination with preset disturbance;

the attack judging module is configured to obtain a corresponding character recognition result of the confrontation sample image through a trained scene character recognition model, judge whether the confrontation sample image is successfully attacked or not through a preset judging method by combining a corresponding attack type of the confrontation sample image and a label sequence corresponding to the attack type, if the confrontation sample image is successfully attacked and the matrix norm of the disturbance is smaller than the corresponding matrix norm before the disturbance is updated, take the confrontation sample image as an optimal confrontation sample image and execute the iteration judging module, and otherwise, execute the iteration judging module;

the iteration judging module is configured to judge whether the current iteration times are smaller than the set maximum iteration times, and if so, the gradient of the cost function of the scene character recognition model to the disturbance is obtained through a back propagation algorithm; updating the disturbance through a preset disturbance updating method based on the gradient, updating the confrontation sample image according to the updated disturbance, skipping to an attack judgment module after the updating is finished, and otherwise, outputting the optimal confrontation sample image;

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, and the programs are loaded and executed by a processor to implement the above white-box confrontation sample generation method for a scene character recognition model.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the white-box confrontation sample generation method for the scene character recognition model.

The invention has the beneficial effects that:

the method meets the requirement of generating the white-box counterattack sample for the scene character recognition model, and avoids the time consumption and instability of the conventional counterattack algorithm based on the image classification model through gradient clipping. In the process of resisting attack by the white box, the calculated gradient is scaled through gradient normalization to avoid sudden change of image disturbance, and then the value of the disturbance generated finally is ensured to be within a normal range through gradient clipping, numerical formula constraint and the like, so that the robustness of resisting attack by the white box is ensured. After the attack is successful, the proportion of cross entropy loss and regularization terms is automatically adjusted to adjust the norm size of the disturbance, the disturbance is ensured to be invisible to naked eyes, and the confidence coefficient of the image attack is improved. Meanwhile, in the iterative process, a mode of combining a larger learning rate with 1 is set, and the attack success is accelerated according to the attack success or failure multiplication and the gradient after the multiple reduction normalization, so that the time is shortened, and the high efficiency of resisting the attack is ensured.

Moreover, the method can be used in different scene character recognition models in a self-adaptive manner, and the diversity of the use scene is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

FIG. 1 is a flowchart illustrating a white-box confrontation sample generation method for a scene text recognition model according to an embodiment of the present invention;

FIG. 2 is a block diagram of a white-box confrontation sample generation system for a scene text recognition model according to an embodiment of the invention;

FIG. 3 is a graphical illustration of the results of a targeted attack and a non-targeted attack in accordance with one embodiment of the present invention;

fig. 4 is a hardware framework diagram of a white-box countermeasure sample generation system for a scene text recognition model according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention relates to a white-box confrontation sample generation method for a scene character recognition model, which comprises the following steps as shown in figure 1:

In order to more clearly describe the method for generating the white-box confrontation sample of the scene character recognition model, the following describes in detail the steps in an embodiment of the method of the present invention with reference to the drawings.

And S100, acquiring an image to be attacked, preprocessing the image to obtain a preprocessed image, and generating a countersample image by combining preset disturbance.

In this embodiment, an image to be attacked and a trained scene character recognition model are first obtained, where the image x to be attacked may be a training set used for model training or may be any test picture. The method comprises the steps of adjusting the size of the picture and normalizing the picture. The image size is adjusted according to the size of an input layer of the scene character recognition model, the height, the width and the channel number of the image after being adjusted are respectively represented by [ img _ h, img _ w and img _ c ], the normalization operation can adopt various modes, but in the embodiment, the normalization operation is required to be consistent with the normalization mode of the scene character recognition model, namely the range of the normalization mode adopted by the invention is [0,1], and the normalization process is shown as a formula (1):

and constructing the perturbation and countermeasure sample image with the same size according to the size of the preprocessed image. The perturbation is expressed as a random value (typically less than 10) that is generated within a small range, usually using all-zero initialization, but also using a random distribution^-5) The range should be as small as possible, otherwise the possible disturbance initialization values are visible to the naked eye. Generating a confrontation sample image by combining the preprocessed image based on the initialized disturbance term, wherein the generation process is shown as formula (2):

Adv＝x+ (2)

wherein Adv is the challenge sample image.

Step S200, for the countermeasure sample image, obtaining a corresponding character recognition result through a trained scene character recognition model, and judging whether the attack is successful or not through a preset judging method by combining the attack type corresponding to the character recognition model and a label sequence corresponding to the attack type, if the attack is successful and the matrix norm of the disturbance is smaller than the corresponding matrix norm before the disturbance is updated, taking the countermeasure sample image as an optimal countermeasure sample image and executing step S300, otherwise, executing step S300.

In the embodiment, an optimal countersample image is obtained through white box attack, namely, the structure and parameters of the scene character recognition model are fixed, so that the value and the structure of the scene character recognition model are unchanged in the subsequent counterattack process of the model, and the scene character recognition model is used as a learning object for training. In the training process, gradient unitization, numerical pruning and image numerical value adjustment are needed to ensure that numerical value overflow of restored image numerical value points can not occur. The learning rate lr of the model is generally set to 0.5, and the update form is shown in the formula (3) (4):

lr＝0.001,if max_iter≥iter≥200 (4)

wherein, base_lrFor the initial learning rate, iter is the current iteration number, max _ iter is the set maximum iteration number, which is generally set to 500, and when iter is greater than or equal to 200, the learning rate will remain unchanged and is always 0.001.

Before the white box attack, the attack type of the anti-sample image and the label sequence corresponding to the attack type are preset, the method is preferably set into two attack types, the two attack types are respectively target attack and non-target attack, when the target attack is carried out, the label y is the target character sequence of the image to be attacked, and meanwhile, m is-1, if the target attack is not carried out, the label y is the real label sequence of the image to be attacked, m is 1, and m represents the numeralization of the attack mode. As shown in fig. 3, group route is a real tag sequence, resolution image is a preprocessed image, Target attach is a Target Attack, (unargeted attach is a non-Target Attack, adaptive image is a countersample image, Adv imgpred is a Target character sequence (i.e. a recognition result corresponding to successful Target Attack), and persistence is a visual image corresponding to disturbance.

When attack starts, a word recognition result of the confrontation sample image is obtained through the trained scene word recognition model, and whether attack of the confrontation sample image succeeds or not is judged through a preset judging method according to a preset attack type and a label sequence corresponding to the type, namely whether the recognition result of the confrontation sample image is equal to the label sequence or not is judged. The judging method comprises the following steps:

if the attack type corresponding to the counterattack sample image is the target attack, the character recognition result of the counterattack sample image is equal to the label corresponding to the attack type, and the attack is successful;

and if the attack type corresponding to the counterattack sample image is the non-target attack, the character recognition result of the counterattack sample image is not equal to the label sequence corresponding to the attack type, and the attack is successful.

If the attack is successful and the disturbed matrix norm is smaller than the corresponding matrix norm before the disturbance update, the countermeasure sample image is taken as the optimal countermeasure sample image and the step S300 is executed, otherwise the step S300 is executed.

When the disturbance is initialized, the matrix norm of the disturbance is compared with a preset matrix norm (generally initialized to a larger value so as to facilitate subsequent updating, in this embodiment, the preset matrix norm is preferably set to be 1000), otherwise, the matrix norm of the current disturbance is compared with the corresponding matrix norm before updating of the disturbance, and if the matrix norm is smaller than the corresponding matrix norm before updating, the current disturbance is updated as the optimal disturbance.

Step S300, judging whether the current iteration frequency is smaller than the set maximum iteration frequency, if so, obtaining the gradient of the disturbance caused by the cost function of the scene character recognition model through a back propagation algorithm; updating the disturbance through a preset disturbance updating method based on the gradient, updating the confrontation sample image according to the updated disturbance, skipping to the step S200 after the updating is finished, and otherwise, outputting the optimal confrontation sample image.

In this embodiment, after the attack is completed, it is determined whether the current iteration number is smaller than the set maximum iteration number, if so, the iteration is continued, and otherwise, the optimal countermeasure sample image is output. The optimal countermeasure sample image is a globally optimal countermeasure sample image in an iterative process.

If the current iteration times are less than the set maximum iteration times, the specific processing is as follows:

step S310, obtaining a substitution value through a cost function of the scene character recognition model;

the cost function is shown in equation (5):

L(adv,y)＝m*L_Attn+γ*LReg (5)

L_AttnAnd (3) adjusting according to the attack type, wherein the adjusting method is as shown in a formula (6):

L_Attn＝m*L_Attn(6)

regularization term L_RegObtained mainly using L1-norm, L2-norm, as shown in equation (7):

wherein ii represents the value of each position in the perturbation term, the numerator is L-Norm, and the denominator is the number of elements, and the calculation method is shown by the company (8):

num＝img__h*img__w*img__c (8)

γ is used as a hyperparameter to balance the weight of the two partial (cross hair loss and regularization term) losses, with the initial value base _ γ typically set to 1000. If the previous attack was successful, γ is base _ γ (round (ier/200)), and if the attack was not successful, γ is base _ γ.

Step S320, obtaining the gradient of the cost function to disturbance through a back propagation algorithm based on the cost value;

the procedure for obtaining the gradient is shown in equation (9):

wherein the content of the first and second substances,

indicating the derivation.

And step S330, updating the disturbance through a preset disturbance updating method based on the gradient, and updating the confrontation sample image according to the updated disturbance.

Step S331, unitizing the disturbed gradient, as shown in equation (10):

wherein | g | purple₂The vector two norm representing the gradient matrix.

In step S312, if the attack against the sample image is successful,

if the attack fails g-2 g;

in step S313, a disturbance update is performed, as shown in equation (11):

＝-lr*g (11)

in step S314, the disturbed pixel values are clipped, as shown in equations (12) and (13):

＝clip(,-0.1,0.1) (12)

＝clip(+x,0,1)-x (13)

where clip (.) represents the clipping range of the perturbed pixel values.

And step S315, updating the confrontation sample image according to the updated disturbance, and jumping to the step S200 to continue iteration after the updating is finished.

Otherwise, performing inverse normalization on the optimal confrontation sample image, and outputting the optimal confrontation sample image as a global optimal confrontation sample image (namely the finally generated confrontation sample). Wherein the globally optimal countermeasure sample image usually takes a particularly large value as an initialization value to ensure that the item can be updated normally subsequently if the attack is successful.

If all attacks fail, the output is unpredictable, i.e., the resist sample image generation fails.

A white-box confrontation sample generation system for a scene character recognition model according to a second embodiment of the present invention, as shown in fig. 2, includes: the system comprises an anti-sample image generation module 100, an attack judgment module 200 and an iteration judgment module 300;

the countermeasure sample image generation module 100 is configured to acquire an image to be attacked and perform preprocessing to obtain a preprocessed image, and generate a countermeasure sample image in combination with preset disturbance;

the attack judgment module 200 is configured to obtain a corresponding character recognition result of the confrontation sample image through a trained scene character recognition model, judge whether the confrontation sample image is successfully attacked through a preset judgment method by combining a corresponding attack type of the confrontation sample image and a tag sequence corresponding to the attack type, if the confrontation sample image is successfully attacked and the matrix norm of the disturbance is smaller than the corresponding matrix norm before the disturbance is updated, take the confrontation sample image as an optimal confrontation sample image and execute the iteration judgment module 300, otherwise, execute the iteration judgment module 300;

the iteration judgment module 300 is configured to judge whether the current iteration number is smaller than a set maximum iteration number, and if so, obtain the gradient of the perturbation caused by the cost function of the scene character recognition model through a back propagation algorithm; updating the disturbance through a preset disturbance updating method based on the gradient, updating the confrontation sample image according to the updated disturbance, skipping to the attack judgment module 200 after the updating is finished, and otherwise, outputting the optimal confrontation sample image;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that, the white-box countermeasure sample generation system for the scene text recognition model provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present invention stores therein a plurality of programs, which are adapted to be loaded by a processor and to implement the above-described white-box confrontation sample generation method for a scene character recognition model.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the white-box confrontation sample generation method for the scene character recognition model.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.

Referring now to FIG. 4, there is illustrated a block diagram of a computer system suitable for use as a server in implementing embodiments of the method, system, and apparatus of the present application. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 4, the computer system includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404.

To the I/O interface 405, AN input section 306 including a keyboard, a mouse, and the like, AN output section 407 including components such as a Cathode Ray Tube (CRT), a liquid Crystal Display (L CD, &lttttranslation = L "&tttl &ttt/t &gttiacquired Crystal Display), and the like, a speaker, and the like, a storage section 408 including a hard disk and the like, and a communication section 409 including a network interface card such as L AN (local area network, L annular array) card, a modem, and the like are connected, the communication section 409 performs communication processing via a network such as the internet, a driver 410 is also connected to the I/O interface 405 as necessary, a removable medium 411 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, and the like is mounted on the driver 410 as necessary so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A white-box confrontation sample generation method for a scene character recognition model is characterized by comprising the following steps:

2. The method for generating the white-box confrontation sample for the scene character recognition model according to claim 1, wherein in step S100, "acquiring the image to be attacked and preprocessing" is performed, the method includes: carrying out size adjustment and normalization operation on the acquired image to be attacked; and adjusting the size of the image to be attacked according to the size of an input layer of the scene character recognition model.

3. The method for generating the white-box countermeasure sample for the scene character recognition model according to claim 2, wherein in step S200, "determine whether the attack is successful by a preset determination method in combination with the attack type corresponding thereto and the tag sequence corresponding to the attack type", the method includes:

4. The method of generating white-box confrontation samples for scene text recognition model according to claim 3, wherein the scene text recognition model has a cost function of:

L(adv,y)＝m*L_Attn+γ*L_Reg

5. The method for generating the white-box confrontation sample for the scene text recognition model according to claim 4, wherein in step S300, "obtaining the gradient of the cost function of the scene text recognition model to the disturbance through a back propagation algorithm" is performed by:

where g denotes the gradient, denotes the perturbation,

indicating the derivation.

6. The method as claimed in claim 1, wherein the gradient corresponding to the perturbation is unitized before the perturbation is updated, and the unitized gradient is increased or decreased according to whether the challenge sample image is successfully attacked or not.

7. The method for generating the white-box confrontation sample for the scene character recognition model as claimed in claim 5, wherein in step S300, "update the disturbance by a preset disturbance update method based on the gradient" includes:

＝clip(+x,0,1)-x

＝clip(,-0.1,0.1)

＝-lr*g

8. A white-box confrontation sample generation system for a scene text recognition model, the system comprising: the system comprises a confrontation sample image generation module, an attack judgment module and an iteration judgment module;

9. A storage device having a plurality of programs stored therein, wherein the program applications are loaded and executed by a processor to implement the method for generating white-box confrontation samples for a scene text recognition model according to any one of claims 1 to 7.

10. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor to implement the white-box confrontation sample generation method for the scene text recognition model according to any one of claims 1 to 7.