CN110378474A - Fight sample generating method, device, electronic equipment and computer-readable medium - Google Patents

Fight sample generating method, device, electronic equipment and computer-readable medium Download PDF

Info

Publication number
CN110378474A
CN110378474A CN201910684104.1A CN201910684104A CN110378474A CN 110378474 A CN110378474 A CN 110378474A CN 201910684104 A CN201910684104 A CN 201910684104A CN 110378474 A CN110378474 A CN 110378474A
Authority
CN
China
Prior art keywords
text
sample
candidate
result information
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910684104.1A
Other languages
Chinese (zh)
Inventor
苗宁
周浩
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910684104.1A priority Critical patent/CN110378474A/en
Publication of CN110378474A publication Critical patent/CN110378474A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

Embodiment of the disclosure discloses confrontation sample generating method, device, electronic equipment and computer-readable medium.One specific embodiment of this method includes: the training sample concentrated for training sample, executes following generation step, wherein training sample includes sample text markup information corresponding with the sample text: generating the corresponding candidate text of the sample text;By candidate's text input text-processing model, text-processing result information is obtained;If the corresponding text-processing result information of candidate's text meets preset condition, candidate's text and the corresponding text-processing result information of candidate's text are determined as to resisting sample.The embodiment realizes control and generates to resisting sample towards desired direction.

Description

Fight sample generating method, device, electronic equipment and computer-readable medium
Technical field
Embodiment of the disclosure is related to field of computer technology, and in particular to confrontation sample generating method, device, electronics are set Standby and computer-readable medium.
Background technique
With the development of artificial intelligence, the relevant technologies are widely used to every field.For example, image recognition, voice are known Not, the fields such as natural language processing, artificial intelligence technology have played very important effect.The application of artificial intelligence technology, So that man-machine interaction is more convenient.At the same time, since the realization of various functions is depended on through a large amount of sample Training.Neural network after training is more sensitive for interfering, and may influence the safety of neural network.In order to improve nerve net The anti-interference of network needs to be based on to be trained neural network resisting sample.Thus needed for generating to propose resisting sample It asks.
Summary of the invention
This part of the disclosure is used to conceive with brief form introduction, these designs will be in specific embodiment below Part is described in detail.This part of the disclosure is not intended to identify the key feature of claimed technical solution or necessary special Sign, is intended to be used to limit the range of the technical solution of required protection.
Some embodiments of the present disclosure propose confrontation sample generating method, device, electronic equipment and computer-readable Jie Matter.
In a first aspect, some embodiments of the present disclosure provide a kind of confrontation sample generating method, comprising: for training sample The training sample of this concentration executes following generation step, wherein training sample includes that sample text is corresponding with the sample text Markup information: the corresponding candidate text of the sample text is generated;Candidate's text input text-processing model is obtained at text Manage result information;If the corresponding text-processing result information of candidate's text meets preset condition, by candidate's text and it is somebody's turn to do The corresponding text-processing result information of candidate text is determined as to resisting sample.
Second aspect, some embodiments of the present disclosure provide a kind of pair of resisting sample generating means, comprising: execution unit, It is configured to the training sample concentrated for training sample, carries out the generation to resisting sample using included subelement, wherein Training sample includes sample text markup information corresponding with the sample text, and execution unit includes following subelement: generating son Unit is configured to generate the corresponding candidate text of the sample text;Text-processing result information generates subelement, is configured to By candidate's text input text-processing model, text-processing result information is obtained;First determines subelement, should if being configured to The corresponding text-processing result information of candidate text meets preset condition, by candidate's text and the corresponding text of candidate's text Present treatment result information is determined as to resisting sample.
The third aspect, some embodiments of the present disclosure provide a kind of electronic equipment, comprising: one or more processors; Storage device is stored thereon with one or more programs, when one or more programs are executed by one or more processors, so that One or more processors realize the method as described in implementation any in first aspect.
Fourth aspect, some embodiments of the present disclosure provide a kind of computer-readable medium, are stored thereon with computer Program, wherein the method as described in implementation any in first aspect is realized when program is executed by processor.
Confrontation sample generating method, device, electronic equipment and computer-readable Jie that some embodiments of the present disclosure provide Matter by the candidate text of generation sample text, and by candidate text input text-processing model, obtains text-processing result Information.On this basis, if the corresponding text-processing result information of candidate's text meets preset condition, by candidate's text with And the corresponding text-processing result information of candidate's text is determined as to resisting sample.Wherein, it by controlling preset condition, can control System generates resisting sample towards desired direction.For example, control preset condition makes the corresponding text-processing result of candidate text Information markup information corresponding from sample text is different, or even opposite.Sample can to fight to more aggressive direction It generates.
Detailed description of the invention
In conjunction with attached drawing and refer to following specific embodiments, the above and other feature, advantage of each embodiment of the disclosure and Aspect will be apparent.In attached drawing, the same or similar appended drawing reference indicates the same or similar element.It should manage Solution attached drawing is schematically that original part and element are not necessarily drawn to scale.
Fig. 1 is the architecture diagram that some embodiments of the present disclosure can be applied to exemplary system therein;
Fig. 2 is the flow chart according to some embodiments of the confrontation sample generating method of the disclosure;
Fig. 3 is the schematic diagram according to an application scenarios of the confrontation sample generating method of some embodiments of the present disclosure;
Fig. 4 is the flow chart according to other embodiments of the confrontation sample generating method of the disclosure;
Fig. 5 is the structural schematic diagram according to some embodiments to resisting sample generating means of the disclosure;
Fig. 6 is adapted for the structural schematic diagram for realizing the electronic equipment of some embodiments of the present disclosure.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates.On the contrary, providing these embodiments is in order to more thorough and be fully understood by the disclosure.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
It also should be noted that illustrating only part relevant to related invention for ease of description, in attached drawing.? In the case where not conflicting, the feature in embodiment and embodiment in the disclosure be can be combined with each other.
It is noted that the concepts such as " first " that refers in the disclosure, " second " are only used for different devices, module or list Member distinguishes, and is not intended to limit the sequence or relation of interdependence of function performed by these devices, module or unit.
It is noted that referred in the disclosure "one", the modification of " multiple " be schematically and not restrictive this field It will be appreciated by the skilled person that being otherwise construed as " one or more " unless clearly indicate otherwise in context.
The being merely to illustrate property of title of the message or information that are interacted between multiple devices in disclosure embodiment Purpose, and be not used to limit the range of these message or information.
The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1, which is shown, can generate dress using the confrontation sample generating method of some embodiments of the present disclosure or to resisting sample The exemplary system architecture 100 set.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various client applications can be installed, such as text emotion tendency identification is answered on terminal device 101,102,103 With, the application of translation class, instant messaging application etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be with display screen and the various electronic equipments of supporting text to show, including but not limited to smart phone, plate Computer, E-book reader, pocket computer on knee and desktop computer etc..When terminal device 101,102,103 is soft When part, it may be mounted in above-mentioned cited electronic equipment.It may be implemented into for example for providing the more of Distributed Services Single software or software module also may be implemented into a software or software module.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to the text on terminal device 101,102,103 Processing application provides the backstage text server supported.As an example, can be equipped on the text server of backstage for handling The text-processing model of text.As needed, backstage text server can be generated for being trained to text-processing model To resisting sample.
It should be noted that confrontation sample generating method provided by embodiment of the disclosure is generally held by server 105 Row.Correspondingly, resisting sample generating means can be set in server 105.It is not specifically limited herein.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as that single software or software also for example may be implemented into for providing the multiple softwares or software module of Distributed Services Module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.It is appreciated that working as training sample set and text-processing When model is all stored in server local, terminal device, network can also be not present.
With continued reference to Fig. 2, the process 200 of some embodiments of the confrontation sample generating method according to the disclosure is shown. The confrontation sample generating method executes following generation step including the training sample concentrated for training sample:
Step 201, the corresponding candidate text of the sample text is generated.
In some embodiments, training sample set can be various data sets.The training sample that training sample is concentrated This includes sample text markup information corresponding with the sample text.As an example, can be IMDB or SNLI data set.Its In, IMDB (Internet Movie Data Base) includes 25000 training samples and 25000 test samples.Each sample This includes film review and label.Label is for characterizing film review as front evaluation or unfavorable ratings.SNLI, which is also one, can be used for The data set of natural language processing (NLP) task.
In some embodiments, the executing subject (such as server shown in FIG. 1) for fighting sample generating method can give birth to At the corresponding candidate text of the sample text in training sample.It, can be for any trained sample of training sample concentration in practice This execution this method.For some training sample, it is corresponding that above-mentioned executing subject can generate the sample text in several ways Candidate text.
As an example, synonym or homonym replacement can be carried out at least one word in the sample text, thus Obtain the corresponding candidate text of the sample text.
In some optional implementations, Markov Chain Monte Carlo MCMC (Markov Chain can be based on Monte Carlo) method of sampling and language model LM (language model) generate the corresponding candidate text of the sample text This.Wherein, the MCMC method of sampling includes the method for samplings such as Metropolis-Hasting, Gibbs.
By taking Metropolis-Hasting as an example, in given Stable distritation (stationary distribution) π (x) In the case where suggesting distribution (proposal distribution), sampling can be generated by Metropolis-Hasting Sample.By taking the corresponding sample of sample text x is x ' as an example, it is proposed that be distributed as g (x ' | x).Sample is the acceptance rate of x ' α can be calculated using following formula (1):
If meeting receptance α, sample x ' can receive.
In some optional implementations, LM, the sampling inputted can be inputted by the sample that will be generated The probability that sample occurs.Further, sample of the probability value more than preset threshold can be chosen, to obtain candidate sample This.Wherein, probability value is higher, illustrates that the probability occurred is high, namely the sample of input is more smooth.So as to avoid due to Arbitrarily change training sample causes to construct resisting sample and fail.
In some optional implementations, it can be sampled by way of following at least one: to the sample text Delete word (deleting word), the sample text (epexegesis) is added, is selected from target dictionary from choosing at least one word in target dictionary Word is taken to replace the word (changing word) in the sample text, to obtain the corresponding candidate text of the sample text.Wherein, target dictionary It can be any dictionary.The determination of target dictionary can be specified by technical staff, or be obtained according to certain conditional filtering.
Continue by taking above-mentioned sample text x as an example, is respectively using the probability that above-mentioned three kinds of modes obtain sample x '(deleting word),(epexegesis),(changing word) should then meet:
Wherein, pr、pi、pdThe probability respectively sampled using above-mentioned three kinds of modes.Its value can be according to practical need It to be specified by technical staff.
As an example,It can determine in the following manner:
If x '=x-mOtherwise,It is 0.
Wherein, after x-m indicates that sample text x deletes this word of m, obtained text.
Step 202, by candidate's text input text-processing model, text-processing result information is obtained.
In some embodiments, which can be obtained text by above-mentioned executing subject Processing result information.Wherein, text-processing model can be the various neural networks for handling text.For example, it may be each Kind text classification network, including but not limited to translation network, part-of-speech tagging network, text emotion trend analysis network etc..
As an example, can be by the two-way LSTM of candidate's text input (Long Short-Term Memory, shot and long term note Recall network) network etc., to obtain text-processing result information.According to the difference of neural network, obtained text-processing result Information is also not quite similar.For example, it may be text emotion trend analysis result information.For example, input is if film review, output Can be the film review be front evaluation (or unfavorable ratings) probability.
Step 203, if the corresponding text-processing result information of candidate's text meets preset condition, by candidate's text with And the corresponding text-processing result information of candidate's text is determined as to resisting sample.
In some embodiments, on the basis of step 202, the corresponding text-processing result of candidate's text can be determined Whether information meets preset condition.
According to the actual situation, preset condition can be various conditions.As an example, preset condition may is that time The corresponding text-processing result information of selection sheet markup information corresponding with sample text is opposite.Continue to input as film review act Example, the corresponding markup information of sample text are front evaluation.So, when the corresponding text-processing result information of candidate text is negative When face is evaluated, it is believed that markup information corresponding with sample text is opposite.Certainly, according to actual needs, it also can be set His preset condition, to control the direction generated to resisting sample.
In practice, as an example, following formula (2) can be met by controlling Stable distritation, to realize for fighting sample This is generated towards more aggressive direction:
Wherein, LM (x) is the probability that sample text x occurs.For the corresponding sample of sample text x is inputted Text-processing model, obtained text-processing result information areNon-normalized probability.Wherein,It is corresponding with sample text The different text-processing result information of markup information y.Be proportional to LM (x) withProduct.
In some optional implementations, for the word in the sample text, when being sampled by the way of changing word When sample, as an example, can obtain in the following ways
Wherein, wnFor n-th of word in sample text;To replace set of words;N is the sum of word in sample text;M is indicated The word currently changed is m-th of word in text.wcFor the word for replacing m-th of word in sample text.
When obtaining sample by the way of epexegesis, since epexegesis can be divided into two steps: the first step, it is inserted into occupy-place Word;Second step carries out occupy-place word to change word.Therefore, it is possible to using with change part of speech as method obtain
Wherein,For target dictionary, as an example, target dictionary is obtained by following steps:
Initial dictionary is screened based on LM, obtains target dictionary.
For changing word, for some word in sample text, each word that can be obtained by LM in initial dictionary is replaced After varying word in this document, the probability of obtained text appearance.On this basis, can according to appearance probability by it is high on earth Sequence, choose the word of preset quantity as the corresponding target dictionary of word in sample text.With the number of the word in initial dictionary Amount is compared, and the quantity of the word in target dictionary greatly reduces, and so as to reduce calculation amount, improves arithmetic speed.Similar, it is right In epexegesis, also available target dictionary.
As an example, after the word that can be calculated by the following formula in (replacement) word w replacement sample text x, obtained text The probability S of this appearanceB(ω | x):
SB(w | x)=LM (w | x[1:m-1])·LMb(w|x[m+1:n])
Wherein, LM (ω | x[1:m-1]) are as follows: before known w in the case where all words, w selects each word in initial dictionary Probability.
LMb(ω|x[m+1:n]) are as follows: after known w in the case where all words, w selects the general of each word in initial dictionary Rate.
In above-mentioned implementation, since the generation to resisting sample is utilized text-processing result information, but text-processing The structural information of model it is unknown, black box status is externally presented.It thus may be considered the confrontation sample generated under black box status This.
In some optional implementations, if the available gradient information to text-processing model, text-processing Whitepack state is externally presented in model.Thus may be considered generated under whitepack state to resisting sample.Specifically, can pass through After following formula calculates the word in some (replacement) word w replacement sample text x, the probability S of obtained text appearanceW(ω | x):
ForCorresponding loss function.
emIt is current word w with emWith the term vector of substitute w.So as to so that confrontation sample is faster towards gradient decline Direction generates.
In some optional implementations, if the corresponding text-processing result information of candidate's text is unsatisfactory for default item Part continues to execute generation step using the corresponding candidate text of the sample text as sample text.
With continued reference to the application scenarios that Fig. 3, Fig. 3 are according to the confrontation sample generating method of some embodiments of the present disclosure One schematic diagram.In the application scenarios of Fig. 3, the executing subject for fighting sample generating method can be for training sample concentration Training sample executes following generation step.By taking one of training sample as an example.The sample text of the training sample is " i Really like this movie ", i.e., I be delithted with this film.Its corresponding markup information is " 99%positive ", There is 99% probability for front comment.On this basis, it is corresponding can to firstly generate the sample text for above-mentioned executing subject Candidate text " i truely like this movie ".By candidate's text input text-processing model (text emotion tendency Disaggregated model), text emotion tendency classification information " 82%positive " is obtained, that is, has 82% probability for front comment.It can Choosing, generation step can be continued to execute using candidate's text as sample text.To successively obtain candidate text " i truely like the movie”、“we truely like the movie”、“we truely like the show”。 Its corresponding text emotion tendency classification information difference is as shown in the figure.For candidate text " we truely like the Show ", corresponding text emotion tendency classification information is " 59%negative ", that is, having 59% probability is negative reviews. Under this application scene, preset condition can be the corresponding text-processing result information of candidate text mark corresponding with sample text It is opposite to infuse information.Since candidate text " we truely like the show " meets preset condition, thus can be by the candidate Text " we truely like the show " and the corresponding text emotion of candidate's text are inclined to classification information " negative " is determined as to resisting sample.
The confrontation sample generating method that some embodiments of the present disclosure provide passes through the candidate text for generating sample text, with And by candidate text input text-processing model, obtain text-processing result information.On this basis, if candidate's text is corresponding Text-processing result information meet preset condition, candidate's text and the corresponding text-processing result of candidate's text are believed Breath is determined as to resisting sample.Wherein, by controlling preset condition, it can control and resisting sample is generated towards desired direction.Example Such as, control preset condition makes the corresponding text-processing result information of candidate text markup information corresponding with sample text not Together, or even it is opposite.Sample can to fight to more aggressive direction generation.
With further reference to Fig. 4, it illustrates the processes 400 of other embodiments of confrontation sample generating method.The confrontation The process 400 of sample generating method executes following generation step including the training sample concentrated for training sample:
Step 401, the corresponding candidate text of the sample text is generated.
Step 402, by candidate's text input text-processing model, text-processing result information is obtained.
Step 403, if the corresponding text-processing result information of candidate's text meets preset condition, by candidate's text with And the corresponding text-processing result information of candidate's text is determined as to resisting sample.
In some embodiments, the specific implementation of step 401-403 and brought technical effect can refer to Fig. 2 pairs The embodiment answered, details are not described herein.
Step 404, generation is determined as new training sample set to the intersection of resisting sample and training sample set.
Step 405, text-processing model is trained based on new training sample set.
In some embodiments, executing subject can use various model training methods to text based on new training sample set Present treatment model is trained.For example, training sample can be chosen by concentrating from new training sample, by the sample of the training sample Input of this text as text-processing model, the markup information of the training sample is defeated as the expectation of text-processing model Out, text-processing model is trained.As an example, the reality output of text-processing model can be calculated based on loss function With the difference of desired output.Then the parameter of text-processing model is adjusted.On this basis, new training sample is chosen Text-processing model after adjusting parameter is continued to train, until text-processing model is met the requirements.
Figure 4, it is seen that compared with the description of the corresponding some embodiments of Fig. 2, in the corresponding some embodiments of Fig. 4 The process 400 of confrontation sample generating method increase based on including to the new training sample set of resisting sample to text-processing mould The step of type is trained.After tested, the robustness and accuracy rate of the text-processing model after training are all improved.With list The LSTM of layer is used as text-processing model, and training sample set is for IMDB, using including the confrontation generated under black box status It is 93% by attack cost after the new training sample set of sample is trained.Further, using being included in whitepack It is 92.4% by attack cost after what is generated under state is trained the new training sample set of resisting sample.And do not have There is the text-processing model by dual training, is 98.7% by attack cost.As it can be seen that by the text of dual training The robustness of present treatment model is obviously improved.In addition, further test its accuracy, test respectively its by 10,000, Accuracy rate after 30000,100,000 iteration, the accuracy of the text-processing model identification without dual training is respectively 58.8%, 65.8%, 73.0%.Using include generated under whitepack state the new training sample set of resisting sample is trained after, it is quasi- True rate difference 60%, 66.9%, 73.5%, accuracy rate has obtained significant raising.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, it is raw that present disclose provides a kind of pair of resisting samples At some embodiments of device, these Installation practices are corresponding with those embodiments of the method shown in Fig. 2, which specifically may be used To be applied in various electronic equipments.
As shown in figure 5, some embodiments include: execution unit 501, execution unit 501 to resisting sample generating means 500 Including following subelement: generating subelement 502, text-processing result information generates subelement 503 and first determines subelement 504.Wherein, execution unit 501 is configured to the training sample concentrated for training sample, is carried out using included subelement Generation to resisting sample, wherein training sample includes sample text markup information corresponding with the sample text.Generate subelement 502 are configured to generate the corresponding candidate text of the sample text.Text-processing result information generates subelement 503 and is configured to By candidate's text input text-processing model, text-processing result information is obtained.If first determines that subelement 504 is configured to The corresponding text-processing result information of candidate's text meets preset condition, and candidate's text and candidate's text is corresponding Text-processing result information is determined as to resisting sample.
In some embodiments, above-mentioned execution unit 501, generation subelement 502, text-processing result information generate son list Member 503 and first determines that the specific implementation of subelement 504 and its brought technical effect can refer to the corresponding some implementations of Fig. 2 Step 201-203 in example, details are not described herein.
In some optional implementations, generating subelement 502 can be further configured to: based on Markov Chain The Monte Carlo MCMC method of sampling and language model LM generate the corresponding candidate text of the sample text.
In some optional implementations, execution unit 501 can be further configured to: if candidate's text is corresponding Text-processing result information be unsatisfactory for preset condition, regard the corresponding candidate text of the sample text as sample text, continuation Execute generation step.
In some optional implementations, device 500 can also include: the second determination unit (not shown) and instruction Practice unit (not shown).Wherein, the second determination unit (not shown) be configured to generate to resisting sample and instruction The intersection for practicing sample set is determined as new training sample set.Training unit (not shown) is configured to based on new training sample This collection is trained text-processing model.
In some optional implementations, generates subelement 502 and be further configured to: the sample text is deleted Word, or choose at least one word from target dictionary and the sample text is added;Or word is chosen from target dictionary and replaces the sample Word in text obtains the corresponding candidate text of the sample text.
In some optional implementations, target dictionary is obtained by following steps: for the word in the sample text, Initial dictionary is screened based on LM, obtains target dictionary.
In some embodiments, the candidate text that subelement 502 generates sample text is generated, text-processing result information is raw At subelement 503 by candidate text input text-processing model, text-processing result information is obtained.On this basis, if the time The corresponding text-processing result information of selection sheet meets preset condition, and first determines subelement 504 for candidate's text and be somebody's turn to do The corresponding text-processing result information of candidate text is determined as to resisting sample.Wherein, it by controlling preset condition, can control pair Resisting sample is generated towards desired direction.For example, control preset condition makes the corresponding text-processing result information of candidate text Markup information corresponding from sample text is different, or even opposite.Sample can to fight to more aggressive direction generation.
Below with reference to Fig. 6, it illustrates the electronic equipment for being suitable for being used to realize some embodiments of the present disclosure (such as Fig. 1 In server) 600 structural schematic diagram.Server shown in Fig. 6 is only an example, should not be to embodiment of the disclosure Function and use scope bring any restrictions.
As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.
In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.;Storage device 608 including such as tape, hard disk etc.;And communication device 609.Communication device 609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, can also root According to needing to represent multiple devices.
Particularly, it according to some embodiments of the present disclosure, may be implemented as counting above with reference to the process of flow chart description Calculation machine software program.For example, some embodiments of the present disclosure include a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such some embodiments, which can be downloaded and installed from network by communication device 609, Huo Zhecong Storage device 608 is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, execute The above-mentioned function of being limited in the method for some embodiments of the present disclosure.
It should be noted that computer-readable medium described in some embodiments of the present disclosure can be computer-readable letter Number medium or computer readable storage medium either the two any combination.Computer readable storage medium for example may be used System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor to be, or arbitrarily with On combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type can compile Journey read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic Memory device or above-mentioned any appropriate combination.In some embodiments of the present disclosure, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.And in some embodiments of the present disclosure, computer-readable signal media may include in a base band or The data-signal that person propagates as carrier wave a part, wherein carrying computer-readable program code.The data of this propagation Signal can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer Readable signal medium can also be any computer-readable medium other than computer readable storage medium, the computer-readable letter Number medium can be sent, propagated or be transmitted for being used by instruction execution system, device or device or in connection being made Program.The program code for including on computer-readable medium can transmit with any suitable medium, including but not limited to: Electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.
In some embodiments, client, server can use such as HTTP (HyperText Transfer Protocol, hypertext transfer protocol) etc the network protocols of any currently known or following research and development communicated, and can To be interconnected with the digital data communications (for example, communication network) of arbitrary form or medium.The example of communication network includes local area network (" LAN "), wide area network (" WAN "), Internet (for example, internet) and ad-hoc network are (for example, the end-to-end net of ad hoc Network) and any currently known or following research and development network.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: the training sample concentrated for training sample executes following raw At step, wherein training sample includes sample text markup information corresponding with the sample text: it is corresponding to generate the sample text Candidate text;By candidate's text input text-processing model, text-processing result information is obtained;If candidate's text is corresponding Text-processing result information meet preset condition, candidate's text and the corresponding text-processing result of candidate's text are believed Breath is determined as to resisting sample.
It can be write with one or more programming languages or combinations thereof for executing some embodiments of the present disclosure Operation computer program code, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet Include local area network (LAN) or wide area network (WAN) --- it is connected to subscriber computer, or, it may be connected to outer computer (such as It is connected using ISP by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
The unit being described in some embodiments of the present disclosure can be realized by way of software, can also pass through hardware Mode realize.Described unit also can be set in the processor, for example, can be described as: a kind of processor includes Execution unit generates subelement, text-processing result information generation subelement and the first determining subelement.Wherein, these units Title do not constitute the restriction to the unit itself under certain conditions, for example, generate subelement be also described as it is " raw At the unit of the corresponding candidate text of the sample text ".
Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used include: field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), complex programmable logic equipment (CPLD) etc. Deng.
According to one or more other embodiments of the present disclosure, a kind of confrontation sample generating method is provided, comprising: for training Training sample in sample set executes following generation step, wherein training sample includes that sample text is corresponding with the sample text Markup information: generate the corresponding candidate text of the sample text;By candidate's text input text-processing model, text is obtained Processing result information;If the corresponding text-processing result information of candidate's text meets preset condition, by candidate's text and The corresponding text-processing result information of candidate's text is determined as to resisting sample.
According to one or more other embodiments of the present disclosure, it is based on the Markov Chain Monte Carlo MCMC method of sampling and language Speech model LM generates the corresponding candidate text of the sample text.
According to one or more other embodiments of the present disclosure, this method further include: if the corresponding text-processing of candidate's text Result information is unsatisfactory for preset condition, using the corresponding candidate text of the sample text as sample text, continues to execute generation step Suddenly.
According to one or more other embodiments of the present disclosure, this method further include: by generation to resisting sample and training sample The intersection of collection is determined as new training sample set;Text-processing model is trained based on new training sample set.
According to one or more other embodiments of the present disclosure, it is based on the Markov Chain Monte Carlo MCMC method of sampling and language Speech model LM generates the corresponding candidate text of the sample text, comprising: carries out deleting word to the sample text, or from target dictionary It chooses at least one word and the sample text is added;Or the word in the sample text is replaced from word is chosen in target dictionary, it is somebody's turn to do The corresponding candidate text of sample text.
According to one or more other embodiments of the present disclosure, target dictionary is obtained by following steps: for the sample text In word, initial dictionary is screened based on LM, obtains target dictionary.
According to one or more other embodiments of the present disclosure, a kind of pair of resisting sample generating means are provided, comprising: execute list Member is configured to the training sample concentrated for training sample, carries out the generation to resisting sample using included subelement, In, training sample includes sample text markup information corresponding with the sample text, and execution unit includes following subelement: being generated Subelement is configured to generate the corresponding candidate text of the sample text;Text-processing result information generates subelement, is configured At by candidate's text input text-processing model, text-processing result information is obtained;First determines subelement, if being configured to The corresponding text-processing result information of candidate's text meets preset condition, and candidate's text and candidate's text is corresponding Text-processing result information is determined as to resisting sample.
According to one or more other embodiments of the present disclosure, generates subelement and be further configured to: based on Markov Chain The Monte Carlo MCMC method of sampling and language model LM generate the corresponding candidate text of the sample text.
According to one or more other embodiments of the present disclosure, execution unit is further configured to: if candidate's text is corresponding Text-processing result information be unsatisfactory for preset condition, regard the corresponding candidate text of the sample text as sample text, continuation Execute generation step.
According to one or more other embodiments of the present disclosure, device further include: the second determination unit is configured to generate It is determined as new training sample set to the intersection of resisting sample and training sample set;Training unit is configured to based on new training Sample set is trained text-processing model.
According to one or more other embodiments of the present disclosure, generate subelement and be further configured to: to the sample text into Row deletes word, or chooses at least one word from target dictionary and the sample text is added;Or word replacement is chosen from target dictionary and is somebody's turn to do Word in sample text obtains the corresponding candidate text of the sample text.
According to one or more other embodiments of the present disclosure, target dictionary is obtained by following steps: for the sample text In word, initial dictionary is screened based on LM, obtains target dictionary.
According to one or more other embodiments of the present disclosure, a kind of electronic equipment is provided, comprising: one or more processing Device;Storage device is stored thereon with one or more programs, when one or more programs are executed by one or more processors, So that one or more processors realize the method as described in above-mentioned any embodiment.
According to one or more other embodiments of the present disclosure, a kind of computer-readable medium is provided, is stored thereon with calculating Machine program, wherein the method as described in above-mentioned any embodiment is realized when program is executed by processor.
Above description is only some preferred embodiments of the disclosure and the explanation to institute's application technology principle.This field skill Art personnel should be appreciated that invention scope involved in embodiment of the disclosure, however it is not limited to the specific group of above-mentioned technical characteristic Technical solution made of conjunction, at the same should also cover do not depart from foregoing invention design in the case where, by above-mentioned technical characteristic or its Equivalent feature carries out any combination and other technical solutions for being formed.Such as disclosed in features described above and embodiment of the disclosure (but being not limited to) have the technical characteristic of similar functions replaced mutually and the technical solution that is formed.

Claims (14)

1. a kind of confrontation sample generating method, comprising:
For the training sample that training sample is concentrated, following generation step is executed, wherein training sample includes sample text and should The corresponding markup information of sample text:
Generate the corresponding candidate text of the sample text;
By candidate's text input text-processing model, text-processing result information is obtained;
If the corresponding text-processing result information of candidate's text meets preset condition, by candidate's text and candidate's text Corresponding text-processing result information is determined as to resisting sample.
2. according to the method described in claim 1, wherein, the corresponding candidate text of described generation sample text, comprising:
The corresponding candidate of the sample text is generated based on the Markov Chain Monte Carlo MCMC method of sampling and language model LM Text.
3. according to the method described in claim 1, wherein, the method also includes:
If the corresponding text-processing result information of candidate's text is unsatisfactory for preset condition, by the corresponding candidate text of the sample text This continues to execute the generation step as sample text.
4. according to the method described in claim 1, wherein, the method also includes:
Generation is determined as new training sample set to the intersection of resisting sample and the training sample set;
The text-processing model is trained based on the new training sample set.
5. according to the method described in claim 2, wherein, it is described based on the Markov Chain Monte Carlo MCMC method of sampling and Language model LM generates the corresponding candidate text of the sample text, comprising:
The sample text is carried out to delete word, or
At least one word is chosen from target dictionary, and the sample text is added;Or
The word in the sample text is replaced from word is chosen in target dictionary, obtains the corresponding candidate text of the sample text.
6. according to the method described in claim 5, wherein, the target dictionary is obtained by following steps:
For the word in the sample text, initial dictionary is screened based on the LM, obtains the target dictionary.
7. a kind of pair of resisting sample generating means, comprising:
Execution unit is configured to the training sample concentrated for training sample, carries out confrontation sample using included subelement This generation, wherein training sample includes sample text markup information corresponding with the sample text, and the execution unit includes Following subelement:
Subelement is generated, is configured to generate the corresponding candidate text of the sample text;
Text-processing result information generates subelement, is configured to candidate's text input text-processing model obtaining text Processing result information;
First determines subelement, will if being configured to the corresponding text-processing result information of candidate's text meets preset condition Candidate's text and the corresponding text-processing result information of candidate's text are determined as to resisting sample.
8. device according to claim 7, wherein the generation subelement is further configured to:
The corresponding candidate of the sample text is generated based on the Markov Chain Monte Carlo MCMC method of sampling and language model LM Text.
9. device according to claim 7, wherein the execution unit is further configured to:
If the corresponding text-processing result information of candidate's text is unsatisfactory for preset condition, by the corresponding candidate text of the sample text This continues to execute the generation step as sample text.
10. device according to claim 7, wherein described device further include:
Second determination unit is configured to be determined as new training to the intersection of resisting sample and the training sample set for what is generated Sample set;
Training unit is configured to be trained the text-processing model based on the new training sample set.
11. device according to claim 8, wherein the generation subelement is further configured to:
The sample text is carried out to delete word, or
At least one word is chosen from target dictionary, and the sample text is added;Or
The word in the sample text is replaced from word is chosen in target dictionary, obtains the corresponding candidate text of the sample text.
12. device according to claim 11, wherein the target dictionary is obtained by following steps:
For the word in the sample text, initial dictionary is screened based on the LM, obtains the target dictionary.
13. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method as claimed in any one of claims 1 to 6.
CN201910684104.1A 2019-07-26 2019-07-26 Fight sample generating method, device, electronic equipment and computer-readable medium Pending CN110378474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910684104.1A CN110378474A (en) 2019-07-26 2019-07-26 Fight sample generating method, device, electronic equipment and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910684104.1A CN110378474A (en) 2019-07-26 2019-07-26 Fight sample generating method, device, electronic equipment and computer-readable medium

Publications (1)

Publication Number Publication Date
CN110378474A true CN110378474A (en) 2019-10-25

Family

ID=68256547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910684104.1A Pending CN110378474A (en) 2019-07-26 2019-07-26 Fight sample generating method, device, electronic equipment and computer-readable medium

Country Status (1)

Country Link
CN (1) CN110378474A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046176A (en) * 2019-11-25 2020-04-21 百度在线网络技术(北京)有限公司 Countermeasure sample generation method and device, electronic equipment and storage medium
CN111078892A (en) * 2019-11-25 2020-04-28 百度在线网络技术(北京)有限公司 Countermeasure sample generation method and device, electronic equipment and storage medium
CN111241287A (en) * 2020-01-16 2020-06-05 支付宝(杭州)信息技术有限公司 Training method and device for generating generation model of confrontation text
CN111292766A (en) * 2020-02-07 2020-06-16 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating speech samples
CN111783998A (en) * 2020-06-30 2020-10-16 百度在线网络技术(北京)有限公司 Illegal account recognition model training method and device and electronic equipment
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN112347699A (en) * 2020-11-24 2021-02-09 北京圣涛平试验工程技术研究院有限责任公司 Multi-agent antagonistic neural network training method and device
CN112364641A (en) * 2020-11-12 2021-02-12 北京中科闻歌科技股份有限公司 Chinese countermeasure sample generation method and device for text audit
CN112380845A (en) * 2021-01-15 2021-02-19 鹏城实验室 Sentence noise design method, equipment and computer storage medium
CN113449097A (en) * 2020-03-24 2021-09-28 百度在线网络技术(北京)有限公司 Method and device for generating countermeasure sample, electronic equipment and storage medium
CN113642678A (en) * 2021-10-12 2021-11-12 南京山猫齐动信息技术有限公司 Method, device and storage medium for generating confrontation message sample

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample
CN109117482A (en) * 2018-09-17 2019-01-01 武汉大学 A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency
CN109829164A (en) * 2019-02-01 2019-05-31 北京字节跳动网络技术有限公司 Method and apparatus for generating text
CN109934253A (en) * 2019-01-08 2019-06-25 阿里巴巴集团控股有限公司 A kind of confrontation sample generating method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample
CN109117482A (en) * 2018-09-17 2019-01-01 武汉大学 A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency
CN109934253A (en) * 2019-01-08 2019-06-25 阿里巴巴集团控股有限公司 A kind of confrontation sample generating method and device
CN109829164A (en) * 2019-02-01 2019-05-31 北京字节跳动网络技术有限公司 Method and apparatus for generating text

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BRENT HARRISON,ET AL.: "Toward Automated Story Generation with Markov Chain Monte Carlo Methods and Deep Neural Networks", 《PROCEEDINGS OF THE AAAI CONFERENCE ON ARTFICIAL INTELLIGENCE AND INTERACTIVE DIGITAL ENTERTAINMENT》 *
NING MIAO,ET AL.: "CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling", 《ARXIV》 *
SURANJANA SAMANTA,ET AL.: "Towards Crafting Text Adversarial Samples", 《ARXIV》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078892A (en) * 2019-11-25 2020-04-28 百度在线网络技术(北京)有限公司 Countermeasure sample generation method and device, electronic equipment and storage medium
CN111046176A (en) * 2019-11-25 2020-04-21 百度在线网络技术(北京)有限公司 Countermeasure sample generation method and device, electronic equipment and storage medium
CN111078892B (en) * 2019-11-25 2023-05-23 百度在线网络技术(北京)有限公司 Countermeasure sample generation method, device, electronic equipment and storage medium
CN111046176B (en) * 2019-11-25 2023-04-07 百度在线网络技术(北京)有限公司 Countermeasure sample generation method and device, electronic equipment and storage medium
CN111241287A (en) * 2020-01-16 2020-06-05 支付宝(杭州)信息技术有限公司 Training method and device for generating generation model of confrontation text
CN111292766A (en) * 2020-02-07 2020-06-16 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating speech samples
CN111292766B (en) * 2020-02-07 2023-08-08 抖音视界有限公司 Method, apparatus, electronic device and medium for generating voice samples
CN113449097A (en) * 2020-03-24 2021-09-28 百度在线网络技术(北京)有限公司 Method and device for generating countermeasure sample, electronic equipment and storage medium
CN111783998A (en) * 2020-06-30 2020-10-16 百度在线网络技术(北京)有限公司 Illegal account recognition model training method and device and electronic equipment
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN111783998B (en) * 2020-06-30 2023-08-11 百度在线网络技术(北京)有限公司 Training method and device for illegal account identification model and electronic equipment
CN112364641A (en) * 2020-11-12 2021-02-12 北京中科闻歌科技股份有限公司 Chinese countermeasure sample generation method and device for text audit
CN112347699A (en) * 2020-11-24 2021-02-09 北京圣涛平试验工程技术研究院有限责任公司 Multi-agent antagonistic neural network training method and device
CN112380845A (en) * 2021-01-15 2021-02-19 鹏城实验室 Sentence noise design method, equipment and computer storage medium
CN113642678A (en) * 2021-10-12 2021-11-12 南京山猫齐动信息技术有限公司 Method, device and storage medium for generating confrontation message sample
CN113642678B (en) * 2021-10-12 2022-01-07 南京山猫齐动信息技术有限公司 Method, device and storage medium for generating confrontation message sample

Similar Documents

Publication Publication Date Title
CN110378474A (en) Fight sample generating method, device, electronic equipment and computer-readable medium
CN109902186A (en) Method and apparatus for generating neural network
CN110688528B (en) Method, apparatus, electronic device, and medium for generating classification information of video
CN109981787B (en) Method and device for displaying information
JP2021096813A (en) Method and apparatus for processing data
CN108280104A (en) The characteristics information extraction method and device of target object
JP7354463B2 (en) Data protection methods, devices, servers and media
CN109815365A (en) Method and apparatus for handling video
CN109829432A (en) Method and apparatus for generating information
WO2020182123A1 (en) Method and device for pushing statement
CN111625645B (en) Training method and device for text generation model and electronic equipment
CN108182472A (en) For generating the method and apparatus of information
CN110413742A (en) Duplicate checking method, apparatus, equipment and the storage medium of biographic information
CN109495552A (en) Method and apparatus for updating clicking rate prediction model
CN109902446A (en) Method and apparatus for generating information prediction model
CN109543068A (en) Method and apparatus for generating the comment information of video
CN110097004B (en) Facial expression recognition method and device
CN115471307A (en) Audit evaluation information generation method and device based on knowledge graph and electronic equipment
US20240118984A1 (en) Prediction method and apparatus for faulty gpu, electronic device and storage medium
CN112381074B (en) Image recognition method and device, electronic equipment and computer readable medium
CN110321705A (en) Method, apparatus for generating the method, apparatus of model and for detecting file
CN109829431A (en) Method and apparatus for generating information
CN112800276A (en) Video cover determination method, device, medium and equipment
CN109657073A (en) Method and apparatus for generating information
CN112464654B (en) Keyword generation method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191025

RJ01 Rejection of invention patent application after publication