CN111105028B - Training method, training device and sequence prediction method for neural network - Google Patents

Training method, training device and sequence prediction method for neural network Download PDF

Info

Publication number
CN111105028B
CN111105028B CN201811258926.5A CN201811258926A CN111105028B CN 111105028 B CN111105028 B CN 111105028B CN 201811258926 A CN201811258926 A CN 201811258926A CN 111105028 B CN111105028 B CN 111105028B
Authority
CN
China
Prior art keywords
sequence
kth
probability distribution
probability
prefix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811258926.5A
Other languages
Chinese (zh)
Other versions
CN111105028A (en
Inventor
白帆
程战战
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201811258926.5A priority Critical patent/CN111105028B/en
Publication of CN111105028A publication Critical patent/CN111105028A/en
Application granted granted Critical
Publication of CN111105028B publication Critical patent/CN111105028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a training method and device of a neural network and a sequence prediction method, and belongs to the technical field of artificial intelligence. The method comprises the following steps: after n sample data are input into a neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained, wherein the kth probability distribution column sequence comprises m probability distribution columns, the kth probability distribution column in the kth probability distribution column sequence is the probability distribution of the category of the g unit data of the kth sample data, n, k, m and g are integers, k is more than or equal to 1 and less than or equal to n, and g is more than or equal to 1 and less than or equal to m; determining the editing probability from the kth probability distribution sequence to the kth calibration sequence, wherein the editing probability from the kth probability distribution sequence to the kth calibration sequence is the probability of generating the kth calibration sequence by the kth probability distribution sequence through editing operation; and optimizing the neural network to be trained based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences.

Description

Training method, training device and sequence prediction method for neural network
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a training method and device of a neural network and a sequence prediction method.
Background
With the development of neural networks, neural networks have been widely used. A neural network whose output is a sequence, such as a neural network applied to speech recognition, whose input is an audio segment, and whose output is a sequence for predicting a transcription (e.g., a character sequence) corresponding to the audio. This training process, also known as sequence-to-sequence learning (Sequence to Sequence Learning), of a neural network with sequence input and sequence output generally includes: first, sample data, such as an audio sample x= [ X 1 ,x 2 ,...,x t ]To the neural network to be trained; wherein, it is assumed that the true transcription of the audio sample X is a character sequence y= [ Y ] 1 ,y 2 ,...,y u ]The method comprises the steps of carrying out a first treatment on the surface of the Secondly, the neural network to be trained recognizes, converts and classifies X, and outputs a predicted probability distribution sequence Y' = [ Z ] 1 ,Z 2 ,...],Z 1 Assign column for 1 st probability, Z 1 Is y 1 ' probability distribution of corresponding class, y 1 ' is the first predicted character, Z 2 Assign column for probability 2, Z 2 Is y 2 ' probability distribution of corresponding class, y 2 ' is the second predicted character, and so on; then, calculating the deviation from Y' to Y; and finally, optimizing the neural network to be trained based on the deviation from Y' to Y.
Currently, the deviation of Y' to Y is calculated using Cross Entropy (Cross-Entropy). The principle of cross entropy includes: and comparing each probability distribution column Z included in the Y 'with each element Y included in the Y one by one, calculating the deviation from each probability distribution column Z to each element Y, and summing the calculated deviations to obtain the deviation from Y' to Y.
In carrying out the invention, the inventors have found that the prior art has at least the following problems: the cross entropy is calculated on the premise that each probability distribution Z is aligned with each element Y of the calibration sequence, once the probability distribution Z in the output probability distribution sequence is increased or deleted relative to the elements Y of the calibration sequence, namely the number of Z is changed and possibly greater than the number u of elements of the calibration sequence and possibly less than u, the probability distribution Z and the correspondence of the elements Y are misplaced when calculating the deviation from Y' to Y, for example, the correspondence Y is assumed 3 Is missing, probability distribution column Z 3 The corresponding actual element is y 4 However, the cross entropy is that Z 3 Each probability of y 3 Comparing, calculate Z 3 And y is 3 And due to Z 3 And y is 3 Is staggered, Z 3 The probability distribution and y thereafter 3 The elements which follow are also misplaced correspondingly, so that the calculated deviation is wrong, and the optimization is carried out based on the wrong deviation, so that the wrong optimization is caused, the training efficiency is seriously affected, and the accuracy of the trained neural network is reduced.
Disclosure of Invention
The embodiment of the invention provides a training method, a training device and a sequence prediction method for a neural network, which can improve the training efficiency of the neural network and the accuracy of the trained neural network. The technical scheme is as follows:
in one aspect, a method for training a neural network is provided, the method comprising:
after n sample data are input into a neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained, a kth probability distribution column sequence in the n probability distribution column sequences comprises m probability distribution columns, the g probability distribution column in the kth probability distribution column sequence is the probability distribution of the class of g unit data of the kth sample data in the n sample data, n, k, m and g are integers, k is more than or equal to 1 and less than or equal to n, and g is more than or equal to 1 and less than or equal to m;
determining the editing probability from the kth probability distribution sequence to the kth calibration sequence, wherein the editing probability from the kth probability distribution sequence to the kth calibration sequence is the probability of generating the kth calibration sequence from the kth probability distribution sequence through editing operation, and the kth calibration sequence is the calibration sequence of the kth sample data;
And optimizing the neural network to be trained based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences.
Optionally, the determining the edit probabilities from the kth probability distribution sequence to the kth calibration sequence includes:
respectively calculating the editing probability from the (m-1) th prefix of the (k) th probability distribution sequence to the (s-1) th prefix of the (k) th calibration sequence, the editing probability from the (k) th probability distribution sequence to the (s-1) th prefix of the (k) th calibration sequence and the editing probability from the (m-1) th prefix of the (k) th probability distribution sequence to the (k) th calibration sequence, wherein the (j) th prefix of the (k) th probability distribution sequence is a distribution sequence formed from the (0) th probability distribution of the (k) th probability distribution sequence to the (j) th probability distribution sequence, the (i) th prefix of the (k) th calibration sequence is a sequence formed from the (0) th element of the (k) th calibration sequence to the (i) th element of the (k) th calibration sequence), j and i are natural numbers (j, j and i are natural numbers (0 is less than or equal to m-1, and 0 i-1); the 0 th probability distribution column of the k probability distribution column sequence is the probability distribution column of the class of the empty unit data, and the 0 th element of the k calibration sequence is an empty element;
Calculating the editing probability from the kth probability distribution sequence to the kth calibration sequence according to the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth probability distribution sequence, the editing probability from the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence and the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the kth calibration sequence.
Optionally, the calculating the editing probability from the m-1 th prefix of the kth probability distribution sequence to the s-1 th prefix of the kth calibration sequence includes:
when j=0 and i=0, the editing probability from the j-th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence is equal to 1;
when 0<j is less than or equal to m-1 and i=0, calculating the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence;
when j=0 and 0<i is less than or equal to s-1, calculating the editing probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence according to the editing probability from the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence;
When j is less than or equal to 1 and less than or equal to m-1, and i is less than or equal to 1 and less than or equal to s-1, calculating the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence, the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence.
Optionally, the calculating the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence includes:
acquiring the probability of deletion operation of a j-th probability distribution column in the k-th probability distribution column sequence, wherein the deletion operation of the j-th probability distribution column is performed by deleting the j-th probability distribution column;
and calculating the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Optionally, the calculating the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence includes:
acquiring the probability of an inserting operation of a j-th probability distribution column in the k-th probability distribution column sequence, wherein the inserting operation of the j-th probability distribution column is that the probability distribution column is inserted between a j-1-th probability distribution column and the j-th probability distribution column in the k-th probability distribution column sequence;
acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column generates the inserting operation;
and determining the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the jth probability distribution column in the kth probability distribution column sequence that the inserting operation occurs, the probability of the category corresponding to the ith element in the kth calibration sequence when the inserting operation occurs in the jth probability distribution column, and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
Optionally, the calculating the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence, and from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence includes:
calculating the retention probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence based on the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence;
calculating the insertion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability of the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence;
calculating deletion probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence;
And determining the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the retention probability, the insertion probability and the deletion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence.
Optionally, the calculating the retention probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability from the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence includes:
acquiring the probability of a retention operation of a jth probability distribution column in the kth probability distribution column sequence, wherein the retention operation of the jth probability distribution column is performed without editing the jth probability distribution column;
acquiring the probability of a category corresponding to the ith element in the kth calibration sequence in the jth probability distribution column;
and determining the retention probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the probability of the occurrence of the retention operation in the jth probability distribution sequence in the kth probability distribution sequence, the probability of the class corresponding to the ith element in the kth calibration sequence in the jth probability distribution sequence, and the editing probability from the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence.
Optionally, the calculating the insertion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence includes:
acquiring the probability of the inserting operation of the jth probability distribution column in the kth probability distribution column sequence;
acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column generates the inserting operation;
and determining the insertion probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the jth probability distribution column in the kth probability distribution column sequence that the insertion operation occurs, the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the insertion operation occurs, and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
Optionally, the calculating the deletion probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability from the jth-1 prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence includes:
Acquiring the probability of deletion operation of a jth probability distribution column in the kth probability distribution column sequence;
determining deletion probabilities from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probabilities from the jth-1 prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Optionally, the calculating the editing probability from the kth probability distribution sequence to the s-1 prefix of the kth calibration sequence includes:
when i=0, calculating the editing probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability of the (m-1) th prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence;
when 0<i is less than or equal to s-1, calculating the edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth calibration sequence, the edit probability of the (i-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth probability distribution sequence, and the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i) th prefix of the kth calibration sequence.
Optionally, the calculating the editing probability from the m-1 th prefix of the kth probability distribution sequence to the kth calibration sequence includes:
when j=0, calculating the editing probability from the j-th prefix of the k-th probability distribution sequence to the k-th calibration sequence based on the editing probability from the j-th prefix of the k-th probability distribution sequence to the s-1-th prefix of the k-th calibration sequence;
when 0<j is less than or equal to m-1, calculating the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the jth prefix of the kth probability distribution sequence to the jth prefix of the kth calibration sequence, the editing probability from the jth prefix of the kth probability distribution sequence to the sth prefix of the kth probability distribution sequence and the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence.
Optionally, the optimizing the neural network to be trained based on the edit probabilities from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences includes:
calculating the sum of deviations of each probability distribution sequence and the corresponding calibration sequence of each probability distribution sequence based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences;
And updating the weight parameters of each layer of the neural network to be trained based on the deviation sum of each probability distribution sequence and the calibration sequence corresponding to each probability distribution sequence.
In a second aspect, there is provided a sequence prediction method, the method comprising:
inputting data to be tested into a neural network, wherein the neural network is trained by adopting the training method of the neural network;
acquiring a predicted probability distribution column sequence output by the neural network;
and determining a predicted sequence corresponding to the data to be detected based on the predicted probability distribution sequence output by the neural network.
Optionally, the determining the predicted sequence corresponding to the data to be measured based on the predicted probability distribution sequence output by the neural network includes:
determining the probability of generating each sequence in a sequence library by distributing the predicted probability sequence through editing operation;
and determining a predicted sequence corresponding to the data to be detected based on the probabilities of each sequence in the sequence library generated by the predicted probability distribution sequence through editing operation, wherein the predicted sequence corresponding to the data to be detected is a sequence corresponding to the maximum probability in the probabilities of each sequence in the sequence library generated by the predicted probability distribution sequence through editing operation.
In a third aspect, there is provided a training apparatus for a neural network, the apparatus comprising:
the acquisition module is used for acquiring n probability distribution column sequences output by the neural network to be trained after n sample data are input to the neural network to be trained, wherein a kth probability distribution column sequence in the n probability distribution column sequences comprises m probability distribution columns, the g probability distribution column in the kth probability distribution column sequence is the probability distribution of the category of the g unit data of the kth sample data in the n sample data, n, k, m and g are integers, k is more than or equal to 1 and less than or equal to n, and g is more than or equal to 1 and less than or equal to m;
the determining module is used for determining the editing probability from the kth probability distribution sequence to the kth calibration sequence, wherein the editing probability from the kth probability distribution sequence to the kth calibration sequence is the probability of generating the kth calibration sequence from the kth probability distribution sequence through editing operation, and the kth calibration sequence is the calibration sequence of the kth sample data;
and the optimization module is used for optimizing the neural network to be trained based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences.
Optionally, the determining module is configured to,
respectively calculating the editing probability from the (m-1) th prefix of the (k) th probability distribution sequence to the (s-1) th prefix of the (k) th calibration sequence, the editing probability from the (k) th probability distribution sequence to the (s-1) th prefix of the (k) th calibration sequence and the editing probability from the (m-1) th prefix of the (k) th probability distribution sequence to the (k) th calibration sequence, wherein the (j) th prefix of the (k) th probability distribution sequence is a distribution sequence formed from the (0) th probability distribution of the (k) th probability distribution sequence to the (j) th probability distribution sequence, the (i) th prefix of the (k) th calibration sequence is a sequence formed from the (0) th element of the (k) th calibration sequence to the (i) th element of the (k) th calibration sequence), j and i are natural numbers (j, j and i are natural numbers (0 is less than or equal to m-1, and 0 i-1); the 0 th probability distribution column of the k probability distribution column sequence is the probability distribution column of the class of the empty unit data, and the 0 th element of the k calibration sequence is an empty element;
calculating the editing probability from the kth probability distribution sequence to the kth calibration sequence according to the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth probability distribution sequence, the editing probability from the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence and the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the kth calibration sequence.
Optionally, the determining module is configured to,
when j=0 and i=0, the editing probability from the j-th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence is equal to 1;
when 0<j is less than or equal to m-1 and i=0, calculating the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence;
when j=0 and 0<i is less than or equal to s-1, calculating the editing probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence according to the editing probability from the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence;
when j is less than or equal to 1 and less than or equal to m-1, and i is less than or equal to 1 and less than or equal to s-1, calculating the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence, the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence.
Optionally, the determining module is configured to,
acquiring the probability of deletion operation of a j-th probability distribution column in the k-th probability distribution column sequence, wherein the deletion operation of the j-th probability distribution column is performed by deleting the j-th probability distribution column;
and calculating the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Optionally, the determining module is configured to,
acquiring the probability of an inserting operation of a j-th probability distribution column in the k-th probability distribution column sequence, wherein the inserting operation of the j-th probability distribution column is that the probability distribution column is inserted between a j-1-th probability distribution column and the j-th probability distribution column in the k-th probability distribution column sequence;
acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column generates the inserting operation;
and determining the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the jth probability distribution column in the kth probability distribution column sequence that the inserting operation occurs, the probability of the category corresponding to the ith element in the kth calibration sequence when the inserting operation occurs in the jth probability distribution column, and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
Optionally, the determining module is configured to,
calculating the retention probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence based on the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence;
calculating the insertion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability of the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence;
calculating deletion probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence;
and determining the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the retention probability, the insertion probability and the deletion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence.
Optionally, the determining module is configured to,
acquiring the probability of a retention operation of a jth probability distribution column in the kth probability distribution column sequence, wherein the retention operation of the jth probability distribution column is performed without editing the jth probability distribution column;
acquiring the probability of a category corresponding to the ith element in the kth calibration sequence in the jth probability distribution column;
and determining the retention probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the probability of the occurrence of the retention operation in the jth probability distribution sequence in the kth probability distribution sequence, the probability of the class corresponding to the ith element in the kth calibration sequence in the jth probability distribution sequence, and the editing probability from the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence.
Optionally, the determining module is configured to,
acquiring the probability of the inserting operation of the jth probability distribution column in the kth probability distribution column sequence;
acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column generates the inserting operation;
And determining the insertion probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the jth probability distribution column in the kth probability distribution column sequence that the insertion operation occurs, the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the insertion operation occurs, and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
Optionally, the determining module is configured to,
acquiring the probability of deletion operation of a jth probability distribution column in the kth probability distribution column sequence;
determining deletion probabilities from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probabilities from the jth-1 prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Optionally, the determining module is configured to,
when i=0, calculating the editing probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability of the (m-1) th prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence;
When 0<i is less than or equal to s-1, calculating the edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth calibration sequence, the edit probability of the (i-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth probability distribution sequence, and the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i) th prefix of the kth calibration sequence.
Optionally, the determining module is configured to,
when j=0, calculating the editing probability from the j-th prefix of the k-th probability distribution sequence to the k-th calibration sequence based on the editing probability from the j-th prefix of the k-th probability distribution sequence to the s-1-th prefix of the k-th calibration sequence;
when 0<j is less than or equal to m-1, calculating the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the jth prefix of the kth probability distribution sequence to the jth prefix of the kth calibration sequence, the editing probability from the jth prefix of the kth probability distribution sequence to the sth prefix of the kth probability distribution sequence and the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence.
Optionally, the optimizing module is used for,
calculating the sum of deviations of each probability distribution sequence and the corresponding calibration sequence of each probability distribution sequence based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences;
and updating the weight parameters of each layer of the neural network to be trained based on the deviation sum of each probability distribution sequence and the calibration sequence corresponding to each probability distribution sequence.
In a fourth aspect, there is provided a sequence prediction apparatus, the apparatus comprising:
the input module is used for inputting data to be tested into the neural network, and the neural network is trained by adopting the training method of the neural network;
the acquisition module is used for acquiring a predicted probability distribution column sequence output by the neural network;
and the determining module is used for determining a predicted sequence corresponding to the data to be detected based on the predicted probability distribution sequence output by the neural network.
Optionally, the determining module is configured to,
determining the probability of generating each sequence in a sequence library by distributing the predicted probability sequence through editing operation;
And determining a predicted sequence corresponding to the data to be detected based on the probabilities of each sequence in the sequence library generated by the predicted probability distribution sequence through editing operation, wherein the predicted sequence corresponding to the data to be detected is a sequence corresponding to the maximum probability in the probabilities of each sequence in the sequence library generated by the predicted probability distribution sequence through editing operation.
In a fifth aspect, a training apparatus for a neural network is provided, the apparatus comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to perform the operations performed by the training method for a neural network described above.
In a sixth aspect, there is provided a sequence prediction apparatus comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to implement the operations performed by the sequence prediction method described above.
In a seventh aspect, a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the operations performed by the aforementioned neural network training method is provided.
In an eighth aspect, a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the operations performed by the aforementioned sequence prediction method is provided.
The technical scheme provided by the embodiment of the invention has the beneficial effects that: after n sample data are input into a neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained; determining the editing probability from a kth probability distribution sequence to a kth calibration sequence, wherein the editing probability is the probability that the kth probability distribution sequence generates the calibration sequence of kth sample data through editing operation; when the corresponding positions between the elements of the probability distribution columns and the calibration sequence in the kth probability distribution column sequence are not in one-to-one correspondence and are misplaced, the probability distribution columns in the probability distribution column sequence are deleted or superfluous, at the moment, the probability distribution column sequence with small editing probability is indicated to have the phenomenon that the probability distribution columns in the probability distribution column sequence are possibly deleted or superfluous, and the probability distribution column sequence with large editing probability is indicated to have the phenomenon that the probability distribution columns in the probability distribution column sequence are possibly misplaced, so that the editing probability can accurately estimate the deviation between the probability distribution column sequence and the calibration sequence no matter whether the elements of the probability distribution column and the calibration sequence are misplaced or not, and therefore, the neural network to be trained is optimized based on the editing probability, and the error optimization caused by comparing each probability distribution column included in the probability distribution column sequence with each element included in the calibration sequence one by one can be avoided; after the neural network to be trained is optimized, the accuracy of the neural network to be trained can be improved, and the training efficiency can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a system architecture to which a neural network training method provided by an embodiment of the present invention is applicable;
FIG. 2 is a schematic diagram of a sequence of kth probability distribution columns provided by an embodiment of the present invention;
fig. 3 to fig. 7 are flowcharts of a training method of a neural network according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of the calculation principle of editing probability provided by the embodiment of the invention;
fig. 9 and fig. 10 are flowcharts of a sequence prediction method according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of a training device for a neural network according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a sequence prediction apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a training device for a neural network according to an embodiment of the present invention;
Fig. 14 is a schematic structural diagram of a sequence prediction apparatus according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
In order to facilitate understanding of the technical solution provided by the embodiments of the present invention, some terms involved in the embodiments are first explained.
A data sequence is a collection of one or more elements. When there are a plurality of elements, there is a certain arrangement order among the plurality of elements. The elements are numerals or letters, etc.
The calibration sequence corresponds to the sample data and is a data sequence containing elements corresponding to the true categories of the unit data in the sample data.
The probability distribution column sequence is output after the neural network identifies and converts the input data. During recognition and conversion, the neural network firstly divides input data into a plurality of unit data, then recognizes and converts each unit data, and finally obtains a probability distribution column sequence. The sequence of probability distribution columns includes a number of probability distribution columns. The probability distribution columns are in one-to-one correspondence with the unit data, one unit data corresponds to one probability distribution column, and the probability distribution columns corresponding to different unit data are different. The probability distribution column includes probability distributions of categories of the respective unit data. By means of the sequence of probability distribution columns, the predicted sequence corresponding to the input data can be estimated. The predicted sequence may be a data sequence.
The edit operation is an edit operation applied to the probability distribution column by a pointer. The editing operation includes an insert operation, a delete operation, and a hold operation. The insert operation refers to inserting one probability distribution column before another probability distribution column. The deletion operation refers to deleting the probability distribution column. The retention operation means that no editing is performed on one probability distribution column.
Editing probability is the probability of generating a calibration sequence by editing operation of a probability distribution sequence.
In the above explanation of nouns, reference is made to the following description of the system architecture, the sequence of probability distribution columns, and the training method of neural networks.
The following describes a system architecture to which the neural network training method provided by the embodiment of the present invention is applicable with reference to fig. 1. Referring to fig. 1, during the training phase, its system architecture can be divided into 3 parts: a neural network section 10, an edit probability calculation section Ep (toxih, a) 13, and a back propagation section 14. The neural network section 10 includes a first model f (x) 11 and a second model H (f (x)) 12.f (x) 11 may be constituted by an encoder and a decoder. The encoder is used for extracting the characteristics of the input data X to obtain the characteristic representation, and the decoder is used for decoding the characteristic representation. The H (f (x)) 12 may be composed of three softmax (flexible maximum activation function) layers, which are classifiers, or normalization layers. The three softmax layers output A, I and R, respectively. A is a probability distribution column sequence obtained by predicting the input data X by the neural network section 10. The probability distribution column sequence a includes probability distributions of categories of respective unit data in the input data X. Predicted sequence T Based on the probability distribution column sequence A output by the neural network. R includes an insertion operation probability distribution column R (1), a deletion operation probability distribution column R (2), and a retention operation probability distribution column R (3). R (1) includes the probability of an insert operation occurring for each probability distribution column in A. R (2) includes each of AThe probability of a deletion operation occurring in the rate distribution column. R (3) includes the probability of a retention operation occurring for each probability distribution column in A. I is a class insertion probability distribution column. I includes probabilities of inserting each possible category when an insertion operation occurs for each probability distribution column in the probability distribution column sequence A. The possible categories may be all the categories of the classifier of the softmax layer of output a. Ep (tj, #, a) 13 is used to calculate an edit probability of converting a into T by an edit operation from A, I, R and a calibration sequence T of sample data X. The back propagation section 14 determines a loss function J of the neural network based on the edit probabilities calculated by Ep (tzebra, a) 13, and performs back propagation based on J, to realize optimization of the neural network.
The present embodiment is not limited to the structure of f (x) 11, and the encoder may be CNN (Convolutional Neural Network ) or RNN (Recurrent Neural Network, recurrent neural network) as an example. The decoder may be an RNN or a neural network based on Attention (Attention) mechanisms. The RNN may be LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit, gated loop unit). It should be noted that the counter-propagating section 14 is not necessarily a separate structure, and the counter-propagating section 14 may belong to the neural network section 10. The neural network portion 10 in this embodiment is any neural network that outputs a sequence, here a sequence of probability distribution columns. The predicted sequence corresponding to the input data X may be estimated based on the sequence output from the neural network section 10, and the predicted sequence includes a letter sequence, a character sequence, and a voice sequence.
In training the neural network, training is generally performed using a training set. The training set includes sample data and a calibration sequence of the sample data. When the sample data are different, the lengths of the calibration sequences of the sample data can be the same or different, and the elements contained in the calibration sequences of the sample data can be different. Each calibration sequence contains elements belonging to a set of elements (e.g., an alphabet or a chinese dictionary). Wherein, there is mapping relation between the class of classifier and element. The elements must have corresponding categories, but the categories do not necessarily have corresponding elements. When the input data is picture or voice data, the elements may be letters in an alphabet or characters in a dictionary, and the categories may be specific numbers. The method provided by the embodiment is suitable for training the neural network in the following fields: OCR (Optical Character Recognition ), speech recognition, natural language synthesis, and machine translation.
FIG. 2 shows a kth probability distribution column sequence A of n probability distribution column sequences output from the neural network section 10 k . The training of the neural network is illustratively OCR-compliant, based on which it is assumed that the kth input data X k Is a picture. The encoder performs feature extraction on the picture to obtain X k And (5) characteristic data. To facilitate understanding of the probability distribution sequences, X will be k The feature data is divided into a plurality of unit data, and the number of the unit data is the same as the number of probability distribution columns contained in the probability distribution column sequence output by the neural network. Please refer to fig. 2, a k Comprises 6 probability distribution columns, namely the number m=6 of the probability distribution columns is respectively the 1 st probability distribution column2 nd probability distribution column->The 6 th probability distribution column-> Is the probability distribution of the category of the 1 st unit data in the kth sample data. The class of the 1 st unit data can be any one of 5 classes, and the elements corresponding to the 5 classes are D, O, V, E and # in turn, and # is a special element, which indicates the end of a calibration sequence. In the training process, the last element of each calibration sequence is #. />In the above, the probability of the category corresponding to the element D is the largest (in fig. 2, the larger the category, the darker the color of the square where the element corresponding to the category is located), the probability of the category corresponding to the element E is the next largest, and the probability of the category corresponding to the element O is the smallest. Since the probability of the class to which the element D corresponds is greatest, the 1 st element of the kth predicted sequence may be predicted as the element D. / >Is the probability distribution of the category of the 2 nd unit data in the kth sample data. The category of the 2 nd unit data in the kth sample data may be any one of the 5 categories.The probability of the category corresponding to the element V is the largest, and the probability of the category corresponding to the element D and the category corresponding to the element# is the smallest. Since the probability of the class to which the element V corresponds is the greatest, the 2 nd element of the kth predicted sequence may be predicted as the element V. Analogize to the following, A k The 3 rd to 6 th probability distribution columns of (c) may be predicted as element E, element#, and element#, respectively. Then through A k It is possible to obtain the predicted sequence: "DVE#" (3 terminator may be considered 1 terminator in FIG. 2). When the calibration sequence of the kth sample data is DOVE#, the 2 nd element to the 4 th element of the predicted sequence are actually aligned with the 3 rd element to the 5 th element of the calibration sequence of the kth sample data, which is equivalent to A k One probability distribution column is missing, which is the probability distribution column corresponding to the 2 nd element of the predicted sequence.
Note that fig. 2 does not show a k Is assigned to the 0 th probability of the column. Probability distribution column sequence A k The 0 th probability distribution column of (2) is a probability distribution column of a category of the null cell data. The 0 th probability distribution column of the probability distribution column sequence is only used for calculating the editing probability, and is not actually output, i.e., is not in the probability distribution column sequence actually output. Likewise, the 0 th element of the calibration sequence is not in the calibration sequence, and the 0 th element of the kth calibration sequence is a null element.
Fig. 3 shows a training method of a neural network according to an embodiment of the present invention, which is suitable for training any neural network with a sequence as an output. Referring to fig. 3, the method flow includes the following steps.
Step 101, after n sample data are input to a neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained.
The kth probability distribution column sequence in the n probability distribution column sequences comprises m probability distribution columns. The kth probability distribution in the kth probability distribution column sequence is the probability distribution of the category of the kth unit data of the kth sample data. n, k, m and g are integers, k is more than or equal to 1 and less than or equal to n, and g is more than or equal to 1 and less than or equal to m.
The final result of the neural network output by taking the sequence as the output is a probability distribution sequence, and the predicted sequence of the sample data is obtained through the probability distribution sequence.
Illustratively, each of the n probability distribution sequences includes m probability distribution sequences. m is greater than the number of elements contained in the calibration sequence for each sample data.
Step 102, determining the editing probability from the kth probability distribution sequence to the kth calibration sequence.
The editing probability from the kth probability distribution sequence to the kth calibration sequence is that the kth probability distribution sequence is generated into the kth calibration sequence through editing operation. The kth calibration sequence is the calibration sequence of the kth sample data. The calibration sequence of the kth sample data is a sequence containing elements corresponding to the true category of each unit data in the kth sample data. For example, the kth sample data may be a commodity package picture including a trademark "DOVE", and each unit data in the kth sample data is a picture including "D", a picture including "O", a picture including "V", and a picture including "E", respectively. The calibration sequence of the kth sample data is "DOVE". The true class of the picture containing "D" corresponds to element "D" in the calibration sequence, the true class of the picture containing "O" corresponds to element "O" in the calibration sequence, the true class of the picture containing "V" corresponds to element "V" in the calibration sequence, and the true class of the picture containing "E" corresponds to element "E" in the calibration sequence.
And 103, optimizing the neural network to be trained based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences.
According to the embodiment of the invention, after n sample data are input into the neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained; determining the probability of editing from the kth probability distribution sequence to the kth calibration, wherein the probability of editing is the probability of generating the calibration sequence of the kth sample data from the kth probability distribution sequence through editing operation; when the corresponding positions between the elements of the probability distribution columns and the calibration sequence in the kth probability distribution column sequence are not in one-to-one correspondence and are misplaced, the probability distribution column sequence with small editing probability is indicated to have the probability distribution column which is possibly misplaced in the probability distribution column sequence, and the probability distribution column sequence with large editing probability is indicated to have the probability distribution column which is possibly not misplaced in the probability distribution column sequence, so that the editing probability can accurately estimate the deviation between the probability distribution column sequence and the calibration sequence no matter whether the probability distribution column is misplaced with the elements of the calibration sequence or not, and therefore, the neural network to be trained is optimized based on the editing probability, the error optimization caused by comparing each probability distribution column included in the probability distribution column sequence with each element included in the calibration sequence one by one can be avoided, and the training efficiency and the accuracy of the neural network to be trained can be improved.
Fig. 4 shows a training method of a neural network according to an embodiment of the present invention. Referring to fig. 4, the method flow includes the following steps.
Step 201, selecting n sample data from the training set database, and inputting the n sample data to the neural network to be trained.
Wherein n is greater than 1. As an alternative embodiment, n may be equal to or greater than 32.
The training set database includes a plurality of training sets. Each training set includes sample data and a calibration sequence of sample data. The training set database is a pre-established database. The type of sample data and calibration sequence is determined by the function of the neural network to be trained. For example, when the neural network to be trained is used for text recognition of natural scene pictures, the sample data is the natural scene pictures and the calibration sequence is the text. When the neural network to be trained is used for voice recognition, the sample data is a voice fragment, and the calibration sequence is a text. The training set acquisition method comprises manual generation and training set acquisition based on disclosure. Illustratively, the n sample data may be selected randomly among the training set database. N sample data may be simultaneously input to the neural network to be trained, respectively.
Referring to fig. 1, the first softmax layer in h (f (x)) 12 outputs a sequence of n probability distribution columns.
Step 202, obtaining n probability distribution column sequences output by a neural network to be trained.
Wherein the kth probability distribution column sequence is used to generate a predicted sequence of kth sample data. The kth probability distribution column sequence includes m probability distribution columns. The j-th probability distribution in the k-th probability distribution column sequence is the probability distribution of the category of the g-th unit data of the k-th sample data. n, k, m and g are integers, k is more than or equal to 1 and less than or equal to n, and g is more than or equal to 1 and less than or equal to m.
And 203, calculating the editing probability from the m-1 prefix of the kth probability distribution sequence to the s-1 prefix of the kth calibration sequence.
The jth prefix of the kth probability distribution sequence is a distribution sequence formed from the 0 th probability distribution of the kth probability distribution sequence to the jth probability distribution of the kth probability distribution sequence. The 0 th probability distribution column of the kth probability distribution column sequence is a probability distribution column of a category of the null cell data. The ith prefix of the kth calibration sequence is a sequence formed from the 0 th element of the kth calibration sequence to the ith element of the kth calibration sequence. The 0 th element of the kth calibration sequence is a null element. j and i are natural numbers. J is more than or equal to 0 and less than or equal to m-1, i is more than or equal to 0 and less than or equal to s-1.
When j=m-1, the m-1 th prefix of the kth probability distribution column sequence is a distribution column sequence constituted from the 0 th probability distribution column of the kth probability distribution column sequence to the m-1 th probability distribution column of the kth probability distribution column sequence. When i=s-1, the s-1 th prefix of the kth calibration sequence is a sequence consisting of from the 0 th element of the kth calibration sequence to the s-1 th element of the kth calibration sequence. The 0 th probability distribution column of the kth probability distribution column sequence is the probability distribution column of the class of the null unit data, and the 0 th element of the kth calibration sequence is the null element.
Step 203 may include the following four scenarios.
First case: when j=0 and i=0, the edit probability of the j-th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence is equal to 1. When m=1 and s=1, the first case is satisfied, at which time the edit probability from the m-1 th prefix of the kth probability distribution column sequence to the s-1 th prefix of the kth calibration sequence is 1.
Second case: when 0<j is less than or equal to m-1 and i=0, calculating the editing probability from the j-1 prefix of the k probability distribution sequence to the i prefix of the k calibration sequence according to the editing probability from the j-1 prefix of the k probability distribution sequence to the i prefix of the k calibration sequence. The second case is satisfied when m >1 and s=1.
Third scenario: when j=0 and 0<i is less than or equal to s-1, calculating the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence. When m=1 and s >1, the third case is satisfied.
Fourth scenario: when j is less than or equal to 1 and less than or equal to m-1, and i is less than or equal to 1 and less than or equal to s-1, calculating the edit probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence according to the edit probability from the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence, the edit probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence, and the edit probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence. The fourth scenario is satisfied when m >1 and s > 1.
In the second case, since the 0 th prefix of the kth calibration sequence is a null element, the null element needs to be generated by deleting the prefix of any kth probability distribution sequence, and the inserting operation and the retaining operation need not be performed, i.e., the inserting operation and the retaining operation are invalid in the second case. The calculation method in the second case includes the following steps, for example.
And 2-1-1, acquiring the probability of deletion operation of the jth probability distribution column in the kth probability distribution column sequence.
The j-th probability distribution column is deleted when the j-th probability distribution column is deleted.
Referring to fig. 1, the second softmax layer output R in h (f (x)) 12 includes n deletion operation probability distribution columns R (2). The kth deletion operation probability distribution column comprises probabilities that deletion operations occur in each probability distribution column in the kth probability distribution column sequence. The probability of the deletion operation of the j-th probability distribution column in the k-th probability distribution column sequence can be obtained from the k-th deletion operation probability distribution column.
And 2-1-2, calculating the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
The edit probabilities of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence can be calculated according to the formula (1)
In the formula (1),an ith prefix representing a kth calibration sequence; / >The j-1 th prefix of the sequence of kth probability distribution columns. />Representing the edit probabilities of the j-1 th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence. />Representing the probability of a deletion operation occurring in the jth probability distribution column in the kth probability distribution column sequence.
In the third case, since the 0 th prefix of the k-th probability distribution column sequence is a probability distribution of null unit data, the 0 th prefix of the k-th probability distribution column sequence needs to be generated by the inserting operation to the i-th prefix of the k-th calibration sequence, and the deleting operation and the retaining operation may not need to be performed, i.e., the deleting operation and the retaining operation may be ineffective in the third case. Illustratively, the calculation in the third case includes the following steps.
And step 3-1-1, obtaining the probability of the inserting operation of the jth probability distribution column in the kth probability distribution column sequence.
The j-th probability distribution column is inserted between the j-1-th probability distribution column and the j-th probability distribution column.
Referring to fig. 1, the second softmax layer output R in h (f (x)) 12 includes n insert operation probability distribution columns R (1); the kth insertion probability distribution column comprises probabilities that insertion operations respectively occur in each probability distribution column in the kth probability distribution column sequence. The probability of the occurrence of the insert operation in the j-th probability distribution column can be obtained from the k-th insert probability distribution column.
And 3-1-2, obtaining the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column is subjected to the insertion operation.
Referring to fig. 1, the third softmax layer in h (f (x)) 12 outputs n category insertion probability distributions I, where the kth category insertion probability distribution includes probabilities of inserting the categories of the first softmax layer when insertion operations occur in the respective probability distributions in the kth probability distribution sequence. The probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column performs the inserting operation can be obtained from the kth category inserting probability distribution column.
And 3-1-3, determining the editing probability from the j-th prefix of the k-th probability distribution column sequence to the i-th prefix of the k-th calibration sequence based on the probability of the j-th probability distribution column in the k-th probability distribution column sequence for inserting operation, the probability of inserting the category corresponding to the i-th element in the k-th calibration sequence when the j-th probability distribution column for inserting operation occurs, and the editing probability from the j-th prefix of the k-th probability distribution column sequence to the i-th prefix of the k-th calibration sequence.
The edit probabilities of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence can be calculated according to the formula (2)
/>
In the formula (2),an i-1 th prefix representing a kth calibration sequence; />A j-th prefix representing a k-th sequence of probability distribution columns. />Edit probability representing the j-th prefix of the k-th probability distribution column sequence to the i-1-th prefix of the k-th calibration sequence,/>Representing the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column in the kth probability distribution column sequence performs the insertion operation,/for the category corresponding to the ith element in the kth calibration sequence>Representing the probability of an insert operation occurring in the jth probability distribution column in the kth sequence of probability distribution columns.
In the fourth case, according to the definition of the editing probability, it is necessary to calculate the probability of generating the ith prefix of the kth calibration sequence from the jth prefix of the kth probability distribution column sequence through the insertion operation, the deletion operation, and the retention operation, respectively. Illustratively, the fourth scenario includes the following steps 2031-2034.
Step 2031, calculating a retention probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probabilities from the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence.
The retention probabilities from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence may be probabilities that each probability distribution column in the jth prefix of the kth probability distribution column sequence is aligned with each element in the ith prefix of the kth calibration sequence, respectively.
Referring to fig. 5, the present step 2031 may include the following steps 2031a to 2031c.
Step 2031a, obtaining the probability of the occurrence of the retention operation in the j-th probability distribution column in the k-th probability distribution column sequence.
The j probability distribution columns in the k probability distribution column sequence are reserved, and editing is not performed on the j probability distribution columns. The j-th probability distribution column is reserved, which means that the j-th probability distribution column is aligned with the j-th element in the k-th calibration sequence.
Referring to fig. 1, the second softmax layer output R in h (f (x)) 12 includes n retention operational probability distribution columns R (3). The kth retention probability distribution column includes probabilities that retention operations occur for respective ones of the kth probability distribution columns in the sequence of kth probability distribution columns. The probability of the occurrence of the retention operation for the j-th probability distribution column can be obtained from the k-th retention probability distribution column.
Step 2031b, obtaining the probability of the category corresponding to the ith element in the kth calibration sequence in the jth probability distribution column.
The j-th probability distribution column includes probability distributions of the category of the j-th unit data of the k-th sample data. The probability of the category corresponding to the ith element in the kth calibration sequence can be obtained from the jth probability distribution column.
Step 2031c, determining a retention probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the occurrence of the retention operation in the jth probability distribution column of the kth probability distribution column sequence, the probability of the class of the jth probability distribution column corresponding to the ith element of the kth calibration sequence, and the edit probability of the jth-1 prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
This step 2031c may include: the retention probabilities of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence are determined according to the following formula (3).
In formula (3), ep C,i,j Representing the retention probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.An i-1 th prefix representing a kth calibration sequence; />A j-1 th prefix representing a k-th sequence of probability distribution columns; />Edit probability of the j-1 th prefix of the kth probability distribution column sequence to the i-1 th prefix of the kth calibration sequence,/, is represented by ∈>Representing the probability of a retention operation occurring in the jth probability distribution column of the kth probability distribution column sequence,/for the sequence of kth probability distribution columns>Representing the probability of the category in the j-th probability distribution column in the k-th probability distribution column sequence corresponding to the i-th element in the k-th calibration sequence.
Step 2032, calculating the insertion probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability from the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence.
The insertion probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence may be a probability that the probability distribution column in the jth prefix of the kth probability distribution column sequence is not aligned with the element in the ith prefix of the kth probability distribution column sequence due to the missing probability distribution column of the probability distribution column in the jth prefix of the kth probability distribution column sequence.
Referring to fig. 6, step 2032 may include steps 2032 a-2032 c.
Step 2032a, obtaining the probability of the insertion operation of the jth probability distribution column in the kth probability distribution column sequence.
The j-th probability distribution column is inserted between the j-1-th probability distribution column and the j-th probability distribution column.
Step 2032a is the same as step 3-1-1 and will not be described here again.
Step 2032b, obtaining the probability of inserting the category corresponding to the i element in the kth calibration sequence when the j-th probability distribution column performs the insertion operation.
Step 2032b is the same as step 3-1-2 and will not be described here again.
Step 2032c, determining the probability of inserting the jth prefix of the kth probability distribution sequence into the ith prefix of the kth calibration sequence based on the probability of inserting the jth probability distribution sequence in the kth probability distribution sequence, the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution sequence is inserted, and the probability of editing the jth prefix of the kth probability distribution sequence into the ith-1 prefix of the kth calibration sequence.
Step 2032c may include: the insertion probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence is determined according to the following formula (4).
In formula (4), ep I,i,j Representing the probability of insertion of the jth prefix of the kth probability distribution column sequence into the ith prefix of the kth calibration sequence.Edit probability representing the j-th prefix of the k-th probability distribution column sequence to the i-1-th prefix of the k-th calibration sequence,/>Representing the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column in the kth probability distribution column sequence performs the insertion operation,/for the category corresponding to the ith element in the kth calibration sequence>Representing the probability of an insert operation occurring in the jth probability distribution column in the kth sequence of probability distribution columns.
Step 2033, calculating deletion probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability of the jth-1 prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence.
The deletion probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence may be a probability that the probability distribution column in the jth prefix of the kth probability distribution sequence is misaligned with the element in the ith prefix of the kth probability distribution sequence due to an increase in probability distribution column occurrence probability distribution columns in the jth prefix of the kth probability distribution sequence.
Referring to fig. 7, step 2033 may include step 2033a and step 2033b.
Step 2033a, obtaining the probability of the deletion operation of the j-th probability distribution column in the k-th probability distribution column sequence.
The deletion of the j-th probability distribution column includes deleting the j-th probability distribution column.
Step 2033a is the same as step 2-1-1, and will not be described here again.
Step 2033b, determining deletion probabilities from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation occurring in the jth probability distribution column of the kth probability distribution column sequence and the edit probabilities from the jth-1 prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Step 2033b may include: the deletion probabilities of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence are determined according to the following formula (5).
In equation (5), ep D,i,j Representing the deletion probabilities of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.Representing the edit probabilities of the j-1 th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence. />Representing the probability of a deletion operation occurring in the jth probability distribution column in the kth probability distribution column sequence.
Step 2034, determining an edit probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the retention probability, the insertion probability, and the deletion probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Specifically, the edit probabilities from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence are calculated according to the following formula (6).
When the editing probability is calculated, the situation that the probability distribution columns in the probability distribution column sequence are aligned with elements in the calibration sequence, the situation that the probability distribution columns in the probability distribution column sequence are in missing state and the situation that the probability distribution columns in the probability distribution column sequence are in increased state are considered; therefore, after the neural network to be trained is optimized through editing the probability, the error optimization caused by comparing each probability distribution column included in the probability distribution column sequence with each element included in the calibration sequence one by one under the condition that the probability distribution column is increased and deleted in the probability distribution column sequence can be avoided, so that the training efficiency and the accuracy of the trained neural network can be improved.
The principle of calculation of edit probabilities is described below. Referring to fig. 8,i, j increases in a sequential manner from top to bottom, j increases in a sequential manner from left to right, i and j are each counted from 0, and a filled circle in the ith row and jth column represents ep (T 1:i ,A 1:j ),ep(T 1:i ,A 1:j ) Is approximatelyThe j-th prefix A of the rate distribution column sequence 1:j Ith prefix T to corresponding calibration sequence 1:i Is a rule for editing the probability of a file. Wherein, as i increases from 0, the character string on the left side of the solid circle sequentially corresponds to each prefix of the calibration sequence, and when i=0, the 0 th element of the calibration sequence is a null element; in fig. 8, the null element is denoted by the symbol "". The respective filled circles located in row 0 represent the probabilities of the respective probability distributions in the sequence of probability distributions respectively reaching the null elements. The filled circles in column 0 represent the probabilities of the null element to each element in the calibration sequence, respectively. As can be seen from fig. 8, the position numbers of the probability distribution columns in the probability distribution column sequence are close to the position numbers of the elements in the calibration sequence, and the larger the density of the black dots filled with the corresponding filled circles is, while the position numbers of the probability distribution columns in the probability distribution column sequence are far away from the position numbers of the elements in the calibration sequence, and the smaller the density of the black dots filled with the corresponding filled circles is.
The filled circles (marked with a single circle in dashed lines in FIG. 8) of row 2 and column 2 represent ep (T 1:2 ,A 1:2 ) I.e. probability distribution sequence 2 nd prefix A 1:2 Prefix T2 to the calibration sequence 1:2 Is a rule for editing the probability of a file. ep (T) 1:2 ,A 1:2 ) The upper left filled circle represents ep (T 1:1 ,A 1:1 ) (dashed single box labeled in fig. 8). ep (T) 1:2 ,A 1:2 ) And ep (T) 1:1 ,A 1:1 ) Connected by diagonal arrows, and the diagonal arrows point to ep (T 1:2 ,A 1:2 )。T 1:1 For 1 st prefix of the calibration sequence, A 1:1 For prefix 1 of the probability distribution column sequence, assume A 1:2 To the sequence T 1:2 Is aligned, then, from ep (T 1:1 ,A 1:1 ) To ep (T) 1:2 ,A 1:2 ) Is through a hold operation, and thus, the diagonal arrow represents A 1:2 Generating T through reservation operations 1:2 Is a probability of (2).
ep(T 1:2 ,A 1:2 ) The upper filled circle represents ep (T 1:1 ,A 1:2 ) (dashed double circles in FIG. 8). ep (T) 1:2 ,A 1:2 ) And ep (T) 1:1 ,A 1:2 ) Are connected by vertical arrows, and the vertical arrows point to ep (T 1:2 ,A 1:2 ). Let the 2 nd prefix A of the probability distribution column sequence 1:2 1 st prefix T to the calibration sequence 1:2 Due to the dislocation of the missing probability distribution columns, the probability distribution is based on ep (T 1:1 ,A 1:2 ) The missing probability distribution column should correspond to the 2 nd element in T (counted from 0), then from ep (T 1:1 ,A 1:2 ) To ep (T) 1:2 ,A 1:2 ) Is through an insertion operation, and thus, the vertical arrow indicates A 1:2 Generating T by an insert operation 1:2 Is a probability of (2).
ep(T 1:2 ,A 1:2 ) The filled circles on the left represent ep (T) 1:2 ,A 1:1 ) (dashed double box labeled in fig. 8). ep (T) 1:2 ,A 1:2 ) And ep (T) 1:2 ,A 1:1 ) Connected by horizontal arrows, and the horizontal arrows point to ep (T 1:2 ,A 1:2 ). Suppose A 1:2 To T 1:2 Due to the misalignment of the increased probability distribution, the probability distribution is determined based on ep (T 1:2 ,A 1:1 ) The increased probability distribution column should correspond to the 2 nd element in T (counted from 0), then the probability distribution column is calculated from ep (T 1:2 ,A 1:1 ) To ep (T) 1:2 ,A 1:2 ) Is through a delete operation, and thus, the horizontal arrow indicates A 1:2 Generating T by delete operation 1:2 Is a probability of (2).
Referring to fig. 8, the first case described above is depicted by the 1 st dot in the upper left corner of fig. 8. The second case described above is the first row of dots in fig. 8 (except for the 1 st dot in the first row and the last dot in the first row), where the current dot is obtained by the deletion operation from the left dot (horizontal arrow connection). The third case described above is illustrated with the first row of dots in fig. 8 (except for the 1 st dot in the first row and the last dot in the first row), the current dot being obtained by the insert operation from the top dot (vertical arrow connection). The fourth case described above is a dot other than the round of dots except for the top circle in fig. 8, the current dot is obtained by a left obliquely upper dot (diagonal arrow connection), a right upper dot (vertical arrow connection), and a left dot (horizontal arrow connection) through a hold operation, a delete operation, and an insert operation, respectively.
And 204, calculating the editing probability of the (s-1) prefix from the kth probability distribution sequence to the kth calibration sequence.
Step 204 includes the following two cases.
First case: when i=0, the edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence is calculated based on the edit probabilities of the (m-1) th prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence.
Second case: when 0<i is less than or equal to s-1, calculating the edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth calibration sequence, the edit probability of the (i-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth probability distribution sequence, and the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i) th prefix of the kth calibration sequence.
The first case is similar to the second case in step 203. Referring to fig. 8, the first case is described by the 1 st dot (i.e., the upper right dot) in the last column of dots in fig. 8, which is obtained by the deletion operation from the left dot (horizontal arrow connection). The calculation method in the first case includes the following steps, for example.
And step 1-2-1, obtaining the probability of deleting the mth probability distribution column in the kth probability distribution column sequence.
The step 1-2-1 is identical to the step 2-1-1, and will not be described here again.
And step 1-2-2, calculating the editing probability from the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation of the jth probability distribution sequence in the kth probability distribution sequence and the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence.
The edit probabilities of the ith prefix of the kth probability distribution sequence to the kth calibration sequence can be calculated according to equation (7)
In the formula (7) of the present invention,an ith prefix representing a kth calibration sequence; />Representing the kth sequence of probability distribution columns.Representing the edit probabilities of the m-1 th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence. />Representing the probability of the deletion operation occurring in the mth probability distribution column in the kth probability distribution column sequence.
The second case is similar to the fourth case in step 203. Referring to fig. 8, the second case describes the dots other than the 1 st dot and the lower right dot in the last column of dots in fig. 8, the current dot being obtained by the left obliquely upper dot (diagonal arrow connection), the right upper dot (vertical arrow connection), and the left dot (horizontal arrow connection) through the hold operation, the delete operation, and the insert operation, respectively. By way of example only, and in an illustrative, the calculation method in the second case includes the following steps.
And 2-2-1, calculating the retention probability of the ith prefix from the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth calibration sequence.
Calculating the retention probability ep of the ith prefix of the kth probability distribution sequence to the kth calibration sequence according to the following formula (8) C,i,m
In the formula (8), the expression "a",an i-1 th prefix representing a kth calibration sequence; />An m-1 th prefix representing a k-th probability distribution column sequence; />Edit probability of the (m-1) th prefix of the (k) th probability distribution column sequence to the (i-1) th prefix of the (k) th calibration sequence,/->Representing the probability of a retention operation occurring in the mth probability distribution column of the kth probability distribution column sequence,/for the mth probability distribution column>Representing the probability of the category corresponding to the i element in the kth calibration sequence in the mth probability distribution column in the kth probability distribution column sequence.
And 2-2-2, calculating the insertion probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence.
Calculating the insertion probability ep of the kth prefix of the kth probability distribution sequence to the kth calibration sequence according to the following formula (9) I,i,m
In the formula (9) of the present invention,edit probability of the i-1 th prefix representing the kth probability distribution sequence to the kth calibration sequence,/for the sequence of the kth probability distribution sequence>And the probability of the category corresponding to the ith element in the kth calibration sequence is shown when the mth probability distribution column in the kth probability distribution column sequence is inserted in the insertion operation.
And 2-2-3, calculating the deletion probability of the ith prefix from the kth probability distribution sequence to the kth calibration sequence based on the editing probability of the (m-1) th prefix from the kth probability distribution sequence to the ith prefix of the kth calibration sequence.
The deletion probability ep of the ith prefix of the kth probability distribution sequence to the kth calibration sequence is calculated according to the following formula (10) D,i,m
In the formula (10) of the present invention,representing the edit probabilities of the m-1 th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence. />Representing the probability of the deletion operation occurring in the mth probability distribution column in the kth probability distribution column sequence.
And 2-2-4, determining the editing probability of the ith prefix from the kth probability distribution sequence to the kth calibration sequence based on the retention probability, the insertion probability and the deletion probability of the ith prefix from the kth probability distribution sequence to the kth calibration sequence.
Specifically, the edit probability of the ith prefix of the kth probability distribution sequence to the kth calibration sequence is determined according to the following formula (11)
And 205, calculating the editing probability from the m-1 prefix of the kth probability distribution sequence to the kth calibration sequence.
Step 205 includes the following two conditions.
First condition: when j=0, the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence is obtained based on the editing probability from the jth prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence.
Second condition: when 0<j is less than or equal to m-1, the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence, the editing probability from the jth prefix of the kth probability distribution sequence to the jth prefix of the kth calibration sequence, and the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence are obtained based on the jth-1 prefix of the kth probability distribution sequence to the kth-1 prefix of the kth probability distribution sequence, the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence.
The first situation is similar to the third situation in step 203. Referring to fig. 8, the first case is described by the 1 st dot (i.e., the lower left dot) of the last row of dots in fig. 8, which is obtained by the insert operation from the upper dot (vertical arrow connection). Illustratively, the first condition includes the following steps.
And step 1-2-1, obtaining the probability of the inserting operation of the jth probability distribution column in the kth probability distribution column sequence.
Step 1-2-1 is the same as step 3-1-1, and will not be described here again.
And step 1-2-2, obtaining the probability of inserting the class corresponding to the s-th element in the k-th calibration sequence when the j-th probability distribution column is subjected to the insertion operation.
Step 1-2-2 is the same as step 3-1-2, and will not be described here again.
Step 1-2-3, calculating the editing probability from the j-th prefix of the k-th probability distribution sequence to the k-th calibration sequence based on the probability of the j-th probability distribution sequence in the k-th probability distribution sequence for inserting operation, the probability of the category corresponding to the s-th element in the k-th calibration sequence inserted when the j-th probability distribution sequence is inserted, and the editing probability from the j-th prefix of the k-th probability distribution sequence to the s-1-th prefix of the k-th calibration sequence.
The edit probabilities of the jth prefix of the kth probability distribution column sequence to the kth calibration sequence can be calculated according to the formula (12)
In the formula (12) of the present invention,representing a kth calibration sequence; />A j-th prefix representing a k-th sequence of probability distribution columns. />The s-1 prefix of the kth calibration sequence is represented. />Edit probability representing the jth prefix of the kth probability distribution column sequence to the s-1 th prefix of the kth calibration sequence,/, for the sequence of the kth probability distribution column sequence >Representing the probability of inserting the category corresponding to the s element in the kth calibration sequence when the j-th probability distribution column in the kth probability distribution column sequence generates the inserting operation,representing the probability of an insert operation occurring in the jth probability distribution column in the kth sequence of probability distribution columns.
The second situation is similar to the fourth situation in step 203. Referring to fig. 8, the second case describes that, of the last row of dots in fig. 8, the dots except for the 1 st dot and the last 1 st dot (i.e., the lower right corner dot), the current dot is obtained by a left-obliquely-upper dot (diagonal arrow connection), a right-above dot (vertical arrow connection), and a left-side dot (horizontal arrow connection) through a hold operation, a delete operation, and an insert operation, respectively. Illustratively, the second condition includes the following steps.
And 2-3-1, calculating the retention probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the jth-1 prefix of the kth probability distribution sequence to the (s-1) prefix of the kth calibration sequence.
Calculating the retention probability ep of the jth prefix of the kth probability distribution sequence to the kth calibration sequence according to the following formula (13) C,s,j
In the formula (13) of the present invention,an s-1 th prefix representing a kth calibration sequence; />A j-1 th prefix representing a k-th sequence of probability distribution columns; />Edit probability of the j-1 th prefix of the kth probability distribution column sequence to the s-1 th prefix of the kth calibration sequence,/, is represented>Representing the probability of a retention operation occurring in the jth probability distribution column of the kth probability distribution column sequence,/for the sequence of kth probability distribution columns>Representing the probability of the category in the j-th probability distribution column in the k-th probability distribution column sequence corresponding to the s-th element in the k-th calibration sequence. The s-th element in the kth calibration sequence is the last bit element of the calibration sequence, indicating the end of the calibration sequence, and in this embodiment, the symbol '#' is used to indicate the last bit element of each calibration sequence.
And 2-3-2, calculating the insertion probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the jth prefix of the kth probability distribution sequence to the (s-1) prefix of the kth calibration sequence.
Calculating the insertion probability ep of the jth prefix of the kth probability distribution sequence to the kth calibration sequence according to the following formula (14) I,s,j
In the formula (14) of the present invention,edit probability representing the jth prefix of the kth probability distribution column sequence to the s-1 th prefix of the kth calibration sequence,/, for the sequence of the kth probability distribution column sequence >The probability of the category corresponding to the s element in the k calibration sequence is inserted when the j probability distribution column in the k probability distribution column sequence generates the insertion operation. />Representing the probability of an insert operation occurring in the jth probability distribution column in the kth sequence of probability distribution columns.
And 2-3-3, calculating deletion probability from the j prefix of the k probability distribution sequence to the k calibration sequence based on the editing probability from the j-1 prefix of the k probability distribution sequence to the k calibration sequence.
Calculating the deletion probability ep of the jth prefix of the kth probability distribution sequence to the kth calibration sequence according to the following formula (15) D,s,j
/>
In the formula (15) of the present invention,representing the edit probabilities of the j-1 prefix of the kth probability distribution column sequence to the kth calibration sequence.
And 2-3-4, determining the editing probability from the j prefix of the k probability distribution sequence to the k calibration sequence based on the retention probability, the insertion probability and the deletion probability from the j prefix of the k probability distribution sequence to the k calibration sequence.
Specifically, the edit probabilities from the jth prefix of the kth probability distribution sequence to the kth calibration sequence are determined according to the following formula (16)
And 206, calculating the editing probability from the kth probability distribution sequence to the kth calibration sequence according to the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth probability distribution sequence, the editing probability from the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence and the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the kth calibration sequence.
Illustratively, step 206 includes steps 2061-2064 as follows.
Step 2061, calculating the retention probability from the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence.
The retention probability ep of the kth probability distribution sequence to the kth calibration sequence is calculated according to the following formula (17) C,s,m
In the formula (17) of the present invention,an s-1 th prefix representing a kth calibration sequence; />An m-1 th prefix representing a k-th probability distribution column sequence; />Edit probability of the (m-1) th prefix of the (k) th probability distribution column sequence to the (s-1) th prefix of the (k) th calibration sequence, (-)>Representing the probability of a retention operation occurring in the jth probability distribution column of the kth probability distribution column sequence,/for the sequence of kth probability distribution columns>Representing the probability of the category corresponding to the s-th element in the kth calibration sequence in the mth probability distribution column in the kth probability distribution column sequence. The kth probability distribution column sequence includes the mth probability distribution column sequenceThe rate distribution column is the last probability distribution column of the probability distribution column sequence, and is the probability distribution of the category of the last unit data of the kth sample data. The s-th element in the kth calibration sequence is the last bit element of the calibration sequence, indicating the end of the calibration sequence, and in this embodiment, the symbol '#' is used to indicate the last bit element of each calibration sequence.
Step 2062, calculating the insertion probability of the kth probability distribution sequence to the kth calibration sequence based on the editing probability of the kth prefix (s-1) from the kth probability distribution sequence to the kth calibration sequence.
Calculating the insertion probability ep of the kth probability distribution sequence to the kth calibration sequence according to the following formula (18) I,s,m
In the formula (18), the number of the symbols,representing the kth sequence of probability distribution columns. />Edit probability of the (s-1) th prefix representing the kth probability distribution sequence to the kth calibration sequence,/for the (k) th probability distribution sequence>And the probability of inserting the category corresponding to the s element in the kth calibration sequence when the inserting operation occurs to the mth probability distribution column in the kth probability distribution column sequence is represented.
Step 2063, calculating the deletion probability from the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the kth calibration sequence.
The deletion probability ep of the kth probability distribution sequence to the kth calibration sequence is calculated according to the following formula (19) D,s,m
In the formula (19), the expression "a",indicating the kth calibration sequence. />Representing the edit probabilities of the m-1 prefix of the kth probability distribution column sequence to the kth calibration sequence.
Step 2064, determining the edit probability of the kth probability distribution sequence to the kth calibration sequence based on the retention probability, insertion probability, and deletion probability of the kth probability distribution sequence to the kth calibration sequence.
Specifically, the edit probabilities of the kth probability distribution sequence to the kth calibration sequence are determined according to the following formula (20)
The editing probability from the kth probability distribution sequence to the kth calibration sequence is determined through the steps 203 to 206. Based on the formulas (1) -20, the edit probability of calculating the kth probability distribution sequence to the kth calibration sequence can be deducedAs shown in equation (21).
Wherein, the liquid crystal display device comprises a liquid crystal display device,j=m, ++>The kth probability distribution column sequence is represented, that is, a sequence consisting of the 0 th probability distribution column to the mth probability distribution column of the kth probability distribution column sequence.Representing i=s, when i=s, +.>Represents the kth calibration sequence, i.e., the sequence from the 0 th element to the s-th element of the kth calibration sequence.
Step 207, calculating the sum of deviations of each probability distribution sequence and the calibration sequence corresponding to each probability distribution sequence based on the edit probabilities from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences.
Step 207 may comprise calculating a sum of deviations of the respective probability distribution sequences from the calibration sequences corresponding to the respective probability distribution sequences by means of a loss function. The loss function is shown in equation (22).
In the formula (22), J represents the sum of the deviations, and ep_k represents the edit probabilities from the kth probability distribution sequence to the kth calibration sequence. In this embodiment, the sum of the deviations calculated by the loss function is negative, which is to facilitate the subsequent derivation of the deviation.
And step 208, updating weight parameters of each layer of the neural network to be trained based on the deviation sum of each probability distribution sequence and the calibration sequence corresponding to each probability distribution sequence.
The neural network is trained to optimize the weight parameters of each layer of the neural network. Referring to fig. 1, the weight parameters of the neural network section 10 include weight parameters of f (x) 11 and H (f (x)) 12. The weight parameters of each layer of the neural network to be trained can be updated by a gradient descent method. The working principle of the gradient descent method is described below.
First, the influence of the respective weight parameter on the sum of the deviations is determined.
If the weight parameter has a relatively large influence on the sum of the deviations, which proves to be a great influence on the sum of the deviations, the pace at which it needs to be adjusted is also great. Otherwise, the adjustment is slightly performed. The single weight parameter affects how much the sum of the deviations affects, mathematically, the partial derivative of the sum of the deviations to the corresponding weight parameter. The partial derivative means that at the current position, if the weight is shifted by a small deviation, this will cause the magnitude of the change in the sum of the deviations. In calculating the partial derivatives of the sum of the deviations for the respective weight parameters, a chain law solution may be applied.
Assuming W is the weight parameter of H (f (x)) 12 and J is the sum of the deviations, the effect of the weight parameter W on the sum of the deviations J isIn calculating->At this time, the calculation can be performed according to the chain law. Specifically, the +_can be calculated by the following formula (23) to formula (31)>
And secondly, determining each updated weight parameter based on the influence of each weight parameter on the sum of the deviation.
The respective updated weight parameters may be determined according to the following formula (32).
Wherein W is new For the updated weight parameters, W is the weight parameter before updating, eta is the learning rate value, and J is the sum of deviations. η is used to adjust the update amplitude of the weight parameter, η is a set value, and η may be set based on experience.
The step 207 and the step 208 realize that the neural network to be trained is optimized based on the editing probability from each probability distribution sequence in the n probability distribution sequences to the corresponding calibration sequence.
It should be noted that, steps 201 to 208 are only 1 complete training process of the neural network. The training times of the neural network are generally more than millions. For example, the number of training may be greater than or equal to 200 ten thousand times. That is, steps 201-208 are required to be performed more than 200 ten thousand times to complete the training of the neural network.
Fig. 9 shows a sequence prediction method provided by an embodiment of the present invention, referring to fig. 9, the method includes the following steps.
Step 301, inputting data to be tested into a neural network.
The neural network may be trained by using the training method of the neural network shown in any one of fig. 3 to 7.
Step 302, a predicted probability distribution sequence output by the neural network is obtained.
The predicted sequence of probability distribution columns includes a plurality of probability distribution columns, each probability distribution column including a plurality of probabilities.
Step 303, determining a predicted sequence corresponding to the data to be measured based on the predicted probability distribution sequence output by the neural network.
According to the embodiment of the invention, the data to be detected is input into the neural network, and the predicted probability distribution sequence output by the neural network is obtained; firstly, after n sample data are input into a neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained; secondly, determining the probability of editing from the kth probability distribution sequence to the kth calibration, wherein the probability of editing is the probability of generating the calibration sequence of the kth sample data from the kth probability distribution sequence through editing operation; secondly, optimizing the neural network to be trained based on the editing probability; when the corresponding positions between the elements of the probability distribution columns and the calibration sequence in the kth probability distribution column sequence are not in one-to-one correspondence and are misplaced, the probability distribution column sequence with small editing probability is indicated to be misplaced in the probability distribution column sequence, and the probability distribution column sequence with large editing probability is indicated to be misplaced in the probability distribution column sequence, so that the editing probability can accurately estimate the deviation between the probability distribution column sequence and the calibration sequence no matter whether the probability distribution column is misplaced with the elements of the calibration sequence or not, and therefore, the neural network to be trained is optimized based on the editing probability, the error optimization caused by comparing each probability distribution column included in the probability distribution column sequence with each element included in the calibration sequence one by one can be avoided, and the training efficiency and the accuracy of the neural network to be trained can be improved; when the accuracy of the neural network is high, the accuracy of the predicted probability distribution sequence is also high, so that the accuracy of the predicted sequence corresponding to the determined data to be detected is also high based on the predicted probability distribution sequence output by the neural network.
Fig. 10 shows a sequence prediction method provided by an embodiment of the present invention, referring to fig. 10, the method includes the following steps.
Step 401, inputting data to be tested into a neural network.
The neural network may be trained by using the training method of the neural network shown in any one of fig. 3 to 7.
Step 402, a predicted probability distribution sequence output by the neural network is obtained.
Step 403, determining a predicted sequence corresponding to the data to be measured based on the predicted probability distribution sequence output by the neural network.
Exemplary, the embodiment of the invention provides two ways to determine the prediction sequence corresponding to the data to be tested.
The first way is: firstly, determining the maximum probability in each probability distribution column from a predicted probability distribution column sequence; secondly, determining the corresponding element of each maximum probability; finally, a predicted sequence is determined based on the element corresponding to each maximum probability. The elements in the predicted sequence include elements corresponding to respective maximum probabilities, and the elements are arranged in the order of the probability distribution columns corresponding to the elements in the predicted sequence of probability distribution columns.
The second way is: and acquiring each sequence of a sequence library, and determining a predicted sequence corresponding to the data to be detected based on each acquired sequence and the predicted probability distribution sequence output by the neural network. The sequence library stores a plurality of sequences, and the sequences comprise real sequences corresponding to data to be tested. For example, where the neural network is adapted for use in an OCR scene, the sequence library may be a dictionary. The second approach may include the following steps 4031-4032.
Step 4031, determining the editing probability of the predicted probability distribution sequence to each sequence in the sequence library.
The editing probability from the predicted probability distribution sequence to the t sequence in the sequence library is that the predicted probability distribution sequence is generated into the t sequence through editing operation.
The manner of determining the editing probabilities of the predicted probability distribution sequence to each sequence in the sequence library may refer to steps 203 to 206 in the embodiment shown in fig. 4, which is not described herein.
Step 4032, determining a predicted sequence corresponding to the data to be detected based on the editing probabilities of the predicted probability distribution sequence to each sequence in the sequence library.
The predicted sequence corresponding to the data to be measured may be a sequence corresponding to the maximum editing probability in the editing probabilities of the sequences in the sequence library, where the predicted probability distribution sequence is respectively.
For example, referring to fig. 1, assuming that each sequence in the sequence library C is C1, C2, …, among probabilities of generating each sequence from probability distribution sequence by an editing operation, probability of generating sequence Ct of probability distribution sequence is maximum, the sequence T is predicted Ct is the number.
According to the embodiment of the invention, the data to be detected is input into the neural network, and the predicted probability distribution sequence output by the neural network is obtained; the neural network is trained by adopting the training method of the neural network shown in any one of fig. 3 to 7, the accuracy is higher, the accuracy of the output predicted probability distribution sequence is also higher, and thus, the accuracy of the predicted sequence corresponding to the determined data to be detected is also higher based on the predicted probability distribution sequence output by the neural network.
Fig. 11 shows a training device for a neural network according to an embodiment of the present invention, referring to fig. 11, the device 50 includes: an acquisition module 501, a determination module 502 and an optimization module 503.
The obtaining module 501 is configured to obtain, after n sample data are input to the neural network to be trained, n probability distribution column sequences output by the neural network to be trained, where a kth probability distribution column sequence in the n probability distribution column sequences includes m probability distribution columns, a kth probability distribution column in the kth probability distribution column sequence is a probability distribution of a class of a g unit data of kth sample data in the n sample data, where n, k, m and g are integers, k is greater than or equal to 1 and less than or equal to n, and g is greater than or equal to 1 and less than or equal to m.
The determining module 502 is configured to determine an edit probability from a kth probability distribution sequence to a kth calibration sequence, where the edit probability from the kth probability distribution sequence to the kth calibration sequence is a probability that the kth probability distribution sequence generates the kth calibration sequence through an editing operation, and the kth calibration sequence is a calibration sequence of the kth sample data;
and an optimizing module 503, configured to optimize the neural network to be trained based on the edit probabilities from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences.
Illustratively, the determining module 502 is configured to calculate an edit probability of an mth-1 prefix of the kth probability distribution sequence to an mth-1 prefix of the kth calibration sequence, an edit probability of an kth probability distribution sequence to an mth-1 prefix of the kth calibration sequence, and an edit probability of an mth-1 prefix of the kth probability distribution sequence to an kth calibration sequence, where the jth prefix of the kth probability distribution sequence is a distribution sequence formed from an 0 th probability distribution of the kth probability distribution sequence to an jth probability distribution of the kth probability distribution sequence, where the ith prefix of the kth calibration sequence is a sequence formed from an 0 th element of the kth calibration sequence to an i th element of the kth calibration sequence, where j and i are natural numbers, where j is equal to or less than m-1, and 0 is equal to or less than or equal to i is equal to s-1; the 0 th probability distribution column of the kth probability distribution column sequence is the probability distribution column of the category of the empty unit data, and the 0 th element of the kth calibration sequence is an empty element; and calculating the editing probability from the kth probability distribution sequence to the kth calibration sequence according to the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence, the editing probability from the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence and the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the kth calibration sequence.
Illustratively, the determining module 502 is configured to, when j=0 and i=0, set the edit probability from the j-th prefix of the k-th probability distribution column sequence to the i-th prefix of the k-th calibration sequence equal to 1; when 0<j is less than or equal to m-1 and i=0, calculating the editing probability from the j-1 prefix of the k probability distribution sequence to the i prefix of the k calibration sequence according to the editing probability from the j-1 prefix of the k probability distribution sequence to the i prefix of the k calibration sequence; when j=0 and 0<i is less than or equal to s-1, calculating the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence; when j is less than or equal to 1 and less than or equal to m-1, and i is less than or equal to 1 and less than or equal to s-1, calculating the edit probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence according to the edit probability from the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence, the edit probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence, and the edit probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to obtain a probability of a deletion operation occurring in a jth probability distribution column in the kth probability distribution column sequence, where the deletion operation occurring in the jth probability distribution column is deleting the jth probability distribution column; and calculating the editing probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probability from the jth-1 prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to obtain a probability of an insert operation occurring in a jth probability distribution column in the kth probability distribution column sequence, where the insert operation occurs in the jth probability distribution column, and insert a probability distribution column between the jth-1 probability distribution column and the jth probability distribution column in the kth probability distribution column sequence; acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column is subjected to insertion operation; and determining the editing probability from the j-th prefix of the k-th probability distribution column sequence to the i-th prefix of the k-th calibration sequence based on the probability of the j-th probability distribution column in the k-th probability distribution column sequence for inserting operation, the probability of the category corresponding to the i-th element in the k-th calibration sequence inserted when the j-th probability distribution column for inserting operation occurs, and the editing probability from the j-th prefix of the k-th probability distribution column sequence to the i-1-th prefix of the k-th calibration sequence.
Illustratively, the determining module 502 is configured to calculate a retention probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on an edit probability of the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence; calculating the insertion probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence; calculating deletion probability of the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability of the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence; the edit probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence is determined based on the retention probability, the insertion probability, and the deletion probability of the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to obtain a probability that a retention operation occurs in a j-th probability distribution column in the k-th probability distribution column sequence, where the retention operation occurs in the j-th probability distribution column, and no editing is performed on the j-th probability distribution column; acquiring the probability of a category corresponding to the ith element in the kth calibration sequence in the jth probability distribution column; and determining the retention probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the occurrence of the retention operation in the jth probability distribution column in the kth probability distribution column sequence, the probability of the class corresponding to the ith element in the kth calibration sequence in the jth probability distribution column, and the edit probability from the jth-1 prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to obtain a probability of an insert operation occurring in a jth probability distribution column in the kth probability distribution column sequence; acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column is subjected to insertion operation; and determining the insertion probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the jth probability distribution column in the kth probability distribution column sequence for insertion operation, the probability of the category corresponding to the ith element in the kth calibration sequence inserted when the jth probability distribution column for insertion operation occurs, and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to obtain a probability of a deletion operation occurring in a jth probability distribution column in the kth probability distribution column sequence; and determining the deletion probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probability from the jth-1 prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to calculate, when i=0, an edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on an edit probability of the (m-1) th prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence; when 0<i is less than or equal to s-1, calculating the edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth calibration sequence, the edit probability of the (i-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth probability distribution sequence, and the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i) th prefix of the kth calibration sequence.
Illustratively, the determining module 502 is configured to calculate, when j=0, an edit probability from the jth prefix of the kth probability distribution column sequence to the kth calibration sequence based on an edit probability from the jth prefix of the kth probability distribution column sequence to the s-1 th prefix of the kth calibration sequence; when 0<j is less than or equal to m-1, calculating the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence based on the editing probability from the jth prefix of the kth probability distribution sequence to the jth prefix of the kth calibration sequence, and the editing probability from the jth prefix of the kth probability distribution sequence to the kth calibration sequence.
Illustratively, the optimizing module 503 is configured to calculate a sum of deviations of each probability distribution sequence from the calibration sequence corresponding to each probability distribution sequence, based on editing probabilities from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences; and updating the weight parameters of each layer of the neural network to be trained based on the deviation sum of each probability distribution sequence and the calibration sequence corresponding to each probability distribution sequence.
According to the embodiment of the invention, after n sample data are input into the neural network to be trained, n probability distribution column sequences output by the neural network to be trained are obtained; determining the probability of editing from the kth probability distribution sequence to the kth calibration, wherein the probability of editing is the probability of generating the calibration sequence of the kth sample data from the kth probability distribution sequence through editing operation; when the corresponding positions between the elements of the probability distribution columns and the calibration sequence in the kth probability distribution column sequence are not in one-to-one correspondence and are misplaced, the probability distribution column sequence with small editing probability is indicated to have the probability distribution column which is possibly misplaced in the probability distribution column sequence, and the probability distribution column sequence with large editing probability is indicated to have the probability distribution column which is possibly not misplaced in the probability distribution column sequence, so that the editing probability can accurately estimate the deviation between the probability distribution column sequence and the calibration sequence no matter whether the probability distribution column is misplaced with the elements of the calibration sequence or not, and therefore, the neural network to be trained is optimized based on the editing probability, the error optimization caused by comparing each probability distribution column included in the probability distribution column sequence with each element included in the calibration sequence one by one can be avoided, and the training efficiency and the accuracy of the neural network to be trained can be improved.
Fig. 12 shows a sequence prediction apparatus provided by an embodiment of the present invention, referring to fig. 12, the apparatus 60 includes: an input module 601, an acquisition module 602, and a determination module 603.
The input module 601 is configured to input data to be tested into a neural network, and the neural network is trained by using the training method of the neural network shown in fig. 3 to 7.
An acquisition module 602 is configured to acquire a predicted probability distribution column sequence output by the neural network.
The determining module 603 is configured to determine a predicted sequence corresponding to the data to be measured based on the predicted probability distribution sequence output by the neural network.
Illustratively, the determining module 603 is configured to determine probabilities of generating sequences in the sequence library from the predicted probability distribution sequence by the editing operation; and determining a predicted sequence corresponding to the data to be detected based on the probabilities of each sequence in the sequence library generated by the predicted probability distribution sequence through the editing operation, wherein the predicted sequence corresponding to the data to be detected is a sequence corresponding to the maximum probability in the probabilities of each sequence in the sequence library generated by the predicted probability distribution sequence through the editing operation.
According to the embodiment of the invention, the data to be detected is input into the neural network, and the predicted probability distribution sequence output by the neural network is obtained; the neural network is trained by adopting the training method of the neural network shown in fig. 3-7, the accuracy is higher, the accuracy of the output predicted probability distribution sequence is also higher, and thus, the accuracy of the predicted sequence corresponding to the determined data to be detected is also higher based on the predicted probability distribution sequence output by the neural network.
Fig. 13 shows a training device for a neural network according to an embodiment of the present invention. The training device of the neural network may vary considerably depending on configuration or performance, and in particular, the training device of the neural network may be the computer 1800. The computer 1800 may include a Central Processing Unit (CPU) 1801, a system memory 1804 including a Random Access Memory (RAM) 1802 and a Read Only Memory (ROM) 1803, and a system bus 1805 connecting the system memory 1804 and the central processing unit 1801. Computer 1800 also includes a mass storage device 1807 for storing an operating system 1813, application programs 1814, and other program modules 1815.
The mass storage device 1807 is connected to the central processing unit 1801 through a mass storage controller (not shown) connected to the system bus 1805. The mass storage device 1807 and its associated computer-readable media provide non-volatile storage for the computer 1800. That is, the mass storage device 1807 may include a computer-readable medium (not shown), such as a hard disk or CD-ROM drive.
Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The system memory 1804 and mass storage 1807 described above may be referred to collectively as memory.
According to various embodiments of the invention, the computer 1800 may also operate by a remote computer connected to the network through a network, such as the Internet. I.e., the computer 1800 may connect to the network 1812 through a network interface unit 1811 connected to the system bus 1805, or other types of networks or remote computer systems (not shown), using the network interface unit 1811.
The memory also includes one or more programs, one or more programs stored in the memory and configured for execution by the CPU 1801. The methods shown in fig. 3 to 7 can be implemented when the CPU 1801 executes a program in a memory.
In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory, comprising instructions that are loadable and executable by the central processing unit 1801 of the computer 1800 to perform the methods illustrated in fig. 3-7. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Fig. 14 shows a sequence prediction apparatus provided by an embodiment of the present invention. The sequence prediction device may be configured or configured to vary considerably, and in particular, the sequence prediction device may be the computer 1900. The computer 1900 may include a Central Processing Unit (CPU) 1901, a system memory 1904 including a Random Access Memory (RAM) 1902 and a Read Only Memory (ROM) 1903, and a system bus 1905 connecting the system memory 1904 and the central processing unit 1901. Computer 1900 also includes a mass storage device 1907 for storing an operating system 1913, application programs 1914, and other program modules 1915.
The mass storage device 1907 is connected to the central processing unit 1901 through a mass storage controller (not shown) connected to the system bus 1905. The mass storage device 1907 and its associated computer-readable media provide non-volatile storage for the computer 1900. That is, mass storage device 1907 may include a computer readable medium (not shown), such as a hard disk or CD-ROM drive.
Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The system memory 1904 and mass storage device 1907 described above may be collectively referred to as memory.
Computer 1900 may also operate in accordance with various embodiments of the invention through a network, such as the internet, to a remote computer on the network. That is, the computer 1900 may be connected to the network 1912 through a network interface unit 1911 that is coupled to the system bus 1905, or other types of networks or remote computer systems (not shown) may also be connected using the network interface unit 1911.
The memory also includes one or more programs, one or more programs stored in the memory and configured to be executed by the CPU 1901. The methods shown in fig. 9 to 10 can be implemented when the CPU 1901 executes a program in a memory.
In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory, that includes instructions that are loadable and executable by the central processing unit 1901 of the computer 1900 to perform the methods illustrated in fig. 9-10. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method of acquiring a neural network, the method comprising:
after n sample data selected from a training set database are input into a neural network to be trained, n probability distribution column sequences which are output after the neural network to be trained recognizes and converts the n sample data are obtained, a kth probability distribution column sequence in the n probability distribution column sequences comprises m probability distribution columns, the g probability distribution column in the kth probability distribution column sequences is the probability distribution of the category of the g unit data of the kth sample data in the n sample data, n, k, m and g are integers, k is not less than 1 and not more than n, g is not more than 1 and not more than m, a training set in the training set database comprises sample data and a calibration sequence of the sample data, the sample data comprises pictures or voices, and the calibration sequence is a text sequence;
determining the editing probability from the kth probability distribution sequence to the kth calibration sequence, wherein the editing probability from the kth probability distribution sequence to the kth calibration sequence is that the kth probability distribution sequence is generated into the kth calibration sequence through editing operation, the kth calibration sequence is the calibration sequence of the kth sample data, and the calibration sequence of the kth sample data is a sequence containing elements corresponding to the true category of each unit data in the kth sample data;
And optimizing the neural network to be trained based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences to obtain a neural network for predicting a text sequence corresponding to a picture or voice.
2. The method of claim 1, wherein the determining the edit probabilities of the kth probability distribution column sequence to the kth calibration sequence comprises:
respectively calculating the editing probability from the (m-1) th prefix of the (k) th probability distribution sequence to the (s-1) th prefix of the (k) th calibration sequence, the editing probability from the (k) th probability distribution sequence to the (s-1) th prefix of the (k) th calibration sequence and the editing probability from the (m-1) th prefix of the (k) th probability distribution sequence to the (k) th calibration sequence, wherein the (j) th prefix of the (k) th probability distribution sequence is a distribution sequence formed from the (0) th probability distribution of the (k) th probability distribution sequence to the (j) th probability distribution sequence, the (i) th prefix of the (k) th calibration sequence is a sequence formed from the (0) th element of the (k) th calibration sequence to the (i) th element of the (k) th calibration sequence), j and i are natural numbers (j, j and i are natural numbers (0 is less than or equal to m-1, and 0 i-1); the 0 th probability distribution column of the k probability distribution column sequence is the probability distribution column of the class of the empty unit data, and the 0 th element of the k calibration sequence is an empty element;
Calculating the editing probability from the kth probability distribution sequence to the kth calibration sequence according to the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the (s-1) th prefix of the kth probability distribution sequence, the editing probability from the kth probability distribution sequence to the (s-1) th prefix of the kth calibration sequence and the editing probability from the (m-1) th prefix of the kth probability distribution sequence to the kth calibration sequence.
3. The method of claim 2, wherein said calculating the edit probabilities of the m-1 th prefix of the kth probability distribution column sequence to the s-1 th prefix of the kth calibration sequence comprises:
when j=0 and i=0, the editing probability from the j-th prefix of the kth probability distribution column sequence to the i-th prefix of the kth calibration sequence is equal to 1;
when 0<j is less than or equal to m-1 and i=0, calculating the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence;
when j=0 and 0<i is less than or equal to s-1, calculating the editing probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence according to the editing probability from the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence;
When j is less than or equal to 1 and less than or equal to m-1, and i is less than or equal to 1 and less than or equal to s-1, calculating the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence according to the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence, the edit probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence.
4. A method according to claim 3, wherein said calculating the edit probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence from the edit probability of the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence, the edit probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence, and the edit probability of the jth-1 prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence comprises:
Calculating the retention probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence based on the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-1 th prefix of the k-th calibration sequence;
calculating the insertion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability of the j-th prefix of the k-th probability distribution sequence to the i-1-th prefix of the k-th calibration sequence;
calculating deletion probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the editing probability from the j-1 th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence;
and determining the editing probability from the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence based on the retention probability, the insertion probability and the deletion probability of the j-th prefix of the k-th probability distribution sequence to the i-th prefix of the k-th calibration sequence.
5. The method of claim 4, wherein the calculating the retention probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence comprises:
Acquiring the probability of a retention operation of a jth probability distribution column in the kth probability distribution column sequence, wherein the retention operation of the jth probability distribution column is performed without editing the jth probability distribution column;
acquiring the probability of a category corresponding to the ith element in the kth calibration sequence in the jth probability distribution column;
and determining the retention probability from the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the probability of the occurrence of the retention operation in the jth probability distribution sequence in the kth probability distribution sequence, the probability of the class corresponding to the ith element in the kth calibration sequence in the jth probability distribution sequence, and the editing probability from the jth-1 prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence.
6. The method of claim 4, wherein the calculating the insertion probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the jth prefix of the kth probability distribution sequence to the ith-1 prefix of the kth calibration sequence comprises:
Acquiring the probability of the inserting operation of the jth probability distribution column in the kth probability distribution column sequence;
acquiring the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the jth probability distribution column generates the inserting operation;
and determining the insertion probability from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of the jth probability distribution column in the kth probability distribution column sequence that the insertion operation occurs, the probability of inserting the category corresponding to the ith element in the kth calibration sequence when the insertion operation occurs, and the editing probability from the jth prefix of the kth probability distribution column sequence to the ith-1 prefix of the kth calibration sequence.
7. The method of claim 4, wherein the calculating the deletion probability of the jth prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the jth-1 prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence comprises:
acquiring the probability of deletion operation of a jth probability distribution column in the kth probability distribution column sequence;
Determining deletion probabilities from the jth prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence based on the probability of deletion operation of the jth probability distribution column in the kth probability distribution column sequence and the editing probabilities from the jth-1 prefix of the kth probability distribution column sequence to the ith prefix of the kth calibration sequence.
8. The method of claim 2, wherein said calculating the edit probabilities of the kth probability distribution sequence to the s-1 th prefix of the kth calibration sequence comprises:
when i=0, calculating the editing probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the editing probability of the (m-1) th prefix of the kth probability distribution sequence to the ith prefix of the kth calibration sequence;
when 0<i is less than or equal to s-1, calculating the edit probability of the kth probability distribution sequence to the ith prefix of the kth calibration sequence based on the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth calibration sequence, the edit probability of the (i-1) th prefix of the kth probability distribution sequence to the (i-1) th prefix of the kth probability distribution sequence, and the edit probability of the (m-1) th prefix of the kth probability distribution sequence to the (i) th prefix of the kth calibration sequence.
9. A method of sequence prediction, the method comprising:
inputting data to be tested into a neural network, wherein the neural network is trained by adopting the training method of the neural network according to any one of claims 1-8, the data to be tested comprises pictures or voices, and the neural network is used for predicting text sequences corresponding to the pictures or voices;
acquiring a predicted probability distribution sequence output after the neural network identifies and converts the data to be detected, wherein the predicted probability distribution sequence comprises a plurality of probability distribution sequences, and each probability distribution sequence comprises a plurality of probabilities;
and determining a predicted sequence corresponding to the data to be detected based on the predicted probability distribution sequence output by the neural network, wherein the predicted sequence is a text sequence.
10. A training device for a neural network, the device comprising:
the training set comprises a training set database, an acquisition module, a training set database and a training set database, wherein the training set database is used for storing n sample data selected from the training set database, the n sample data is input into the training set database, the training set database is used for identifying and converting the n sample data, and then the n probability distribution column sequences are output, the kth probability distribution column sequence in the n probability distribution column sequences comprises m probability distribution columns, the g probability distribution column in the kth probability distribution column sequence is the probability distribution of the category of the g unit data of the kth sample data in the n sample data, n, k, m and g are integers, k is not less than 1 and not more than n, g is not less than 1 and not more than m, the training set in the training set database comprises sample data and a calibration sequence of the sample data, the sample data comprises pictures or voices, and the calibration sequence is text;
The determining module is used for determining the editing probability from the kth probability distribution sequence to the kth calibration sequence, wherein the editing probability from the kth probability distribution sequence to the kth calibration sequence is the probability of generating the kth calibration sequence through editing operation, the kth calibration sequence is the calibration sequence of the kth sample data, and the calibration sequence of the kth sample data is a sequence containing elements corresponding to the true category of each unit data in the kth sample data;
and the optimization module is used for optimizing the neural network to be trained based on the editing probability from each probability distribution sequence to the corresponding calibration sequence in the n probability distribution sequences to obtain the neural network for predicting the text sequence corresponding to the picture or the voice.
CN201811258926.5A 2018-10-26 2018-10-26 Training method, training device and sequence prediction method for neural network Active CN111105028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811258926.5A CN111105028B (en) 2018-10-26 2018-10-26 Training method, training device and sequence prediction method for neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811258926.5A CN111105028B (en) 2018-10-26 2018-10-26 Training method, training device and sequence prediction method for neural network

Publications (2)

Publication Number Publication Date
CN111105028A CN111105028A (en) 2020-05-05
CN111105028B true CN111105028B (en) 2023-10-24

Family

ID=70418364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811258926.5A Active CN111105028B (en) 2018-10-26 2018-10-26 Training method, training device and sequence prediction method for neural network

Country Status (1)

Country Link
CN (1) CN111105028B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015003436A1 (en) * 2013-07-10 2015-01-15 Tencent Technology (Shenzhen) Company Limited Method and device for parallel processing in model training
WO2016117854A1 (en) * 2015-01-22 2016-07-28 삼성전자 주식회사 Text editing apparatus and text editing method based on speech signal
US9600764B1 (en) * 2014-06-17 2017-03-21 Amazon Technologies, Inc. Markov-based sequence tagging using neural networks
CN107392311A (en) * 2016-05-17 2017-11-24 阿里巴巴集团控股有限公司 The method and apparatus of sequence cutting
CN108229474A (en) * 2017-12-29 2018-06-29 北京旷视科技有限公司 Licence plate recognition method, device and electronic equipment
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108596011A (en) * 2017-12-29 2018-09-28 中国电子科技集团公司信息科学研究院 A kind of face character recognition methods and device based on combined depth network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017160393A1 (en) * 2016-03-18 2017-09-21 Google Inc. Globally normalized neural networks
US10983853B2 (en) * 2017-03-31 2021-04-20 Microsoft Technology Licensing, Llc Machine learning for input fuzzing
US10776697B2 (en) * 2017-04-18 2020-09-15 Huawei Technologies Co., Ltd. System and method for training a neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015003436A1 (en) * 2013-07-10 2015-01-15 Tencent Technology (Shenzhen) Company Limited Method and device for parallel processing in model training
US9600764B1 (en) * 2014-06-17 2017-03-21 Amazon Technologies, Inc. Markov-based sequence tagging using neural networks
WO2016117854A1 (en) * 2015-01-22 2016-07-28 삼성전자 주식회사 Text editing apparatus and text editing method based on speech signal
CN107392311A (en) * 2016-05-17 2017-11-24 阿里巴巴集团控股有限公司 The method and apparatus of sequence cutting
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108229474A (en) * 2017-12-29 2018-06-29 北京旷视科技有限公司 Licence plate recognition method, device and electronic equipment
CN108596011A (en) * 2017-12-29 2018-09-28 中国电子科技集团公司信息科学研究院 A kind of face character recognition methods and device based on combined depth network

Also Published As

Publication number Publication date
CN111105028A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN110418210B (en) Video description generation method based on bidirectional cyclic neural network and depth output
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
CN105205448A (en) Character recognition model training method based on deep learning and recognition method thereof
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
Mathur et al. Camera2Caption: a real-time image caption generator
CN111709242B (en) Chinese punctuation mark adding method based on named entity recognition
EP4211591A1 (en) Method and system for identifying citations within regulatory content
CN107273357B (en) Artificial intelligence-based word segmentation model correction method, device, equipment and medium
CN113657098B (en) Text error correction method, device, equipment and storage medium
CN111400494A (en) Sentiment analysis method based on GCN-Attention
WO2023093525A1 (en) Model training method, chinese text error correction method, electronic device, and storage medium
CN111931813A (en) CNN-based width learning classification method
CN115510864A (en) Chinese crop disease and pest named entity recognition method fused with domain dictionary
CN114359938A (en) Form identification method and device
CN110516240B (en) Semantic similarity calculation model DSSM (direct sequence spread spectrum) technology based on Transformer
CN110059705B (en) OCR recognition result judgment method and device based on modeling
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN111105028B (en) Training method, training device and sequence prediction method for neural network
CN109753966A (en) A kind of Text region training system and method
CN111507103B (en) Self-training neural network word segmentation model using partial label set
CN112131879A (en) Relationship extraction system, method and device
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN107730002B (en) Intelligent fuzzy comparison method for remote control parameters of communication gateway machine
CN103793720A (en) Method and system for positioning eyes
CN109271494B (en) System for automatically extracting focus of Chinese question and answer sentences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant