CN111339755A - Automatic error correction method and device for office data - Google Patents

Automatic error correction method and device for office data Download PDF

Info

Publication number
CN111339755A
CN111339755A CN201811458543.2A CN201811458543A CN111339755A CN 111339755 A CN111339755 A CN 111339755A CN 201811458543 A CN201811458543 A CN 201811458543A CN 111339755 A CN111339755 A CN 111339755A
Authority
CN
China
Prior art keywords
office data
data
error
target
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811458543.2A
Other languages
Chinese (zh)
Inventor
邢彪
张卷卷
凌啼
章淑敏
吕吉
林昊
姜晓辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201811458543.2A priority Critical patent/CN111339755A/en
Publication of CN111339755A publication Critical patent/CN111339755A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides an automatic error correction method and a device for office data, wherein the method comprises the following steps: acquiring target office data, and preprocessing the target office data, including text cleaning and text serialization of the target office data; inputting the text serialized target office data into a trained office data error corrector, and outputting office data obtained after error correction is carried out on the target office data; judging whether the error-corrected office data is consistent with the target office data or not, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data; the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the converged model weight is obtained after the model is converged. The invention can realize automatic error correction of the office data, avoid the occurrence of network element faults to the maximum extent and ensure the safety, stability and reliability of office data manufacture.

Description

Automatic error correction method and device for office data
Technical Field
The embodiment of the invention relates to the technical field of communication and artificial intelligence, in particular to an automatic office data error correction method and device.
Background
The bureau data (bureau data) refers to data on various networks and network element devices in the core network domain of the communication network, and may include configuration data and service data of the devices themselves. Communication equipment on an end office, such as a switch, a gateway and the like, can normally communicate with other office communication equipment after office data is configured. Therefore, the accurate and complete data making of the core network has a crucial meaning for the smooth operation of the core network end office equipment.
In the prior art, the accuracy of office data is ensured mainly by a mechanism of one person making and one person checking, that is, whether office data making is correct or not is judged mainly by means of expert experience and manual checking, and such a manner relying on manual or expert experience not only wastes time and labor, but also has risks of missed judgment and erroneous judgment. With the increasing of the network evolution speed and complexity, the complexity of office data production is also increasing, and the prior art cannot meet the requirements of security, stability and reliability of office data production.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides an automatic error correction method and device for office data.
The embodiment of the invention provides an automatic error correction method for office data, which comprises the following steps:
acquiring target office data, and preprocessing the target office data, wherein the preprocessing comprises text cleaning of the target office data and text serialization of the target office data after the text cleaning;
inputting the text serialized target office data into a trained office data error corrector, and outputting office data obtained after error correction is carried out on the target office data;
judging whether the error-corrected office data is consistent with the target office data, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data;
the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
The embodiment of the invention provides an automatic error correction device for office data, which comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring target office data and preprocessing the target office data, and the first acquisition module is used for performing text cleaning on the target office data and performing text serialization on the target office data after the text cleaning;
the error correction module is used for inputting the text serialized target office data into a trained office data error corrector and outputting office data obtained after error correction is carried out on the target office data;
the judging module is used for judging whether the error-corrected office data is consistent with the target office data or not, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data;
the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
An embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented as described above.
The method and the device for automatically correcting the local data, provided by the embodiment of the invention, pre-process the target local data by acquiring the target local data, comprise the steps of performing text serialization on the target local data after text cleaning and text cleaning, inputting the text serialized target local data into a trained local data corrector, outputting the local data obtained after correcting the error of the target local data, judging whether the corrected local data is consistent with the target local data, if not, determining that the target local data has errors, and replacing the target local data with the corrected local data, wherein the trained local data corrector is used for training a power-on-demand mechanism coding and decoding circular neural network until a model converges to obtain converged model weight based on the acquired potential error local data and corresponding correct local data in historical network element operation, therefore, automatic error correction of office data can be realized, not only can wrong office data be identified, but also corresponding correct office data can be output, the occurrence of network element faults is avoided to the greatest extent, and the safety, stability and reliability of office data manufacturing are guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an automatic error correction method for office data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an attention mechanism provided by an embodiment of the present invention;
FIG. 3 is a diagram of a neural network model provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for automatically error correcting office data according to an embodiment of the present invention;
fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a schematic flow chart of an office data automatic error correction method according to an embodiment of the present invention, and as shown in fig. 1, the office data automatic error correction method according to the embodiment includes:
s1, acquiring target office data, preprocessing the target office data, and performing text serialization on the target office data after text cleaning and text cleaning.
And S2, inputting the text serialized target office data into a trained office data error corrector, and outputting office data obtained by correcting the error of the target office data.
The trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
S3, judging whether the error-corrected office data is consistent with the target office data, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data.
The method for automatically correcting the local data comprises the steps of obtaining target local data, preprocessing the target local data, performing text serialization on the target local data after text cleaning and text cleaning, inputting the text serialized target local data into a trained local data corrector, outputting local data obtained after error correction on the target local data, judging whether the local data after error correction is consistent with the target local data, if not, determining that the target local data has errors, and replacing the target local data with the local data after error correction, wherein the trained local data corrector trains an attention-paying system encoding and decoding cyclic neural network until a model converges to obtain converged model weight based on the obtained potential error local data in historical network element operation and the corresponding correct local data, so that, the method can realize automatic error correction of the office data, not only can identify the error office data, but also can output the corresponding correct office data, thereby avoiding the occurrence of network element faults to the maximum extent and ensuring the safety, stability and reliability of office data manufacture.
Further, on the basis of the above embodiment, after the step S3 of determining whether the error-corrected office data is consistent with the target office data, the method further includes:
and if the error-corrected office data is consistent with the target office data, determining that the target office data is correct.
Further, on the basis of the foregoing embodiment, in step S1, "performing text cleansing on the target office data and performing text serialization on the target office data after the text cleansing" in this embodiment may include:
and cleaning the text of the target office data, filtering all punctuations in the target office data, unifying capital and lowercase letters, converting capital letters into lowercase letters, performing text word segmentation, and converting each word after the text word segmentation into an integer sequence.
Therefore, the text of the target office data after text cleaning and text cleaning can be serialized.
Further, on the basis of the above embodiment, before the step S2, the method of the present embodiment may further include steps P1-P3 not shown in the figure:
p1, obtaining potential error office data and corresponding correct office data in the historical network element operation from the core network side.
And P2, preprocessing the potential error office data and the corresponding correct office data in the acquired historical network element operation, including performing text cleaning on the acquired potential error office data and the corresponding correct office data in the historical network element operation and performing text serialization on the potential error office data and the corresponding correct office data after the text cleaning.
Further, the step may include: and performing text cleaning on the obtained potential error office data and the corresponding correct office data in the historical network element operation, filtering all punctuations in the obtained potential error office data and the corresponding correct office data in the historical network element operation, unifying capital and lowercase letters, converting capital letters into lowercase letters, performing text word segmentation, and converting each word after the text word segmentation into an integer sequence.
Specifically, the potential error office data may be represented as:
Figure BDA0001888208850000051
wherein xN pIs the Nth word of the input office data, N is a positive integer;
the corresponding error corrected office data may be expressed as:
Figure BDA0001888208850000052
wherein xM cIs the Mth word of the output office data, M is a positive integer.
Specifically, the maximum text length in the data set of the potential error bureau may be taken as N ═ bureaus data _ length, the dictionary size in the data set of the potential error bureau may be taken as bureaus data _ vocab _ size, the maximum text length in the data set of the error-corrected bureau may be taken as M ═ correteddburedata _ length, and the dictionary size in the data set of the error-corrected bureau may be taken as correteddburedata _ vocab _ size; and dividing the total data set into a training set and a testing set, wherein 90% of the total data set is divided into the training set, and 10% of the total data set is divided into the testing set. The training set is used to train the model and the test set is used to test the model. Each of the incoming potentially erroneous office data and the outgoing error corrected office data needs to be encoded as integers and the sequence that does not reach the maximum text length is automatically zero padded. Word embedding (word embedding) is used for the input sequence, and one hot encoding (one hot encoding) is used for the output sequence. Text segmentation is realized through a Tokenizer class (keras. preprocessing. text. Tokenizer) in keras, and texts are serialized, and each word in the potentially wrong office data and the corrected office data texts is mapped into an integer sequence. For example: [ "execute": 40, "error": 105, "info": 8, "update": 278, "on": 89, "agent": 164, "modify": 59, "the": 21, "interleaved": 303, "command": 231,...].
And P3, training the attention mechanism coding and decoding circular neural network based on the potential error office data after text serialization and the corresponding correct office data until the model converges, and taking the converged model weight as a trained office data error corrector.
It is understood that the present embodiment employs an attention-based codec recurrent neural network, in which:
the coding and decoding circular neural network refers to a model adopting an encoder-decoder (encoder-decoder) structure, and the type of the neural network adopts a circular neural network (recurrent neural network). Coder-decoder models have been used in the field of machine translation, i.e. translation from one source language to another target language. The embodiment uses this idea for reference, a coding/decoding model is used in the local data error correction scenario, an encoder is used to encode the local data with potential errors into a fixed-length vector, i.e. a context vector (context vector), and a decoder is used to generate correct local data from the encoded context vector. The encoder adopts a bidirectional long-short term memory neural network, the decoder adopts a long-short term memory neural network, and the bidirectional long-short term memory neural network and the long-short term memory neural network both belong to the type of a recurrent neural network.
The long-short term memory (LSTM) is a special type of RNN recurrent neural network, i.e. the same neural network is reused. The LSTM can learn long-term dependency information, and by controlling the time for which values in the cache are stored, long-term information can be remembered, which is suitable for long-sequence learning. Each neuron has four inputs and one output, there is a Cell (neuron) in each neuron that stores the memorized value, and each LSTM neuron has three gates: forget gate, input gate, output gate. The long-short term memory neural network has a better effect on the learning of long sequences.
The Bidirectional long-short term memory (BilSTM) is a variant of the long-short term memory, the model can learn from two directions, the data of an input layer can be calculated in the forward direction and the backward direction, finally the output hidden state is concatated and then used as the input of the next layer, the principle is similar to the LSTM, and only the Bidirectional calculation and concatation process are added. The core of the BilSTM network structure is that a common unidirectional RNN (recurrent neural network) is divided into two directions, namely a forward direction with a time sequence and a reverse direction with a reverse time sequence. And outputting the current time node, and simultaneously utilizing the information in the forward direction and the reverse direction. The two RNNs in different directions do not share a state, the forward RNN output state is only transmitted to the forward RNN, the reverse RNN output state is only transmitted to the reverse RNN, and the forward RNN and the reverse RNN are not directly connected. Each time node inputs and respectively transmits the input to the forward RNN and the reverse RNN, outputs are generated according to respective states, and the two outputs are connected to the Bi-RNN output node together to jointly synthesize a final output.
Attention (attention) mechanism solves the limitation of the codec structure that the performance is poor when the input or output sequence is long, first paying attention to the fact that the encoder in the mechanism model passes all hidden states to the decoder, so that the richer context obtained from the encoder is provided to the decoder, and simultaneously paying attention to the learning mechanism that the decoder can learn where in the richer context it needs to focus when predicting the sequence output at each time step. The attention decoder performs an additional step before outputting the sequence, and in order to focus attention on the part of the input that is most relevant to the decoder output sequence, the decoder performs the following: looking at all hidden states received from the encoders, each of which is most relevant to a word in the input sentence, assigning a score to each hidden state, multiplying each hidden state by its score after softmax, thus expanding hidden states with high scores and contracting hidden states with low scores. This scoring operation is performed at every time step at the decoder side. The attention mechanism allows the model to focus on relevant parts of the input sequence as required, and the attention network assigns each input an attention weight that is closer to 1 if the input is more relevant to the current operation and to 0 otherwise, and these attention weights are recalculated at each output step. The attention mechanism may be referred to in fig. 2, wherein,
Txthe number of time steps is input; t isyThe number of the time step is output; attentioniTo output the attention weight at time step i, attentionTyIs at the T thyAttention weight of each output time step; xTxIs the T thxInputting time; y isTyIs the T thyOutputting time; c. Ci: context at output time step i;
1) calculating attention weight with weight length of TxThe sum of all weights is 1:
attentioni=softmax(Dense(x,yi-1))
2) the sum of the products of the attention weight and the input is calculated, the result being the context:
Figure BDA0001888208850000071
3) the resulting context is input into the long-short term memory neural layer of the model decoder:
yi=LSTM(ci)。
an attention mechanism bidirectional long-short term memory neural network is built and trained through open source deep learning frameworks tensorflow and keras. The neural network model of the present embodiment includes two parts, namely an encoder and a decoder, and the model can refer to fig. 3.
Taking a bell alcatel HSS (Home Subscriber Server) network element as an example:
inputting office data: (modifying a GT address)
MODIFY-SCCP-GLBTITL:GTAR=E164_INTAL_TT1,DN=K′861729349,NES=HZHSTP.
Outputting the error-corrected office data:
MODIFY-SCCP-GLBTITL:GTAR=E164_INTAL_TT0,DN=K′861729349,NES=HZHSTP.
the error correction part is: GTAR ═ E164_ INTAL _ TT1 should be corrected to GTAR ═ E164_ INTAL _ TT0
The model consists of the following details:
the first layer is an input layer: the encoded office data is input, the length of the encoding sequence of each office data is BureauData _ length, and thus the shape of the output data of the layer is (None).
The second layer is an embedding layer (embedding): each word is converted into a vector by word embedding (word embedding), the dimension of input data is BureaUData _ vocab _ size, the output is set as a space vector which needs to convert the word into 128 dimensions, the length of the input sequence is BureaUData _ length, and therefore the shape of the output data of the layer is (None, BureaUData _ length, 128). The role of this layer is to vector map (wordempiddings) words in the office data, converting the integer sequence of each word in the instruction text output from Tokenizer into a fixed-shaped 128-dimensional vector.
The third layer is a BilSTM coding layer: 64 BilSTM neurons are contained, the activation function is set to 'relu', and the shape of the output data of the layer is (None, BureauData _ length, 64);
the fourth layer is the attention LSTM decoding layer: there are 128 attention mechanism LSTM neurons and the activation function is set to "relu". The shape of the layer of output data is (None, BureaUData _ length, 128);
fifth fully connected (sense) layer (output layer): the number of the neurons containing the Dense full connection is CorrectBeureData _ vocab _ size, the activation function is set as 'softmax', and the result is output by the softmax and sent to the multi-class cross entropy loss function. The shape of the layer output data is (None, correct metadata _ vocab _ size). The output shape of the attention decoding layer is converted into the dimension of the final output.
Model training: the training round number is set to 1000(epochs is 1000), the batch size is set to 100(batch _ size is 100), the coordinated cross entropy is selected as the loss function, namely the objective function (mass is 'coordinated cross entropy'), and the gradient descent optimization algorithm selects the adam optimizer for improving the learning speed of the traditional gradient descent. The neural network can find the optimal weight value which enables the target function to be minimum through gradient descent, and the neural network can learn the weight value automatically through training. Training is performed with a training set so that the smaller the objective function, the better, and the validation model is evaluated with a test set after each round of training. And with the increase of the number of the training rounds, the training error is gradually reduced, the model is gradually converged, the converged model is tested on a test set, and finally the weight of the model is derived.
After the model weight is derived, the saved model weight is directly loaded when the online office data error correction is needed. After the preprocessing is finished, the serialized local data text is input into a trained local data error corrector (namely a trained attention mechanism bidirectional long-short term memory neural network model), and the data passes through 1 input layer, 3 hidden layers (1 embedded layer, 1 BilSt coding layer, 1 attention LSTM decoding layer) and 1 output layer (a sense full connection layer). And finally, outputting error-corrected office data according to the office data input by the user in real time, wherein if the error-corrected office data is consistent with the input office data, the office data is correct, and if the error-corrected office data is inconsistent, the input office data is replaced by the error-corrected office data.
The method for automatically correcting the office data provided by the embodiment of the invention can automatically correct the office data, not only can identify the office data with errors, but also can output the corresponding correct office data. In this embodiment, the converged model weight is used as a trained local data error corrector, the advantages of the uni-directional long and short term memory neural network in text learning and the attention mechanism can be focused on relevant parts in an input sequence as required, error-corrected local data are output in real time one by one according to each piece of local data input by a user, if the error-corrected local data is consistent with the input local data, the local data is correct, and if the error-corrected local data is inconsistent with the input local data, the input local data is replaced by the error-corrected local data, so that the occurrence of network element faults can be avoided to the greatest extent, and the safety, stability and reliability of local data production can be guaranteed.
Fig. 4 is a schematic structural diagram of an office data automatic error correction apparatus according to an embodiment of the present invention, and as shown in fig. 4, the office data automatic error correction apparatus according to the embodiment includes: a first obtaining module 41, an error correcting module 42 and a judging module 43; wherein:
the first obtaining module 41 is configured to obtain target office data, and perform preprocessing on the target office data, including performing text serialization on the target office data after performing text cleaning and text cleaning on the target office data;
the error correction module 42 is configured to input the text-serialized target office data into a trained office data error corrector, and output office data obtained by performing error correction on the target office data;
the judging module 43 is configured to judge whether the error-corrected office data is consistent with the target office data, and if not, determine that the target office data is incorrect, and replace the target office data with the error-corrected office data;
the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
Specifically, the first obtaining module 41 is configured to obtain target office data, and perform preprocessing on the target office data, including performing text cleaning on the target office data and performing text serialization on the target office data after the text cleaning; the error correction module 42 inputs the text serialized target office data into a trained office data error corrector, and outputs office data obtained by correcting the error of the target office data; the judging module 43 judges whether the error-corrected office data is consistent with the target office data, if not, it determines that the target office data is wrong, and replaces the target office data with the error-corrected office data; the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
The automatic office data error correction device provided by the embodiment of the invention can realize automatic error correction of office data, not only can identify the error office data, but also can output the corresponding correct office data, thereby avoiding the occurrence of network element faults to the greatest extent and ensuring the safety, stability and reliability of office data production.
Further, on the basis of the above embodiment, the determining module may be further configured to
After judging whether the error-corrected office data is consistent with the target office data or not, if the error-corrected office data is consistent with the target office data, determining that the target office data is correct.
Further, on the basis of the foregoing embodiment, in this embodiment, the "performing text serialization on the target office data after performing text cleaning and text cleaning on the target office data" in the first obtaining module 41 may include:
and cleaning the text of the target office data, filtering all punctuations in the target office data, unifying capital and lowercase letters, converting capital letters into lowercase letters, performing text word segmentation, and converting each word after the text word segmentation into an integer sequence.
Therefore, the text of the target office data after text cleaning and text cleaning can be serialized.
Further, on the basis of the above embodiments, the apparatus method according to this embodiment may further include, not shown in the drawings:
a second obtaining module, configured to obtain, from a core network side, potentially erroneous office data and corresponding correct office data in historical network element operations;
the preprocessing module is used for preprocessing the potential error office data and the corresponding correct office data in the historical network element operation, and comprises the steps of performing text serialization on the potential error office data and the corresponding correct office data after text cleaning and text cleaning are performed on the obtained potential error office data and the corresponding correct office data in the historical network element operation;
and the training module is used for training the attention mechanism coding and decoding cyclic neural network based on the potential error office data after text serialization and the corresponding correct office data until the model converges, and taking the converged model weight as a trained office data error corrector.
Further, the preprocessing module may be specifically configured to perform text cleaning on the obtained potential wrong office data and the corresponding correct office data in the historical network element operation, filter out all punctuation marks in the obtained potential wrong office data and the corresponding correct office data in the historical network element operation, unify capital and lowercase letters, convert capital letters into lowercase letters, perform text word segmentation, and convert each word after the text word segmentation into an integer sequence.
The automatic office data error correction device provided by the embodiment of the invention can realize automatic error correction of office data, not only can identify the office data with errors, but also can output the corresponding correct office data. In this embodiment, the converged model weight is used as a trained local data error corrector, the advantages of the uni-directional long and short term memory neural network in text learning and the attention mechanism can be focused on relevant parts in an input sequence as required, error-corrected local data are output in real time one by one according to each piece of local data input by a user, if the error-corrected local data is consistent with the input local data, the local data is correct, and if the error-corrected local data is inconsistent with the input local data, the input local data is replaced by the error-corrected local data, so that the occurrence of network element faults can be avoided to the greatest extent, and the safety, stability and reliability of local data production can be guaranteed.
The automatic error correction device for office data provided by the embodiment of the present invention may be used to implement the technical solutions of the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and are not described herein again.
Fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 5, the electronic device may include a memory 502, a processor 501, and a computer program stored in the memory 502 and executable on the processor 501, where the processor 501 implements the steps of the method when executing the program, for example, the method includes: acquiring target office data, and preprocessing the target office data, wherein the preprocessing comprises text cleaning of the target office data and text serialization of the target office data after the text cleaning; inputting the text serialized target office data into a trained office data error corrector, and outputting office data obtained after error correction is carried out on the target office data; judging whether the error-corrected office data is consistent with the target office data, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data; the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
An embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above method, for example, including: acquiring target office data, and preprocessing the target office data, wherein the preprocessing comprises text cleaning of the target office data and text serialization of the target office data after the text cleaning; inputting the text serialized target office data into a trained office data error corrector, and outputting office data obtained after error correction is carried out on the target office data; judging whether the error-corrected office data is consistent with the target office data, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data; the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An automatic error correction method for office data, comprising:
acquiring target office data, and preprocessing the target office data, wherein the preprocessing comprises text cleaning of the target office data and text serialization of the target office data after the text cleaning;
inputting the text serialized target office data into a trained office data error corrector, and outputting office data obtained after error correction is carried out on the target office data;
judging whether the error-corrected office data is consistent with the target office data, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data;
the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
2. The method of claim 1, wherein the text-washing and text-washing the target office data for text serialization comprises:
and cleaning the text of the target office data, filtering all punctuations in the target office data, unifying capital and lowercase letters, converting capital letters into lowercase letters, performing text word segmentation, and converting each word after the text word segmentation into an integer sequence.
3. The method of claim 1, wherein before inputting the text-serialized target local data into an attention-based codec recurrent neural network model and outputting local data obtained by error correction of the target local data, the method further comprises:
acquiring potential error office data and corresponding correct office data in historical network element operation from a core network side;
preprocessing the potential error office data and the corresponding correct office data in the acquired historical network element operation, wherein the preprocessing comprises the steps of performing text cleaning on the acquired potential error office data and the corresponding correct office data in the historical network element operation and performing text serialization on the acquired potential error office data and the corresponding correct office data after the text cleaning;
training the attention mechanism coding and decoding circular neural network until the model converges based on the potential error local data after text serialization and the corresponding correct local data, and taking the converged model weight as a trained local data error corrector.
4. The method of claim 3, wherein the preprocessing the potentially wrong office data and the corresponding correct office data in the obtained historical network element operation comprises text-serializing the potentially wrong office data and the corresponding correct office data after text-washing and text-washing the potentially wrong office data and the corresponding correct office data in the obtained historical network element operation, and comprises:
and performing text cleaning on the obtained potential error office data and the corresponding correct office data in the historical network element operation, filtering all punctuations in the obtained potential error office data and the corresponding correct office data in the historical network element operation, unifying capital and lowercase letters, converting capital letters into lowercase letters, performing text word segmentation, and converting each word after the text word segmentation into an integer sequence.
5. The method of claim 1, wherein after determining whether the error-corrected office data is consistent with the target office data, the method further comprises:
and if the error-corrected office data is consistent with the target office data, determining that the target office data is correct.
6. An apparatus for automatically correcting error in office data, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring target office data and preprocessing the target office data, and the first acquisition module is used for performing text cleaning on the target office data and performing text serialization on the target office data after the text cleaning;
the error correction module is used for inputting the text serialized target office data into a trained office data error corrector and outputting office data obtained after error correction is carried out on the target office data;
the judging module is used for judging whether the error-corrected office data is consistent with the target office data or not, if not, determining that the target office data is wrong, and replacing the target office data with the error-corrected office data;
the trained local data error corrector trains the attention mechanism coding and decoding cyclic neural network based on the acquired potential error local data in the historical network element operation and the corresponding correct local data until the model converges to obtain converged model weight.
7. The apparatus of claim 6, further comprising:
a second obtaining module, configured to obtain, from a core network side, potentially erroneous office data and corresponding correct office data in historical network element operations;
the preprocessing module is used for preprocessing the potential error office data and the corresponding correct office data in the historical network element operation, and comprises the steps of performing text serialization on the potential error office data and the corresponding correct office data after text cleaning and text cleaning are performed on the obtained potential error office data and the corresponding correct office data in the historical network element operation;
and the training module is used for training the attention mechanism coding and decoding cyclic neural network based on the potential error office data after text serialization and the corresponding correct office data until the model converges, and taking the converged model weight as a trained office data error corrector.
8. The apparatus of claim 6, wherein the determining module is further configured to determine the determined amount of the received signal
After judging whether the error-corrected office data is consistent with the target office data or not, if the error-corrected office data is consistent with the target office data, determining that the target office data is correct.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 5 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN201811458543.2A 2018-11-30 2018-11-30 Automatic error correction method and device for office data Pending CN111339755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811458543.2A CN111339755A (en) 2018-11-30 2018-11-30 Automatic error correction method and device for office data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811458543.2A CN111339755A (en) 2018-11-30 2018-11-30 Automatic error correction method and device for office data

Publications (1)

Publication Number Publication Date
CN111339755A true CN111339755A (en) 2020-06-26

Family

ID=71185111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811458543.2A Pending CN111339755A (en) 2018-11-30 2018-11-30 Automatic error correction method and device for office data

Country Status (1)

Country Link
CN (1) CN111339755A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255332A (en) * 2021-07-15 2021-08-13 北京百度网讯科技有限公司 Training and text error correction method and device for text error correction model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 Chinese text automatic correction method
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN106598939A (en) * 2016-10-21 2017-04-26 北京三快在线科技有限公司 Method and device for text error correction, server and storage medium
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN107451106A (en) * 2017-07-26 2017-12-08 阿里巴巴集团控股有限公司 Text method and device for correcting, electronic equipment
CN107977356A (en) * 2017-11-21 2018-05-01 新疆科大讯飞信息科技有限责任公司 Method and device for correcting recognized text
CN108052499A (en) * 2017-11-20 2018-05-18 北京百度网讯科技有限公司 Text error correction method, device and computer-readable medium based on artificial intelligence
CN108108349A (en) * 2017-11-20 2018-06-01 北京百度网讯科技有限公司 Long text error correction method, device and computer-readable medium based on artificial intelligence
CN108874174A (en) * 2018-05-29 2018-11-23 腾讯科技(深圳)有限公司 A kind of text error correction method, device and relevant device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 Chinese text automatic correction method
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN106598939A (en) * 2016-10-21 2017-04-26 北京三快在线科技有限公司 Method and device for text error correction, server and storage medium
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN107451106A (en) * 2017-07-26 2017-12-08 阿里巴巴集团控股有限公司 Text method and device for correcting, electronic equipment
CN108052499A (en) * 2017-11-20 2018-05-18 北京百度网讯科技有限公司 Text error correction method, device and computer-readable medium based on artificial intelligence
CN108108349A (en) * 2017-11-20 2018-06-01 北京百度网讯科技有限公司 Long text error correction method, device and computer-readable medium based on artificial intelligence
CN107977356A (en) * 2017-11-21 2018-05-01 新疆科大讯飞信息科技有限责任公司 Method and device for correcting recognized text
CN108874174A (en) * 2018-05-29 2018-11-23 腾讯科技(深圳)有限公司 A kind of text error correction method, device and relevant device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AHMADI, SINA: "Attention-based encoder-decoder networks for spelling and grammatical error correction.", pages 11 - 67 *
何之源: "21个项目玩转深度学习:基于TensorFlow的实践详解", vol. 1, 北京:电子工业出版社, pages: 291 - 294 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255332A (en) * 2021-07-15 2021-08-13 北京百度网讯科技有限公司 Training and text error correction method and device for text error correction model
CN113255332B (en) * 2021-07-15 2021-12-24 北京百度网讯科技有限公司 Training and text error correction method and device for text error correction model

Similar Documents

Publication Publication Date Title
JP7171367B2 (en) Language processing method and device
CN108052512B (en) Image description generation method based on depth attention mechanism
US11379723B2 (en) Method and apparatus for compressing neural network
CN111444311A (en) Semantic understanding model training method and device, computer equipment and storage medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
US11651578B2 (en) End-to-end modelling method and system
CN110825857A (en) Multi-turn question and answer identification method and device, computer equipment and storage medium
CN108959388B (en) Information generation method and device
CN107679225B (en) Reply generation method based on keywords
CN108879732B (en) Transient stability evaluation method and device for power system
CN111539199B (en) Text error correction method, device, terminal and storage medium
CN110286778A (en) Chinese deep learning input method and device and electronic equipment
CN111144140B (en) Zhongtai bilingual corpus generation method and device based on zero-order learning
CN112395861A (en) Method and device for correcting Chinese text and computer equipment
WO2019163718A1 (en) Learning device, speech recognition order estimation device, methods therefor, and program
CN112990464B (en) Knowledge tracking method and system
CN107590139A (en) A kind of knowledge mapping based on circular matrix translation represents learning method
EP3525107A1 (en) Conversational agent
CN113111190A (en) Knowledge-driven dialog generation method and device
CN114048301B (en) Satisfaction-based user simulation method and system
CN111339755A (en) Automatic error correction method and device for office data
CN112214592B (en) Method for training reply dialogue scoring model, dialogue reply method and device thereof
CN113904915A (en) Intelligent power communication fault analysis method and system based on Internet of things
CN113779190A (en) Event cause and effect relationship identification method and device, electronic equipment and storage medium
JP2021089655A (en) Learning model construction device, method for constructing learning model, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626