CN110377889A

CN110377889A - A kind of method for editing text and system based on feedforward sequence Memory Neural Networks

Info

Publication number: CN110377889A
Application number: CN201910487145.1A
Authority: CN
Inventors: 吴立刚; 刘迪; 邱镇; 黄晓光; 浦正国; 梁翀; 韩涛; 张天奇; 余江斌; 宋杰; 何东; 郭庆; 吴小华; 胡心颖; 周伟
Original assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; Anhui Jiyuan Software Co Ltd; National Network Information and Communication Industry Group Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; Anhui Jiyuan Software Co Ltd; National Network Information and Communication Industry Group Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2019-10-25
Anticipated expiration: 2039-06-05
Also published as: CN110377889B

Abstract

The invention discloses a kind of method for editing text based on feedforward sequence Memory Neural Networks, belong to speech signal processing technology, comprising: obtain urtext to be edited；Receive editor's voice data；Editor's voice data is used and carries out speech recognition based on improved feedforward sequence Memory Neural Networks, obtains edit commands；Semantic understanding is carried out to the edit commands, executes the edit commands.The exemplary technical solution of the present invention carries out speech recognition using based on improved feedforward sequence Memory Neural Networks, and text editing is more acurrate efficiently.

Description

A kind of method for editing text and system based on feedforward sequence Memory Neural Networks

Technical field

The invention belongs to speech signal processing technologies, specifically a kind of to be based on feedforward sequence Memory Neural Networks Method for editing text and system.

Background technique

With popularizing for mobile phone, people can receive largely on the portable devices such as mobile phone or tablet computer daily Text information.For example, message, web page contents and text news etc. that short message, instant messaging class software or other software push.When When people want to edit word content interested in text information, it is necessary first to position a cursor over interested text At word content, then subsequent operation carried out to the text chosen, such as increases text newly in cursor position, the text chosen is carried out Replacement operation etc., editing process are complicated, inconvenient.Having technology at present is the voice data for receiving user's typing, further according to voice Data execute corresponding edit operation to edit object.In this way, user is when carrying out text editing, it not only can be directly fast Edit object in the selected text of speed, chooses operation without complex text, user can also be directly realized by by voice input To the editor of edit object, text editing process is simplified.But operation is directly executed after current reception voice data, it is not right Any processing of voice, under some far fields and the stronger situation of noise jamming, the performance of speech recognition system is not ideal enough, Cause text editing inaccurate.

Summary of the invention

In order to solve above-mentioned deficiency in the prior art, the purpose of the present invention is to provide one kind to be remembered based on feedforward sequence The method for editing text of neural network carries out speech recognition using based on improved feedforward sequence Memory Neural Networks, and text is compiled It collects more acurrate efficient.

In order to solve the above-mentioned technical problem, the present invention adopts the following technical scheme:

On the one hand, the present invention provides a kind of method for editing text based on feedforward sequence Memory Neural Networks, specific to walk Suddenly are as follows:

S1: urtext to be edited is obtained；

S2: editor's voice data is received；

S3: using editor's voice data and carry out speech recognition based on improved feedforward sequence Memory Neural Networks, Obtain edit commands；

S4: semantic understanding is carried out to the edit commands, executes the edit commands.

It is further preferred that the improved feedforward sequence Memory Neural Networks are in the hidden of full Connection Neural Network that feedover The linear projection layer of low dimensional is inserted between layer, by memory module equipment on the linear projection layer, in the adjacent memory Module addition jumps connection, so that high-rise memory module can directly be tired out and be added to the output of low layer memory module.

It is further preferred that the memory module is tapped delay structure by current time and the hidden layer at moment exports before A fixed expression is obtained by one group of coefficient coding.

It is further preferred that the operation of the memory module uses the coding based on scalar or vector.

It is further preferred that the coding of the memory module introduces the stride factor.

On the other hand, the present invention also provides a kind of text editing system based on feedforward sequence Memory Neural Networks, packets It includes:

Acquisition unit, the configuration urtext to be edited with acquisition；

Receiving unit；It is configured to receive editor's voice data；

Recognition unit is configured to use based on improved feedforward sequence Memory Neural Networks editor's voice data Speech recognition is carried out, edit commands is obtained；

Output unit is configured to carry out semantic understanding to the edit commands, executes the edit commands, output edit Text.

On the other hand, the present invention also provides a kind of equipment, the equipment includes:

One or more processors；

Memory, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of places It manages device and executes the exemplary any method for editing text based on feedforward sequence Memory Neural Networks of the present invention.

On the other hand, the present invention also provides a kind of computer readable storage medium for being stored with computer program, the journeys The present invention exemplary any text editing based on feedforward sequence Memory Neural Networks is realized when sequence is executed by processor.

Compared with prior art, the invention has the benefit that

A kind of exemplary method for editing text based on feedforward sequence Memory Neural Networks of the present invention, obtains original to be edited Beginning text and the voice data for receiving user's typing, executing corresponding edit operation to edit object further according to voice data is Can, in this way, user when carrying out text editing, not only directly can quickly select the edit object in text, without complexity Text chooses operation, and user can also be directly realized by the editor to edit object by voice input, simplify text editing Journey.Speech recognition, text are carried out based on improved feedforward sequence Memory Neural Networks in addition, using to editor's voice data It edits more acurrate efficient.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is the flow diagram of one embodiment of the invention；

Fig. 2 is the structural block diagram of improved feedforward sequence Memory Neural Networks.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

As shown in Figure 1, An embodiment provides a kind of texts based on feedforward sequence Memory Neural Networks Edit methods, specific steps are as follows:

S1: urtext to be edited is obtained；

S2: editor's voice data is received；

The improved feedforward sequence Memory Neural Networks be inserted between the hidden layer for the full Connection Neural Network of feedovering it is low The linear projection layer of dimension is jumped by memory module equipment on the linear projection layer in the adjacent memory module addition Connection, so that high-rise memory module can directly be tired out and be added to the output of low layer memory module.

The memory module, which is tapped delay structure, passes through one group of coefficient for current time and before the hidden layer output at moment Coding obtains a fixed expression.

The operation of the memory module uses the coding based on scalar or vector.

The coding of the memory module introduces the stride factor, and specific calculation formula is as follows:

WhereinRepresent the output of previous cFSMN-layer layers of memory module, s1 and s2 respectively represent review and to

The stride that future sees.Indicate that each moment takes an input when encoding to history if s1=2.Exist in this way It is identical

Order in the case of, so that it may see farther history, so as to it is significantly more efficient to it is long when correlation carry out Modeling.

The improved feedforward sequence Memory Neural Networks (cFSMN) of the present embodiment and existing Sigmoid-DNN, LSTM, Performance and model parameter amount and each iteration of the speech recognition system of BLSTM, sFSMN and vFSMN on SWB database Training time comparison, is shown in Table 1:

Table 1: the training time of performance and model parameter amount and each iteration of the speech recognition system on SWB database

The experimental results showed that those can effectively put the model of row modeling into long phase, such as LSTM and FSMN can be with Obtain the significant performance boost of DNN.The secondary iteration of LSTM-needs 9.5 hours, and BLSTM then needs 23.2 hours.This is Because NVIDIA Tesla K20GPU memory only has 3GB, so that the BLSTM based on BPTT training can only be parallel using 16 words, And LSTM then can be parallel using 64 words.The vFSMN put forward can obtain a small amount of performance boost compared to BLSTM. The model structure of vFSMN is simpler, and also more rapidly, the vFSMN of an iteration training substantially needs 6.9 small training speed When, accelerate compared to the BLSTM training that can obtain 3 times.But the model parameter of vFSMN is but more than BLSTM.Further, The total parameter of model can be reduced to 74MB by the cFSMN of proposition, compared to BLSTM, parameter amount can be reduced 60%.More Importantly, iteration only needs 3.0 hours every time, accelerate compared to the BLSTM training that can obtain substantially 7 times.And it is based on The model of cFSMN can obtain 12.5% Word Error Rate, mention compared to the BLSTM absolute performance that can obtain 0.9% point It rises.

Improved feedforward sequence Memory Neural Networks are expressed as 216-N × [2048-P (N₁,N₂)]-M×2048-P-8911, Wherein N and M respectively represents the number of cFSMN-layer and the full articulamentum of standard.P is the interstitial content of low-rank linear projection layer. N₁,N₂Respectively represent the filter order reviewed He before seen.It is different configuration of to use improved feedforward sequence Memory Neural Networks (cFSMN) acoustic model is shown in Table 2 in the performance test of FSH task:

Table 2: different configuration of to train the cFSMN acoustic model of deep layer in the performance of FSH task using quick connect

Experimental result: exp1 and exp2's results showed that using as formula (1) memory module encode formula, by setting Set big stride, it can be seen that farther contextual information, it is hereby achieved that better performance.From exp2 to exp6, gradually Increase the number of cFSMN-layer, model performance is gradually promoted.Connection is jumped eventually by addition, can successfully train one Deep layer cFSMN comprising 12 cFSMN-layer and 2 full articulamentum is labeled as Deep-cFSMN, in Hub5e00 test set The Word Error Rate of upper acquisition 9.3%.

Acquisition unit, the configuration urtext to be edited with acquisition；

Receiving unit；It is configured to receive editor's voice data；

One or more processors；

Memory, for storing one or more programs,

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Except for the technical features described in the specification, remaining technical characteristic is the known technology of those skilled in the art, is prominent Innovative characteristics of the invention out, details are not described herein for remaining technical characteristic.

Claims

1. a kind of method for editing text based on feedforward sequence Memory Neural Networks, it is characterised in that: specific steps are as follows:

S1: urtext to be edited is obtained；

S2: editor's voice data is received；

S3: editor's voice data is used and carries out speech recognition based on improved feedforward sequence Memory Neural Networks, is obtained Edit commands；

2. the method for editing text according to claim 1 based on feedforward sequence Memory Neural Networks, it is characterised in that: institute Stating improved feedforward sequence Memory Neural Networks is to be inserted into the linear of low dimensional between the hidden layer for the full Connection Neural Network that feedovers Projection layer jumps connection in the adjacent memory module addition, to make by memory module equipment on the linear projection layer High-rise memory module can directly be tired out and be added to the output for obtaining low layer memory module.

3. the method for editing text according to claim 2 based on feedforward sequence Memory Neural Networks, it is characterised in that: institute It states memory module to be tapped delay structure exports current time and the before hidden layer at moment and obtain one by one group of coefficient coding The expression of a fixation.

4. the method for editing text according to claim 2 based on feedforward sequence Memory Neural Networks, it is characterised in that: institute The operation for stating memory module uses the coding based on scalar or vector.

5. according to the method for editing text as claimed in claim 2 based on feedforward sequence Memory Neural Networks, it is characterised in that: described The coding of memory module introduces the stride factor.

6. a kind of text editing system based on feedforward sequence Memory Neural Networks, comprising:

Acquisition unit, the configuration urtext to be edited with acquisition；

Receiving unit；It is configured to receive editor's voice data；

Recognition unit is configured to use based on the progress of improved feedforward sequence Memory Neural Networks editor's voice data Speech recognition obtains edit commands；

Output unit is configured to carry out semantic understanding to the edit commands, executes the edit commands, output edit text This.

7. a kind of equipment, the equipment include:

One or more processors；

Memory, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors A kind of method for editing text based on feedforward sequence Memory Neural Networks that perform claim requires the office 1-5 to state.

8. a kind of computer readable storage medium for being stored with computer program realizes that right is wanted when the program is executed by processor Seek a kind of any method for editing text based on feedforward sequence Memory Neural Networks of 1-5.