CN110377889B - Text editing method and system based on feedforward sequence memory neural network - Google Patents

Text editing method and system based on feedforward sequence memory neural network Download PDF

Info

Publication number
CN110377889B
CN110377889B CN201910487145.1A CN201910487145A CN110377889B CN 110377889 B CN110377889 B CN 110377889B CN 201910487145 A CN201910487145 A CN 201910487145A CN 110377889 B CN110377889 B CN 110377889B
Authority
CN
China
Prior art keywords
neural network
edited
memory module
editing
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910487145.1A
Other languages
Chinese (zh)
Other versions
CN110377889A (en
Inventor
吴立刚
刘迪
邱镇
黄晓光
浦正国
梁翀
韩涛
张天奇
余江斌
宋杰
何东
郭庆
吴小华
胡心颖
周伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Zhejiang Electric Power Co Ltd
Anhui Jiyuan Software Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Zhejiang Electric Power Co Ltd
Anhui Jiyuan Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, State Grid Zhejiang Electric Power Co Ltd, Anhui Jiyuan Software Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910487145.1A priority Critical patent/CN110377889B/en
Publication of CN110377889A publication Critical patent/CN110377889A/en
Application granted granted Critical
Publication of CN110377889B publication Critical patent/CN110377889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a text editing method based on a feedforward sequence memory neural network, which belongs to the technical field of voice signal processing and comprises the following steps: acquiring an original text to be edited; receiving edited voice data; performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command; and carrying out semantic understanding on the editing command, and executing the editing command. According to the technical scheme, the improved feedforward sequence memory neural network is adopted for voice recognition, and text editing is accurate and efficient.

Description

Text editing method and system based on feedforward sequence memory neural network
Technical Field
The invention belongs to the technical field of voice signal processing, and particularly relates to a text editing method and system based on a feedforward sequence memory neural network.
Background
With the popularity of mobile phones, people receive a large amount of text information on portable devices such as mobile phones or tablet computers every day. For example, messages pushed by short messages, instant messaging software or other software, web page content, text news, etc. When people want to edit the text content of interest in the text information, firstly, the cursor is required to be positioned at the text content of interest, and then the selected text is subjected to subsequent operations, such as adding the text at the cursor position, replacing the selected text, and the like, so that the editing process is complex and inconvenient. The prior art is to receive voice data recorded by a user, and then execute corresponding editing operation on an editing object according to the voice data. Therefore, when the user edits the text, the user can directly and rapidly select the editing object in the text without complex text selection operation, and the user can directly edit the editing object through voice input, so that the text editing process is simplified. However, the current voice data is directly received and then operated, no voice is processed, and under the conditions of strong far field and noise interference, the performance of the voice recognition system is not ideal enough, so that text editing is inaccurate.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a text editing method based on a feedforward sequence memory neural network, which adopts the feedforward sequence memory neural network based on improvement to carry out voice recognition, so that the text editing is more accurate and efficient.
In order to solve the technical problems, the invention adopts the following technical scheme:
in one aspect, the invention provides a text editing method based on a feedforward sequence memory neural network, which comprises the following specific steps:
s1: acquiring an original text to be edited;
s2: receiving edited voice data;
s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
s4: and carrying out semantic understanding on the editing command, and executing the editing command.
Further preferably, the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added between adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to the high-layer memory module.
Further preferably, the memory module is a tap delay structure that encodes hidden layer outputs at a current time and a previous time through a set of coefficients to obtain a fixed representation.
Further preferably, the operation of the memory module employs scalar or vector based encoding.
Further preferably, the encoding of the memory module introduces a stride factor.
On the other hand, the invention also provides a text editing system based on the feedforward sequence memory neural network, which comprises the following steps:
the acquisition unit is configured to acquire an original text to be edited;
a receiving unit; configured to receive edit speech data;
the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
and the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text.
In another aspect, the present invention also provides an apparatus, including:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform any of the feedforward sequence memory neural network-based text editing methods of examples of the invention.
In another aspect, the present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements any of the feedforward sequence memory neural network-based text editing of the examples of the invention.
Compared with the prior art, the invention has the beneficial effects that:
according to the text editing method based on the feedforward sequence memory neural network, which is disclosed by the invention, the original text to be edited is obtained, the voice data input by the user is received, and the corresponding editing operation is executed on the editing object according to the voice data, so that the user can directly and rapidly select the editing object in the text without complex text selection operation when editing the text, and the user can directly edit the editing object through voice input, so that the text editing process is simplified. In addition, the voice recognition is carried out on the edited voice data by adopting a feedforward sequence memory neural network based on improvement, so that the text editing is more accurate and efficient.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:
FIG. 1 is a schematic flow diagram of one embodiment of the present invention;
FIG. 2 is a block diagram of an improved feedforward sequence memory neural network.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
As shown in fig. 1, an embodiment of the present invention provides a text editing method based on a feedforward sequence memory neural network, which specifically includes the steps of:
s1: acquiring an original text to be edited;
s2: receiving edited voice data;
s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
s4: and carrying out semantic understanding on the editing command, and executing the editing command.
The improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module.
The memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression.
The memory module operates using scalar or vector based coding.
The coding of the memory module introduces a stride factor, and a specific calculation formula is as follows:
Figure BDA0002085781230000032
wherein the method comprises the steps of
Figure BDA0002085781230000033
Output of the memory module representing the previous cFSMN-layer, s1 and s2 represent look-back and look-forward, respectively
Stride for future viewing. If s1=2 then this means that one input is taken for each moment in time when the history is encoded. Thus in the same way
In order, a longer history can be seen, so that the long-term correlation can be modeled more effectively.
The performance of the improved feedforward sequence memory neural network (cFSMN) of this example on the SWB database, the number of model parameters and the training time for each iteration are compared with the performance of the existing Sigmoid-DNN, LSTM, BLSTM, sFSMN and vFSMN speech recognition systems, see Table 1:
table 1: performance of speech recognition system on SWB database, model parameters and training time per iteration
Figure BDA0002085781230000031
Figure BDA0002085781230000041
Experimental results indicate that models that can effectively model long-term correlations, such as LSTM and FSMN, can achieve significant performance improvements in DNN. LSTM-iterations take 9.5 hours, while BLSTM takes 23.2 hours. This is because NVIDIA Tesla K20GPU memory is only 3GB, so that BLSTM based on BPTT training can only use 16-sentence parallelism, while LSTM can use 64-sentence parallelism. The proposed vfmn can achieve a small performance improvement over the BLSTM. The model structure of the vFSMN is simpler, the training speed is faster, the training of the vFSMN for one iteration approximately takes 6.9 hours, and 3 times of training acceleration can be obtained compared with BLSTM. But the model parameters of the vfmn are more than those of the BLSTM. Further, the proposed cFSMN can reduce the overall parameters of the model to 74MB, which can reduce the amount of parameters by 60% compared to BLSTM. More importantly, only 3.0 hours are required for each iteration, approximately 7 times the training acceleration can be achieved compared to BLSTM. Furthermore, a model based on cFSMN can obtain a word error rate of 12.5% and an absolute performance improvement of 0.9% over BLSTM.
The improved feedforward sequence memory neural network is represented as 216-Nx [2048-P (N) 1 ,N 2 )]-mx2048-P-8911, wherein N and M represent the number of cfmn-layers and standard fully connected layers, respectively. P is the number of nodes of the low rank linear projection layer. N (N) 1 ,N 2 Representing the filter order for review and review, respectively. Performance tests of different configurations using the improved feedforward sequence memory neural network (cFSMN) acoustic model at FSH tasks are shown in Table 2:
table 2: performance of different configurations of cFSMN acoustic models employing shortcut training deep layers in FSH tasks
Figure BDA0002085781230000042
The experimental results exp1 and exp2 show that the memory module coding formula as formula (1) is adopted, and by setting a large stride, more context information can be seen, so that better performance can be obtained. From exp2 to exp6, the number of cFSMN-layers is gradually increased, and the model performance is gradually improved. Finally, by adding the jump connection, a Deep cFSMN comprising 12 cFSMN-layers and 2 full connection layers, which is marked as Deep-cFSMN, can be successfully trained, and the word error rate of 9.3% is obtained on the Hub5e00 test set.
On the other hand, the invention also provides a text editing system based on the feedforward sequence memory neural network, which comprises the following steps:
the acquisition unit is configured to acquire an original text to be edited;
a receiving unit; configured to receive edit speech data;
the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
and the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text.
In another aspect, the present invention also provides an apparatus, including:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform any of the feedforward sequence memory neural network-based text editing methods of examples of the invention.
In another aspect, the present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements any of the feedforward sequence memory neural network-based text editing of the examples of the invention.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.
Other technical features besides those described in the specification are known to those skilled in the art, and are not described herein in detail to highlight the innovative features of the present invention.

Claims (5)

1. A text editing method based on a feedforward sequence memory neural network is characterized by comprising the following steps of: the method comprises the following specific steps:
s1: acquiring an original text to be edited;
s2: receiving edited voice data;
s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
s4: carrying out semantic understanding on the editing command and executing the editing command;
the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module;
the memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression;
the memory module operates using scalar or vector based coding.
2. The text editing method based on feedforward sequence memory neural network according to claim 1, wherein: the encoding of the memory module introduces a stride factor.
3. A text editing system based on a feedforward sequence memory neural network, comprising:
the acquisition unit is configured to acquire an original text to be edited;
a receiving unit; configured to receive edit speech data;
the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text;
the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module;
the memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression;
the memory module operates using scalar or vector based coding.
4. An apparatus, the apparatus comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a text editing method based on a feedforward sequence memory neural network of any of claims 1-2.
5. A computer readable storage medium storing a computer program which when executed by a processor implements a method of text editing based on a feedforward sequence memory neural network according to any of claims 1-2.
CN201910487145.1A 2019-06-05 2019-06-05 Text editing method and system based on feedforward sequence memory neural network Active CN110377889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910487145.1A CN110377889B (en) 2019-06-05 2019-06-05 Text editing method and system based on feedforward sequence memory neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910487145.1A CN110377889B (en) 2019-06-05 2019-06-05 Text editing method and system based on feedforward sequence memory neural network

Publications (2)

Publication Number Publication Date
CN110377889A CN110377889A (en) 2019-10-25
CN110377889B true CN110377889B (en) 2023-06-20

Family

ID=68249843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910487145.1A Active CN110377889B (en) 2019-06-05 2019-06-05 Text editing method and system based on feedforward sequence memory neural network

Country Status (1)

Country Link
CN (1) CN110377889B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101688A1 (en) * 2014-12-25 2016-06-30 清华大学 Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network
CN106919977A (en) * 2015-12-25 2017-07-04 科大讯飞股份有限公司 A kind of feedforward sequence Memory Neural Networks and its construction method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101688A1 (en) * 2014-12-25 2016-06-30 清华大学 Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network
CN106919977A (en) * 2015-12-25 2017-07-04 科大讯飞股份有限公司 A kind of feedforward sequence Memory Neural Networks and its construction method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于时域建模的自动语音识别;王海坤等;《计算机工程与应用》;20171015(第20期);全文 *

Also Published As

Publication number Publication date
CN110377889A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
US11620983B2 (en) Speech recognition method, device, and computer-readable storage medium
US10395118B2 (en) Systems and methods for video paragraph captioning using hierarchical recurrent neural networks
US11321535B2 (en) Hierarchical annotation of dialog acts
CN108735202A (en) Convolution recurrent neural network for small occupancy resource keyword retrieval
CN104199825A (en) Information inquiry method and system
CN112825249A (en) Voice processing method and device
CN110136689A (en) Song synthetic method, device and storage medium based on transfer learning
CN108388597A (en) Conference summary generation method and device
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
WO2019138897A1 (en) Learning device and method, and program
CN110377889B (en) Text editing method and system based on feedforward sequence memory neural network
CN113408208A (en) Model training method, information extraction method, related device and storage medium
CN108962228A (en) model training method and device
CN116737895A (en) Data processing method and related equipment
CN112150103B (en) Schedule setting method, schedule setting device and storage medium
CN109147773B (en) Voice recognition device and method
CN112185352B (en) Voice recognition method and device and electronic equipment
GB2555945A (en) Hierarchical annotation of dialog acts
CN109829035A (en) Process searching method, device, computer equipment and storage medium
CN117521658B (en) RPA process mining method and system based on chapter-level event extraction
CN116306672A (en) Data processing method and device
Koster Automatic LIP-SYNC: direct translation of speech sound to mouth animation
Farsi et al. Modifying voice activity detection in low SNR by correction factors
CN117952171A (en) Model generation method, image generation device and electronic equipment
CN117975984A (en) Speech processing method, apparatus, device, storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant