CN111009238B - Method, device and equipment for recognizing spliced voice - Google Patents
Method, device and equipment for recognizing spliced voice Download PDFInfo
- Publication number
- CN111009238B CN111009238B CN202010002558.9A CN202010002558A CN111009238B CN 111009238 B CN111009238 B CN 111009238B CN 202010002558 A CN202010002558 A CN 202010002558A CN 111009238 B CN111009238 B CN 111009238B
- Authority
- CN
- China
- Prior art keywords
- spliced
- voice data
- voice
- long
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 56
- 230000015654 memory Effects 0.000 claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 44
- 208000011293 voice disease Diseases 0.000 claims abstract description 11
- 238000013145 classification model Methods 0.000 claims description 47
- 230000007787 long-term memory Effects 0.000 claims description 27
- 238000010606 normalization Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 abstract description 13
- 238000010586 diagram Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006403 short-term memory Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a method, a device and equipment for recognizing spliced voice. Wherein the method comprises the following steps: acquiring normal voice data of a user, cutting the normal voice data into a preset number of segments, splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data, constructing a binary model based on the normal voice data and the spliced voice data, training the binary model by adopting a long-short-period memory network and a convolutional neural network, and recognizing spliced voice of the voice data according to the binary model trained by the spliced voice model. Through the mode, the recognition of spliced voice can be realized, and the safety of voice verification can be further ensured.
Description
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a device for recognizing spliced speech.
Background
In many scenes of real life, it is often necessary to perform voice verification on a user, for example, to log in a software program through voice verification or log in a terminal device through voice verification, but some illegal persons cut voices of other users than the illegal persons and splice spliced voices of specific audio contents, attempt to use the spliced voices to imitate identities of real users to perform voice verification, so that benefits are illegally obtained or illegal operations are performed, and the safety of voice verification cannot be ensured.
However, the prior art cannot recognize the spliced voice, and thus cannot guarantee the security of voice verification.
Disclosure of Invention
In view of the above, the present invention aims to provide a method, a device and a device for recognizing spliced voice, which can recognize the spliced voice and further guarantee the security of voice verification.
According to one aspect of the present invention, there is provided a recognition method of spliced voice, including:
acquiring normal voice data of a user;
cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
constructing a classification model based on the normal voice data and the spliced voice data;
training a spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
Wherein the constructing a classification model based on the normal voice data and the spliced voice data comprises:
and constructing a binary model based on the normal voice data and the spliced voice data by adopting a mode of respectively extracting linear prediction analysis characteristics and tone characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the tone characteristics, and taking the linear prediction analysis characteristics and the tone characteristics after the differential operation and the normalization operation as training inputs of a long-short-period memory network and a convolutional neural network.
The training of the spliced voice model by adopting the long-term memory network and the convolutional neural network comprises the following steps:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
Wherein after the recognition of the spliced voice is performed on the voice data according to the two classification models trained by the spliced voice model, the method further comprises the following steps:
and carrying out parameter counting on the long-short-term memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and training and updating the dichotomy model through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating.
According to another aspect of the present invention, there is provided a recognition apparatus for spliced voice, comprising:
the system comprises an acquisition module, a splicing module, a construction module, a training module and an identification module;
the acquisition module is used for acquiring normal voice data of a user;
the splicing module is used for cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
the construction module is used for constructing a classification model based on the normal voice data and the spliced voice data;
the training module is used for training the spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and the recognition module is used for recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
The construction module is specifically configured to:
and constructing a binary model based on the normal voice data and the spliced voice data by adopting a mode of respectively extracting linear prediction analysis characteristics and tone characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the tone characteristics, and taking the linear prediction analysis characteristics and the tone characteristics after the differential operation and the normalization operation as training inputs of a long-short-period memory network and a convolutional neural network.
The training module is specifically configured to:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
Wherein, the recognition device of concatenation pronunciation still includes:
updating a module;
the updating module is used for carrying out parameter updating on the long-period memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and carrying out training updating on the two classification models through iteration of preset times by adopting the long-period memory network and the convolutional neural network after parameter updating.
According to still another aspect of the present invention, there is provided a recognition apparatus for spliced voice, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of recognizing a spliced speech as claimed in any one of the preceding claims.
According to a further aspect of the present invention, there is provided a computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements a method of recognizing spliced speech as defined in any one of the above.
It can be found that according to the scheme, normal voice data of a user can be obtained, the normal voice data can be cut into a preset number of segments, the normal voice data cut into the preset number of segments is spliced according to voice disorder to obtain spliced voice data, a two-class model based on the normal voice data and the spliced voice data can be constructed, a long-term memory network and a convolution neural network can be adopted to train the two-class model to splice the voice model, and the voice data can be identified according to the two-class model trained by the spliced voice model, so that the identification of the spliced voice can be realized, and further the safety of voice verification can be ensured.
Further, according to the scheme, the linear prediction analysis feature and the pitch feature of the normal voice data and the spliced voice data can be extracted respectively, the difference operation and the normalization operation are carried out on the linear prediction analysis feature and the pitch feature, the linear prediction analysis feature and the pitch feature after the difference operation and the normalization operation are used as training inputs of a long-short-period memory network and a convolutional neural network, and a classification model based on the normal voice data and the spliced voice data is constructed.
Furthermore, according to the scheme, the acoustic features can be extracted from the two-classification model, the extracted acoustic features are input into the long-term memory network and the convolutional neural network, and the long-term memory network and the convolutional neural network are adopted to train the spliced voice model of the two-classification model.
Furthermore, according to the scheme, parameters of the long-period memory network and the short-period memory network and the convolutional neural network which are adopted by the scheme can be counted more through a loss function and an optimization algorithm of cross entropy loss, and the training and updating of the two classification models are carried out through iteration of preset times by adopting the long-period memory network and the convolutional neural network which are subjected to parameter updating, so that the accuracy rate of recognition of spliced voice can be improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an embodiment of a method for recognizing a spliced voice according to the present invention;
FIG. 2 is a flow chart of another embodiment of a method for recognizing a spliced voice according to the present invention;
FIG. 3 is a schematic diagram illustrating an embodiment of a speech recognition apparatus according to the present invention;
FIG. 4 is a schematic diagram of another embodiment of a speech recognition device according to the present invention;
fig. 5 is a schematic structural diagram of an embodiment of a speech recognition device according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is specifically noted that the following examples are only for illustrating the present invention, but do not limit the scope of the present invention. Likewise, the following examples are only some, but not all, of the examples of the present invention, and all other examples, which a person of ordinary skill in the art would obtain without making any inventive effort, are within the scope of the present invention.
The invention provides a recognition method of spliced voice, which can realize the recognition of the spliced voice and further ensure the safety of voice verification.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a method for recognizing a spliced voice according to the present invention. It should be noted that, if there are substantially the same results, the method of the present invention is not limited to the flow sequence shown in fig. 1. As shown in fig. 1, the method comprises the steps of:
s101: and acquiring normal voice data of the user.
In this embodiment, the user may be a single user or a plurality of users, and the present invention is not limited thereto.
In this embodiment, the normal voice data of a plurality of users may be obtained at one time, or may be obtained multiple times, or may be obtained one by one, or the normal voice data of the users may be obtained one by one.
S102: cutting the normal voice data into preset segments, and splicing the normal voice data cut into the preset segments according to voice disorder to obtain spliced voice data.
In this embodiment, the normal voice data may be cut into 2 segments, or may be cut into 3 segments, or may be cut into other segments, which is not limited by the present invention.
S103: and constructing a classification model based on the normal voice data and the spliced voice data.
Wherein, the constructing a classification model based on the normal voice data and the spliced voice data may include:
the method comprises the steps of respectively extracting LPC (Linear Predictive Coding, linear prediction analysis) characteristics and pitch characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of LSTM (Long Short-Term Memory network and convolutional neural network) and CNN (Convolutional Neural Networks, convolutional neural network), and constructing a two-class model based on the normal voice data and the spliced voice data.
S104: and training the spliced voice model by adopting a long-term memory network and a convolution neural network.
The training of the spliced voice model by adopting the long-term memory network and the convolutional neural network can comprise the following steps:
the method has the advantages that the extracted acoustic features can make the features of the spliced voice more prominent, and the accuracy of the recognition of the spliced voice can be improved.
In this embodiment, the long-short-term memory network and the convolutional neural network may include two long-short-term memory layers and two full-connection layers, may include three long-short-term memory layers and three full-connection layers, and may include four long-short-term memory layers and four full-connection layers.
S105: and according to the two classification models trained by the spliced voice model, carrying out recognition of spliced voice on voice data.
After the recognition of the spliced voice is performed on the voice data according to the two classification models trained by the spliced voice model, the method further comprises the following steps:
the parameters of the long-short-term memory network and the convolutional neural network are more numerous through a loss function of cross entropy loss and an optimization algorithm, and the training and updating of the two classification models are carried out through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating, so that the accuracy rate of recognition of spliced voice can be improved.
It can be found that in this embodiment, normal voice data of a user may be obtained, the normal voice data may be cut into a preset number of segments, and the normal voice data cut into the preset number of segments may be spliced according to a voice disorder to obtain spliced voice data, a binary model based on the normal voice data and the spliced voice data may be constructed, a long-short-term memory network and a convolutional neural network may be used to train the binary model to splice voice models, and recognition of spliced voice may be performed on voice data according to the binary model trained by the spliced voice model, so that recognition of spliced voice may be realized, and further security of voice verification may be ensured.
Further, in this embodiment, a manner of extracting the linear prediction analysis feature and the pitch feature of the normal voice data and the spliced voice data, performing differential operation and normalization operation on the linear prediction analysis feature and the pitch feature, and taking the linear prediction analysis feature and the pitch feature after the differential operation and the normalization operation as training inputs of a long-short-period memory network and a convolutional neural network can be adopted to construct a classification model based on the normal voice data and the spliced voice data, which is advantageous in that the long-short-period memory network and the convolutional neural network can retain information of audio context, thereby being capable of facilitating recognition of spliced voice.
Further, in this embodiment, the acoustic features may be extracted from the two-classification model, and the extracted acoustic features may be input to the long-short-term memory network and the convolutional neural network, and training of the spliced voice model is performed on the two-classification model by using the long-short-term memory network and the convolutional neural network.
Referring to fig. 2, fig. 2 is a flowchart of another embodiment of a method for recognizing a spliced voice according to the present invention. In this embodiment, the method includes the steps of:
s201: and acquiring normal voice data of the user.
As described in S101, a detailed description is omitted here.
S202: cutting the normal voice data into preset segments, and splicing the normal voice data cut into the preset segments according to voice disorder to obtain spliced voice data.
As described in S102, the description is omitted here.
S203: and constructing a classification model based on the normal voice data and the spliced voice data.
As described in S103, a detailed description is omitted here.
S204: and training the spliced voice model by adopting a long-term memory network and a convolution neural network.
As described in S104, a detailed description is omitted here.
S205: and according to the two classification models trained by the spliced voice model, carrying out recognition of spliced voice on voice data.
S206: and carrying out parameter updating on the long-short-term memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and training and updating the two classification models through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating.
It can be found that in this embodiment, parameters of the long-short-term memory network and the convolutional neural network can be more numerous through a loss function and an optimization algorithm of cross entropy loss, and the training update of the two-class model is performed through iteration of preset times by using the long-short-term memory network and the convolutional neural network after parameter update, so that the accuracy rate of recognition of spliced voice can be improved.
The invention also provides a recognition device for the spliced voice, which can realize the recognition of the spliced voice and further ensure the safety of voice verification.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of a speech recognition device according to the present invention. In this embodiment, the recognition device 30 for spliced voice includes an acquisition module 31, a splicing module 32, a construction module 33, a training module 34, and a recognition module 35.
The acquiring module 31 is configured to acquire normal voice data of a user.
The splicing module 32 is configured to cut the normal voice data into a preset number of segments, and splice the normal voice data cut into the preset number of segments according to a voice disorder to obtain spliced voice data.
The construction module 33 is configured to construct a classification model based on the normal voice data and the spliced voice data.
The training module 34 is configured to train the spliced speech model on the two classification models using a long-term memory network and a convolutional neural network.
The recognition module 35 is configured to recognize the spliced voice from the voice data according to the two classification models trained by the spliced voice model.
Alternatively, the construction module 33 may be specifically configured to:
the method comprises the steps of respectively extracting linear prediction analysis characteristics and pitch characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, and constructing a binary classification model based on the normal voice data and the spliced voice data by taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of a long-short-term memory network and a convolutional neural network.
Optionally, the training module 34 may be specifically configured to:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model of the two classification models by adopting the long-term memory network and the convolutional neural network.
Referring to fig. 4, fig. 4 is a schematic structural diagram of another embodiment of a speech recognition device according to the present invention. Unlike the previous embodiment, the apparatus 40 for recognizing spliced voice according to the present embodiment further includes an updating module 41.
The updating module 41 is configured to perform parameter updating on the long-term and short-term memory networks and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and perform training updating on the two classification models through preset times of iteration on the long-term and short-term memory networks and the convolutional neural network after parameter updating.
The respective unit modules of the recognition device 30/40 for spliced voice can execute the corresponding steps in the above method embodiments, so that the detailed description of the respective unit modules is omitted herein.
The present invention further provides a recognition device for spliced voice, as shown in fig. 5, including: at least one processor 51; and a memory 52 communicatively coupled to the at least one processor 51; the memory 52 stores instructions executable by the at least one processor 51, and the instructions are executed by the at least one processor 51 to enable the at least one processor 51 to perform the above-described method for recognizing spliced speech.
Where the memory 52 and the processor 51 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 51 and the memory 52 together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 51 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 51.
The processor 51 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 52 may be used to store data used by the processor 51 in performing operations.
The present invention further provides a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
It can be found that according to the scheme, normal voice data of a user can be obtained, the normal voice data can be cut into a preset number of segments, the normal voice data cut into the preset number of segments is spliced according to voice disorder to obtain spliced voice data, a two-class model based on the normal voice data and the spliced voice data can be constructed, a long-term memory network and a convolution neural network can be adopted to train the two-class model to splice the voice model, and the voice data can be identified according to the two-class model trained by the spliced voice model, so that the identification of the spliced voice can be realized, and further the safety of voice verification can be ensured.
Further, according to the scheme, the linear prediction analysis feature and the pitch feature of the normal voice data and the spliced voice data can be extracted respectively, the difference operation and the normalization operation are carried out on the linear prediction analysis feature and the pitch feature, the linear prediction analysis feature and the pitch feature after the difference operation and the normalization operation are used as training inputs of a long-short-period memory network and a convolutional neural network, and a classification model based on the normal voice data and the spliced voice data is constructed.
Furthermore, according to the scheme, the acoustic features can be extracted from the two-classification model, the extracted acoustic features are input into the long-term memory network and the convolutional neural network, and the long-term memory network and the convolutional neural network are adopted to train the spliced voice model of the two-classification model.
Furthermore, according to the scheme, parameters of the long-period memory network and the short-period memory network and the convolutional neural network which are adopted by the scheme can be counted more through a loss function and an optimization algorithm of cross entropy loss, and the training and updating of the two classification models are carried out through iteration of preset times by adopting the long-period memory network and the convolutional neural network which are subjected to parameter updating, so that the accuracy rate of recognition of spliced voice can be improved.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only a partial embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent devices or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.
Claims (8)
1. A method for recognizing spliced speech, comprising:
acquiring normal voice data of a user;
cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
constructing a classification model based on the normal voice data and the spliced voice data, comprising:
the method comprises the steps of respectively extracting linear prediction analysis characteristics and pitch characteristics of normal voice data and spliced voice data, performing differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, and constructing a binary model based on the normal voice data and the spliced voice data by taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of a long-short-term memory network and a convolutional neural network;
training a spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
2. The method for recognizing a spliced voice according to claim 1, wherein the training of the spliced voice model using the long-short-term memory network and the convolutional neural network comprises:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
3. The method for recognizing a spliced voice according to claim 1, further comprising, after the recognition of the spliced voice from the voice data according to the two classification models trained by the spliced voice model:
and carrying out parameter counting on the long-short-term memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and training and updating the dichotomy model through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating.
4. A spliced voice recognition device, comprising:
the system comprises an acquisition module, a splicing module, a construction module, a training module and an identification module;
the acquisition module is used for acquiring normal voice data of a user;
the splicing module is used for cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
the construction module is used for constructing a classification model based on the normal voice data and the spliced voice data, and is specifically used for:
the method comprises the steps of respectively extracting linear prediction analysis characteristics and pitch characteristics of normal voice data and spliced voice data, performing differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, and constructing a binary model based on the normal voice data and the spliced voice data by taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of a long-short-term memory network and a convolutional neural network;
the training module is used for training the spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and the recognition module is used for recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
5. The apparatus for recognizing spliced speech according to claim 4, wherein the training module is specifically configured to:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
6. The spliced voice recognition device of claim 4, wherein the spliced voice recognition device further comprises:
updating a module;
the updating module is used for carrying out parameter updating on the long-period memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and carrying out training updating on the two classification models through iteration of preset times by adopting the long-period memory network and the convolutional neural network after parameter updating.
7. A spliced voice recognition apparatus, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of recognition of spliced speech according to any one of claims 1 to 3.
8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of recognition of spliced speech according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010002558.9A CN111009238B (en) | 2020-01-02 | 2020-01-02 | Method, device and equipment for recognizing spliced voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010002558.9A CN111009238B (en) | 2020-01-02 | 2020-01-02 | Method, device and equipment for recognizing spliced voice |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111009238A CN111009238A (en) | 2020-04-14 |
CN111009238B true CN111009238B (en) | 2023-06-23 |
Family
ID=70120411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010002558.9A Active CN111009238B (en) | 2020-01-02 | 2020-01-02 | Method, device and equipment for recognizing spliced voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111009238B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111477235B (en) * | 2020-04-15 | 2023-05-05 | 厦门快商通科技股份有限公司 | Voiceprint acquisition method, voiceprint acquisition device and voiceprint acquisition equipment |
CN111599351A (en) * | 2020-04-30 | 2020-08-28 | 厦门快商通科技股份有限公司 | Voice recognition method, device and equipment |
CN111583946A (en) * | 2020-04-30 | 2020-08-25 | 厦门快商通科技股份有限公司 | Voice signal enhancement method, device and equipment |
CN111583947A (en) * | 2020-04-30 | 2020-08-25 | 厦门快商通科技股份有限公司 | Voice enhancement method, device and equipment |
CN113516969B (en) * | 2021-09-14 | 2021-12-14 | 北京远鉴信息技术有限公司 | Spliced voice identification method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456345A (en) * | 2010-10-19 | 2012-05-16 | 盛乐信息技术(上海)有限公司 | Concatenated speech detection system and method |
CN108288470A (en) * | 2017-01-10 | 2018-07-17 | 富士通株式会社 | Auth method based on vocal print and device |
CN109376264A (en) * | 2018-11-09 | 2019-02-22 | 广州势必可赢网络科技有限公司 | A kind of audio-frequency detection, device, equipment and computer readable storage medium |
CN110491391A (en) * | 2019-07-02 | 2019-11-22 | 厦门大学 | A kind of deception speech detection method based on deep neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10276166B2 (en) * | 2014-07-22 | 2019-04-30 | Nuance Communications, Inc. | Method and apparatus for detecting splicing attacks on a speaker verification system |
-
2020
- 2020-01-02 CN CN202010002558.9A patent/CN111009238B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456345A (en) * | 2010-10-19 | 2012-05-16 | 盛乐信息技术(上海)有限公司 | Concatenated speech detection system and method |
CN108288470A (en) * | 2017-01-10 | 2018-07-17 | 富士通株式会社 | Auth method based on vocal print and device |
CN109376264A (en) * | 2018-11-09 | 2019-02-22 | 广州势必可赢网络科技有限公司 | A kind of audio-frequency detection, device, equipment and computer readable storage medium |
CN110491391A (en) * | 2019-07-02 | 2019-11-22 | 厦门大学 | A kind of deception speech detection method based on deep neural network |
Also Published As
Publication number | Publication date |
---|---|
CN111009238A (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111009238B (en) | Method, device and equipment for recognizing spliced voice | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN110942763B (en) | Speech recognition method and device | |
CN112380853B (en) | Service scene interaction method and device, terminal equipment and storage medium | |
CN111899759B (en) | Method, device, equipment and medium for pre-training and model training of audio data | |
CN111832318B (en) | Single sentence natural language processing method and device, computer equipment and readable storage medium | |
CN113283238B (en) | Text data processing method and device, electronic equipment and storage medium | |
CN113192497B (en) | Speech recognition method, device, equipment and medium based on natural language processing | |
CN111582341B (en) | User abnormal operation prediction method and device | |
CN109462482A (en) | Method for recognizing sound-groove, device, electronic equipment and computer readable storage medium | |
CN113362852A (en) | User attribute identification method and device | |
CN112632248A (en) | Question answering method, device, computer equipment and storage medium | |
CN110428816B (en) | Method and device for training and sharing voice cell bank | |
CN116702736A (en) | Safe call generation method and device, electronic equipment and storage medium | |
CN114758645A (en) | Training method, device and equipment of speech synthesis model and storage medium | |
CN112669836B (en) | Command recognition method and device and computer readable storage medium | |
CN111128234B (en) | Spliced voice recognition detection method, device and equipment | |
CN117912455A (en) | Land-air communication voice conversion method and device, terminal equipment and storage medium | |
CN114706943A (en) | Intention recognition method, apparatus, device and medium | |
CN114925159A (en) | User emotion analysis model training method and device, electronic equipment and storage medium | |
CN111199750B (en) | Pronunciation evaluation method and device, electronic equipment and storage medium | |
CN116150324A (en) | Training method, device, equipment and medium of dialogue model | |
CN111179912A (en) | Detection method, device and equipment for spliced voice | |
CN111477235B (en) | Voiceprint acquisition method, voiceprint acquisition device and voiceprint acquisition equipment | |
CN111599351A (en) | Voice recognition method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |