CN111009238B - Method, device and equipment for recognizing spliced voice - Google Patents

Method, device and equipment for recognizing spliced voice Download PDF

Info

Publication number
CN111009238B
CN111009238B CN202010002558.9A CN202010002558A CN111009238B CN 111009238 B CN111009238 B CN 111009238B CN 202010002558 A CN202010002558 A CN 202010002558A CN 111009238 B CN111009238 B CN 111009238B
Authority
CN
China
Prior art keywords
spliced
voice data
voice
long
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010002558.9A
Other languages
Chinese (zh)
Other versions
CN111009238A (en
Inventor
陈剑超
肖龙源
李稀敏
蔡振华
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010002558.9A priority Critical patent/CN111009238B/en
Publication of CN111009238A publication Critical patent/CN111009238A/en
Application granted granted Critical
Publication of CN111009238B publication Critical patent/CN111009238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method, a device and equipment for recognizing spliced voice. Wherein the method comprises the following steps: acquiring normal voice data of a user, cutting the normal voice data into a preset number of segments, splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data, constructing a binary model based on the normal voice data and the spliced voice data, training the binary model by adopting a long-short-period memory network and a convolutional neural network, and recognizing spliced voice of the voice data according to the binary model trained by the spliced voice model. Through the mode, the recognition of spliced voice can be realized, and the safety of voice verification can be further ensured.

Description

Method, device and equipment for recognizing spliced voice
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a device for recognizing spliced speech.
Background
In many scenes of real life, it is often necessary to perform voice verification on a user, for example, to log in a software program through voice verification or log in a terminal device through voice verification, but some illegal persons cut voices of other users than the illegal persons and splice spliced voices of specific audio contents, attempt to use the spliced voices to imitate identities of real users to perform voice verification, so that benefits are illegally obtained or illegal operations are performed, and the safety of voice verification cannot be ensured.
However, the prior art cannot recognize the spliced voice, and thus cannot guarantee the security of voice verification.
Disclosure of Invention
In view of the above, the present invention aims to provide a method, a device and a device for recognizing spliced voice, which can recognize the spliced voice and further guarantee the security of voice verification.
According to one aspect of the present invention, there is provided a recognition method of spliced voice, including:
acquiring normal voice data of a user;
cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
constructing a classification model based on the normal voice data and the spliced voice data;
training a spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
Wherein the constructing a classification model based on the normal voice data and the spliced voice data comprises:
and constructing a binary model based on the normal voice data and the spliced voice data by adopting a mode of respectively extracting linear prediction analysis characteristics and tone characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the tone characteristics, and taking the linear prediction analysis characteristics and the tone characteristics after the differential operation and the normalization operation as training inputs of a long-short-period memory network and a convolutional neural network.
The training of the spliced voice model by adopting the long-term memory network and the convolutional neural network comprises the following steps:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
Wherein after the recognition of the spliced voice is performed on the voice data according to the two classification models trained by the spliced voice model, the method further comprises the following steps:
and carrying out parameter counting on the long-short-term memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and training and updating the dichotomy model through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating.
According to another aspect of the present invention, there is provided a recognition apparatus for spliced voice, comprising:
the system comprises an acquisition module, a splicing module, a construction module, a training module and an identification module;
the acquisition module is used for acquiring normal voice data of a user;
the splicing module is used for cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
the construction module is used for constructing a classification model based on the normal voice data and the spliced voice data;
the training module is used for training the spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and the recognition module is used for recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
The construction module is specifically configured to:
and constructing a binary model based on the normal voice data and the spliced voice data by adopting a mode of respectively extracting linear prediction analysis characteristics and tone characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the tone characteristics, and taking the linear prediction analysis characteristics and the tone characteristics after the differential operation and the normalization operation as training inputs of a long-short-period memory network and a convolutional neural network.
The training module is specifically configured to:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
Wherein, the recognition device of concatenation pronunciation still includes:
updating a module;
the updating module is used for carrying out parameter updating on the long-period memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and carrying out training updating on the two classification models through iteration of preset times by adopting the long-period memory network and the convolutional neural network after parameter updating.
According to still another aspect of the present invention, there is provided a recognition apparatus for spliced voice, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of recognizing a spliced speech as claimed in any one of the preceding claims.
According to a further aspect of the present invention, there is provided a computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements a method of recognizing spliced speech as defined in any one of the above.
It can be found that according to the scheme, normal voice data of a user can be obtained, the normal voice data can be cut into a preset number of segments, the normal voice data cut into the preset number of segments is spliced according to voice disorder to obtain spliced voice data, a two-class model based on the normal voice data and the spliced voice data can be constructed, a long-term memory network and a convolution neural network can be adopted to train the two-class model to splice the voice model, and the voice data can be identified according to the two-class model trained by the spliced voice model, so that the identification of the spliced voice can be realized, and further the safety of voice verification can be ensured.
Further, according to the scheme, the linear prediction analysis feature and the pitch feature of the normal voice data and the spliced voice data can be extracted respectively, the difference operation and the normalization operation are carried out on the linear prediction analysis feature and the pitch feature, the linear prediction analysis feature and the pitch feature after the difference operation and the normalization operation are used as training inputs of a long-short-period memory network and a convolutional neural network, and a classification model based on the normal voice data and the spliced voice data is constructed.
Furthermore, according to the scheme, the acoustic features can be extracted from the two-classification model, the extracted acoustic features are input into the long-term memory network and the convolutional neural network, and the long-term memory network and the convolutional neural network are adopted to train the spliced voice model of the two-classification model.
Furthermore, according to the scheme, parameters of the long-period memory network and the short-period memory network and the convolutional neural network which are adopted by the scheme can be counted more through a loss function and an optimization algorithm of cross entropy loss, and the training and updating of the two classification models are carried out through iteration of preset times by adopting the long-period memory network and the convolutional neural network which are subjected to parameter updating, so that the accuracy rate of recognition of spliced voice can be improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an embodiment of a method for recognizing a spliced voice according to the present invention;
FIG. 2 is a flow chart of another embodiment of a method for recognizing a spliced voice according to the present invention;
FIG. 3 is a schematic diagram illustrating an embodiment of a speech recognition apparatus according to the present invention;
FIG. 4 is a schematic diagram of another embodiment of a speech recognition device according to the present invention;
fig. 5 is a schematic structural diagram of an embodiment of a speech recognition device according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is specifically noted that the following examples are only for illustrating the present invention, but do not limit the scope of the present invention. Likewise, the following examples are only some, but not all, of the examples of the present invention, and all other examples, which a person of ordinary skill in the art would obtain without making any inventive effort, are within the scope of the present invention.
The invention provides a recognition method of spliced voice, which can realize the recognition of the spliced voice and further ensure the safety of voice verification.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a method for recognizing a spliced voice according to the present invention. It should be noted that, if there are substantially the same results, the method of the present invention is not limited to the flow sequence shown in fig. 1. As shown in fig. 1, the method comprises the steps of:
s101: and acquiring normal voice data of the user.
In this embodiment, the user may be a single user or a plurality of users, and the present invention is not limited thereto.
In this embodiment, the normal voice data of a plurality of users may be obtained at one time, or may be obtained multiple times, or may be obtained one by one, or the normal voice data of the users may be obtained one by one.
S102: cutting the normal voice data into preset segments, and splicing the normal voice data cut into the preset segments according to voice disorder to obtain spliced voice data.
In this embodiment, the normal voice data may be cut into 2 segments, or may be cut into 3 segments, or may be cut into other segments, which is not limited by the present invention.
S103: and constructing a classification model based on the normal voice data and the spliced voice data.
Wherein, the constructing a classification model based on the normal voice data and the spliced voice data may include:
the method comprises the steps of respectively extracting LPC (Linear Predictive Coding, linear prediction analysis) characteristics and pitch characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of LSTM (Long Short-Term Memory network and convolutional neural network) and CNN (Convolutional Neural Networks, convolutional neural network), and constructing a two-class model based on the normal voice data and the spliced voice data.
S104: and training the spliced voice model by adopting a long-term memory network and a convolution neural network.
The training of the spliced voice model by adopting the long-term memory network and the convolutional neural network can comprise the following steps:
the method has the advantages that the extracted acoustic features can make the features of the spliced voice more prominent, and the accuracy of the recognition of the spliced voice can be improved.
In this embodiment, the long-short-term memory network and the convolutional neural network may include two long-short-term memory layers and two full-connection layers, may include three long-short-term memory layers and three full-connection layers, and may include four long-short-term memory layers and four full-connection layers.
S105: and according to the two classification models trained by the spliced voice model, carrying out recognition of spliced voice on voice data.
After the recognition of the spliced voice is performed on the voice data according to the two classification models trained by the spliced voice model, the method further comprises the following steps:
the parameters of the long-short-term memory network and the convolutional neural network are more numerous through a loss function of cross entropy loss and an optimization algorithm, and the training and updating of the two classification models are carried out through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating, so that the accuracy rate of recognition of spliced voice can be improved.
It can be found that in this embodiment, normal voice data of a user may be obtained, the normal voice data may be cut into a preset number of segments, and the normal voice data cut into the preset number of segments may be spliced according to a voice disorder to obtain spliced voice data, a binary model based on the normal voice data and the spliced voice data may be constructed, a long-short-term memory network and a convolutional neural network may be used to train the binary model to splice voice models, and recognition of spliced voice may be performed on voice data according to the binary model trained by the spliced voice model, so that recognition of spliced voice may be realized, and further security of voice verification may be ensured.
Further, in this embodiment, a manner of extracting the linear prediction analysis feature and the pitch feature of the normal voice data and the spliced voice data, performing differential operation and normalization operation on the linear prediction analysis feature and the pitch feature, and taking the linear prediction analysis feature and the pitch feature after the differential operation and the normalization operation as training inputs of a long-short-period memory network and a convolutional neural network can be adopted to construct a classification model based on the normal voice data and the spliced voice data, which is advantageous in that the long-short-period memory network and the convolutional neural network can retain information of audio context, thereby being capable of facilitating recognition of spliced voice.
Further, in this embodiment, the acoustic features may be extracted from the two-classification model, and the extracted acoustic features may be input to the long-short-term memory network and the convolutional neural network, and training of the spliced voice model is performed on the two-classification model by using the long-short-term memory network and the convolutional neural network.
Referring to fig. 2, fig. 2 is a flowchart of another embodiment of a method for recognizing a spliced voice according to the present invention. In this embodiment, the method includes the steps of:
s201: and acquiring normal voice data of the user.
As described in S101, a detailed description is omitted here.
S202: cutting the normal voice data into preset segments, and splicing the normal voice data cut into the preset segments according to voice disorder to obtain spliced voice data.
As described in S102, the description is omitted here.
S203: and constructing a classification model based on the normal voice data and the spliced voice data.
As described in S103, a detailed description is omitted here.
S204: and training the spliced voice model by adopting a long-term memory network and a convolution neural network.
As described in S104, a detailed description is omitted here.
S205: and according to the two classification models trained by the spliced voice model, carrying out recognition of spliced voice on voice data.
S206: and carrying out parameter updating on the long-short-term memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and training and updating the two classification models through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating.
It can be found that in this embodiment, parameters of the long-short-term memory network and the convolutional neural network can be more numerous through a loss function and an optimization algorithm of cross entropy loss, and the training update of the two-class model is performed through iteration of preset times by using the long-short-term memory network and the convolutional neural network after parameter update, so that the accuracy rate of recognition of spliced voice can be improved.
The invention also provides a recognition device for the spliced voice, which can realize the recognition of the spliced voice and further ensure the safety of voice verification.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of a speech recognition device according to the present invention. In this embodiment, the recognition device 30 for spliced voice includes an acquisition module 31, a splicing module 32, a construction module 33, a training module 34, and a recognition module 35.
The acquiring module 31 is configured to acquire normal voice data of a user.
The splicing module 32 is configured to cut the normal voice data into a preset number of segments, and splice the normal voice data cut into the preset number of segments according to a voice disorder to obtain spliced voice data.
The construction module 33 is configured to construct a classification model based on the normal voice data and the spliced voice data.
The training module 34 is configured to train the spliced speech model on the two classification models using a long-term memory network and a convolutional neural network.
The recognition module 35 is configured to recognize the spliced voice from the voice data according to the two classification models trained by the spliced voice model.
Alternatively, the construction module 33 may be specifically configured to:
the method comprises the steps of respectively extracting linear prediction analysis characteristics and pitch characteristics of the normal voice data and the spliced voice data, carrying out differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, and constructing a binary classification model based on the normal voice data and the spliced voice data by taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of a long-short-term memory network and a convolutional neural network.
Optionally, the training module 34 may be specifically configured to:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model of the two classification models by adopting the long-term memory network and the convolutional neural network.
Referring to fig. 4, fig. 4 is a schematic structural diagram of another embodiment of a speech recognition device according to the present invention. Unlike the previous embodiment, the apparatus 40 for recognizing spliced voice according to the present embodiment further includes an updating module 41.
The updating module 41 is configured to perform parameter updating on the long-term and short-term memory networks and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and perform training updating on the two classification models through preset times of iteration on the long-term and short-term memory networks and the convolutional neural network after parameter updating.
The respective unit modules of the recognition device 30/40 for spliced voice can execute the corresponding steps in the above method embodiments, so that the detailed description of the respective unit modules is omitted herein.
The present invention further provides a recognition device for spliced voice, as shown in fig. 5, including: at least one processor 51; and a memory 52 communicatively coupled to the at least one processor 51; the memory 52 stores instructions executable by the at least one processor 51, and the instructions are executed by the at least one processor 51 to enable the at least one processor 51 to perform the above-described method for recognizing spliced speech.
Where the memory 52 and the processor 51 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 51 and the memory 52 together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 51 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 51.
The processor 51 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 52 may be used to store data used by the processor 51 in performing operations.
The present invention further provides a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
It can be found that according to the scheme, normal voice data of a user can be obtained, the normal voice data can be cut into a preset number of segments, the normal voice data cut into the preset number of segments is spliced according to voice disorder to obtain spliced voice data, a two-class model based on the normal voice data and the spliced voice data can be constructed, a long-term memory network and a convolution neural network can be adopted to train the two-class model to splice the voice model, and the voice data can be identified according to the two-class model trained by the spliced voice model, so that the identification of the spliced voice can be realized, and further the safety of voice verification can be ensured.
Further, according to the scheme, the linear prediction analysis feature and the pitch feature of the normal voice data and the spliced voice data can be extracted respectively, the difference operation and the normalization operation are carried out on the linear prediction analysis feature and the pitch feature, the linear prediction analysis feature and the pitch feature after the difference operation and the normalization operation are used as training inputs of a long-short-period memory network and a convolutional neural network, and a classification model based on the normal voice data and the spliced voice data is constructed.
Furthermore, according to the scheme, the acoustic features can be extracted from the two-classification model, the extracted acoustic features are input into the long-term memory network and the convolutional neural network, and the long-term memory network and the convolutional neural network are adopted to train the spliced voice model of the two-classification model.
Furthermore, according to the scheme, parameters of the long-period memory network and the short-period memory network and the convolutional neural network which are adopted by the scheme can be counted more through a loss function and an optimization algorithm of cross entropy loss, and the training and updating of the two classification models are carried out through iteration of preset times by adopting the long-period memory network and the convolutional neural network which are subjected to parameter updating, so that the accuracy rate of recognition of spliced voice can be improved.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only a partial embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent devices or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims (8)

1. A method for recognizing spliced speech, comprising:
acquiring normal voice data of a user;
cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
constructing a classification model based on the normal voice data and the spliced voice data, comprising:
the method comprises the steps of respectively extracting linear prediction analysis characteristics and pitch characteristics of normal voice data and spliced voice data, performing differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, and constructing a binary model based on the normal voice data and the spliced voice data by taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of a long-short-term memory network and a convolutional neural network;
training a spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
2. The method for recognizing a spliced voice according to claim 1, wherein the training of the spliced voice model using the long-short-term memory network and the convolutional neural network comprises:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
3. The method for recognizing a spliced voice according to claim 1, further comprising, after the recognition of the spliced voice from the voice data according to the two classification models trained by the spliced voice model:
and carrying out parameter counting on the long-short-term memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and training and updating the dichotomy model through iteration of preset times by adopting the long-short-term memory network and the convolutional neural network after parameter updating.
4. A spliced voice recognition device, comprising:
the system comprises an acquisition module, a splicing module, a construction module, a training module and an identification module;
the acquisition module is used for acquiring normal voice data of a user;
the splicing module is used for cutting the normal voice data into a preset number of segments, and splicing the normal voice data cut into the preset number of segments according to voice disorder to obtain spliced voice data;
the construction module is used for constructing a classification model based on the normal voice data and the spliced voice data, and is specifically used for:
the method comprises the steps of respectively extracting linear prediction analysis characteristics and pitch characteristics of normal voice data and spliced voice data, performing differential operation and normalization operation on the linear prediction analysis characteristics and the pitch characteristics, and constructing a binary model based on the normal voice data and the spliced voice data by taking the linear prediction analysis characteristics and the pitch characteristics after the differential operation and the normalization operation as training inputs of a long-short-term memory network and a convolutional neural network;
the training module is used for training the spliced voice model of the two classification models by adopting a long-term memory network and a convolution neural network;
and the recognition module is used for recognizing the spliced voice of the voice data according to the two classification models trained by the spliced voice model.
5. The apparatus for recognizing spliced speech according to claim 4, wherein the training module is specifically configured to:
and extracting acoustic features from the two classification models, inputting the extracted acoustic features into a long-term memory network and a convolutional neural network, and training the spliced voice model by adopting the long-term memory network and the convolutional neural network.
6. The spliced voice recognition device of claim 4, wherein the spliced voice recognition device further comprises:
updating a module;
the updating module is used for carrying out parameter updating on the long-period memory network and the convolutional neural network through a loss function of cross entropy loss and an optimization algorithm, and carrying out training updating on the two classification models through iteration of preset times by adopting the long-period memory network and the convolutional neural network after parameter updating.
7. A spliced voice recognition apparatus, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of recognition of spliced speech according to any one of claims 1 to 3.
8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of recognition of spliced speech according to any one of claims 1 to 3.
CN202010002558.9A 2020-01-02 2020-01-02 Method, device and equipment for recognizing spliced voice Active CN111009238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010002558.9A CN111009238B (en) 2020-01-02 2020-01-02 Method, device and equipment for recognizing spliced voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010002558.9A CN111009238B (en) 2020-01-02 2020-01-02 Method, device and equipment for recognizing spliced voice

Publications (2)

Publication Number Publication Date
CN111009238A CN111009238A (en) 2020-04-14
CN111009238B true CN111009238B (en) 2023-06-23

Family

ID=70120411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010002558.9A Active CN111009238B (en) 2020-01-02 2020-01-02 Method, device and equipment for recognizing spliced voice

Country Status (1)

Country Link
CN (1) CN111009238B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111477235B (en) * 2020-04-15 2023-05-05 厦门快商通科技股份有限公司 Voiceprint acquisition method, voiceprint acquisition device and voiceprint acquisition equipment
CN111599351A (en) * 2020-04-30 2020-08-28 厦门快商通科技股份有限公司 Voice recognition method, device and equipment
CN111583946A (en) * 2020-04-30 2020-08-25 厦门快商通科技股份有限公司 Voice signal enhancement method, device and equipment
CN111583947A (en) * 2020-04-30 2020-08-25 厦门快商通科技股份有限公司 Voice enhancement method, device and equipment
CN113516969B (en) * 2021-09-14 2021-12-14 北京远鉴信息技术有限公司 Spliced voice identification method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456345A (en) * 2010-10-19 2012-05-16 盛乐信息技术(上海)有限公司 Concatenated speech detection system and method
CN108288470A (en) * 2017-01-10 2018-07-17 富士通株式会社 Auth method based on vocal print and device
CN109376264A (en) * 2018-11-09 2019-02-22 广州势必可赢网络科技有限公司 A kind of audio-frequency detection, device, equipment and computer readable storage medium
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10276166B2 (en) * 2014-07-22 2019-04-30 Nuance Communications, Inc. Method and apparatus for detecting splicing attacks on a speaker verification system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456345A (en) * 2010-10-19 2012-05-16 盛乐信息技术(上海)有限公司 Concatenated speech detection system and method
CN108288470A (en) * 2017-01-10 2018-07-17 富士通株式会社 Auth method based on vocal print and device
CN109376264A (en) * 2018-11-09 2019-02-22 广州势必可赢网络科技有限公司 A kind of audio-frequency detection, device, equipment and computer readable storage medium
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network

Also Published As

Publication number Publication date
CN111009238A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN111009238B (en) Method, device and equipment for recognizing spliced voice
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN110942763B (en) Speech recognition method and device
CN112380853B (en) Service scene interaction method and device, terminal equipment and storage medium
CN111899759B (en) Method, device, equipment and medium for pre-training and model training of audio data
CN111832318B (en) Single sentence natural language processing method and device, computer equipment and readable storage medium
CN113283238B (en) Text data processing method and device, electronic equipment and storage medium
CN113192497B (en) Speech recognition method, device, equipment and medium based on natural language processing
CN111582341B (en) User abnormal operation prediction method and device
CN109462482A (en) Method for recognizing sound-groove, device, electronic equipment and computer readable storage medium
CN113362852A (en) User attribute identification method and device
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN110428816B (en) Method and device for training and sharing voice cell bank
CN116702736A (en) Safe call generation method and device, electronic equipment and storage medium
CN114758645A (en) Training method, device and equipment of speech synthesis model and storage medium
CN112669836B (en) Command recognition method and device and computer readable storage medium
CN111128234B (en) Spliced voice recognition detection method, device and equipment
CN117912455A (en) Land-air communication voice conversion method and device, terminal equipment and storage medium
CN114706943A (en) Intention recognition method, apparatus, device and medium
CN114925159A (en) User emotion analysis model training method and device, electronic equipment and storage medium
CN111199750B (en) Pronunciation evaluation method and device, electronic equipment and storage medium
CN116150324A (en) Training method, device, equipment and medium of dialogue model
CN111179912A (en) Detection method, device and equipment for spliced voice
CN111477235B (en) Voiceprint acquisition method, voiceprint acquisition device and voiceprint acquisition equipment
CN111599351A (en) Voice recognition method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant