CN111640454A - Spectrogram matching method, device and equipment and computer readable storage medium - Google Patents
Spectrogram matching method, device and equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN111640454A CN111640454A CN202010405211.9A CN202010405211A CN111640454A CN 111640454 A CN111640454 A CN 111640454A CN 202010405211 A CN202010405211 A CN 202010405211A CN 111640454 A CN111640454 A CN 111640454A
- Authority
- CN
- China
- Prior art keywords
- sample
- spectrogram
- phoneme
- information
- acquiring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 239000013598 vector Substances 0.000 claims description 39
- 238000002372 labelling Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 description 20
- 238000012545 processing Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000012937 correction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a spectrogram matching method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a sample spectrogram and acquiring a sample spectrogram; acquiring sample phoneme information of a target phoneme in the sample spectrogram; and calculating phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity. The invention can realize intelligent matching of the spectrogram and improve the matching efficiency of the spectrogram.
Description
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a spectrogram matching method, apparatus, device, and computer-readable storage medium.
Background
At present, with the continuous development of society, the voice processing technology is also gradually applied to various fields, a spectrogram is a common voice data expression mode, and the spectrogram is often required to be used in the voice processing process, and the voice recognition, the identity recognition and other processing are performed through the matching between the spectrogram.
The traditional spectrogram matching method judges the matching condition between spectrograms by manually comparing differences, but the matching method is time-consuming and low in matching efficiency.
Disclosure of Invention
The invention mainly aims to provide a spectrogram matching method, a spectrogram matching device, spectrogram matching equipment and a computer-readable storage medium, and aims to realize intelligent matching of spectrograms and improve matching efficiency of the spectrograms.
In order to achieve the above object, an embodiment of the present invention provides a spectrogram matching method, including:
acquiring a sample spectrogram and acquiring a sample spectrogram;
acquiring sample phoneme information of a target phoneme in the sample spectrogram;
and calculating phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
Optionally, before the step of acquiring the phoneme information of the target phoneme in the spectrogram of the specimen, the method further includes:
correcting the sample spectrogram to obtain a corrected sample spectrogram;
the step of acquiring the sample phoneme information of the target phoneme in the sample spectrogram comprises:
and acquiring the sample phoneme information of the target phoneme in the corrected sample spectrogram.
Optionally, the step of correcting the sample spectrogram to obtain a corrected sample spectrogram includes:
acquiring amplitude curve information corresponding to each sample phoneme in the sample spectrogram, and calculating to obtain corresponding deviation information according to the amplitude curve information;
and correcting the sample spectrogram based on the deviation information to obtain a corrected sample spectrogram.
Optionally, before the step of acquiring the phoneme information of the target phoneme in the spectrogram of the specimen, the method further includes:
detecting whether peaks and troughs in the sample spectrogram are labeled or not;
if the peaks and the troughs in the sample spectrogram are labeled, executing the following steps: acquiring sample phoneme information of a target phoneme in the sample spectrogram;
if the wave crest and the wave trough in the sample frequency spectrogram are not labeled, labeling the wave crest and the wave trough in the sample frequency spectrogram to obtain a labeled sample frequency spectrogram;
the step of acquiring the sample phoneme information of the target phoneme in the sample spectrogram comprises:
and acquiring the sample phoneme information of the target phoneme in the labeled sample spectrogram.
Optionally, the step of acquiring a sample spectrogram comprises:
acquiring a sample audio, and converting the sample audio into a sample spectrogram based on a preset rule;
and acquiring a corresponding sample audio from a preset sample database according to the sample audio, and acquiring a sample spectrogram corresponding to the sample audio.
Optionally, the step of calculating a phoneme similarity according to the sample phoneme information and the sample phoneme information includes:
converting the sample phoneme information into corresponding sample phoneme vectors;
and calculating the vector similarity of the sample phoneme vector and the sample phoneme vector, and determining the phoneme similarity according to the vector similarity.
Optionally, after the step of calculating a phoneme similarity according to the sample phoneme information and the phoneme information, and determining a spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity, the method further includes:
judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
and if the matching degree of the frequency spectrogram is larger than a preset threshold value, acquiring sample identity information corresponding to the sample frequency spectrogram, and determining the sample identity information of the sample frequency spectrogram according to the sample identity information.
In addition, to achieve the above object, an embodiment of the present invention further provides a spectrogram matching apparatus, including:
the first acquisition module is used for acquiring a sample spectrogram and acquiring a sample spectrogram;
a second obtaining module, configured to obtain sample phoneme information of a target phoneme in the sample spectrogram, and obtain sample phoneme information of the target phoneme in the sample spectrogram;
and the matching degree determining module is used for calculating the phoneme similarity according to the sample phoneme information and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
In addition, to achieve the above object, an embodiment of the present invention further provides a spectrogram matching apparatus, which includes a processor, a memory, and a spectrogram matching program stored on the memory and executable by the processor, wherein when the spectrogram matching program is executed by the processor, the steps of the spectrogram matching method are implemented as described above.
In addition, to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, on which a spectrogram matching program is stored, wherein when the spectrogram matching program is executed by a processor, the steps of the spectrogram matching method are implemented as described above.
The invention provides a spectrogram matching method, a device, equipment and a computer-readable storage medium, which are used for acquiring a sample spectrogram and acquiring a sample spectrogram; acquiring sample phoneme information of a target phoneme in the sample spectrogram; and calculating phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity. Through the mode, the sample phoneme information of the target phoneme in the sample spectrogram is obtained, and then the similarity calculation is performed according to the sample phoneme information and the sample phoneme information to determine the calculated spectrogram matching degree, so that the intelligent matching of the spectrogram is realized.
Drawings
Fig. 1 is a schematic hardware configuration diagram of a spectrogram matching apparatus according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a spectrogram matching method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a spectrogram matching method according to a second embodiment of the present invention;
fig. 4 is a functional block diagram of a spectrogram matching apparatus according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The spectrogram matching method according to the embodiment of the present invention is mainly applied to spectrogram matching equipment, which may be equipment with a data processing function, such as a Personal Computer (PC), a notebook Computer, and a mobile terminal.
Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a spectrogram matching apparatus according to an embodiment of the present invention. In this embodiment of the present invention, the spectrogram matching apparatus may include a processor 1001 (e.g., a central processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WI-FI interface, WI-FI interface); the memory 1005 may be a Random Access Memory (RAM) or a non-volatile memory (non-volatile memory), such as a magnetic disk memory, and the memory 1005 may optionally be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration depicted in FIG. 1 is not intended to be limiting of the present invention, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
With continued reference to fig. 1, the memory 1005 of fig. 1, which is a computer-readable storage medium, may include an operating system, a network communication module, and a spectrogram matching program. In fig. 1, the network communication module may be configured to connect to a preset database, and perform data communication with the database; the processor 1001 may call a spectrogram matching program stored in the memory 1005, and execute the spectrogram matching method according to the embodiment of the present invention.
Based on the hardware architecture, embodiments of the spectrogram matching method of the present invention are provided.
The embodiment of the invention provides a spectrogram matching method.
Referring to fig. 2, fig. 2 is a flowchart illustrating a spectrogram matching method according to a first embodiment of the present invention.
In this embodiment, the spectrogram matching method includes the following steps:
step S10, acquiring a sample spectrogram and acquiring a sample spectrogram;
the spectrogram is a common voice data expression, and the spectrogram is often used in the voice processing process, and the voice recognition, the identity recognition and other processing are performed through matching between the spectrogram. In the spectrogram, the x-axis represents frequency and the y-axis represents amplitude. The traditional spectrogram matching method judges the matching condition between spectrograms by manually comparing differences, but the matching method is time-consuming and low in matching efficiency. In view of the above, this embodiment provides a spectrogram matching method, which includes acquiring sample phoneme information of a target phoneme in a sample spectrogram, acquiring sample phoneme information of the target phoneme in the sample spectrogram, and performing similarity calculation according to the sample phoneme information and the sample phoneme information to determine a calculated spectrogram matching degree, so as to implement intelligent matching of the spectrogram.
The spectrogram matching method in this embodiment is implemented by spectrogram matching equipment, which may be a personal computer, a notebook computer, a mobile terminal (e.g., a mobile phone), and the like, and in this embodiment, a computer is taken as an example for description. In this embodiment, the computer first acquires a sample spectrogram, which can be regarded as a target object that needs to be processed currently. Secondly, a database of the computer stores a plurality of sample spectrograms in advance, the sample spectrograms can be considered to be collected in advance, when the matching process is started, the sample spectrograms are obtained from the database, and the subsequent matching processing is carried out on the sample spectrograms and the sample spectrograms. Of course, the sample spectrogram may also be considered to be obtained by converting similar sample audio obtained by matching from a preset sample database through some way (e.g., voiceprint recognition) based on a sample voice corresponding to the sample spectrogram.
Specifically, step S10 includes:
a1, acquiring a sample audio, and converting the sample audio into a sample spectrogram based on a preset rule;
step a2, obtaining a corresponding sample audio from a preset sample database according to the sample audio, and obtaining a sample spectrogram corresponding to the sample audio.
It should be noted that, in this embodiment, the computer only acquires the sample audio in the initial stage, and in order to perform the subsequent matching processing, the computer needs to perform corresponding conversion processing on the sample audio based on a certain preset rule when the sample audio is acquired, so as to obtain a corresponding sample spectrogram. During the transformation, the transformation may be obtained based on a fourier transform rule, and certainly, other software or programs may be used for the transformation, and the specific transformation method may refer to the prior art and is not described herein again. In this way, even if a piece of audio is obtained, for example, a speech recording of a certain person is obtained, the audio can be converted into a sample spectrogram and then processed.
For the acquisition of the sample spectrogram, a corresponding sample audio may be acquired from a preset sample database according to the sample audio, and then the sample spectrogram corresponding to the sample audio is acquired. For the acquisition of the sample audio, the sample audio which is similar to the sample audio can be obtained based on the modes of voiceprint recognition, voice feature matching and the like. By the method, in the practical application process of identity matching through spectrogram matching, the sample spectrogram needing to be matched can be reduced, so that the identity matching efficiency is improved. In addition, the acquisition of the sample spectrogram can be obtained by converting the sample audio or directly obtained by converting the sample audio in advance.
Step S20, acquiring sample phoneme information of a target phoneme in the sample spectrogram, and acquiring sample phoneme information of the target phoneme in the sample spectrogram;
in this embodiment, when obtaining the sample spectrogram and the sample spectrogram, sample phoneme information of a target phoneme in the sample spectrogram is obtained, and sample phoneme information of the target phoneme in the sample spectrogram is obtained. The target phoneme can be preset by a system or selected by a user, and the target phoneme can include one or more (namely more than two); the corpus phoneme information and the sample phoneme information may include, but are not limited to, frequency, amplitude, peak value, valley value, and period.
Specifically, the acquisition process of the sample phoneme information and the sample phoneme information is as follows:
respectively labeling phoneme points corresponding to target phonemes in the sample spectrogram and the sample spectrogram to obtain sample phoneme points and sample phoneme points, wherein the sample phoneme points are recorded in the sample spectrogram, so that a computer can acquire sample phoneme information, such as frequency, amplitude, peak value, valley value, period and the like, corresponding to the sample phoneme points through the sample spectrogram; similarly, since the sample phoneme points are recorded in the sample spectrogram, the computer may also obtain sample phoneme information corresponding to each sample phoneme point through the sample spectrogram, such as frequency, amplitude, peak value, valley value, and period.
It is understood that, in a piece of audio, the same target phoneme may appear one or more times, and correspondingly, the same target phoneme may correspond to one or more points on the spectrogram, so that, when obtaining the sample phoneme information, an average value calculation may be performed on the sample phoneme information of the plurality of phoneme points corresponding to the same target phoneme to obtain the final sample phoneme information. For example, the phoneme r includes three sample phoneme points 1, 2, and 3 in the sample spectrogram, and the amplitude values a at which the sample phoneme points 1, 2, and 3 are respectively obtainedr1、Ar2、Ar3Thereafter, an average calculation can be performed to obtain Ar=(Ar1+Ar2+Ar3)/3。
Step S30, calculating a phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining a spectrogram matching degree between the sample spectrogram and the sample spectrogram according to the phoneme similarity.
In this embodiment, when obtaining the sample phoneme information and the sample phoneme information, the computer may calculate to obtain the phoneme similarity according to the sample phoneme information and the sample phoneme information; the phoneme similarity can be regarded as the representation of the pronunciation difference degree of the sample spectrogram and the sample spectrogram to the same phoneme; the larger the phoneme similarity is, the smaller the pronunciation difference of the sample spectrogram and the sample spectrogram to the same phoneme is; the smaller the phoneme similarity is, the larger the pronunciation difference of the same phoneme between the sample spectrogram and the sample spectrogram is.
Specifically, the step of calculating the phoneme similarity according to the sample phoneme information and the sample phoneme information includes:
a step b1 of converting the sample phoneme information into a corresponding sample phoneme vector;
in this embodiment, for the process of calculating the phoneme similarity, the computer first converts the sample phoneme information into a corresponding sample phoneme vector, and converts the sample phoneme information into a corresponding sample phoneme vector. Wherein, the conversion rules of the two should be consistent; specifically, for phoneme information in a vector, including a, b, and c, and the order of arrangement is a, b, and c, first, three types of attributes a, b, and c in sample phoneme information are taken out, mapped to corresponding attribute values according to a certain numerical mapping relationship, and then sorted according to the order of a, b, and c, so as to obtain sample phoneme vectors corresponding to the sample phoneme information; similarly, sample phoneme information may be obtained for conversion into corresponding sample phoneme vectors. The phoneme information type, the sorting order of each phoneme information, and the numerical mapping relation between the attributes and the numerical values included in the vector can be set according to actual conditions.
And b2, calculating the vector similarity of the sample phoneme vector and the sample phoneme vector, and determining the phoneme similarity according to the vector similarity.
When the sample phoneme vector and the sample phoneme vector are obtained, the computer may calculate a vector similarity between the sample phoneme vector and the sample phoneme vector, and then determine the phoneme similarity according to the vector similarity, for example, the vector similarity may be directly used as the phoneme similarity, or a certain linear transformation process may be performed. The vector similarity between the sample phoneme vector and the sample phoneme vector may be calculated by using different formulas according to actual needs, for example, the vector similarity may be calculated based on a remainder similarity formula, or calculated based on an euclidean distance formula or a chebyshev distance formula. Through the method, the phoneme information of the sample and the phoneme information of the sample are respectively converted into vectors and are used for carrying out similarity calculation, so that the similarity between quantized representation phonemes is convenient for carrying out quantitative description on the matching relation between frequency spectrogram images.
When the phoneme similarity is obtained through calculation, the phoneme similarity can be regarded as a representation of the pronunciation difference degree of the sample spectrogram and the sample spectrogram to the same phoneme, so that the spectrogram matching degree of the sample spectrogram and the sample spectrogram can be determined according to the phoneme similarity, and the matching relation between the spectrograms is quantitatively described according to the phoneme information of the same phoneme in different spectrograms. When determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity, the phoneme similarity may be used as the spectrogram matching degree; or carrying out certain linear transformation processing on the phoneme similarity to obtain the spectrogram matching degree; or setting different similarity ranges in advance, wherein the different similarity ranges correspond to different spectrogram matching degrees, and then determining the corresponding spectrogram matching degree according to the similarity range in which the phoneme similarity is located.
The embodiment of the invention provides a spectrogram matching method, which comprises the steps of obtaining a sample spectrogram and obtaining a sample spectrogram; acquiring sample phoneme information of a target phoneme in the sample spectrogram; and calculating phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity. Through the manner, the embodiment of the invention can improve the matching efficiency of the spectrogram by acquiring the sample phoneme information of the target phoneme in the sample spectrogram, and then performing similarity calculation according to the sample phoneme information and the sample phoneme information to determine and obtain the spectrogram matching degree through calculation, thereby realizing the intelligent matching of the spectrogram.
Based on the first embodiment of the spectrogram matching method, a second embodiment of the spectrogram matching method of the present invention is provided.
Referring to fig. 3, fig. 3 is a flowchart illustrating a spectrogram matching method according to a second embodiment of the present invention.
In this embodiment, before the step S20, the method further includes:
step S40, correcting the sample spectrogram to obtain a corrected sample spectrogram;
in this embodiment, because the sample audio is affected by the collection environment, the generated spectrogram may be irregular and have a deviation, so as to affect the accuracy of the spectrogram matching result, in this embodiment, after the sample spectrogram is obtained, the sample spectrogram is corrected first, so that the sample spectrogram in the subsequent matching process is regular, thereby avoiding affecting the accuracy of the spectrogram matching result.
Specifically, after the sample spectrogram is acquired, the sample spectrogram is corrected to obtain a corrected sample spectrogram.
Specifically, step S40 includes:
step c1, acquiring amplitude curve information corresponding to each sample phoneme in the sample spectrogram, and calculating to obtain corresponding deviation information according to the amplitude curve information;
and c2, correcting the sample spectrogram based on the deviation information to obtain a corrected sample spectrogram.
The correction process of the sample spectrogram is as follows: the amplitude curve information corresponding to each sample phoneme in the sample spectrogram may be obtained first, where the amplitude curve information may include a peak frequency corresponding to a peak, a trough frequency corresponding to a trough, a center point frequency corresponding to a center point value, a peak value, and a trough value. And then, calculating corresponding deviation information according to the amplitude curve information, wherein the deviation information comprises horizontal axis deviation information and vertical axis deviation information. Specifically, the average frequency corresponding to each sample phoneme may be calculated in the manner described in the first embodiment, and then, the corresponding lateral deviation information may be determined by comparing whether the difference between the peak frequency and the center point frequency of each phoneme point is different from the corresponding average frequency, and whether the difference between the trough frequency and the center point frequency of each phoneme point is different from the corresponding average frequency. Next, the average amplitude corresponding to each sample phoneme can be calculated in the manner described in the first embodiment, and the corresponding longitudinal deviation information can be determined by comparing whether the absolute value of the peak value or the trough value of each phoneme point is different from the corresponding average amplitude. And finally, correcting the sample spectrogram based on the deviation information to obtain a corrected sample spectrogram.
At this time, step S20 includes:
and acquiring sample phoneme information of the target phoneme in the corrected sample spectrogram.
Then, the sample phoneme information of the target phoneme in the corrected sample spectrogram is obtained, and the sample phoneme information of the target phoneme in the sample spectrogram is obtained, so as to perform the subsequent steps.
In this embodiment, after the sample spectrogram is acquired, the sample spectrogram is corrected first, so that the sample spectrogram in the subsequent matching process is standard, thereby avoiding affecting the accuracy of the spectrogram matching result.
Based on the first embodiment of the spectrogram matching method, a third embodiment of the spectrogram matching method of the present invention is provided.
In this embodiment, before the step S20, the method further includes:
step A, detecting whether peaks and troughs in the sample spectrogram are labeled or not;
in this embodiment, in order to facilitate the acquisition of the phoneme information of the subsequent sample, after the sample spectrogram is acquired, whether the peaks and troughs in the sample spectrogram are labeled may be detected first.
If the peaks and troughs in the sample spectrogram have been labeled, execute step S20: acquiring sample phoneme information of a target phoneme in the sample spectrogram;
if the peaks and troughs in the sample spectrogram have been labeled, obtaining sample phoneme information of the target phonemes in the sample spectrogram, and then performing subsequent steps.
If the peaks and the troughs in the sample spectrogram are not labeled, executing the step B: marking peaks and troughs in the sample spectrogram to obtain a marked sample spectrogram;
at this time, step S20 includes:
and acquiring sample phoneme information of a target phoneme in the labeled sample spectrogram, and acquiring sample phoneme information of the target phoneme in the sample spectrogram.
And if the wave crests and the wave troughs in the sample frequency spectrogram are not labeled, labeling the wave crests and the wave troughs in the sample frequency spectrogram to obtain the labeled sample frequency spectrogram. The specific labeling process may be: and sequentially acquiring the vertical coordinates corresponding to each time point according to time through a computer, and comparing the vertical coordinates of adjacent moments. When it is detected that the ordinate y1 at the time t1 is smaller than the ordinate y2 at the time t2 and the ordinate y2 at the time t2 is larger than the ordinate y3 at the time t3, it is determined that the point corresponding to the time t2 is a peak; when it is detected that the ordinate y4 at the time t4 is greater than the ordinate y5 at the time t6, and the ordinate y5 at the time t5 is less than the ordinate y6 at the time t6, it is determined that the point corresponding to the time t5 is a trough; and marking the wave crest and the wave trough according to the detection result. Of course, it can be understood that, in a specific embodiment, the sample frequency spectrogram may also be sent to a corresponding working end, so that a worker manually marks peaks and troughs in the sample frequency spectrogram, and then receives the marked sample frequency spectrogram returned by the working end. Then, the sample phoneme information of the target phoneme in the labeled sample spectrogram is obtained, and the sample phoneme information of the target phoneme in the sample spectrogram is obtained, so as to perform the subsequent steps.
Based on the embodiments of the spectrogram matching method, a fourth embodiment of the spectrogram matching method of the present invention is provided.
In this embodiment, after step S30, the method further includes:
step C, judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
the spectrogram matching process in this embodiment may be applied to a voice identity recognition process, that is, matching a sample spectrogram with a sample spectrogram, so as to determine a sample identity corresponding to the sample spectrogram according to a spectrogram matching degree. Specifically, when the spectrogram matching degree of the sample spectrogram and the spectrogram of the sample spectrogram is obtained, the computer may compare the spectrogram matching degree with a preset threshold value, and determine whether the spectrogram matching degree is greater than the preset threshold value.
And D, if the matching degree of the frequency spectrogram is larger than a preset threshold value, acquiring sample identity information corresponding to the sample frequency spectrogram, and determining the sample identity information of the sample frequency spectrogram according to the sample identity information.
In this embodiment, if the matching degree of the spectrogram is smaller than or equal to the preset threshold, it may be determined that the sample spectrogram has a lower matching degree, and the two do not belong to the same identity; if the matching degree of the spectrogram is greater than the preset threshold, the matching degree of the sample spectrogram by the sample spectrogram is considered to be higher, the sample spectrogram and the sample spectrogram belong to the same identity, and the computer can acquire sample identity information corresponding to the sample spectrogram and determine the sample identity information of the sample spectrogram according to the sample identity information, namely determine the sample identity corresponding to the sample spectrogram.
Through the above manner, the spectrogram matching process of the embodiment can be applied to the voice identity recognition process, and if the spectrogram matching degree is greater than the preset threshold, it can be considered that the sample spectrogram has a higher matching degree, and the sample spectrogram belong to the same identity, at this time, the computer can obtain sample identity information corresponding to the sample spectrogram, and determine the sample identity information of the sample spectrogram according to the sample identity information, thereby implementing voice identity recognition.
In addition, the embodiment of the invention also provides a spectrogram matching device.
Referring to fig. 4, fig. 4 is a functional block diagram of a spectrogram matching apparatus according to a first embodiment of the present invention.
In this embodiment, the spectrogram matching apparatus includes:
a first obtaining module 10, configured to obtain a sample spectrogram and obtain a sample spectrogram;
a second obtaining module 20, configured to obtain sample phoneme information of a target phoneme in the sample spectrogram, and obtain sample phoneme information of the target phoneme in the sample spectrogram;
and the matching degree determining module 30 is configured to calculate a phoneme similarity according to the sample phoneme information and the sample phoneme information, and determine a spectrogram matching degree between the sample spectrogram and the sample spectrogram according to the phoneme similarity.
Each virtual function module of the spectrogram matching apparatus is stored in the memory 1005 of the spectrogram matching device shown in fig. 1, and is used for implementing all functions of a spectrogram matching program; when executed by the processor 1001, the modules may perform spectrogram matching functions.
Further, the spectrogram matching apparatus further comprises:
the correcting module is used for correcting the sample spectrogram to obtain a corrected sample spectrogram;
the second obtaining module 20 is specifically configured to: and acquiring the sample phoneme information of the target phoneme in the corrected sample spectrogram.
Further, the correction module includes:
the first calculating unit is used for acquiring amplitude curve information corresponding to each sample phoneme in the sample spectrogram and calculating corresponding deviation information according to the amplitude curve information;
and the correcting unit is used for correcting the sample spectrogram based on the deviation information to obtain a corrected sample spectrogram.
Further, the spectrogram matching apparatus further comprises:
the detection module is used for detecting whether the wave crests and the wave troughs in the sample spectrogram are labeled or not;
the second obtaining module 20 is specifically configured to: if the wave crests and the wave troughs in the sample spectrogram are labeled, obtaining sample phoneme information of target phonemes in the sample spectrogram;
the labeling module is used for labeling the wave crests and the wave troughs in the sample spectrogram if the wave crests and the wave troughs in the sample spectrogram are not labeled, so as to obtain a labeled sample spectrogram;
the second obtaining module 20 is specifically configured to: and acquiring the sample phoneme information of the target phoneme in the labeled sample spectrogram.
Further, the first obtaining module 10 includes:
the conversion unit is used for acquiring a sample audio and converting the sample audio into a sample spectrogram based on a preset rule;
and the acquisition unit is used for acquiring a corresponding sample audio from a preset sample database according to the sample audio and acquiring a sample spectrogram corresponding to the sample audio.
Further, the matching degree determination module 30 includes:
a vector conversion unit, configured to convert the specimen phoneme information into a corresponding specimen phoneme vector, and convert the sample phoneme information into a corresponding sample phoneme vector;
and the second calculating unit is used for calculating the vector similarity of the sample phoneme vector and determining the phoneme similarity according to the vector similarity.
Further, the spectrogram matching apparatus further comprises:
the matching degree judging module is used for judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
and the information determining module is used for acquiring sample identity information corresponding to the sample spectrogram if the spectrogram matching degree is greater than a preset threshold value, and determining the sample identity information of the sample spectrogram according to the sample identity information.
The function implementation of each module in the spectrogram matching apparatus corresponds to each step in the spectrogram matching method embodiment, and the function and implementation process thereof are not described in detail herein.
In addition, the embodiment of the invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention stores a spectrogram matching program, wherein the spectrogram matching program, when executed by a processor, implements the steps of the spectrogram matching method as described above.
The method implemented when the spectrogram matching program is executed may refer to various embodiments of the spectrogram matching method of the present invention, which are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A spectrogram matching method, comprising:
acquiring a sample spectrogram and acquiring a sample spectrogram;
acquiring sample phoneme information of a target phoneme in the sample spectrogram;
and calculating phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
2. The spectrogram matching method of claim 1, wherein said step of obtaining the phoneme information of the target phoneme in the spectrogram of the specimen further comprises:
correcting the sample spectrogram to obtain a corrected sample spectrogram;
the step of acquiring the sample phoneme information of the target phoneme in the sample spectrogram comprises:
and acquiring the sample phoneme information of the target phoneme in the corrected sample spectrogram.
3. The spectrogram matching method of claim 2, wherein said step of correcting said sample spectrogram to obtain a corrected sample spectrogram comprises:
acquiring amplitude curve information corresponding to each sample phoneme in the sample spectrogram, and calculating to obtain corresponding deviation information according to the amplitude curve information;
and correcting the sample spectrogram based on the deviation information to obtain a corrected sample spectrogram.
4. The spectrogram matching method of claim 1, wherein said step of obtaining the phoneme information of the target phoneme in the spectrogram of the specimen further comprises:
detecting whether peaks and troughs in the sample spectrogram are labeled or not;
if the peaks and the troughs in the sample spectrogram are labeled, executing the following steps: acquiring sample phoneme information of a target phoneme in the sample spectrogram;
if the wave crest and the wave trough in the sample frequency spectrogram are not labeled, labeling the wave crest and the wave trough in the sample frequency spectrogram to obtain a labeled sample frequency spectrogram;
the step of acquiring the sample phoneme information of the target phoneme in the sample spectrogram comprises:
and acquiring the sample phoneme information of the target phoneme in the labeled sample spectrogram.
5. The spectrogram matching method of claim 1, wherein the step of acquiring a sample spectrogram comprises:
acquiring a sample audio, and converting the sample audio into a sample spectrogram based on a preset rule;
and acquiring a corresponding sample audio from a preset sample database according to the sample audio, and acquiring a sample spectrogram corresponding to the sample audio.
6. The spectrogram matching method of claim 1, wherein said step of calculating phoneme similarity from said sample phoneme information and said sample phoneme information comprises:
converting the sample phoneme information into corresponding sample phoneme vectors;
and calculating the vector similarity of the sample phoneme vector and the sample phoneme vector, and determining the phoneme similarity according to the vector similarity.
7. The spectrogram matching method of any one of claims 1 to 6, wherein after the step of calculating a phoneme similarity according to the sample phoneme information and the sample phoneme information, and determining the spectrogram matching degree of the sample spectrogram and the spectrogram of the spectrogram, according to the phoneme similarity, further comprises:
judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
and if the matching degree of the frequency spectrogram is larger than a preset threshold value, acquiring sample identity information corresponding to the sample frequency spectrogram, and determining the sample identity information of the sample frequency spectrogram according to the sample identity information.
8. A spectrogram matching apparatus, comprising:
the first acquisition module is used for acquiring a sample spectrogram and acquiring a sample spectrogram;
a second obtaining module, configured to obtain sample phoneme information of a target phoneme in the sample spectrogram, and obtain sample phoneme information of the target phoneme in the sample spectrogram;
and the matching degree determining module is used for calculating the phoneme similarity according to the sample phoneme information and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
9. A spectrogram matching apparatus comprising a processor, a memory, and a spectrogram matching program stored on the memory and executable by the processor, wherein the spectrogram matching program, when executed by the processor, implements the steps of the spectrogram matching method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a spectrogram matching program is stored, wherein the spectrogram matching program, when executed by a processor, implements the steps of the spectrogram matching method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010405211.9A CN111640454B (en) | 2020-05-13 | 2020-05-13 | Spectrogram matching method, device, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010405211.9A CN111640454B (en) | 2020-05-13 | 2020-05-13 | Spectrogram matching method, device, equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111640454A true CN111640454A (en) | 2020-09-08 |
CN111640454B CN111640454B (en) | 2023-08-11 |
Family
ID=72333212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010405211.9A Active CN111640454B (en) | 2020-05-13 | 2020-05-13 | Spectrogram matching method, device, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111640454B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114267375A (en) * | 2021-11-24 | 2022-04-01 | 北京百度网讯科技有限公司 | Phoneme detection method and device, training method and device, equipment and medium |
CN114429770A (en) * | 2022-04-06 | 2022-05-03 | 北京普太科技有限公司 | Sound data testing method and device of tested equipment |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102664017A (en) * | 2012-04-25 | 2012-09-12 | 武汉大学 | Three-dimensional (3D) audio quality objective evaluation method |
CN105355212A (en) * | 2015-10-14 | 2016-02-24 | 天津大学 | Firm underdetermined blind separation source number and hybrid matrix estimating method and device |
CN109448707A (en) * | 2018-12-18 | 2019-03-08 | 北京嘉楠捷思信息技术有限公司 | Voice recognition method and device, equipment and medium |
CN109817223A (en) * | 2019-01-29 | 2019-05-28 | 广州势必可赢网络科技有限公司 | Phoneme marking method and device based on audio fingerprints |
CN110189749A (en) * | 2019-06-06 | 2019-08-30 | 四川大学 | Voice keyword automatic identifying method |
CN110223673A (en) * | 2019-06-21 | 2019-09-10 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of voice, storage medium, electronic equipment |
CN110634490A (en) * | 2019-10-17 | 2019-12-31 | 广州国音智能科技有限公司 | Voiceprint identification method, device and equipment |
CN111063342A (en) * | 2020-01-02 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Speech recognition method, speech recognition device, computer equipment and storage medium |
CN111108552A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identity identification method and related device |
CN111133508A (en) * | 2019-12-24 | 2020-05-08 | 广州国音智能科技有限公司 | Method and device for selecting comparison phonemes |
-
2020
- 2020-05-13 CN CN202010405211.9A patent/CN111640454B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102664017A (en) * | 2012-04-25 | 2012-09-12 | 武汉大学 | Three-dimensional (3D) audio quality objective evaluation method |
CN105355212A (en) * | 2015-10-14 | 2016-02-24 | 天津大学 | Firm underdetermined blind separation source number and hybrid matrix estimating method and device |
CN109448707A (en) * | 2018-12-18 | 2019-03-08 | 北京嘉楠捷思信息技术有限公司 | Voice recognition method and device, equipment and medium |
CN109817223A (en) * | 2019-01-29 | 2019-05-28 | 广州势必可赢网络科技有限公司 | Phoneme marking method and device based on audio fingerprints |
CN110189749A (en) * | 2019-06-06 | 2019-08-30 | 四川大学 | Voice keyword automatic identifying method |
CN110223673A (en) * | 2019-06-21 | 2019-09-10 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of voice, storage medium, electronic equipment |
CN110634490A (en) * | 2019-10-17 | 2019-12-31 | 广州国音智能科技有限公司 | Voiceprint identification method, device and equipment |
CN111108552A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint identity identification method and related device |
CN111133508A (en) * | 2019-12-24 | 2020-05-08 | 广州国音智能科技有限公司 | Method and device for selecting comparison phonemes |
CN111063342A (en) * | 2020-01-02 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Speech recognition method, speech recognition device, computer equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114267375A (en) * | 2021-11-24 | 2022-04-01 | 北京百度网讯科技有限公司 | Phoneme detection method and device, training method and device, equipment and medium |
CN114429770A (en) * | 2022-04-06 | 2022-05-03 | 北京普太科技有限公司 | Sound data testing method and device of tested equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111640454B (en) | 2023-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111046133B (en) | Question and answer method, equipment, storage medium and device based on mapping knowledge base | |
WO2021174717A1 (en) | Text intent recognition method and apparatus, computer device and storage medium | |
TWI752455B (en) | Image classification model training method, image processing method, data classification model training method, data processing method, computer device, and storage medium | |
JP2021526242A (en) | Insurance recording quality inspection methods, equipment, equipment and computer storage media | |
CN111640453B (en) | Spectrogram matching method, device, equipment and computer readable storage medium | |
CN111259625A (en) | Intention recognition method, device, equipment and computer readable storage medium | |
CN108038208B (en) | Training method and device of context information recognition model and storage medium | |
CN111640454B (en) | Spectrogram matching method, device, equipment and computer readable storage medium | |
CN110750991B (en) | Entity identification method, device, equipment and computer readable storage medium | |
CN111626346A (en) | Data classification method, device, storage medium and device | |
CN110164417B (en) | Language vector obtaining and language identification method and related device | |
US6718306B1 (en) | Speech collating apparatus and speech collating method | |
CN111353484A (en) | Image character recognition method, device, equipment and readable storage medium | |
JPWO2020003413A1 (en) | Information processing equipment, control methods, and programs | |
CN112966964A (en) | Product matching method, device, equipment and storage medium based on design requirements | |
CN111640450A (en) | Multi-person audio processing method, device, equipment and readable storage medium | |
CN111640421A (en) | Voice comparison method, device, equipment and computer readable storage medium | |
CN111143001A (en) | Language detection method of terminal, user equipment, storage medium and device | |
CN110738126A (en) | Lip shearing method, device and equipment based on coordinate transformation and storage medium | |
CN115859065A (en) | Model evaluation method, device, equipment and storage medium | |
CN115881108A (en) | Voice recognition method, device, equipment and storage medium | |
CN115019788A (en) | Voice interaction method, system, terminal equipment and storage medium | |
CN110134909B (en) | Curved surface drawing method, equipment, storage medium and device | |
CN111353867A (en) | Learning rate adjusting method, device, equipment and readable storage medium | |
CN112351304A (en) | Intelligent large screen control method, device, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |