CN111640453A - Speech spectrum matching method, device and equipment and computer readable storage medium - Google Patents

Speech spectrum matching method, device and equipment and computer readable storage medium Download PDF

Info

Publication number
CN111640453A
CN111640453A CN202010405173.7A CN202010405173A CN111640453A CN 111640453 A CN111640453 A CN 111640453A CN 202010405173 A CN202010405173 A CN 202010405173A CN 111640453 A CN111640453 A CN 111640453A
Authority
CN
China
Prior art keywords
sample
phoneme
spectrogram
points
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010405173.7A
Other languages
Chinese (zh)
Other versions
CN111640453B (en
Inventor
郑琳琳
龙洪锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Intelligent Technology Co ltd
Original Assignee
Guangzhou Speakin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Intelligent Technology Co ltd filed Critical Guangzhou Speakin Intelligent Technology Co ltd
Priority to CN202010405173.7A priority Critical patent/CN111640453B/en
Publication of CN111640453A publication Critical patent/CN111640453A/en
Application granted granted Critical
Publication of CN111640453B publication Critical patent/CN111640453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a spectrogram matching method, a device, equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring a sample spectrogram and acquiring a sample spectrogram; when a phoneme selection instruction is received, determining corresponding sample phoneme points in the sample spectrogram, and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram; acquiring a sample phoneme attribute of the sample phoneme point, and acquiring a sample phoneme attribute of the sample phoneme point; and calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity. The method starts from the phoneme attribute, and quantitatively describes the matching relationship between the spectrograms according to the phoneme attribute relationship of the same phoneme in different spectrogram, so that the accuracy of spectrogram matching judgment is improved.

Description

Speech spectrum matching method, device and equipment and computer readable storage medium
Technical Field
The invention relates to the technical field of voice processing, in particular to a method, a device and equipment for matching a spectrogram and a computer-readable storage medium.
Background
At present, with the continuous development of society, the voice processing technology is also gradually applied to various fields, a spectrogram is a common voice data expression mode, and the spectrogram is often used in the voice processing process, and the voice recognition, the identity recognition and other processing are performed through the matching between the spectrogram.
In the traditional spectrogram matching method, people compare differences in a naked eye mode so as to judge the matching condition between spectrograms, but the matching method is easy to draw different conclusions due to the differences of people, so that the judgment accuracy is influenced.
Disclosure of Invention
The invention mainly aims to provide a spectrogram matching method, device and equipment and a computer-readable storage medium, and aims to solve the technical problem of low accuracy of a spectrogram comparing method manually.
In order to achieve the above object, an embodiment of the present invention provides a spectrogram matching method, including:
acquiring a sample spectrogram and acquiring a sample spectrogram;
when receiving a phoneme selection instruction based on the sample spectrogram, determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram;
acquiring a sample phoneme attribute of the sample phoneme point, and acquiring a sample phoneme attribute of the sample phoneme point;
and calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
Optionally, the step of acquiring a sample spectrogram comprises:
the method comprises the steps of acquiring a sample audio, and converting the sample audio into a sample spectrogram based on a preset rule.
Optionally, the step of calculating the phoneme similarity of the sample phoneme point and the sample phoneme point according to the sample phoneme attribute and the sample phoneme attribute comprises:
converting the sample phoneme attributes into corresponding sample phoneme vectors, and converting the sample phoneme attributes into corresponding sample phoneme vectors;
and calculating the vector similarity of the sample phoneme vector and the sample phoneme vector, and determining the phoneme similarity according to the vector similarity.
Optionally, the number of sample phoneme points corresponding to the same sample phoneme point in the sample spectrogram is more than two,
the step of calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity comprises:
respectively calculating the phoneme similarity between the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes;
and acquiring a first comprehensive phoneme similarity according to the phoneme similarities, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the first comprehensive phoneme similarity.
Alternatively, the number of the sample voxel points is two or more,
the step of calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity comprises:
respectively calculating the phoneme similarity between each sample phoneme point and the corresponding sample phoneme point according to the sample phoneme attributes and the sample phoneme attributes;
and acquiring a second comprehensive phoneme similarity according to the phoneme similarities, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the second comprehensive phoneme similarity.
Optionally, after the step of calculating the phoneme similarity between the sample phoneme point and the sample phoneme point according to the sample phoneme attribute and the sample phoneme attribute, and determining the spectrogram matching degree between the sample spectrogram and the sample spectrogram according to the phoneme similarity, the method further includes:
and displaying the sample phoneme attribute, the phoneme similarity and the spectrogram matching degree.
Optionally, after the step of calculating the phoneme similarity between the sample phoneme point and the sample phoneme point according to the sample phoneme attribute and the sample phoneme attribute, and determining the spectrogram matching degree between the sample spectrogram and the sample spectrogram according to the phoneme similarity, the method further includes:
judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
and if the matching degree of the spectrogram is larger than a preset threshold value, acquiring sample identity information corresponding to the sample spectrogram, and determining sample identity information of the sample spectrogram according to the sample identity information.
In addition, to achieve the above object, an embodiment of the present invention further provides a spectrogram matching apparatus, including:
the spectrogram acquisition module is used for acquiring a sample spectrogram and acquiring a sample spectrogram;
the phoneme point acquisition module is used for determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram when receiving a phoneme selection instruction based on the sample spectrogram;
the attribute acquisition module is used for acquiring the sample phoneme attributes of the sample phoneme points and acquiring the sample phoneme attributes of the sample phoneme points;
and the matching degree determining module is used for calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
In addition, to achieve the above object, an embodiment of the present invention further provides a spectrogram matching apparatus, which includes a processor, a memory, and a spectrogram matching program stored on the memory and executable by the processor, wherein when the spectrogram matching program is executed by the processor, the steps of the spectrogram matching method are implemented.
In addition, to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, on which a spectrogram matching program is stored, wherein when the spectrogram matching program is executed by a processor, the steps of the spectrogram matching method as described above are implemented.
The invention provides a spectrogram matching method, a device, equipment and a computer-readable storage medium, which are characterized in that a sample spectrogram is obtained by obtaining a sample spectrogram; when receiving a phoneme selection instruction based on the sample spectrogram, determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram; acquiring a sample phoneme attribute of the sample phoneme point, and acquiring a sample phoneme attribute of the sample phoneme point; and calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity. Through the mode, the corresponding sample phoneme points are determined in the sample spectrogram according to the received phoneme selection instruction, the sample phoneme points corresponding to the sample phoneme points are determined in the sample spectrogram, the phoneme similarity between the phoneme points is calculated according to the respective phoneme attributes of the phoneme points corresponding to each other, and the spectrogram matching degree is determined through the phoneme similarity between the corresponding phoneme points, so that the matching relation between the spectrograms is quantitatively described according to the phoneme attribute relation of the same phoneme in different spectrograms from the phoneme attribute, and the accuracy of spectrogram matching judgment is improved.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of a spectrogram matching apparatus according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a spectrogram matching method according to a first embodiment of the present invention;
fig. 3 is a functional block diagram of a spectrogram matching apparatus according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The spectrogram matching method provided by the embodiment of the invention is mainly applied to spectrogram matching equipment which can be equipment with a data processing function, such as a Personal Computer (PC), a notebook Computer, a mobile terminal and the like.
Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a spectrogram matching apparatus according to an embodiment of the present invention. In this embodiment of the present invention, the spectrogram matching apparatus may include a processor 1001 (e.g., a central processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WI-FI interface, WI-FI interface); the memory 1005 may be a Random Access Memory (RAM) or a non-volatile memory (non-volatile memory), such as a magnetic disk memory, and the memory 1005 may optionally be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration depicted in FIG. 1 is not intended to be limiting of the present invention, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
With continued reference to fig. 1, the memory 1005 of fig. 1, which is one type of computer-readable storage medium, may include an operating system, a network communication module, and a spectrogram-matching program. In fig. 1, the network communication module may be configured to connect to a preset database, and perform data communication with the database; the processor 1001 may call the spectrogram matching program stored in the memory 1005, and execute the spectrogram matching method according to the embodiment of the present invention.
Based on the hardware architecture, the invention provides various embodiments of the spectrogram matching method.
The embodiment of the invention provides a spectrogram matching method.
Referring to fig. 2, fig. 2 is a schematic flow chart of a spectrogram matching method according to a first embodiment of the present invention.
In this embodiment, the spectrogram matching method includes the following steps:
step S10, acquiring a sample spectrogram and a sample spectrogram;
the spectrogram is a common voice data expression mode, and the spectrogram is often used in the voice processing process, and the voice recognition, the identity recognition and other processing are performed through the matching between the spectrogram. In the spectrogram, an x axis represents time, a y axis represents frequency, and a coordinate point value is voice data energy; because the three-dimensional information is expressed by adopting the two-dimensional plane, the size of the energy value is expressed by the color, and the deeper the color, the stronger the voice energy for expressing the point is. In the traditional spectrogram matching method, people compare differences in a naked eye mode so as to judge the matching condition between spectrograms, but the matching method is easy to draw different conclusions due to the differences of people, so that the judgment accuracy is influenced. In view of the above, this embodiment provides a spectrogram matching method, which determines corresponding sample phoneme points in a sample spectrogram according to a received phoneme selection instruction, and simultaneously determines sample phoneme points corresponding to the sample phoneme points in the sample spectrogram, then calculates phoneme similarities between the phoneme points according to respective phoneme attributes of the mutually corresponding phoneme points, and determines a spectrogram matching degree through the phoneme similarities between the corresponding phoneme points, so as to perform quantitative description on matching relationships between the spectrograms according to phoneme attribute relationships of the same phoneme between different spectrograms from the phoneme attributes, which is beneficial to improving accuracy of spectrogram matching judgment.
The spectrogram matching method in this embodiment is implemented by spectrogram matching equipment, which may be a personal computer, a notebook computer, a mobile terminal (such as a mobile phone), and the like, and the computer is described as an example in this embodiment. In this embodiment, the computer first acquires a sample spectrogram, which can be regarded as a target object currently needing to be processed. Secondly, a plurality of sample spectrogram are stored in a database of the computer in advance, the sample spectrogram can be considered to be collected in advance, the sample spectrogram is obtained from the database when the matching process starts, and the subsequent matching processing is carried out on the sample spectrogram and the sample spectrogram.
Further, the step of "acquiring a sample spectrogram" includes:
the method comprises the steps of acquiring a sample audio, and converting the sample audio into a sample spectrogram based on a preset rule.
It should be noted that, in this embodiment, the computer only acquires the sample audio in the initial stage, and in order to perform the subsequent matching processing, the computer needs to perform corresponding conversion processing on the sample audio based on a certain preset rule when the sample audio is acquired, so as to obtain a corresponding sample spectrogram. During the conversion, the conversion may be obtained based on a fourier transform rule, and certainly, other rules may also be adopted, and during the conversion, processing such as windowing and image binarization may also be set according to actual needs, or related setting may be performed on a sampling rate. In this way, even if a piece of audio is obtained, for example, a speech recording of a certain person is obtained, the audio can be converted into a sample spectrogram and then processed.
Step S20, when receiving a phoneme selection instruction based on the sample spectrogram, determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram;
in this embodiment, when the sample spectrogram and the sample spectrogram are obtained, a user may operate a computer to select one or more phonemes from the sample spectrogram and trigger a corresponding phoneme selection instruction. It should be noted that a phone (phone) is the smallest phonetic unit divided according to the natural attributes of the speech, and is analyzed according to the pronunciation action in the syllable, and one action constitutes one phone, such as "mandarin chinese", and is composed of three syllables, which can be analyzed as eight phones, "p, u, t, o, ng, h, u, a". In this embodiment, when receiving a phoneme selection instruction based on a sample spectrogram, a computer may first determine a corresponding sample phoneme point in the sample spectrogram, that is, which phoneme is selected by a user and which point the phoneme corresponds to in the sample spectrogram; then, the computer will determine sample phoneme points corresponding to the sample phoneme points in the sample spectrogram, wherein the sample phoneme points correspond to the same phoneme as the sample phoneme points. It should be noted that, for a sample phoneme point, there may be one or more corresponding sample phoneme points in the sample spectrogram, and for convenience of description, a sample phoneme point is taken as an example in this embodiment for description; in other words, in this step, corresponding phoneme points are determined in the sample spectrogram and the sample spectrogram for the same phoneme respectively.
Step S30, acquiring sample phoneme attributes of the sample phoneme points, and acquiring sample phoneme attributes of the sample phoneme points;
in this embodiment, when determining the sample phoneme points and the sample phoneme points, since the sample phoneme points are recorded in the sample spectrogram, the computer can acquire the sample phoneme attributes corresponding to the sample phoneme points, such as time, frequency, amplitude, energy, and the like, through the sample spectrogram; similarly, since the sample phoneme point is recorded in the sample spectrogram, the computer may also obtain a sample phoneme attribute corresponding to the sample phoneme point, such as time, frequency, amplitude, energy, and the like, from the sample spectrogram.
Step S40, calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
In this embodiment, when obtaining the specimen phoneme attribute and the sample phoneme attribute, the computer may calculate the phoneme similarity between the specimen phoneme point and the sample phoneme point according to the specimen phoneme attribute and the sample phoneme attribute; the phoneme similarity can be regarded as the representation of the pronunciation difference degree of the same phoneme of both the sample spectrogram and the sample spectrogram; the greater the phoneme similarity is, the smaller the pronunciation difference of the sample spectrogram and the sample spectrogram to the same phoneme is; the smaller the phoneme similarity is, the larger the difference in pronunciation between the sample spectrogram and the sample spectrogram for the same phoneme is.
Specifically, the step of calculating the phoneme similarity between the sample phoneme point and the sample phoneme point according to the sample phoneme attribute and the sample phoneme attribute includes:
a step a1, converting the sample phoneme attributes into corresponding sample phoneme vectors;
in this embodiment, for the process of calculating the phoneme similarity, the computer first converts the sample phoneme attributes into corresponding sample phoneme vectors, and converts the sample phoneme attributes into corresponding sample phoneme vectors. Wherein, the conversion rules of the two should be consistent; specifically, for the phoneme attributes in the vector, including a, b, and c, and the order of the phoneme attributes is a, b, and c, the attributes of three types, namely a, b, and c, in the sample phoneme attributes are taken out, mapped to corresponding attribute values according to a certain numerical mapping relationship, and then sorted according to the order of a, b, and c, so as to obtain sample phoneme vectors corresponding to the sample phoneme attributes; similarly, a sample phoneme attribute may be obtained that is converted into a corresponding sample phoneme vector. The phoneme attribute type, the sorting order of each phoneme attribute and the numerical mapping relation between the attributes and the numerical values included in the vector can be set according to actual conditions.
Step a2, calculating the vector similarity of the sample phoneme vector and the sample phoneme vector, and determining the phoneme similarity according to the vector similarity.
When the sample phoneme vector and the sample phoneme vector are obtained, the computer may calculate a vector similarity between the sample phoneme vector and the sample phoneme vector, and then determine the phoneme similarity according to the vector similarity, for example, the vector similarity may be directly used as the phoneme similarity, or a certain linear transformation process may be performed. The vector similarity between the sample phoneme vector and the sample phoneme vector may be calculated by using different formulas according to actual needs, for example, the vector similarity may be calculated based on a remainder similarity formula, or calculated based on an euclidean distance formula or a chebyshev distance formula. Through the method, the phoneme attributes of the sample are respectively converted into vectors and are used for similarity calculation, so that the similarity between quantized representation phoneme points is convenient for the subsequent quantitative description of the matching relation between the spectrogram.
When the phoneme similarity is obtained through calculation, the phoneme similarity can be regarded as the representation of the pronunciation difference degree of the same phoneme between the sample spectrogram and the specimen spectrogram, so that the matching degree of the sample spectrogram and the specimen spectrogram can be determined according to the phoneme similarity, and the matching relation between the spectrograms is quantitatively described according to the phoneme attribute relation of the same phoneme in different spectrograms from the phoneme attribute. When the matching degree of the spectrogram of the sample spectrogram and the sample spectrogram is determined according to the phoneme similarity, the phoneme similarity can be used as the matching degree of the spectrogram; or performing certain linear transformation processing on the phoneme similarity to obtain a spectrogram matching degree; or presetting different similarity ranges, wherein the different similarity ranges correspond to different spectrogram matching degrees, and then determining the corresponding spectrogram matching degree according to the similarity range where the phoneme similarity exists.
Further, after step S40, the spectrogram matching method further includes:
and displaying the sample phoneme attribute, the phoneme similarity and the spectrogram matching degree.
In this embodiment, when the computer calculates the spectrogram matching degree, the computer may further display the sample phoneme attribute, the phoneme similarity, and the spectrogram matching degree on the screen. By the method, the user can conveniently know the matching process of the spectrogram and the related specific parameters, and the user can conveniently know the specific matching condition of the spectrogram from a numerical angle.
The embodiment of the invention provides a recommendation method, which comprises the steps of obtaining a sample spectrogram by obtaining a sample spectrogram; when receiving a phoneme selection instruction based on the sample spectrogram, determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram; acquiring a sample phoneme attribute of the sample phoneme point, and acquiring a sample phoneme attribute of the sample phoneme point; and calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity. Through the above manner, in the embodiment, the corresponding sample phoneme points are determined in the sample spectrogram according to the received phoneme selection instruction, the sample phoneme points corresponding to the sample phoneme points are determined in the sample spectrogram, then the phoneme similarity between the phoneme points is calculated according to the respective phoneme attributes of the phoneme points corresponding to each other, and the spectrogram matching degree is determined through the phoneme similarity between the corresponding phoneme points, so that from the phoneme attributes, the matching relationship between the spectrograms is quantitatively described according to the phoneme attribute relationship of the same phoneme in different spectrograms, which is beneficial to improving the accuracy of spectrogram matching judgment.
Based on the first embodiment of the spectrogram matching method, a second embodiment of the spectrogram matching method is provided.
In this embodiment, the number of sample phoneme points corresponding to the same sample phoneme point in the sample spectrogram is two or more, and step S40 includes:
step b1, calculating phoneme similarity between the sample phoneme point and each sample phoneme point according to the sample phoneme attributes and the sample phoneme attributes;
in this embodiment, for one sample phoneme point, there are more than two corresponding sample phoneme points in the sample spectrogram (that is, the number of sample phoneme points corresponding to the same sample phoneme point in the sample spectrogram is more than two). In this case, when calculating the phoneme similarity, the phoneme similarity between the sample phoneme point and each sample phoneme point is calculated based on the sample phoneme attribute and the sample phoneme attribute. For example, the sample phoneme point is M, and the sample phoneme points corresponding to the sample spectrogram at the sample phoneme point are N1, N2, and N3; in this regard, the phoneme similarity SMN1 of the sample phoneme point M and the sample phoneme point N1, the phoneme similarity SMN2 of the sample phoneme point M and the sample phoneme point N2, and the phoneme similarity SMN3 of the sample phoneme point M and the sample phoneme point N3 may be calculated, respectively.
Step b2, acquiring a first comprehensive phoneme similarity according to the phoneme similarities, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the first comprehensive phoneme similarity.
After obtaining the similarity of each phoneme, acquiring a first comprehensive phoneme similarity according to the similarity of each phoneme; for the acquisition of the first integrated phoneme similarity, the average value of the phoneme similarities, the median of the phoneme similarities, the maximum or minimum of the phoneme similarities, and the like may be taken. When the first integrated phoneme similarity is obtained, the matching degree of the spectrogram of the sample spectrogram and the specimen spectrogram can be determined according to the first integrated phoneme similarity, and the specific determination process includes determining the matching degree of the spectrogram of the sample spectrogram and the specimen spectrogram according to the phoneme similarity, which is not described herein again.
Through the above manner, in the embodiment, when the number of the sample phoneme points corresponding to the same sample phoneme point in the sample spectrogram is more than two, the phoneme similarity between the sample phoneme point and each sample phoneme point can be respectively calculated, and the phoneme similarity is integrated to determine the spectrogram matching degree, so that the similarity of multiple sample phoneme points is comprehensively considered, and the accuracy of spectrogram matching judgment is improved.
Based on the first embodiment of the spectrogram matching method, a third embodiment of the spectrogram matching method is provided.
In this embodiment, the number of sample voxel points is two or more, and step S40 includes:
step b3, calculating phoneme similarity between each sample phoneme point and corresponding sample phoneme point according to the sample phoneme attributes and the sample phoneme attributes;
in this embodiment, when the computer operates to select a phoneme trigger phoneme selection instruction for a sample spectrogram, a user selects two or more sample factors, and the computer determines two or more sample phoneme points according to the phoneme selection instruction. In this case, when calculating the phoneme similarity, the phoneme similarity between each sample phoneme point and the corresponding sample phoneme point is calculated based on the sample phoneme attribute and the sample phoneme attribute. For example, the sample voxel points M1, M2, and M3, the sample voxel point corresponding to the sample spectrogram for the sample voxel point M1 is N4, the sample voxel point corresponding to the sample spectrogram for the sample voxel point M2 is N5, and the sample voxel point corresponding to the sample spectrogram for the sample voxel point M3 is N6; in this regard, the phoneme similarity S of the sample phoneme point M1 and the sample phoneme point N4 can be calculated, respectivelyM1N4The phoneme similarity S of the sample phoneme point M2 and the sample phoneme point N5M2N5The phoneme similarity S of the sample phoneme point M3 and the sample phoneme point N6M3N6
Step b4, acquiring a second comprehensive phoneme similarity according to the phoneme similarities, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the second comprehensive phoneme similarity.
After obtaining the phoneme similarity, obtaining a second comprehensive phoneme similarity according to the phoneme similarity; the second integrated phoneme similarity may be obtained by taking an average value of the phoneme similarities, or by taking a median of the phoneme similarities, or by taking a maximum value or a minimum value of the phoneme similarities, or the like. When the second integrated phoneme similarity is obtained, the spectrogram matching degree of the sample spectrogram and the specimen spectrogram can be determined according to the second integrated phoneme similarity, and the specific determination process is to determine the spectrogram matching degree of the sample spectrogram and the specimen spectrogram according to the phoneme similarity, which is not described herein again.
It should be noted that, in practical applications, two or more sample phoneme points may be selected, and the number of sample phoneme points corresponding to at least one sample phoneme point may be two or more. At this time, for the calculation of the phoneme similarity and the calculation of the spectrogram matching degree, the specific process may refer to the second embodiment for the first integrated phoneme similarity between each sample phoneme point and the corresponding sample phoneme point; then, a corresponding second integrated phoneme similarity is obtained according to the first integrated phoneme similarity, and a spectrogram matching degree is determined according to the second integrated phoneme similarity, and the specific process refers to the third embodiment.
Through the above manner, in the case that the number of the sample phoneme points is more than two, the phoneme similarity between each sample phoneme point and the corresponding sample phoneme point can be respectively calculated, and the phoneme similarity is integrated to determine the spectrogram matching degree, so that the similarity condition of the multi-sample phoneme points is comprehensively considered, and the accuracy of spectrogram matching judgment is improved.
Based on the first embodiment of the spectrogram matching method, a fourth embodiment of the spectrogram matching method is provided.
In this embodiment, after step S40, the method further includes:
step A, judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
the spectrogram matching process in this embodiment may be applied to a voice identity recognition process, that is, matching a sample spectrogram with a sample spectrogram so as to determine a sample identity corresponding to the sample spectrogram according to the spectrogram matching degree. Specifically, when the matching degree of the spectrogram of the sample spectrogram and the spectrogram of the sample spectrogram is obtained, the computer may compare the matching degree of the spectrogram with a preset threshold value, and determine whether the matching degree of the spectrogram is greater than the preset threshold value.
And B, if the matching degree of the spectrogram is larger than a preset threshold value, acquiring sample identity information corresponding to the sample spectrogram, and determining the sample identity information of the sample spectrogram according to the sample identity information.
In this embodiment, if the matching degree of the sample spectrogram is less than or equal to the preset threshold, it may be considered that the sample spectrogram has a lower matching degree, and the two do not belong to the same identity; if the matching degree of the spectrogram is greater than the preset threshold, the matching degree of the spectrogram of the sample can be considered to be higher by the spectrogram of the sample, the spectrogram of the sample and the sample belong to the same identity, and the computer can acquire the sample identity information corresponding to the spectrogram of the sample and determine the sample identity information of the spectrogram of the sample according to the sample identity information, namely determine the sample identity corresponding to the spectrogram of the sample.
Through the manner, the speech spectrogram matching process of the embodiment can be applied to the process of voice identity recognition, if the matching degree of the speech spectrogram is greater than the preset threshold value, the matching degree of the sample speech spectrogram by the sample speech spectrogram can be considered to be higher, the sample speech spectrogram and the sample speech spectrogram belong to the same identity, and at this time, the computer can acquire sample identity information corresponding to the sample speech spectrogram and determine sample identity information of the sample speech spectrogram according to the sample identity information, so that voice identity recognition is realized.
In addition, the embodiment of the invention also provides a spectrogram matching device.
Referring to fig. 3, fig. 3 is a functional block diagram of a spectrogram matching apparatus according to a first embodiment of the present invention.
In this embodiment, the spectrogram matching apparatus includes:
a spectrogram acquiring module 10, configured to acquire a sample spectrogram and acquire a sample spectrogram;
a phoneme point obtaining module 20, configured to, when a phoneme selection instruction based on the sample spectrogram is received, determine corresponding sample phoneme points in the sample spectrogram, and determine sample phoneme points corresponding to the sample phoneme points in the sample spectrogram;
an attribute obtaining module 30, configured to obtain a sample phoneme attribute of the sample phoneme point, and obtain a sample phoneme attribute of the sample phoneme point;
and the matching degree determining module 40 is configured to calculate a phoneme similarity between the sample phoneme point and the sample phoneme point according to the sample phoneme attribute and the sample phoneme attribute, and determine a spectrogram matching degree between the sample spectrogram and the sample spectrogram according to the phoneme similarity.
Each virtual function module of the spectrogram matching apparatus is stored in the memory 1005 of the spectrogram matching device shown in fig. 1, and is used for implementing all functions of the spectrogram matching program; when executed by the processor 1001, the modules may implement a spectrogram matching function.
Further, the spectrogram acquiring module 10 includes:
and the spectrogram conversion unit is used for acquiring the sample audio and converting the sample audio into a sample spectrogram based on a preset rule.
Further, the matching degree determining module 40 includes:
a vector conversion unit, configured to convert the sample phoneme attributes into corresponding sample phoneme vectors;
and the similarity calculation unit is used for calculating the vector similarity of the sample phoneme vector and determining the phoneme similarity according to the vector similarity.
Furthermore, the number of sample phoneme points corresponding to the same sample phoneme point in the sample spectrogram is more than two,
the matching degree determination module 40 includes:
a first calculating unit, configured to calculate a phoneme similarity between the sample phoneme point and each sample phoneme point according to the sample phoneme attribute and the sample phoneme attribute;
the first determining unit is used for acquiring a first comprehensive phoneme similarity according to the phoneme similarities and determining a spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the first comprehensive phoneme similarity.
Further, the number of the sample voxel points is two or more,
the matching degree determination module 40 includes:
the second calculating unit is used for respectively calculating the phoneme similarity between each sample phoneme point and the corresponding sample phoneme point according to the sample phoneme attributes and the sample phoneme attributes;
and the second determining unit is used for acquiring a second comprehensive phoneme similarity according to the phoneme similarities and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the second comprehensive phoneme similarity.
Further, the spectrogram matching device further comprises:
and the display module is used for displaying the sample phoneme attribute, the phoneme similarity and the spectrogram matching degree.
Further, the spectrogram matching device further comprises:
the matching degree judging module is used for judging whether the matching degree of the spectrogram is larger than a preset threshold value or not;
and the information determining module is used for acquiring sample identity information corresponding to the sample spectrogram if the matching degree of the spectrogram is greater than a preset threshold value, and determining the sample identity information of the sample spectrogram according to the sample identity information.
The function implementation of each module in the above-mentioned spectrogram matching device corresponds to each step in the above-mentioned spectrogram matching method embodiment, and the function and implementation process thereof are not described in detail herein.
In addition, the embodiment of the invention also provides a computer readable storage medium.
The computer readable storage medium of the present invention stores a spectrogram matching program, wherein the spectrogram matching program, when executed by a processor, implements the steps of the spectrogram matching method as described above.
The method implemented when the spectrogram matching program is executed may refer to each embodiment of the spectrogram matching method of the present invention, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A spectrogram matching method is characterized by comprising the following steps:
acquiring a sample spectrogram and acquiring a sample spectrogram;
when receiving a phoneme selection instruction based on the sample spectrogram, determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram;
acquiring a sample phoneme attribute of the sample phoneme point, and acquiring a sample phoneme attribute of the sample phoneme point;
and calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
2. The spectrogram matching method of claim 1, wherein said step of obtaining a spectrogram of a sample comprises:
the method comprises the steps of acquiring a sample audio, and converting the sample audio into a sample spectrogram based on a preset rule.
3. The spectrogram matching method of claim 1, wherein said step of calculating phoneme similarity for said sample phoneme points and said sample phoneme points based on said sample phoneme attributes and said sample phoneme attributes comprises:
converting the sample phoneme attributes into corresponding sample phoneme vectors, and converting the sample phoneme attributes into corresponding sample phoneme vectors;
and calculating the vector similarity of the sample phoneme vector and the sample phoneme vector, and determining the phoneme similarity according to the vector similarity.
4. The spectrogram matching method of claim 1, wherein the number of sample phoneme points corresponding to the same sample phoneme point in the sample spectrogram is two or more,
the step of calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity comprises:
respectively calculating the phoneme similarity between the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes;
and acquiring a first comprehensive phoneme similarity according to the phoneme similarities, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the first comprehensive phoneme similarity.
5. The spectrogram matching method according to claim 1, wherein the number of said sample voxel points is two or more,
the step of calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity comprises:
respectively calculating the phoneme similarity between each sample phoneme point and the corresponding sample phoneme point according to the sample phoneme attributes and the sample phoneme attributes;
and acquiring a second comprehensive phoneme similarity according to the phoneme similarities, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the second comprehensive phoneme similarity.
6. The spectrogram matching method of claim 1, wherein after the steps of calculating phoneme similarity between the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining spectrogram matching degree between the sample spectrogram and the sample spectrogram according to the phoneme similarity, further comprises:
and displaying the sample phoneme attribute, the phoneme similarity and the spectrogram matching degree.
7. The spectrogram matching method according to any one of claims 1 to 6, wherein said step of calculating phoneme similarity between said sample phoneme points and said sample phoneme points according to said sample phoneme attributes and said sample phoneme attributes, and determining spectrogram matching degree between said sample spectrogram and a sample spectrogram according to said phoneme similarity further comprises:
judging whether the matching degree of the spectrogram is greater than a preset threshold value or not;
and if the matching degree of the spectrogram is larger than a preset threshold value, acquiring sample identity information corresponding to the sample spectrogram, and determining sample identity information of the sample spectrogram according to the sample identity information.
8. A spectrogram matching apparatus, comprising:
the spectrogram acquisition module is used for acquiring a sample spectrogram and acquiring a sample spectrogram;
the phoneme point acquisition module is used for determining corresponding sample phoneme points in the sample spectrogram and determining sample phoneme points corresponding to the sample phoneme points in the sample spectrogram when receiving a phoneme selection instruction based on the sample spectrogram;
the attribute acquisition module is used for acquiring the sample phoneme attributes of the sample phoneme points and acquiring the sample phoneme attributes of the sample phoneme points;
and the matching degree determining module is used for calculating the phoneme similarity of the sample phoneme points and the sample phoneme points according to the sample phoneme attributes and the sample phoneme attributes, and determining the spectrogram matching degree of the sample spectrogram and the sample spectrogram according to the phoneme similarity.
9. A spectrogram matching apparatus comprising a processor, a memory, and a spectrogram matching program stored on the memory and executable by the processor, wherein the spectrogram matching program, when executed by the processor, implements the steps of the spectrogram matching method as defined in any one of claims 1 to 7.
10. A computer-readable storage medium, having a spectrogram matching program stored thereon, wherein the spectrogram matching program, when executed by a processor, implements the steps of the spectrogram matching method as claimed in any one of claims 1 to 7.
CN202010405173.7A 2020-05-13 2020-05-13 Spectrogram matching method, device, equipment and computer readable storage medium Active CN111640453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010405173.7A CN111640453B (en) 2020-05-13 2020-05-13 Spectrogram matching method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010405173.7A CN111640453B (en) 2020-05-13 2020-05-13 Spectrogram matching method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111640453A true CN111640453A (en) 2020-09-08
CN111640453B CN111640453B (en) 2023-06-16

Family

ID=72332107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010405173.7A Active CN111640453B (en) 2020-05-13 2020-05-13 Spectrogram matching method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111640453B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114267375A (en) * 2021-11-24 2022-04-01 北京百度网讯科技有限公司 Phoneme detection method and device, training method and device, equipment and medium
CN114676774A (en) * 2022-03-25 2022-06-28 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium
WO2023036014A1 (en) * 2021-09-07 2023-03-16 广西电网有限责任公司贺州供电局 Method for automatically saving power grid scheduling command on basis of voice recognition

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107037887A (en) * 2016-02-03 2017-08-11 北京搜狗科技发展有限公司 A kind of method, device and electronic equipment inputted for Chinese character
CN107301859A (en) * 2017-06-21 2017-10-27 南京邮电大学 Phonetics transfer method under the non-parallel text condition clustered based on adaptive Gauss
CN107680601A (en) * 2017-10-18 2018-02-09 深圳势必可赢科技有限公司 A kind of identity homogeneity method of inspection retrieved based on sound spectrograph and phoneme and device
CN108766417A (en) * 2018-05-29 2018-11-06 广州国音科技有限公司 A kind of the identity homogeneity method of inspection and device based on phoneme automatically retrieval
CN109284717A (en) * 2018-09-25 2019-01-29 华中师范大学 It is a kind of to paste the detection method and system for distorting operation towards digital audio duplication
CN110223673A (en) * 2019-06-21 2019-09-10 龙马智芯(珠海横琴)科技有限公司 The processing method and processing device of voice, storage medium, electronic equipment
CN110347866A (en) * 2019-07-05 2019-10-18 联想(北京)有限公司 Information processing method, device, storage medium and electronic equipment
CN110634490A (en) * 2019-10-17 2019-12-31 广州国音智能科技有限公司 Voiceprint identification method, device and equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107037887A (en) * 2016-02-03 2017-08-11 北京搜狗科技发展有限公司 A kind of method, device and electronic equipment inputted for Chinese character
CN107301859A (en) * 2017-06-21 2017-10-27 南京邮电大学 Phonetics transfer method under the non-parallel text condition clustered based on adaptive Gauss
CN107680601A (en) * 2017-10-18 2018-02-09 深圳势必可赢科技有限公司 A kind of identity homogeneity method of inspection retrieved based on sound spectrograph and phoneme and device
CN108766417A (en) * 2018-05-29 2018-11-06 广州国音科技有限公司 A kind of the identity homogeneity method of inspection and device based on phoneme automatically retrieval
CN109284717A (en) * 2018-09-25 2019-01-29 华中师范大学 It is a kind of to paste the detection method and system for distorting operation towards digital audio duplication
CN110223673A (en) * 2019-06-21 2019-09-10 龙马智芯(珠海横琴)科技有限公司 The processing method and processing device of voice, storage medium, electronic equipment
CN110347866A (en) * 2019-07-05 2019-10-18 联想(北京)有限公司 Information processing method, device, storage medium and electronic equipment
CN110634490A (en) * 2019-10-17 2019-12-31 广州国音智能科技有限公司 Voiceprint identification method, device and equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023036014A1 (en) * 2021-09-07 2023-03-16 广西电网有限责任公司贺州供电局 Method for automatically saving power grid scheduling command on basis of voice recognition
CN114267375A (en) * 2021-11-24 2022-04-01 北京百度网讯科技有限公司 Phoneme detection method and device, training method and device, equipment and medium
CN114267375B (en) * 2021-11-24 2022-10-28 北京百度网讯科技有限公司 Phoneme detection method and device, training method and device, equipment and medium
CN114676774A (en) * 2022-03-25 2022-06-28 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111640453B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN111640453B (en) Spectrogram matching method, device, equipment and computer readable storage medium
CN110287312B (en) Text similarity calculation method, device, computer equipment and computer storage medium
CN111179975A (en) Voice endpoint detection method for emotion recognition, electronic device and storage medium
JP2021526242A (en) Insurance recording quality inspection methods, equipment, equipment and computer storage media
CN108509416B (en) Sentence meaning identification method and device, equipment and storage medium
CN110827803A (en) Method, device and equipment for constructing dialect pronunciation dictionary and readable storage medium
WO2020107834A1 (en) Verification content generation method for lip-language recognition, and related apparatus
JP2010504553A (en) Voice keyword identification method, apparatus, and voice identification system
JP5196199B2 (en) Keyword display system, keyword display method, and program
Noroozi et al. Supervised vocal-based emotion recognition using multiclass support vector machine, random forests, and adaboost
CN108830201B (en) Method and device for acquiring sample triple, computer equipment and storage medium
CN110890088B (en) Voice information feedback method and device, computer equipment and storage medium
CN112017633B (en) Speech recognition method, device, storage medium and electronic equipment
US20220392485A1 (en) System and Method For Identifying Sentiment (Emotions) In A Speech Audio Input
CN113160819A (en) Method, apparatus, device, medium and product for outputting animation
CN111640454B (en) Spectrogram matching method, device, equipment and computer readable storage medium
CN110111778B (en) Voice processing method and device, storage medium and electronic equipment
CN111626346A (en) Data classification method, device, storage medium and device
JP2021513701A (en) Information processing equipment, methods and programs
CN111292763A (en) Stress detection method and device, and non-transient storage medium
CN109300484B (en) Audio alignment method and device, computer equipment and readable storage medium
CN111640421B (en) Speech comparison method, device, equipment and computer readable storage medium
CN110956981B (en) Speech emotion recognition method, device, equipment and storage medium
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
JP6996627B2 (en) Information processing equipment, control methods, and programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant