CN111782867A - Voiceprint retrieval method, system, mobile terminal and storage medium - Google Patents

Voiceprint retrieval method, system, mobile terminal and storage medium Download PDF

Info

Publication number
CN111782867A
CN111782867A CN202010431564.6A CN202010431564A CN111782867A CN 111782867 A CN111782867 A CN 111782867A CN 202010431564 A CN202010431564 A CN 202010431564A CN 111782867 A CN111782867 A CN 111782867A
Authority
CN
China
Prior art keywords
fundamental frequency
audio
phoneme
frequency value
appointed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010431564.6A
Other languages
Chinese (zh)
Other versions
CN111782867B (en
Inventor
洪国强
肖龙源
李稀敏
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010431564.6A priority Critical patent/CN111782867B/en
Publication of CN111782867A publication Critical patent/CN111782867A/en
Application granted granted Critical
Publication of CN111782867B publication Critical patent/CN111782867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Abstract

The invention provides a voiceprint retrieval method, a system, a mobile terminal and a storage medium, wherein the method comprises the following steps: acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data; carrying out fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list; acquiring target fundamental frequency information of audio to be identified, and determining a fundamental frequency value range according to the target fundamental frequency information; and inquiring the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared. The invention carries out voiceprint retrieval by taking the phoneme fundamental frequency of the appointed phoneme as a retrieval condition, so that the voiceprint retrieval process is independent of factors such as a channel, noise, a far-near field and the like, the influence of the factors such as the channel, the noise, the far-near field and the like on the voiceprint retrieval is effectively reduced, and the accuracy of the voiceprint retrieval is improved.

Description

Voiceprint retrieval method, system, mobile terminal and storage medium
Technical Field
The invention belongs to the technical field of voiceprint recognition, and particularly relates to a voiceprint retrieval method, a voiceprint retrieval system, a mobile terminal and a storage medium.
Background
The voice of each person implies unique biological characteristics, and the voiceprint recognition refers to a technical means for recognizing a speaker by using the voice of the speaker. The voiceprint recognition has high safety and reliability as the techniques of fingerprint recognition and the like, and can be applied to all occasions needing identity recognition. Such as in the financial fields of criminal investigation, banking, securities, insurance, and the like. Compared with the traditional identity recognition technology, the voiceprint recognition technology has the advantages of simple voiceprint extraction process, low cost, uniqueness and difficulty in counterfeiting and counterfeit. In the existing voiceprint recognition, voiceprint retrieval is performed by calculating the similarity of voiceprint vectors between the audio to be recognized and the registered audio to obtain the audio to be compared, and the voiceprint recognition result is obtained by respectively performing feature matching on the audio to be recognized and each audio to be compared.
In the existing voiceprint retrieval process, the voiceprint vector similarity between audio data is taken as a retrieval condition, so that the robustness and accuracy requirements of a voiceprint vector algorithm are high, factors such as channels, noise, far and near fields and the like have large influence on the voiceprint vector algorithm, further the calculation accuracy of the voiceprint vector similarity is low, and the accuracy of voiceprint retrieval is reduced.
Disclosure of Invention
The embodiment of the invention aims to provide a voiceprint retrieval method, a voiceprint retrieval system, a mobile terminal and a storage medium, and aims to solve the problem that in the existing voiceprint retrieval process, the voiceprint retrieval accuracy is low due to the fact that the voiceprint vector similarity between audio data is used as a retrieval condition.
The embodiment of the invention is realized in such a way that a voiceprint retrieval method comprises the following steps:
acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data, wherein the fundamental frequency information comprises a phoneme fundamental frequency of at least one appointed phoneme;
carrying out fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list;
acquiring target fundamental frequency information of audio to be identified, and determining a fundamental frequency value range according to the target fundamental frequency information;
and inquiring the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared.
Further, the step of determining the range of fundamental frequency values according to the target fundamental frequency information comprises:
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is one, calculating a fundamental frequency value of the appointed phonemes in the audio to be identified;
calculating an error range between a fundamental frequency value of the designated phoneme and a preset error value to obtain a fundamental frequency value range;
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is more than one, respectively calculating the fundamental frequency value of each appointed phoneme in the audio to be identified;
and carrying out fundamental frequency weighting on the fundamental frequency value of the appointed phoneme to obtain a weighted fundamental frequency value, and calculating an error range between the weighted fundamental frequency value and the preset error value to obtain the fundamental frequency value range.
Further, the step of determining the range of fundamental frequency values according to the target fundamental frequency information comprises:
when the number of the appointed phonemes in the fundamental frequency information of the registration audio is more than one, acquiring a phoneme retrieval sequence list;
sequentially calculating the fundamental frequency value of each appointed phoneme in the audio to be recognized according to the retrieval sequence of the appointed phonemes in the phoneme retrieval sequence table;
respectively calculating an error range between the fundamental frequency value of each appointed phoneme and a preset error value to obtain a plurality of fundamental frequency value ranges;
correspondingly, the step of querying the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared includes:
and sequentially carrying out audio query on the corresponding appointed phonemes in the fundamental frequency fragment list according to the fundamental frequency value range, and setting the queried registration audio as the audio to be compared.
Further, after the step of sequentially performing audio query on the corresponding designated phonemes in the baseband frequency fragment list according to the range of the baseband frequency values, the method further includes:
and setting the same audio frequency in the registered audio frequencies inquired each time as the audio frequency to be compared.
Further, the formula for weighting the fundamental frequency of the designated phoneme is as follows:
F=a1*K1+a2*K2……+an*Kn;
wherein, F is the weighted fundamental frequency value, an is the fundamental frequency value of the nth designated phoneme in the audio to be identified, Kn is the weighted value corresponding to the nth designated phoneme in the audio to be identified, and K1+ K2 … … + Kn is 1.
Furthermore, the method for calculating the fundamental frequency value of the designated phoneme in the audio to be recognized is an autocorrelation algorithm, a cepstrum method or an inverse filtering method.
Another object of an embodiment of the present invention is to provide a voiceprint retrieval system, which includes:
the base frequency information acquisition module is used for acquiring registration data and acquiring base frequency information of each registration audio in the registration data, wherein the base frequency information comprises a phoneme base frequency of at least one appointed phoneme;
the audio storage module is used for carrying out fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list;
the base frequency value range determining module is used for acquiring target base frequency information of the audio to be identified and determining a base frequency value range according to the target base frequency information;
and the voiceprint identification module is used for inquiring the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared.
Further, the fundamental frequency value range determination module is further configured to:
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is one, calculating a fundamental frequency value of the appointed phonemes in the audio to be identified;
calculating an error range between a fundamental frequency value of the designated phoneme and a preset error value to obtain a fundamental frequency value range;
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is more than one, respectively calculating the fundamental frequency value of each appointed phoneme in the audio to be identified;
and carrying out fundamental frequency weighting on the fundamental frequency value of the appointed phoneme to obtain a weighted fundamental frequency value, and calculating an error range between the weighted fundamental frequency value and the preset error value to obtain the fundamental frequency value range.
Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the above voiceprint retrieval method.
Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the above-mentioned mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the above-mentioned voiceprint retrieval method.
According to the embodiment of the invention, the voiceprint retrieval is carried out by taking the phoneme fundamental frequency of the appointed phoneme as the retrieval condition, so that the voiceprint retrieval process is independent of factors such as a channel, noise, a far-near field and the like, the influence of the factors such as the channel, the noise, the far-near field and the like on the voiceprint retrieval is effectively reduced, and the accuracy of the voiceprint retrieval is improved.
Drawings
FIG. 1 is a flowchart of a voiceprint retrieval method provided by a first embodiment of the invention;
FIG. 2 is a flowchart of a voiceprint retrieval method provided by a second embodiment of the invention;
FIG. 3 is a flowchart of a voiceprint retrieval method provided by a third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a voiceprint retrieval system according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a mobile terminal according to a fifth embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Example one
Please refer to fig. 1, which is a flowchart illustrating a voiceprint retrieval method according to a first embodiment of the present invention, including the steps of:
step S10, acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data;
the fundamental frequency information comprises a phoneme fundamental frequency of at least one appointed phoneme and/or an average fundamental frequency of the registered audio, and the method for calculating the fundamental frequency value or the average fundamental frequency of the appointed phoneme in the audio to be identified is an autocorrelation algorithm, a cepstrum method or an inverse filtering method;
specifically, in this step, the number of the registered audios in the registered data may be set according to the user requirement, for example, in this embodiment, the registered data stores 100 different registered audios, the fundamental frequency information corresponding to the registered audios may only store the average fundamental frequency of the corresponding registered audios, or the fundamental frequency information corresponding to the registered audios may store the fundamental frequency value of at least one designated phoneme, and any phoneme may be set as the designated phoneme by the user;
step S20, storing the registered audio frequency in a slicing mode according to the fundamental frequency information to obtain a fundamental frequency slicing list;
the slice interval value in the process of storing the registered audio slices can be set according to the user requirement, for example, the slice interval value can be 5hz, 10hz or 15hz, and preferably, as the human normal voice fundamental frequency range is 40hz-400hz, the registered audio can be stored by taking the 10hz slice interval value, namely, the fundamental frequency ranges of 40hz-50hz, 50hz-60hz, 60hz-70hz … … 390hz-400hz are different data slices, and the registered audio is stored into the corresponding data slices according to the fundamental frequency information;
for example, when the fundamental frequency value of the phoneme a specified in the fundamental frequency information is 85hz, storing the registered audio corresponding to the fundamental frequency information into 80hz-90hz data fragments, and storing the relationship between each data fragment and the corresponding registered audio into a fundamental frequency fragment list, and storing the corresponding relationship between different data fragments and the corresponding registered audio in the fundamental frequency fragment list;
step S30, acquiring target fundamental frequency information of the audio to be recognized, and determining a fundamental frequency value range according to the target fundamental frequency information;
the target fundamental frequency information of the audio to be identified and the fundamental frequency information of the registered audio are calculated in the same way, and in the step, the fundamental frequency value range can be determined by adopting an error range calculation or fundamental frequency weighting way;
specifically, in the step, the design of the fundamental frequency value range is determined according to the target fundamental frequency information, so that the influence of the fundamental frequency information of the registered audio or the calculation error of the target fundamental frequency information on voiceprint retrieval is prevented, and the accuracy of the voiceprint retrieval is improved;
step S40, inquiring the registered audio frequency corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio frequency to be compared;
the fundamental frequency reflects the vocal cord vibration frequency of a speaker when the speaker is voiced, and the fundamental frequencies of different speakers aiming at the same phoneme are different, so in the step, the vocal print retrieval is carried out by taking the phoneme fundamental frequency of the appointed phoneme as a retrieval condition, and the accuracy of the vocal print retrieval is effectively improved;
for example, when the fundamental frequency value of the phoneme a specified in the target fundamental frequency information is 90hz, and the obtained fundamental frequency value range is determined to be 90hz-110hz, setting the registered audio corresponding to the 90hz-100hz and 100hz-110hz data fragments in the fundamental frequency fragment list as the audio to be compared, and respectively performing feature matching on the audio to be identified and each audio to be compared to obtain a voiceprint identification result;
in addition, in this embodiment, when the average fundamental frequency is stored in the audio information, the average fundamental frequency of each registered audio is obtained, the registered audio is stored in segments according to the average fundamental frequency to obtain the fundamental frequency segment list, the fundamental frequency value range is determined by obtaining the average fundamental frequency of the audio to be identified, and the registered data corresponding to the average fundamental frequency in the range in the fundamental frequency segment list is queried according to the fundamental frequency value range to obtain the audio to be compared.
In the embodiment, the voiceprint retrieval is carried out by taking the phoneme fundamental frequency of the designated phoneme as the retrieval condition, so that the voiceprint retrieval process is independent of factors such as a channel, noise, a far-near field and the like, the influence of the factors such as the channel, the noise, the far-near field and the like on the voiceprint retrieval is effectively reduced, and the accuracy of the voiceprint retrieval is improved.
Example two
Please refer to fig. 2, which is a flowchart illustrating a voiceprint retrieval method according to a second embodiment of the present invention, including the steps of:
step S11, acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data;
wherein the fundamental frequency information comprises a phone fundamental frequency of at least one specified phone;
step S21, storing the registered audio in a slicing mode according to the fundamental frequency information to obtain a fundamental frequency slicing list, and acquiring target fundamental frequency information of the audio to be identified;
step S31, when the number of the designated phonemes in the fundamental frequency information of the registered audio is one, calculating a fundamental frequency value of the designated phonemes in the audio to be recognized;
when the designated phoneme is phoneme a, calculating a fundamental frequency value of the phoneme a in the audio to be identified;
step S41, calculating an error range between the fundamental frequency value of the designated phoneme and a preset error value to obtain the fundamental frequency value range;
the preset error may be set according to a requirement of a user, for example, the preset error value may be 5hz, 10hz, or 15 hz;
preferably, if the preset error value in this step is 5hz, the fundamental frequency value range is obtained by calculating the fundamental frequency value ± 5hz of the phoneme a;
step S51, when the number of the designated phonemes in the fundamental frequency information of the registered audio is greater than one, respectively calculating a fundamental frequency value of each designated phoneme in the audio to be recognized;
the number of the designated phonemes in the fundamental frequency information can be set according to user requirements, for example, the number of the designated phonemes in the fundamental frequency information can be three, including phoneme a, phoneme b and phoneme c, and then fundamental frequency values corresponding to phoneme a, phoneme b and phoneme c in the audio to be recognized are respectively calculated;
step S61, carrying out fundamental frequency weighting on the fundamental frequency value of the appointed phoneme to obtain a weighted fundamental frequency value, and calculating an error range between the weighted fundamental frequency value and the preset error value to obtain the fundamental frequency value range;
because the phonemes have volatility, but the overall fundamental frequency of the same speaker is relatively stable, the fundamental frequency value range is determined by adopting a fundamental frequency weighting mode, the influence caused by fundamental frequency fluctuation is effectively reduced, and the accuracy of voiceprint retrieval is improved;
preferably, the formula for weighting the fundamental frequency of the designated phoneme is as follows:
F=a1*K1+a2*K2……+an*Kn;
wherein F is the weighted fundamental frequency value, an is the fundamental frequency value of the nth designated phoneme in the audio to be identified, Kn is the weighted value corresponding to the nth designated phoneme in the audio to be identified, and K1+ K2 … … + Kn is 1;
specifically, in this step, the corresponding relationship between each registered audio and the weighted fundamental frequency value after the fundamental frequency weighting is performed on the corresponding designated phoneme is stored in the fundamental frequency fragment list, that is, in the same embodiment, the fundamental frequency information corresponding to each registered audio and the audio to be identified has the same calculation mode;
for example, when the designated phonemes set in this embodiment are phoneme a, phoneme b, and phoneme c, calculating weighted fundamental frequency values corresponding to the phoneme a, the phoneme b, and the phoneme c in each registered audio, performing fragment storage on the registered audio based on the calculated weighted fundamental frequency values to obtain a fundamental frequency fragment list, and further, calculating, for the audio to be recognized, a weighted fundamental frequency value corresponding to the phoneme a, the phoneme b, and the phoneme c in the audio to be recognized by calculating, based on the calculated weighted fundamental frequency value, a fundamental frequency value range;
step S71, inquiring the registered audio frequency corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio frequency to be compared;
in the embodiment, the voiceprint retrieval is carried out by taking the phoneme fundamental frequency of the designated phoneme as the retrieval condition, so that the voiceprint retrieval process is independent of factors such as a channel, noise, a far-near field and the like, the influence of the factors such as the channel, the noise, the far-near field and the like on the voiceprint retrieval is effectively reduced, the accuracy of the voiceprint retrieval is improved, and the fundamental frequency value range is determined by adopting a fundamental frequency weighting mode, so that the influence caused by fundamental frequency fluctuation is effectively reduced, and the accuracy of the voiceprint retrieval is improved.
EXAMPLE III
Please refer to fig. 3, which is a flowchart illustrating a voiceprint retrieval method according to a third embodiment of the present invention, including the steps of:
step S12, acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data;
the fundamental frequency information comprises a phoneme fundamental frequency of at least one appointed phoneme, and preferably, a method adopted for calculating the fundamental frequency value of the appointed phoneme in the audio to be recognized is an autocorrelation algorithm, a cepstrum method or an inverse filtering method;
step S22, storing the registered audio in a slicing mode according to the fundamental frequency information to obtain a fundamental frequency slicing list, and acquiring target fundamental frequency information of the audio to be identified;
step S32, when the number of the designated phonemes in the fundamental frequency information of the registration audio is greater than one, acquiring a phoneme retrieval order table;
for example, when the phoneme retrieval sequence table is a phoneme a-phoneme b-phoneme c, the retrieval sequence of the phoneme a is a first order, the retrieval sequence of the phoneme b is a second order, and the retrieval sequence of the phoneme c is a third order;
step S42, sequentially calculating the fundamental frequency value of each appointed phoneme in the audio to be recognized according to the retrieval sequence of the appointed phonemes in the phoneme retrieval sequence table;
respectively calculating fundamental frequency values corresponding to the phoneme a, the phoneme b and the phoneme c in the audio to be identified;
step S52, respectively calculating the error range between the fundamental frequency value of each designated phoneme and a preset error value to obtain a plurality of fundamental frequency value ranges;
when the fundamental frequency values corresponding to the phoneme a, the phoneme b and the phoneme c in the audio to be recognized are F1, F2 and F3, respectively calculating error ranges between F1, F2 and F3 and preset error values to obtain a first fundamental frequency value range, a second fundamental frequency value range and a third fundamental frequency value range;
step S62, according to the range of the fundamental frequency value, sequentially carrying out audio query on the corresponding appointed phonemes in the list of the fundamental frequency fragments, and setting the queried registration audio as the audio to be compared;
preferably, because the phonemes have volatility but the overall fundamental frequency of the same speaker is relatively stable, the influence caused by fundamental frequency fluctuation is effectively reduced and the accuracy of voiceprint retrieval is improved by adopting the fundamental frequencies of a plurality of different phonemes to carry out the multilevel retrieval of the voiceprint;
for example, according to the first fundamental frequency value range, querying the registered audio of the phoneme a in the fundamental frequency fragment list in the first fundamental frequency value range to obtain first audio data, querying the registered audio of the phoneme b in the fundamental frequency fragment list in the second fundamental frequency value range according to the second fundamental frequency value range to obtain second audio data, querying the registered audio of the phoneme c in the fundamental frequency fragment list in the third fundamental frequency value range according to the third fundamental frequency value range to obtain third audio data, and setting the registered audio of the first audio data, the second audio data and the third audio data as the audio to be compared;
preferably, the step S62 further includes: setting the same audio frequency in the registered audio frequencies inquired each time as the audio frequency to be compared, namely performing data matching in the first audio data, the second audio data and the third audio data, and setting the same registered audio frequency in the first audio data, the second audio data and the third audio data as the audio frequency to be compared according to the matching result;
in addition, the step S62 may further include: sequentially carrying out audio query on the corresponding appointed phonemes in the fundamental frequency fragment list according to the fundamental frequency value range, setting the queried registered audio as the audio to be compared if the registered audio is queried in the fundamental frequency fragment list in the current fundamental frequency value range, and stopping query operation of the registered audio in the fundamental frequency fragment list;
for example, the registration audio corresponding to the phoneme a is inquired in the baseband fragment list according to the first baseband value range, if the registration audio corresponding to the phoneme a is not inquired in the first baseband value range, the registration audio corresponding to the phoneme b is inquired in the baseband fragment list according to the second baseband value range, if the registration audio corresponding to the phoneme b is inquired in the second baseband value range, the inquired registration audio is set as the audio to be compared, and the audio to be compared is output, so that the registration audio corresponding to the phoneme c does not need to be inquired in the baseband fragment list according to the third baseband value range, and the voiceprint detection efficiency is effectively improved.
In the embodiment, the voiceprint retrieval is carried out by taking the phoneme fundamental frequency of the designated phoneme as the retrieval condition, so that the voiceprint retrieval process is independent of factors such as a channel, noise, a far-near field and the like, the influence of the factors such as the channel, the noise, the far-near field and the like on the voiceprint retrieval is effectively reduced, the accuracy of the voiceprint retrieval is improved, and the influence caused by fundamental frequency fluctuation is effectively reduced and the accuracy of the voiceprint retrieval is improved by adopting the fundamental frequencies of a plurality of different phonemes to carry out the voiceprint multi-stage retrieval.
Example four
Please refer to fig. 4, which is a schematic structural diagram of a voiceprint retrieval system 100 according to a fourth embodiment of the present invention, including: a fundamental frequency information obtaining module 10, an audio frequency storage module 11, a fundamental frequency value range determining module 12 and a voiceprint recognition module 13, wherein:
a fundamental frequency information obtaining module 10, configured to obtain registration data, and obtain fundamental frequency information of each registration audio in the registration data, where the fundamental frequency information includes a fundamental frequency of phonemes of at least one specified phoneme, and a method adopted to calculate a fundamental frequency value of the specified phoneme in the audio to be identified is an autocorrelation algorithm, a cepstrum method, or an inverse filtering method.
And the audio storage module 11 is configured to perform fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list.
The base frequency value range determining module 12 is configured to obtain target base frequency information of the audio to be identified, and determine a base frequency value range according to the target base frequency information.
Preferably, the fundamental frequency value range determining module 12 is further configured to:
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is one, calculating a fundamental frequency value of the appointed phonemes in the audio to be identified;
calculating an error range between a fundamental frequency value of the designated phoneme and a preset error value to obtain a fundamental frequency value range;
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is more than one, respectively calculating the fundamental frequency value of each appointed phoneme in the audio to be identified;
and carrying out fundamental frequency weighting on the fundamental frequency value of the appointed phoneme to obtain a weighted fundamental frequency value, and calculating an error range between the weighted fundamental frequency value and the preset error value to obtain the fundamental frequency value range.
Further, the fundamental frequency value range determining module 12 is further configured to:
when the number of the appointed phonemes in the fundamental frequency information of the registration audio is more than one, acquiring a phoneme retrieval sequence list;
sequentially calculating the fundamental frequency value of each appointed phoneme in the audio to be recognized according to the retrieval sequence of the appointed phonemes in the phoneme retrieval sequence table;
and respectively calculating an error range between the fundamental frequency value of each appointed phoneme and a preset error value to obtain a plurality of fundamental frequency value ranges.
Specifically, the formula for weighting the fundamental frequency of the designated phoneme is as follows:
F=a1*K1+a2*K2……+an*Kn;
wherein, F is the weighted fundamental frequency value, an is the fundamental frequency value of the nth designated phoneme in the audio to be identified, Kn is the weighted value corresponding to the nth designated phoneme in the audio to be identified, and K1+ K2 … … + Kn is 1.
And the voiceprint recognition module 13 is configured to query the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list, so as to obtain an audio to be compared.
Preferably, the voiceprint recognition module 13 is further configured to: and sequentially carrying out audio query on the corresponding appointed phonemes in the fundamental frequency fragment list according to the fundamental frequency value range, and setting the queried registration audio as the audio to be compared.
Further, the voiceprint recognition module 13 is further configured to set, as the audio to be compared, the same audio in the registered audio queried each time.
In the embodiment, the voiceprint retrieval is carried out by taking the phoneme fundamental frequency of the designated phoneme as the retrieval condition, so that the voiceprint retrieval process is independent of factors such as a channel, noise, a far-near field and the like, the influence of the factors such as the channel, the noise, the far-near field and the like on the voiceprint retrieval is effectively reduced, and the accuracy of the voiceprint retrieval is improved.
EXAMPLE five
Referring to fig. 5, a mobile terminal 101 according to a fifth embodiment of the present invention includes a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal 101 execute the above-mentioned voiceprint retrieval method.
The present embodiment also provides a storage medium on which a computer program used in the above-mentioned mobile terminal 101 is stored, which when executed, includes the steps of:
acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data, wherein the fundamental frequency information comprises a phoneme fundamental frequency of at least one appointed phoneme;
carrying out fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list;
acquiring target fundamental frequency information of audio to be identified, and determining a fundamental frequency value range according to the target fundamental frequency information;
and inquiring the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is used as an example, in practical applications, the above-mentioned function distribution may be performed by different functional units or modules according to needs, that is, the internal structure of the storage device is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.
Those skilled in the art will appreciate that the component structures shown in fig. 4 are not intended to be limiting of the voiceprint retrieval system of the present invention and can include more or fewer components than shown, or some components in combination, or a different arrangement of components, and that the voiceprint retrieval method of fig. 1-3 can also be implemented using more or fewer components than shown in fig. 4, or some components in combination, or a different arrangement of components. The units, modules, etc. referred to herein are a series of computer programs that can be executed by a processor (not shown) in the target voiceprint retrieval system and that are functionally capable of performing certain functions, all of which can be stored in a storage device (not shown) of the target voiceprint retrieval system.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A voiceprint retrieval method, said method comprising:
acquiring registration data and acquiring fundamental frequency information of each registration audio in the registration data, wherein the fundamental frequency information comprises a phoneme fundamental frequency of at least one appointed phoneme;
carrying out fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list;
acquiring target fundamental frequency information of audio to be identified, and determining a fundamental frequency value range according to the target fundamental frequency information;
and inquiring the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared.
2. The voiceprint retrieval method of claim 1 wherein said step of determining a range of fundamental frequency values from said target fundamental frequency information comprises:
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is one, calculating a fundamental frequency value of the appointed phonemes in the audio to be identified;
calculating an error range between a fundamental frequency value of the designated phoneme and a preset error value to obtain a fundamental frequency value range;
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is more than one, respectively calculating the fundamental frequency value of each appointed phoneme in the audio to be identified;
and carrying out fundamental frequency weighting on the fundamental frequency value of the appointed phoneme to obtain a weighted fundamental frequency value, and calculating an error range between the weighted fundamental frequency value and the preset error value to obtain the fundamental frequency value range.
3. The voiceprint retrieval method of claim 1 wherein said step of determining a range of fundamental frequency values from said target fundamental frequency information comprises:
when the number of the appointed phonemes in the fundamental frequency information of the registration audio is more than one, acquiring a phoneme retrieval sequence list;
sequentially calculating the fundamental frequency value of each appointed phoneme in the audio to be recognized according to the retrieval sequence of the appointed phonemes in the phoneme retrieval sequence table;
respectively calculating an error range between the fundamental frequency value of each appointed phoneme and a preset error value to obtain a plurality of fundamental frequency value ranges;
correspondingly, the step of querying the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared includes:
and sequentially carrying out audio query on the corresponding appointed phonemes in the fundamental frequency fragment list according to the fundamental frequency value range, and setting the queried registration audio as the audio to be compared.
4. The method for voiceprint retrieval according to claim 3 wherein said step of sequentially audio querying corresponding said designated phonemes in said list of fundamental frequency tiles according to said range of fundamental frequency values further comprises:
and setting the same audio frequency in the registered audio frequencies inquired each time as the audio frequency to be compared.
5. The method for voiceprint retrieval according to claim 2 wherein said weighting the fundamental frequency of said designated phoneme is performed by the formula:
F=a1*K1+a2*K2……+an*Kn;
wherein, F is the weighted fundamental frequency value, an is the fundamental frequency value of the nth designated phoneme in the audio to be identified, Kn is the weighted value corresponding to the nth designated phoneme in the audio to be identified, and K1+ K2 … … + Kn is 1.
6. The method for voiceprint retrieval according to claim 1, wherein the method for calculating the fundamental frequency value of the designated phoneme in the audio to be recognized is an autocorrelation algorithm, a cepstrum method or an inverse filtering method.
7. A voiceprint retrieval system, said system comprising:
the base frequency information acquisition module is used for acquiring registration data and acquiring base frequency information of each registration audio in the registration data, wherein the base frequency information comprises a phoneme base frequency of at least one appointed phoneme;
the audio storage module is used for carrying out fragment storage on the registered audio according to the fundamental frequency information to obtain a fundamental frequency fragment list;
the base frequency value range determining module is used for acquiring target base frequency information of the audio to be identified and determining a base frequency value range according to the target base frequency information;
and the voiceprint identification module is used for inquiring the registered audio corresponding to the fundamental frequency value range in the fundamental frequency fragment list to obtain the audio to be compared.
8. The voiceprint retrieval system of claim 7 wherein said fundamental frequency range determination module is further configured to:
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is one, calculating a fundamental frequency value of the appointed phonemes in the audio to be identified;
calculating an error range between a fundamental frequency value of the designated phoneme and a preset error value to obtain a fundamental frequency value range;
when the number of the appointed phonemes in the fundamental frequency information of the registered audio is more than one, respectively calculating the fundamental frequency value of each appointed phoneme in the audio to be identified;
and carrying out fundamental frequency weighting on the fundamental frequency value of the appointed phoneme to obtain a weighted fundamental frequency value, and calculating an error range between the weighted fundamental frequency value and the preset error value to obtain the fundamental frequency value range.
9. A mobile terminal, characterized in that it comprises a storage device for storing a computer program and a processor running the computer program to make the mobile terminal execute the voiceprint retrieval method according to any one of claims 1 to 6.
10. A storage medium, characterized in that it stores a computer program for use in a mobile terminal according to claim 9, which computer program, when executed by a processor, implements the steps of the voiceprint retrieval method according to any one of claims 1 to 6.
CN202010431564.6A 2020-05-20 2020-05-20 Voiceprint retrieval method, system, mobile terminal and storage medium Active CN111782867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010431564.6A CN111782867B (en) 2020-05-20 2020-05-20 Voiceprint retrieval method, system, mobile terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010431564.6A CN111782867B (en) 2020-05-20 2020-05-20 Voiceprint retrieval method, system, mobile terminal and storage medium

Publications (2)

Publication Number Publication Date
CN111782867A true CN111782867A (en) 2020-10-16
CN111782867B CN111782867B (en) 2022-12-30

Family

ID=72754284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010431564.6A Active CN111782867B (en) 2020-05-20 2020-05-20 Voiceprint retrieval method, system, mobile terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111782867B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339605A (en) * 2010-07-22 2012-02-01 盛乐信息技术(上海)有限公司 Fundamental frequency extraction method and system based on prior surd and sonant knowledge
CN108597525A (en) * 2018-04-25 2018-09-28 四川远鉴科技有限公司 Voice vocal print modeling method and device
CN108766413A (en) * 2018-05-25 2018-11-06 北京云知声信息技术有限公司 Phoneme synthesizing method and system
CN109817223A (en) * 2019-01-29 2019-05-28 广州势必可赢网络科技有限公司 Phoneme notation method and device based on audio-frequency fingerprint
US20200089705A1 (en) * 2009-08-13 2020-03-19 TunesMap Inc. Analyzing Captured Sound and Seeking a Match for Temporal and Geographic Presentation and Navigation of Linked Cultural, Artistic, and Historic Content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200089705A1 (en) * 2009-08-13 2020-03-19 TunesMap Inc. Analyzing Captured Sound and Seeking a Match for Temporal and Geographic Presentation and Navigation of Linked Cultural, Artistic, and Historic Content
CN102339605A (en) * 2010-07-22 2012-02-01 盛乐信息技术(上海)有限公司 Fundamental frequency extraction method and system based on prior surd and sonant knowledge
CN108597525A (en) * 2018-04-25 2018-09-28 四川远鉴科技有限公司 Voice vocal print modeling method and device
CN108766413A (en) * 2018-05-25 2018-11-06 北京云知声信息技术有限公司 Phoneme synthesizing method and system
CN109817223A (en) * 2019-01-29 2019-05-28 广州势必可赢网络科技有限公司 Phoneme notation method and device based on audio-frequency fingerprint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄蓉等: "《基于小波变换的音素分段算法》", 《信息与电子工程》 *

Also Published As

Publication number Publication date
CN111782867B (en) 2022-12-30

Similar Documents

Publication Publication Date Title
Lee et al. Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis
US10629209B2 (en) Voiceprint recognition method, device, storage medium and background server
US7643994B2 (en) Method for generating an audio signature based on time domain features
Han et al. A classification based approach to speech segregation
Gerosa et al. Scream and gunshot detection in noisy environments
US20180374487A1 (en) Detection of replay attack
US20190279298A1 (en) Information auditing method, apparatus, electronic device and computer readable storage medium
CN110265037B (en) Identity verification method and device, electronic equipment and computer readable storage medium
US6772119B2 (en) Computationally efficient method and apparatus for speaker recognition
US10242677B2 (en) Speaker dependent voiced sound pattern detection thresholds
KR20170045123A (en) Hotword recognition
US20140081638A1 (en) Cut and paste spoofing detection using dynamic time warping
CN111312259B (en) Voiceprint recognition method, system, mobile terminal and storage medium
Jaafar et al. Automatic syllables segmentation for frog identification system
CN111783939A (en) Voiceprint recognition model training method and device, mobile terminal and storage medium
US11081115B2 (en) Speaker recognition
Zehetner et al. Wake-up-word spotting for mobile systems
CN111782867B (en) Voiceprint retrieval method, system, mobile terminal and storage medium
CN111370000A (en) Voiceprint recognition algorithm evaluation method, system, mobile terminal and storage medium
Suthokumar et al. Use of claimed speaker models for replay detection
US11942094B2 (en) Hybrid multilingual text-dependent and text-independent speaker verification
Camarena-Ibarrola et al. Speaker identification through spectral entropy analysis
Huang et al. Sports audio segmentation and classification
JPS63186298A (en) Word voice recognition equipment
Rouniyar et al. Channel response based multi-feature audio splicing forgery detection and localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant