CN112528068A - Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment - Google Patents

Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment Download PDF

Info

Publication number
CN112528068A
CN112528068A CN202011268559.4A CN202011268559A CN112528068A CN 112528068 A CN112528068 A CN 112528068A CN 202011268559 A CN202011268559 A CN 202011268559A CN 112528068 A CN112528068 A CN 112528068A
Authority
CN
China
Prior art keywords
voiceprint
random vector
sensitive hash
voiceprint features
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011268559.4A
Other languages
Chinese (zh)
Inventor
郭俊龙
赖勇铨
左为
陈文�
贺亚运
李美玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN202011268559.4A priority Critical patent/CN112528068A/en
Publication of CN112528068A publication Critical patent/CN112528068A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The application provides a voiceprint feature storage method, a voiceprint feature matching device and electronic equipment, which are applied to the technical field of information storage, wherein the method comprises the following steps: the voiceprint features to be stored are stored to the corresponding hash bucket according to the hash bucket index value of the voiceprint features to be stored, so that the uniform storage of the voiceprint features can be realized, the matching range can be reduced when the voiceprint features are matched, the matching times are reduced, the calculation efficiency is improved, the second-level matching of hundred million-level voiceprint features can be realized, and the user experience is improved.

Description

Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment
Technical Field
The application relates to the technical field of voice processing, in particular to a voiceprint feature storage method, a voiceprint feature matching device and electronic equipment.
Background
The voiceprint features have the characteristics of uniqueness, stability, difficulty in counterfeiting, convenience in acquisition and transmission and the like. Voiceprint feature matching is an important area of intelligent speech technology. With the rapid development of the artificial intelligence technology related to the voiceprint, the voiceprint has more and more extensive applications in various fields, such as scenes of voiceprint login of an intelligent terminal, identity verification of telephone customer service, financial fraud prevention and the like.
Currently, there are two main scenarios for voiceprint feature matching: 1:1 identity verification, 1: and n is identified. The identity verification scene of 1:1 is mainly used for verifying whether the identity id and the voiceprint feature are the same person or not, namely cosine similarity calculation is carried out on the voiceprint feature with the same identity id in a voiceprint library, if the similarity meets a certain threshold value, the identity id is determined to be the same, and vice versa. 1: n, only one voiceprint feature is given in the identification scene, and then cosine similarity is calculated with the voiceprint features in the voiceprint library and is sequenced, so that the identity id is determined. However, the cosine similarity comparison has high computational complexity, and if the comparison is performed on the whole amount of data, the time consumption is very large, and the method is only suitable for the situation of small-scale data, and when the data reaches hundreds of millions of levels, the time consumption can not meet the service requirement at all.
Disclosure of Invention
The application provides a voiceprint feature storage method, a voiceprint feature matching device and electronic equipment, which are used for reasonably storing voiceprint features, so that when voiceprint features are matched, the matching range can be reduced, the matching times are reduced, the calculation efficiency is improved, second-level matching of hundred million-level voiceprint features can be realized, and the user experience is improved.
The technical scheme adopted by the application is as follows:
in a first aspect, a voiceprint feature storage method is provided, including:
acquiring voiceprint characteristics to be stored;
processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups;
determining a hash bucket index of the voiceprint features to be stored based on a set of k sequences of 0 s or 1 s;
and storing the voiceprint features to be stored into the hash bucket corresponding to the hash bucket index.
Optionally, the voiceprint feature is standardized by L2.
Optionally, the method further comprises:
taking out m voiceprint features which are not replaced randomly from the plurality of voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
randomly generating a first random vector, wherein the dimension of the first random vector is the same as that of the voiceprint feature;
determining a first locality-sensitive hash function having a value range of 0 or 1 based on the first random vector;
adding the first partially sensitive hash function to the partially sensitive hash dataset if the first partially sensitive hash function is able to evenly divide the m voiceprint features.
Optionally, the method further comprises:
adding the first random vector to a set of random vectors;
randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function with a value range of 0 or 1 based on the second random vector;
adding a second locality-sensitive hash function to the set of locality-sensitive hash data and a second random vector to the set of random vectors if the second locality-sensitive hash function is able to evenly divide the m voiceprint features.
In a second aspect, a voiceprint feature matching method is provided, including:
acquiring voiceprint characteristics to be matched;
processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups;
determining a hash bucket index of the voiceprint features to be stored based on a set of k sequences of 0 s or 1 s;
and matching the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index, and determining a voiceprint feature matching result.
Optionally, the determining the voiceprint feature matching result includes that the voiceprint feature is standardized by L2, and the voiceprint feature to be matched is matched with the voiceprint feature in the hash bucket corresponding to the hash bucket index, and the determining the voiceprint feature matching result includes:
determining the cosine distance between the voiceprint features to be matched and the voiceprint features in the hash bucket corresponding to the hash bucket index based on an optimized cosine distance calculation formula, wherein the optimized cosine distance calculation formula is that s is cos theta is ViVj
And determining a voiceprint feature matching result based on the cosine distance calculation result.
Optionally, the method further comprises:
taking out m voiceprint features which are not replaced randomly from the plurality of voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
randomly generating a first random vector, wherein the dimension of the first random vector is the same as that of the voiceprint feature;
determining a first locality-sensitive hash function having a value range of 0 or 1 based on the first random vector;
adding the first partially sensitive hash function to the partially sensitive hash dataset if the first partially sensitive hash function is able to evenly divide the m voiceprint features.
Optionally, the method further comprises:
adding the first random vector to a set of random vectors;
randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function with a value range of 0 or 1 based on the second random vector;
adding a second locality-sensitive hash function to the set of locality-sensitive hash data and a second random vector to the set of random vectors if the second locality-sensitive hash function is able to evenly divide the m voiceprint features.
In a third aspect, a voiceprint feature storage apparatus is provided, including:
the first acquisition module is used for acquiring the voiceprint characteristics to be stored;
the first mapping module is used for processing the voiceprint features to be stored based on the locality sensitive hash data set, and mapping to obtain a group of k 0 or 1 sequences, wherein the locality sensitive hash data set comprises k locality sensitive hash functions, and the locality sensitive hash functions are used for equally dividing a plurality of voiceprint features into two groups;
a first determining module for determining a hash bucket index of a voiceprint feature to be stored based on a set of k sequences of 0 s or 1 s;
and the storage module is used for storing the voiceprint features to be stored into the hash bucket corresponding to the hash bucket index.
Optionally, the voiceprint feature is standardized by L2.
Optionally, the apparatus further comprises:
the first extraction module is used for extracting m voiceprint features which are not replaced randomly from the voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
the first random generation module is used for randomly generating a first random vector, and the dimensionality of the first random vector is the same as that of the voiceprint feature;
a second determination module to determine a first locality-sensitive hash function with a value range of 0 or 1 based on the first random vector;
a first adding module, configured to add the first locally sensitive hash function to the locally sensitive hash data group if the first locally sensitive hash function can equally divide the m voiceprint features.
Optionally, the apparatus further comprises:
a second adding module, configured to add the first random vector to the random vector group;
a second random generation module for randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function having a value range of 0 or 1 based on the second random vector;
a third adding module, configured to add the second locality-sensitive hash function to the locality-sensitive hash data set and add the second random vector to the random vector set if the second locality-sensitive hash function can equally divide the m voiceprint features.
In a fourth aspect, there is provided a voiceprint feature matching apparatus comprising:
the second acquisition module is used for acquiring the voiceprint characteristics to be matched;
the second mapping module is used for processing the voiceprint features to be stored based on the locality sensitive hash data set, and mapping to obtain a group of k 0 or 1 sequences, wherein the locality sensitive hash data set comprises k locality sensitive hash functions, and the locality sensitive hash functions are used for equally dividing a plurality of voiceprint features into two groups;
a third determining module, configured to determine a hash bucket index of the voiceprint feature to be stored based on a set of k sequences of 0 s or 1 s;
and the matching module is used for matching the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index to determine a voiceprint feature matching result.
Optionally, the voiceprint feature is normalized for L2, the third determination module comprising:
a first determining unit, configured to determine that voiceprint features to be matched correspond to hash bucket indexes based on an optimized cosine distance calculation formulaThe cosine distance of the voiceprint features in the hash bucket is optimized according to the calculation formula of s-cos theta-ViVj
And the second determining unit is used for determining a voiceprint feature matching result based on the cosine distance calculation result.
Optionally, the apparatus further comprises:
the second extraction module is used for extracting m voiceprint features which are not replaced randomly from the voiceprint features to be stored, wherein m is an integer which is more than or equal to 2;
the third random generation module is used for randomly generating a first random vector, and the dimensionality of the first random vector is the same as that of the voiceprint feature;
a fourth determining module for determining a first locality-sensitive hash function with a value range of 0 or 1 based on the first random vector;
and the third adding module is used for adding the first local sensitive hash function to the local sensitive hash data group if the first local sensitive hash function can evenly divide the m voiceprint characteristics.
Optionally, the apparatus further comprises:
a fourth adding module, configured to add the first random vector to the random vector group;
a fourth random generation module, configured to randomly generate a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determine, based on the second random vector, a second locality-sensitive hash function having a value range of 0 or 1;
a fifth adding module, configured to add the second locality-sensitive hash function to the locality-sensitive hash data set and add the second random vector to the random vector set if the second locality-sensitive hash function can equally divide the m voiceprint features.
In a fifth aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the voiceprint feature storage method or the voiceprint feature matching method shown in the first aspect is performed.
In a sixth aspect, there is provided a computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the voiceprint feature storage method or the voiceprint feature matching method of the first aspect.
The application provides a voiceprint feature storage method, a voiceprint feature matching device and electronic equipment, wherein all voiceprint features in the prior art are stored in a database, and the voiceprint feature matching method and the voiceprint feature matching device are characterized in that the voiceprint feature storage method, the voiceprint feature matching method and the voiceprint feature matching device are stored in the database through the steps of 1: the method comprises the steps that the voiceprint characteristics are matched in an N mode, and the voiceprint characteristics to be stored are obtained; processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups; determining a hash bucket index of the voiceprint features to be stored based on a set of k sequences of 0 s or 1 s; and storing the voiceprint features to be stored into the hash bucket corresponding to the hash bucket index. The voiceprint features to be stored are stored to the corresponding hash bucket according to the hash bucket index value of the voiceprint features to be stored, so that the uniform storage of the voiceprint features can be realized, the matching range can be reduced when the voiceprint features are matched, the matching times are reduced, the calculation efficiency is improved, the second-level matching of hundred million-level voiceprint features can be realized, and the user experience is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a voiceprint feature storage method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a voiceprint feature matching method implemented by the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a voiceprint feature storage apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a voiceprint feature matching apparatus according to an embodiment of the present application;
FIG. 6 is a diagram illustrating an overall process flow of voiceprint feature matching according to an embodiment of the present application;
fig. 7 is a flowchart of an example of a module for generating a locality-sensitive hash function set according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and groups thereof. As used herein, the term "and" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Implement one
An embodiment of the present application provides a voiceprint feature storage method, as shown in fig. 1, the method may include the following steps:
step S101, acquiring voiceprint characteristics to be stored;
step S102, processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k 0 or 1 sequences, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups;
step S103, determining hash bucket indexes of the voiceprint features to be stored based on a group of k sequences of 0 or 1;
and step S104, storing the voiceprint features to be stored into the hash bucket corresponding to the hash bucket index.
In particular, if a voiceprint feature V is given0(i.e., the voiceprint features to be stored) can be passed directly through the set of locality sensitive hash functions Hg(V) mapping a set of k sequences of 0 or 1; if a section of voice signal is given, a deep learning neural network model is firstly passed through, and the voiceprint feature is output and obtained. Whether given voiceprint features or the voiceprint model output by the deep learning model, requires L2 to be standardized.
In particular, the set of locality-sensitive hash functions H may be followedg(V) the sequence of the partial Hash sensitive function, the k sequences of 0 or 1 form a binary data, and the binary data determines a range of 0-2k-a value of 1 representing the hash bucket index to which the voiceprint feature corresponds. For example, the binary sequence '0000001001' represents the hash bucket index 10. The number of the hash buckets is 2kThe average number of voiceprint features in each hash bucket is N2k. The number of hash buckets can be determined according to specific needs, and then the value of k is correspondingly determined.
Specifically, after determining the hash bucket index value of the voiceprint feature to be stored, the voiceprint feature to be stored is stored in the corresponding hash bucket. Illustratively, embodiments of the present application use a mysql database, one table for each hash bucket. The storage system used is not limited herein, and different storage systems can be selected according to actual situations.
To this application embodiment, to the vocal print feature storage to corresponding hash bucket of treating the storage according to the hash bucket index value of the vocal print feature of treating the storage to can realize equalling divide the storage of a plurality of vocal print features, thereby when carrying out the vocal print feature matching, can reduce the matching range, reduce the matching number of times, promote the computational efficiency, can realize the second level matching of hundred million level vocal print features, thereby promote user experience.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
taking out m voiceprint features which are not replaced randomly from the plurality of voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
randomly generating a first random vector, wherein the dimension of the first random vector is the same as that of the voiceprint feature;
determining a first locality-sensitive hash function having a value range of 0 or 1 based on the first random vector;
adding the first partially sensitive hash function to the partially sensitive hash dataset if the first partially sensitive hash function is able to evenly divide the m voiceprint features.
Specifically, the correlation step is primarily aimed at generating a set of random vectors that approximately equally divide the voiceprint features into two parts and that are approximately mutually orthogonal. In order to reduce the processing time, all voiceprint feature data do not need to be involved in calculation, as long as m voiceprint features are randomly sampled from hundred million voiceprint feature samples (N), and m is not limited at this point and depends on factors such as the memory of a computer device, the calculation speed and the like.
Specifically, a vector R' having the same dimension as the voiceprint feature is randomly generated (R)1,r2,…,rn). R' and the voiceprint features are positive vectors normalized by L2, namely
Figure BDA0002776967370000091
Here n is not limited, depending on the output dimension of the selected deep learning voiceprint model. At this time, the random vector group is null Rg={}。
The generated random vector R' defines a locality sensitive hash function with a value range of 0 or 1:
Figure BDA0002776967370000092
dividing m voiceprint features into two parts of 0 and 1 through a locality sensitive hash function H (V), and if the two parts are approximately equal in number, putting R' into a random vector group RgAnd entering the next process; otherwise, regenerating the random variable to carry out the steps. The locality sensitive hash function may also be stored to the set of locality sensitive hash functions. The above steps may be performed multiple times, resulting in k locality-sensitive hash functions.
It should be noted that, the averaging in the present application includes complete averaging and relative averaging, where complete averaging is to divide the data into two equal parts, and the number of the two equal parts of the relative averaging is not completely equal, and has a certain number difference, but the number difference is within a certain threshold range.
The embodiment of the application solves the problem of determining the locality sensitive hash function group.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
adding the first random vector to a set of random vectors;
randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function with a value range of 0 or 1 based on the second random vector;
adding a second locality-sensitive hash function to the set of locality-sensitive hash data and a second random vector to the set of random vectors if the second locality-sensitive hash function is able to evenly divide the m voiceprint features.
Specifically, when determining the random vector, the generated random vector R' and the random vector group R may also be determinedgIf so, determining a locality sensitive hash function based on the random vector, and judging whether m voiceprint features are equally divided, so as to judge whether the locality sensitive hash function generated according to the random vector is added to a locality sensitive hash data set; if not, the random variables are regenerated.
For the embodiment of the application, the orthogonal random matrix can be more effectively segmented to different subspaces when the data is segmented, so that data imbalance is avoided.
Example two
An embodiment of the present application provides a voiceprint feature matching method, as shown in fig. 2, the method includes:
step S201, acquiring voiceprint characteristics to be matched;
step S202, processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k 0 or 1 sequences, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups;
step S203, determining a hash bucket index of the voiceprint features to be stored based on a group of k 0 or 1 sequences;
and step S204, matching the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index, and determining a voiceprint feature matching result.
The hash value mapping process flow is consistent with that of the voiceprint feature storage module, and will not be described in detail here.
Determining the hash bucket index is consistent with the hash bucket index calculation method of the embodiment, and will not be described in detail here.
Optionally, the determining the voiceprint feature matching result includes that the voiceprint feature is standardized by L2, and the voiceprint feature to be matched is matched with the voiceprint feature in the hash bucket corresponding to the hash bucket index, and the determining the voiceprint feature matching result includes:
determining the cosine distance between the voiceprint features to be matched and the voiceprint features in the hash bucket corresponding to the hash bucket index based on an optimized cosine distance calculation formula, wherein the optimized cosine distance calculation formula is that s is cos theta is ViVj
And determining a voiceprint feature matching result based on the cosine distance calculation result.
Specifically, the standard cosine distance calculation formula is:
Figure BDA0002776967370000111
because the voiceprint features are vectors normalized by L2, namely | | | Vi||=||VjIf 1, the cosine similarity calculation formula can be optimized as multiplying two vectors by s or θ or ViVjAnd may further optimize the multiplication of vectors in series to a multiplication of matrices in parallel.
All voiceprint characteristics are taken out from the corresponding hash bucket to form a voiceprint characteristic matrix A ═ V1,V2,...,Vm]TVoiceprint feature V matched to a given need0And (3) multiplying the matrix to obtain a cosine similarity sequence:
Figure BDA0002776967370000112
then determining a voiceprint feature matching result according to the similarity calculation result, specifically, determining the voiceprint feature which has the largest cosine similarity and meets a certain threshold as the voiceprint feature obtained by matching, and only outputting topnSimilar voiceprint information. The output form of the matching result can be flexibly adjusted according to the application scene.
For the embodiment of the application, the cosine similarity is calculated based on the optimized cosine distance calculation formula, and the multiplication of serial vectors can be further optimized into the multiplication of parallel matrixes, so that the efficiency of voiceprint feature matching can be improved.
Optionally, the method further comprises:
taking out m voiceprint features which are not replaced randomly from the plurality of voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
randomly generating a first random vector, wherein the dimension of the first random vector is the same as that of the voiceprint feature;
determining a first locality-sensitive hash function having a value range of 0 or 1 based on the first random vector;
adding the first partially sensitive hash function to the partially sensitive hash dataset if the first partially sensitive hash function is able to evenly divide the m voiceprint features.
Optionally, the method further comprises:
adding the first random vector to a set of random vectors;
randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function with a value range of 0 or 1 based on the second random vector;
adding a second locality-sensitive hash function to the set of locality-sensitive hash data and a second random vector to the set of random vectors if the second locality-sensitive hash function is able to evenly divide the m voiceprint features.
To this application embodiment, to the vocal print feature storage to corresponding hash bucket of treating the storage according to the hash bucket index value of the vocal print feature of treating the storage to can realize equalling divide the storage of a plurality of vocal print features, thereby when carrying out the vocal print feature matching, can reduce the matching range, reduce the matching number of times, promote the computational efficiency, can realize the second level matching of hundred million level vocal print features, thereby promote user experience.
Exemplarily, fig. 6 shows an exemplary diagram of a general flow of voiceprint feature matching, in which a locally sensitive hash function group generation module randomly samples m voiceprint features from a large number of voiceprint feature samples, and may adjust the number of samples according to a computer memory; then randomly generating a vector which is the same as the vocal print characteristic dimension, and if the local sensitive hash function corresponding to the random vector can averagely divide the sampled vocal print characteristic sample into two parts and is orthogonal to the stored random vector, storing the random vector; and repeatedly executing the steps until k random vectors are generated, and determining a locality sensitive hash function by each random vector to obtain a locality sensitive hash function group. An example of the flow of the locality sensitive hash function set generation module is shown in fig. 7.
The voiceprint feature storage module is used for mapping a group of hash values by the voiceprint features through a local sensitive hash function group; then determining a hash bucket index corresponding to the voiceprint characteristics according to the group of hash values; and finally, storing the voiceprint characteristics to the corresponding hash bucket.
Firstly, the voiceprint characteristics pass through a local sensitive hash function group to map a group of hash values; then determining a hash bucket index corresponding to the voiceprint characteristics according to the group of hash values; secondly, extracting all voiceprint characteristics in the corresponding hash bucket to form a voiceprint characteristic matrix; then, carrying out matrix multiplication on the voiceprint characteristic matrix and the given voiceprint characteristic to obtain cosine similarity; and finally, sorting the cosine similarity, wherein the voiceprint features which have the largest cosine similarity and meet a certain threshold are matched voiceprint features.
The method of the embodiment of the present application has similar effects to the method of the first embodiment, and details are not repeated herein.
EXAMPLE III
Fig. 3 is a voiceprint feature storage apparatus provided in an embodiment of the present application, where the apparatus 30 includes:
a first obtaining module 301, configured to obtain a voiceprint feature to be stored;
a first mapping module 302, configured to process voiceprint features to be stored based on a locality-sensitive hash data set, and map the voiceprint features to obtain a group of k sequences of 0 s or 1 s, where the locality-sensitive hash data set includes k locality-sensitive hash functions, and the locality-sensitive hash functions are used to divide a plurality of voiceprint features into two groups;
a first determining module 303, configured to determine a hash bucket index of the voiceprint feature to be stored based on a set of k sequences of 0 s or 1 s;
the storage module 304 is configured to store the voiceprint features to be stored in the hash bucket corresponding to the hash bucket index.
Optionally, the voiceprint feature is standardized by L2.
Optionally, the apparatus further comprises:
the first extraction module is used for extracting m voiceprint features which are not replaced randomly from the voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
the first random generation module is used for randomly generating a first random vector, and the dimensionality of the first random vector is the same as that of the voiceprint feature;
a second determination module to determine a first locality-sensitive hash function with a value range of 0 or 1 based on the first random vector;
a first adding module, configured to add the first locally sensitive hash function to the locally sensitive hash data group if the first locally sensitive hash function can equally divide the m voiceprint features.
Optionally, the apparatus further comprises:
a second adding module, configured to add the first random vector to the random vector group;
a second random generation module for randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function having a value range of 0 or 1 based on the second random vector;
a third adding module, configured to add the second locality-sensitive hash function to the locality-sensitive hash data set and add the second random vector to the random vector set if the second locality-sensitive hash function can equally divide the m voiceprint features.
According to the embodiment of the application, the voiceprint features to be stored are stored to the corresponding hash bucket according to the hash bucket index value of the voiceprint features to be stored, so that the uniform storage of the voiceprint features can be realized, the matching range can be reduced when the voiceprint features are matched, the matching times are reduced, the calculation efficiency is improved, the second-level matching of hundred million-level voiceprint features can be realized, and the user experience is improved.
The embodiment of the present application provides a voiceprint feature storage device, which is suitable for the voiceprint feature storage method shown in the foregoing embodiment, and details are not described here again.
Example four
An embodiment of the present application provides a voiceprint feature matching device, as shown in fig. 4, the device includes:
a second obtaining module 401, configured to obtain a voiceprint feature to be matched;
a second mapping module 402, configured to process voiceprint features to be stored based on a locality-sensitive hash data set, and map the voiceprint features to obtain a group of k sequences of 0 s or 1 s, where the locality-sensitive hash data set includes k locality-sensitive hash functions, and the locality-sensitive hash functions are used to divide a plurality of voiceprint features into two groups;
a third determining module 403, configured to determine a hash bucket index of the voiceprint feature to be stored based on a set of k sequences of 0 s or 1 s;
and the matching module 403 is configured to match the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index, and determine a voiceprint feature matching result.
Optionally, the voiceprint feature is normalized for L2, the third determination module comprising:
a first determining unit, configured to determine a cosine distance between a voiceprint feature to be matched and a voiceprint feature in a hash bucket corresponding to the hash bucket index based on an optimized cosine distance calculation formula, where s ═ cos θ ═ ViVj
And the second determining unit is used for determining a voiceprint feature matching result based on the cosine distance calculation result.
Optionally, the apparatus further comprises:
the second extraction module is used for extracting m voiceprint features which are not replaced randomly from the voiceprint features to be stored, wherein m is an integer which is more than or equal to 2;
the third random generation module is used for randomly generating a first random vector, and the dimensionality of the first random vector is the same as that of the voiceprint feature;
a fourth determining module for determining a first locality-sensitive hash function with a value range of 0 or 1 based on the first random vector;
and the third adding module is used for adding the first local sensitive hash function to the local sensitive hash data group if the first local sensitive hash function can evenly divide the m voiceprint characteristics.
Optionally, the apparatus further comprises:
a fourth adding module, configured to add the first random vector to the random vector group;
a fourth random generation module, configured to randomly generate a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determine, based on the second random vector, a second locality-sensitive hash function having a value range of 0 or 1;
a fifth adding module, configured to add the second locality-sensitive hash function to the locality-sensitive hash data set and add the second random vector to the random vector set if the second locality-sensitive hash function can equally divide the m voiceprint features.
According to the embodiment of the application, the voiceprint features to be stored are stored to the corresponding hash bucket according to the hash bucket index value of the voiceprint features to be stored, so that the uniform storage of the voiceprint features can be realized, the matching range can be reduced when the voiceprint features are matched, the matching times are reduced, the calculation efficiency is improved, the second-level matching of hundred million-level voiceprint features can be realized, and the user experience is improved.
The embodiment of the present application provides a voiceprint feature matching device, which is suitable for the voiceprint feature matching method shown in the foregoing embodiment, and details are not repeated here.
EXAMPLE five
An embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 30 shown in fig. 3 includes: a processor 3001 and a memory 3003. The processor 3001 is coupled to the memory 3003, such as via a bus 3002. Further, the electronic device 30 may also include a transceiver 3003. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 30 is not limited to the embodiment of the present application. The processor 3001 is applied in the embodiment of the present application to implement the functions of the modules shown in fig. 2. The transceiver 3003 includes a receiver and a transmitter.
The processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 3002 may include a path that conveys information between the aforementioned components. The bus 3002 may be a PCI bus or an EISA bus, etc. The bus 3002 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Memory 3003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement the functions of the apparatus provided by the embodiments shown in fig. 4 or 5.
The embodiment of the application provides an electronic equipment, will treat the voiceprint feature storage of storage to corresponding hash bucket according to the hash bucket index value of the voiceprint feature of treating the storage to can realize the equally divide storage of a plurality of voiceprint features, thereby when carrying out voiceprint feature matching, can reduce the match range, reduce the matching number of times, promote the computational efficiency, can realize the second level matching of hundred million level voiceprint features, thereby promote user experience.
The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.
EXAMPLE six
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.
The embodiment of the application provides a computer-readable storage medium, which stores voiceprint features to be stored to a corresponding hash bucket according to a hash bucket index value of the voiceprint features to be stored, so that uniform storage of a plurality of voiceprint features can be realized, when voiceprint features are matched, the matching range can be reduced, the matching times are reduced, the calculation efficiency is improved, second-level matching of hundred million-level voiceprint features can be realized, and the user experience is improved.
The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (12)

1. A voiceprint feature storage method, comprising:
acquiring voiceprint characteristics to be stored;
processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions, and the local sensitive hash functions are used for equally dividing a plurality of voiceprint features into two groups;
determining a hash bucket index for the voiceprint feature to be stored based on the set of k 0 s or 1 s;
and storing the voiceprint features to be stored into the hash bucket corresponding to the hash bucket index.
2. The method of claim 1, wherein the voiceprint features are L2 normalized.
3. The method of claim 1, further comprising:
taking out m voiceprint features which are not replaced randomly from the plurality of voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
randomly generating a first random vector, wherein the dimension of the first random vector is the same as that of the voiceprint feature;
determining a first locality-sensitive hash function having a value range of 0 or 1 based on the first random vector;
adding the first locally sensitive hash function to a locally sensitive hash dataset if the first locally sensitive hash function can equally divide the m voiceprint features.
4. The method of claim 3, further comprising:
adding the first random vector to a set of random vectors;
randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function having a value range of 0 or 1 based on the second random vector;
adding the second locality-sensitive hash function to a locality-sensitive hash data set and the second random vector to the random vector set if the second locality-sensitive hash function can equally divide the m voiceprint features.
5. A voiceprint feature matching method, comprising:
acquiring voiceprint characteristics to be matched;
processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions, and the local sensitive hash functions are used for equally dividing a plurality of voiceprint features into two groups;
determining a hash bucket index for the voiceprint feature to be stored based on the set of k 0 s or 1 s;
and matching the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index, and determining a voiceprint feature matching result.
6. The method according to claim 5, wherein the voiceprint features are standardized by L2, and the step of matching the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index to determine a voiceprint feature matching result comprises:
determining the cosine distance between the voiceprint features to be matched and the voiceprint features in the hash bucket corresponding to the hash bucket index based on an optimized cosine distance calculation formula, wherein the optimized cosine distance calculation formula is that s is cos theta is ViVj
And determining a voiceprint feature matching result based on the cosine distance calculation result.
7. The method of claim 5 or 6, further comprising:
taking out m voiceprint features which are not replaced randomly from the plurality of voiceprint features to be stored, wherein m is an integer greater than or equal to 2;
randomly generating a first random vector, wherein the dimension of the first random vector is the same as that of the voiceprint feature;
determining a first locality-sensitive hash function having a value range of 0 or 1 based on the first random vector;
adding the first locally sensitive hash function to a locally sensitive hash dataset if the first locally sensitive hash function can equally divide the m voiceprint features.
8. The method of claim 7, further comprising:
adding the first random vector to a set of random vectors;
randomly generating a second random vector, and if the second random vector is orthogonal to a predetermined threshold number of vectors in the random vector group, determining a second locality sensitive hash function having a value range of 0 or 1 based on the second random vector;
adding the second locality-sensitive hash function to a locality-sensitive hash data set and the second random vector to the random vector set if the second locality-sensitive hash function can equally divide the m voiceprint features.
9. A voiceprint feature storage device, comprising:
the first acquisition module is used for acquiring the voiceprint characteristics to be stored;
the first mapping module is used for processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups;
a first determining module, configured to determine a hash bucket index of the voiceprint feature to be stored based on the k sets of sequences of 0 s or 1 s;
and the storage module is used for storing the voiceprint features to be stored into the hash bucket corresponding to the hash bucket index.
10. A voiceprint feature matching apparatus, comprising:
the second acquisition module is used for acquiring the voiceprint characteristics to be matched;
the second mapping module is used for processing the voiceprint features to be stored based on a local sensitive hash data set, and mapping to obtain a group of k sequences of 0 or 1, wherein the local sensitive hash data set comprises k local sensitive hash functions which are used for equally dividing a plurality of voiceprint features into two groups;
a third determining module, configured to determine a hash bucket index of the voiceprint feature to be stored based on the k sets of sequences of 0 s or 1 s;
and the matching module is used for matching the voiceprint features to be matched with the voiceprint features in the hash bucket corresponding to the hash bucket index to determine a voiceprint feature matching result.
11. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing a voiceprint feature storage method or a voiceprint feature matching method according to any one of claims 1 to 8.
12. A computer-readable storage medium for storing computer instructions which, when executed on a computer, enable the computer to perform the voiceprint feature storage method or the voiceprint feature matching method of any one of the preceding claims 1 to 8.
CN202011268559.4A 2020-11-13 2020-11-13 Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment Pending CN112528068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011268559.4A CN112528068A (en) 2020-11-13 2020-11-13 Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011268559.4A CN112528068A (en) 2020-11-13 2020-11-13 Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112528068A true CN112528068A (en) 2021-03-19

Family

ID=74981285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011268559.4A Pending CN112528068A (en) 2020-11-13 2020-11-13 Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112528068A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113921016A (en) * 2021-10-15 2022-01-11 阿波罗智联(北京)科技有限公司 Voice processing method, device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113921016A (en) * 2021-10-15 2022-01-11 阿波罗智联(北京)科技有限公司 Voice processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109657696B (en) Multi-task supervised learning model training and predicting method and device
CN110188829B (en) Neural network training method, target recognition method and related products
CN111243601B (en) Voiceprint clustering method and device, electronic equipment and computer-readable storage medium
CN114283341B (en) High-transferability confrontation sample generation method, system and terminal
CN111428217B (en) Fraudulent party identification method, apparatus, electronic device and computer readable storage medium
Kim et al. Multi-class classifier-based adaboost algorithm
CN113298152B (en) Model training method, device, terminal equipment and computer readable storage medium
CN112016697A (en) Method, device and equipment for federated learning and storage medium
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN116403290A (en) Living body detection method based on self-supervision domain clustering and domain generalization
Yu et al. A multi-task learning CNN for image steganalysis
CN112528068A (en) Voiceprint feature storage method, voiceprint feature matching method and device and electronic equipment
CN111310743B (en) Face recognition method and device, electronic equipment and readable storage medium
CN112257689A (en) Training and recognition method of face recognition model, storage medium and related equipment
CN110414586B (en) Anti-counterfeit label counterfeit checking method, device, equipment and medium based on deep learning
CN112364198A (en) Cross-modal Hash retrieval method, terminal device and storage medium
CN116720214A (en) Model training method and device for privacy protection
CN111488950A (en) Classification model information output method and device
CN116188439A (en) False face-changing image detection method and device based on identity recognition probability distribution
CN116152884A (en) Face image recognition method and device, processor and electronic equipment
CN115080748A (en) Weak supervision text classification method and device based on noisy label learning
CN112528646A (en) Word vector generation method, terminal device and computer-readable storage medium
CN111126617A (en) Method, device and equipment for selecting fusion model weight parameters
CN116912920B (en) Expression recognition method and device
CN110928987A (en) Legal provision retrieval method based on neural network hybrid model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination