CN113407768A

CN113407768A - Voiceprint retrieval method, device, system, server and storage medium

Info

Publication number: CN113407768A
Application number: CN202110703864.XA
Authority: CN
Inventors: 卢宇机; 唐智; 刘小钊
Original assignee: Voiceai Technologies Co ltd
Current assignee: Voiceai Technologies Co ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-09-17
Anticipated expiration: 2041-06-24
Also published as: CN113407768B

Abstract

The embodiment of the application discloses a voiceprint retrieval method, a voiceprint retrieval device, a voiceprint retrieval system, a server and a storage medium. The method comprises the following steps: acquiring voiceprint characteristic data to be retrieved; comparing the voiceprint feature data to be retrieved with voiceprint feature data stored in a memory in advance; and acquiring a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

Description

Voiceprint retrieval method, device, system, server and storage medium

Technical Field

The application belongs to the technical field of data processing, and particularly relates to a voiceprint retrieval method, device, system, server and storage medium.

Background

The voiceprint retrieval is to compare the voice to be retrieved with the voice stored in the database and return the voice to the database, wherein the voice and the voice are from one or more voices of the same speaker. With the development of voiceprint retrieval technology, the application scenarios of the voiceprint retrieval technology are increasing, and especially in the aspect of mass voiceprint feature retrieval, data transmission is performed between a related voiceprint retrieval method, a program and a Remote Dictionary Server in a network transmission mode, so that when mass voiceprint feature data retrieved are exported, network transmission brings a very large performance bottleneck, and the retrieval speed is slowed.

Disclosure of Invention

In view of the above problems, the present application provides a voiceprint retrieval method, apparatus, system, server and storage medium to achieve an improvement of the above problems.

In a first aspect, an embodiment of the present application provides a voiceprint retrieval method, which is applied to a voiceprint retrieval module of a voiceprint retrieval server, where the method includes: acquiring voiceprint characteristic data to be retrieved; comparing the voiceprint feature data to be retrieved with voiceprint feature data stored in a memory in advance; and acquiring a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result.

In a second aspect, an embodiment of the present application provides a voiceprint retrieval method, which is applied to a service server, and the method includes: sending voiceprint feature data to be retrieved to a voiceprint retrieval server so that a voiceprint retrieval module of the voiceprint retrieval server performs feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance; and receiving a feature comparison result corresponding to the voiceprint feature data to be retrieved, and displaying the feature comparison result.

In a third aspect, an embodiment of the present application provides a voiceprint retrieval method, which is applied to a voiceprint retrieval system, where the system includes a voiceprint retrieval server and a service server, and the method includes: the service server sends voiceprint characteristic data to be retrieved to the voiceprint retrieval server; the voiceprint retrieval server acquires the voiceprint characteristic data to be retrieved; the voiceprint retrieval server compares the voiceprint characteristic data to be retrieved with voiceprint characteristic data stored in a memory in advance; and the service server receives a feature comparison result corresponding to the voiceprint feature data to be retrieved, which is sent by the voiceprint retrieval server, and displays the feature comparison result.

In a fourth aspect, an embodiment of the present application provides a voiceprint retrieval apparatus, which runs on a voiceprint retrieval module of a voiceprint retrieval server, and includes: the data acquisition unit is used for acquiring the voiceprint characteristic data to be retrieved; the characteristic comparison unit is used for comparing the voiceprint characteristic data to be retrieved with the voiceprint characteristic data pre-stored in the memory; and the result obtaining unit is used for obtaining a feature comparison result corresponding to the voiceprint feature data to be retrieved and sending the feature comparison result.

In a fifth aspect, an embodiment of the present application provides a voiceprint retrieval apparatus, which runs on a service server, and includes: the data sending unit is used for sending voiceprint feature data to be retrieved to a voiceprint retrieval server so that a voiceprint retrieval module of the voiceprint retrieval server can perform feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance; and the display unit is used for receiving the feature comparison result corresponding to the voiceprint feature data to be retrieved and displaying the feature comparison result.

In a sixth aspect, an embodiment of the present application provides a voiceprint retrieval system, where the system includes a voiceprint retrieval server and a service server; the service server is used for sending voiceprint characteristic data to be retrieved to the voiceprint retrieval server; the voiceprint retrieval server is used for acquiring the voiceprint characteristic data to be retrieved; the voiceprint retrieval server is used for comparing the voiceprint feature data to be retrieved with the voiceprint feature data pre-stored in the memory; and the service server is used for receiving the characteristic comparison result corresponding to the voiceprint characteristic data to be retrieved, which is sent by the voiceprint retrieval server, and displaying the characteristic comparison result.

In a seventh aspect, an embodiment of the present application provides a server, including one or more processors and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.

In an eighth aspect, the present application provides a computer-readable storage medium, in which a program code is stored, wherein the program code performs the above-mentioned method when running.

The embodiment of the application provides a voiceprint retrieval method, a voiceprint retrieval device, a voiceprint retrieval system, a server and a storage medium. Firstly, obtaining voiceprint feature data to be retrieved, then carrying out feature comparison on the voiceprint feature data to be retrieved and the voiceprint feature data pre-stored in the memory, finally obtaining a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view illustrating an application scenario of a voiceprint retrieval method according to an embodiment of the present application;

fig. 2 is a flowchart illustrating a voiceprint retrieval method according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating a voiceprint retrieval method according to another embodiment of the present application;

FIG. 4 is a diagram illustrating a scenario of inserting a callback function according to another embodiment of the present application;

FIG. 5 is a flow chart illustrating a voiceprint retrieval method according to yet another embodiment of the present application;

fig. 6 is a flowchart illustrating a method for retrieving a voiceprint according to yet another embodiment of the present application to obtain voiceprint feature data to be stored;

FIG. 7 is a schematic diagram illustrating a scene of voiceprint feature data synchronization according to yet another embodiment of the present application;

FIG. 8 is a flow chart illustrating a voiceprint retrieval method according to yet another embodiment of the present application;

fig. 9 is a schematic diagram illustrating a voiceprint retrieval scene according to yet another embodiment of the present application;

fig. 10 is a block diagram illustrating a voiceprint retrieval apparatus according to an embodiment of the present application;

fig. 11 is a block diagram showing another voiceprint retrieval apparatus according to an embodiment of the present application;

fig. 12 is a block diagram showing a structure of another voiceprint retrieval apparatus according to an embodiment of the present application;

fig. 13 is a block diagram showing a structure of another voiceprint retrieval apparatus according to an embodiment of the present application;

fig. 14 is a block diagram illustrating a voiceprint retrieval system according to an embodiment of the present application;

FIG. 15 is a block diagram of a server for executing a voiceprint retrieval method according to an embodiment of the present application in real time;

fig. 16 illustrates a storage unit for storing or carrying program codes for implementing the voiceprint retrieval method according to the embodiment of the present application in real time.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The voiceprint retrieval is to obtain the voiceprint characteristics of the given voice, compare the voiceprint characteristics of the given voice with the voiceprint characteristics of the voice stored in the database and further return the information of the speaker corresponding to the voice.

Due to the popularization of microphone input devices such as mobile phones, personal computers and the like in recent years, the rapid development of network media, and the appearance of a great amount of voice and video gushes, thousands of hours of videos are uploaded to the cloud every minute. Voiceprint retrieval is also becoming more and more popular, for example by recommending similar voices for retrieval of voices; detecting infringement behaviors through voiceprint retrieval; in large-scale voiceprint authentication, too many speakers can cause slow authentication speed, and a retrieval technology can be used for accelerating the authentication process, and the like.

In the research on the related voiceprint retrieval method, the inventor finds that the voiceprint retrieval server and the voiceprint data storage server transmit voiceprint retrieval data in a network transmission mode, and further network transmission brings a very large performance bottleneck and slows down retrieval speed when the retrieved mass voiceprint feature data are exported.

Therefore, the inventor provides a voiceprint retrieval method, a voiceprint retrieval device, a voiceprint retrieval system, a voiceprint retrieval server and a storage medium, wherein the voiceprint retrieval method, the voiceprint retrieval device, the voiceprint retrieval system, the voiceprint retrieval server and the storage medium are used for firstly obtaining voiceprint feature data to be retrieved, then carrying out feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance, finally obtaining a feature comparison result corresponding to the voiceprint feature data to be retrieved, sending the feature comparison result, directly comparing the voiceprint feature data to be retrieved with the voiceprint feature data stored in the memory in advance when obtaining the voiceprint feature data to be retrieved, outputting the voiceprint feature comparison result, and improving the voiceprint feature retrieval speed in a mode of directly traversing the voiceprint feature data in the memory.

The following introduces an application environment of the voiceprint retrieval method provided by the implementation of the invention:

referring to fig. 1, the voiceprint retrieval method provided by the embodiment of the present invention can be applied to a retrieval system 100, and the system 100 can include a voiceprint feature extraction server 110, a Redis server 120, a voiceprint retrieval server 710, and a service server 720. The voiceprint search server 710 includes a voiceprint search module 711, and further, the voiceprint search module 711 further includes a Redis dynamic database module 7111, a memory 7112, and a core algorithm module 7113.

In the embodiment of the present application, the voiceprint feature extraction server 110 can be used to extract the voiceprint feature of the voice of the registrant. When the voiceprint feature extraction server 110 extracts the voiceprint feature of the voice, the voiceprint feature of the voice of the registrant may be extracted using a voiceprint feature extraction model included in the voiceprint feature extraction server 110. The voiceprint feature extraction model can be a pre-trained neural network model, and is used for outputting the voiceprint features of the voice of the registrant according to the input voice of the registrant; the registrant may be a user who first transmits voice data or audio data to the service server 720.

The Redis server 120 is an open source log-type and Key-Value database written in ANSI C language, supporting network, based on memory and persistent, and provides an Application Programming Interface (API) in multiple languages, and is generally called a data structure server, because the Value (Value) can be any of five types, i.e., String (String), Hash (Hash/Map), list (list), set (sets) and ordered set (sorted sets), and the operation is very convenient. The Redis server 120 may be used to store voiceprint feature data of the registrant's voice.

The voiceprint retrieval module 711 in the voiceprint retrieval server 710 may be a voiceprint retrieval application program, and is configured to compare the voiceprint feature data to be retrieved with the pre-stored voiceprint feature data according to the received voiceprint retrieval instruction.

Where pre-stored voice print characteristic data may be stored in the Redis dynamic database module 7111 and the memory 7112. Specifically, the Redis dynamic database module 7111 and the memory 7112 may store the voiceprint feature data sent by the Redis server 120 according to the received voiceprint feature data synchronization instruction.

The service server 720 may be configured to receive the voiceprint feature data to be retrieved, and then send the voiceprint feature data to be retrieved to the voiceprint retrieval server 710, so that the voiceprint retrieval module 711 performs voiceprint retrieval on the voiceprint feature data to be retrieved.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2, a voiceprint retrieval method provided in the embodiment of the present application is applied to a voiceprint retrieval module of a voiceprint retrieval server, and the method includes:

step S110: and acquiring voiceprint characteristic data to be retrieved.

In the embodiment of the application, the voiceprint feature data to be retrieved is a voiceprint feature corresponding to the input audio data of the user needing to perform the voiceprint retrieval. The voiceprint is a sound wave frequency spectrum which is displayed by an electro-acoustic instrument and carries speech information, and the voiceprint not only has specificity, but also has the characteristic of relative stability. In the embodiment of the present application, the voiceprint feature may include, but is not limited to, an MFCC (Mel Frequency Cepstral Coefficients) feature, an LPCC (Linear Prediction Cepstrum Coefficient) feature. The MFCC features utilize the nonlinear characteristic of the auditory frequency of human ears, convert the frequency spectrum into a nonlinear frequency spectrum based on Mel frequency, and then convert the nonlinear frequency spectrum into a cepstrum domain, so that the auditory characteristic of human is fully simulated, and the MFCC features have identification performance and anti-noise capability without any premise hypothesis; the LPCC features are a representation mode of linear prediction coefficients in a cepstrum domain, cepstrum parameters are obtained by utilizing linear prediction analysis based on the assumption that a voice signal is an autoregressive signal, and LPC orders in experiments are linear prediction cepstrum parameters, so that specific vocal tract characteristics of each person are reflected.

As a mode, the voiceprint feature data to be retrieved may be voiceprint feature data corresponding to the user's audio data acquired in real time, or may also be voiceprint feature data corresponding to the user's audio data acquired in advance and transmitted through an external device. The audio data may be audio data generated in a call process, audio data generated in a conference process, or audio data input by a user in application software.

If the voiceprint feature data to be retrieved is voiceprint feature data corresponding to pre-collected user audio data transmitted through the external device, the external device can acquire time information of occurrence of the audio data when collecting the user audio data for storage, record the time information, the source of the audio data, the text information and the corresponding voiceprint feature data according to a specified format, and further determine the voiceprint feature data to be retrieved according to the time information, the source of the audio data and other information when a voiceprint retrieval module of the voiceprint retrieval server acquires the voiceprint feature data to be retrieved. The recording according to the specified format may be understood as storing the time information, the source of the audio data, the text information, and the corresponding voiceprint feature data corresponding to the audio data in different fields of the same piece of data, and the voiceprint retrieval module of the voiceprint retrieval server may obtain the corresponding data information by reading the different fields of the piece of data.

When the voiceprint feature data to be retrieved is voiceprint feature data corresponding to the pre-collected user audio data transmitted by the external device, the voiceprint retrieval module of the voiceprint retrieval server can send a voiceprint feature data acquisition request to the external device in advance, and when the external device receives the voiceprint feature data acquisition request, the voiceprint feature data are returned to the voiceprint retrieval module of the voiceprint retrieval server and serve as the voiceprint feature data to be retrieved. The external device may be an audio acquisition device in communication connection with the voiceprint retrieval server, such as a smart phone, a tablet computer, or a smart device equipped with a microphone. In the embodiment of the present application, the audio data of the user may be collected through a microphone installed in the external device.

If the voiceprint feature data to be retrieved is the voiceprint feature data corresponding to the user's audio data collected in real time, the audio data can be collected in real time through the audio collecting device, the collected user's audio data can be sent to the voiceprint feature extraction server after the user's audio data is collected through the audio collecting device, then the voiceprint feature extraction server can extract the voiceprint features of the user's audio data, and after the voiceprint feature extraction server extracts the voiceprint feature data corresponding to the user's audio data, the voiceprint feature data corresponding to the user's audio data is sent to a voiceprint retrieval module of the voiceprint retrieval server to serve as the voiceprint feature data to be retrieved.

Step S120: and comparing the voiceprint feature data to be retrieved with the voiceprint feature data pre-stored in the memory.

As one mode, the memory stores voiceprint feature data corresponding to audio data of a plurality of users in advance. When the voiceprint retrieval module of the voiceprint retrieval server obtains the voiceprint feature data to be retrieved, the voiceprint feature data to be retrieved and the voiceprint feature data corresponding to the audio data of the plurality of users stored in the memory in advance can be subjected to feature comparison one by one.

Step S130: and acquiring a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result.

In this embodiment, the feature comparison result may be a user corresponding to the voiceprint comparison score. And comparing the voiceprint feature data to be retrieved with the voiceprint feature data corresponding to the audio data of the plurality of users pre-stored in the memory one by one to obtain a plurality of voiceprint comparison scores corresponding to the voiceprint feature data to be retrieved, taking the user corresponding to the voiceprint comparison score exceeding a voiceprint score threshold value in the voiceprint comparison scores as a feature comparison result, and sending the feature comparison result.

Optionally, the voiceprint comparison scores exceeding the voiceprint score threshold in the plurality of voiceprint comparison scores may also be sorted from high to low, and the users corresponding to the voiceprint comparison scores of the designated number sorted in the front are used as the feature comparison result.

The voiceprint retrieval method comprises the steps of firstly obtaining voiceprint feature data to be retrieved, then carrying out feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance, finally obtaining a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

Referring to fig. 3, a voiceprint retrieval method provided in the embodiment of the present application is applied to a voiceprint retrieval module of a voiceprint retrieval server, and the method includes:

step S210: and receiving a voiceprint characteristic data synchronization instruction.

As one mode, a Redis dynamic database module is included in the voiceprint retrieval module. The voiceprint feature data synchronization instruction may be sent by a Redis server. Specifically, when the Redis server receives voiceprint feature data sent by the service server, the Redis server sends a voiceprint feature data synchronization instruction to a voiceprint retrieval module of the voiceprint retrieval server, and then the Redis dynamic database module can obtain the voiceprint feature data from the Redis server and store the voiceprint feature data after receiving the voiceprint feature data synchronization instruction. Optionally, when the Redis server sends a voiceprint feature data synchronization instruction to the voiceprint retrieval module of the voiceprint retrieval server, the voiceprint feature data and the voiceprint feature data synchronization instruction may be sent to the voiceprint retrieval module as a piece of data, and then the Redis dynamic database module in the voiceprint retrieval module may directly obtain the voiceprint feature data from the piece of data and store the voiceprint feature data in the Redis dynamic database module.

Step S220: and synchronously storing the voiceprint characteristic data stored in the Redis dynamic database module to the memory based on the voiceprint characteristic data synchronization instruction.

And the voiceprint characteristic data stored in the Redis dynamic database module is the voiceprint characteristic data of the registrant acquired in the registration stage.

In the embodiment of the present application, the registrant may be a user who transmits voice data or audio data to the service server for the first time. The voiceprint feature data sent by the service server to the Redis server is the voiceprint feature data of the registrant collected in the registration stage. Specifically, the registrant may input audio data through an API interface provided by the audio acquisition device, and after receiving the audio data input by the registrant, the service server sends the audio data to the voiceprint feature extraction server, so that the voiceprint feature extraction server extracts the voiceprint feature data of the audio data input by the registrant.

And after the voiceprint feature extraction server extracts the voiceprint feature data of the audio data input by the registrant, the voiceprint feature data are sent to the service server, and then the service server sends the voiceprint feature data of the registrant to the Redis server for storage.

After the Redis server receives the voiceprint feature data of the registrant, the voiceprint feature data of the registrant are synchronized into the Redis dynamic database module based on the voiceprint feature data synchronization instruction, and then the Redis dynamic database module synchronizes the registered voiceprint feature data into the memory.

Optionally, the step of synchronously storing the voiceprint feature data stored in the Redis dynamic database module to the memory includes: and when the voiceprint characteristic data synchronization instruction is received, the voiceprint characteristic data stored in the Redis dynamic database module is synchronously stored to the memory through a callback function.

In the embodiment of the present application, the voiceprint feature data of the registrant is stored in the form of a hash table in the Redis dynamic database module. The callback function can be used for synchronously storing the voiceprint characteristic data of the registrant into the memory. Specifically, different callback functions can be inserted into different positions of the hash table, so that different callback functions can execute different functions when the voiceprint retrieval server is in different running states, wherein the callback functions can include a database state callback function, a database emptying callback function, a key value insertion callback function, a key value deletion callback function, a key value updating callback function and the like. As shown in fig. 4, the Redis dynamic database module may access the memory through callback functions such as a database state callback function, a database emptying callback function, a key value insertion callback function, a key value deletion callback function, and a key value update callback function.

As a mode, when the positions of the callback functions of different functions are determined, the callback functions can be inserted into different positions of the hash table according to the functions of the callback functions. Specifically, the database state callback function is used for starting a memory; the database clearing callback function is used for clearing voiceprint feature data stored in the memory; the key value insertion callback function, the key value deletion callback function and the key value updating function are all used for processing modification of data when voiceprint characteristic data are synchronized to a memory. Because the database state callback function is used for starting the memory, the database state callback function can be inserted into the starting position of the hash table, so that when a voiceprint feature data synchronization instruction is received, the memory can be started by the database state callback function preferentially, and further, voiceprint feature data in the Redis dynamic database module can be synchronized into the memory. In addition, since the callback functions such as the database emptying callback function, the key value insertion callback function, the key value deletion callback function, the key value update callback function and the like are all used for operations such as addition, deletion, modification, emptying and the like of the voiceprint feature data in the synchronization process of the voiceprint feature data, the callback functions such as the database emptying callback function, the key value insertion callback function, the key value deletion callback function, the key value update callback function and the like can be inserted into the middle position of the hash table. Further, when callback functions such as a database emptying callback function, a key value insertion callback function, a key value deletion callback function, a key value update callback function and the like are inserted into the middle position of the hash table, the callback functions such as the database emptying callback function, the key value insertion callback function, the key value deletion callback function, the key value update callback function and the like can be inserted into the same position of the hash table, and the callback functions such as the database emptying callback function, the key value insertion callback function, the key value deletion callback function, the key value update callback function and the like can also be inserted into different positions of the hash table. When callback functions such as database emptying callback function, key value insertion callback function, key value deletion callback function and key value update callback function are inserted into different positions of the hash table, the callback functions such as database emptying callback function, key value insertion callback function, key value deletion callback function and key value update callback function can be inserted into different positions of the hash table according to the sequence of operations such as addition, deletion, modification and emptying of voiceprint feature data in the actual application process.

Step S230: and receiving a voiceprint retrieval instruction sent by the service server.

In the voiceprint retrieval stage, the voiceprint retrieval module of the voiceprint retrieval server can trigger to start voiceprint retrieval based on receiving a voiceprint retrieval instruction sent by the service server.

Step S240: and responding to the voiceprint retrieval instruction, and acquiring the voiceprint characteristic data to be retrieved.

As a mode, when a voiceprint retrieval module of a voiceprint retrieval server receives a voiceprint retrieval instruction sent by a service server, voiceprint feature data to be retrieved are obtained. Optionally, when the voiceprint retrieval module obtains the voiceprint feature data to be retrieved, the voiceprint feature data to be retrieved may be obtained from the Redis server based on the voiceprint retrieval instruction, or the voiceprint feature data to be retrieved may be obtained from the service server.

Step S250: and comparing the voiceprint feature data to be retrieved with the voiceprint feature data pre-stored in the memory.

The voice print characteristic data corresponding to the audio data of each user can be respectively used as the reference voice print characteristic data, and the reference voice print characteristic data can be stored.

As one mode, the memory may be pre-stored with reference voiceprint feature data corresponding to the audio data of the multiple users, and when obtaining the voiceprint feature data to be retrieved, the voiceprint feature data to be retrieved and the reference voiceprint feature data corresponding to the audio data of the multiple users pre-stored in the memory are subjected to feature comparison one by one. For example, N (N is a positive integer) pieces of reference voiceprint feature data are stored in the memory in advance. And in the characteristic comparison process, comparing the voiceprint characteristic data to be retrieved with N pieces of reference voiceprint characteristic data in sequence, and when finding that the voiceprint characteristic data to be retrieved is consistent with one piece of reference voiceprint characteristic data, determining that the comparison result is consistent and not comparing with the subsequent reference voiceprint characteristic data any more. And if the voiceprint feature data to be retrieved is inconsistent with any reference voiceprint feature data, determining that the comparison result is inconsistent. Or, the voiceprint feature data to be retrieved may be compared with the N pieces of reference voiceprint feature data, respectively, to obtain N comparison results, where each comparison result represents a similarity between the voiceprint feature data to be retrieved and the corresponding reference voiceprint feature data. Further, obtaining a comparison result with the maximum similarity, and when the maximum similarity exceeds a preset similarity threshold, determining that the comparison result of the voiceprint feature data to be retrieved is consistent with the comparison result of the corresponding reference voiceprint feature data; and when the maximum similarity does not exceed a preset similarity threshold, determining that the voiceprint features to be retrieved are inconsistent with any reference voiceprint feature.

Step S260: and acquiring a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result to the service server for displaying.

In the embodiment of the application, the feature comparison result is sent to the client running in the service server for displaying.

The voiceprint retrieval method comprises the steps of firstly receiving a voiceprint characteristic data synchronization instruction, synchronously storing voiceprint characteristic data stored in a Redis dynamic database module to a memory based on the voiceprint characteristic data synchronization instruction, then receiving a voiceprint retrieval instruction sent by a service server, responding to the voiceprint retrieval instruction, obtaining voiceprint characteristic data to be retrieved, then performing characteristic comparison on the voiceprint characteristic data to be retrieved and voiceprint characteristic data stored in the memory in advance, finally obtaining a characteristic comparison result corresponding to the voiceprint characteristic data to be retrieved, and sending the characteristic comparison result to the service server for display. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

Referring to fig. 5, a voiceprint retrieval method provided in the embodiment of the present application is applied to a service server, and the method includes:

step S310: and sending voiceprint feature data to be retrieved to a voiceprint retrieval server so that a voiceprint retrieval module of the voiceprint retrieval server performs feature comparison on the voiceprint feature data to be retrieved and the voiceprint feature data pre-stored in a memory.

In the embodiment of the application, in the voiceprint retrieval stage, the service server sends the voiceprint feature data to be retrieved to the voiceprint retrieval server. The voiceprint feature data to be retrieved may be voiceprint feature data of a sound to be verified. Alternatively, the sound to be verified may be a voice input by the user, and specifically may be a voice file obtained by performing a silence suppression process on a recording spoken by the user, where the silence suppression process may be a process of identifying and eliminating a long silence segment from a sound signal stream of the voice input by the user.

Step S320: and receiving a feature comparison result corresponding to the voiceprint feature data to be retrieved, and displaying the feature comparison result.

As shown in fig. 6, step S310 further includes:

step S301: and acquiring voiceprint characteristic data to be stored.

As one mode, the step of acquiring the voiceprint feature data to be stored includes: receiving a sound to be registered; sending the voice to be registered to a voiceprint feature extraction server so that the voiceprint feature extraction server extracts voiceprint feature data of the voice to be registered; and receiving the voiceprint feature data of the voice to be registered sent by the voiceprint feature extraction server, and taking the voiceprint feature data as the voiceprint feature data to be stored.

The voice to be registered and the voice to be verified are similar and can be voices input by a user, specifically, the voice can be a voice file obtained after a recorded voice of the user is subjected to mute suppression processing, and in order to ensure the accuracy of the voice to be registered, a plurality of acquired voices to be registered can be provided.

Step S302: and synchronously storing the voiceprint feature data to be stored into the memory in the voiceprint retrieval server.

Specifically, the flow of voiceprint feature data synchronization is as shown in fig. 7, when the service server receives a voiceprint registration instruction, the voice to be registered is acquired, and then a voiceprint feature extraction instruction is sent to the voiceprint feature extraction server, and the voiceprint feature extraction server extracts voiceprint feature data of the voice to be registered based on the voiceprint feature extraction instruction and returns the voiceprint feature data of the voice to be registered to the service server.

When the service server receives the voiceprint feature data of the voice to be registered, the voiceprint feature data of the voice to be registered is cached to the Redis server, after the voiceprint feature data of the voice to be registered is cached by the Redis server, the storage result of the voiceprint feature data of the voice to be registered is returned to the service server, and then the service server can display the storage result through the client.

And meanwhile, the Redis server sends a voiceprint characteristic data synchronization instruction to the voiceprint retrieval server, and when the voiceprint retrieval server receives the voiceprint characteristic data synchronization instruction, the voiceprint characteristic data of the voice to be registered are stored in the Redis dynamic database module. When the Redis server and the Redis dynamic database module synchronize the voiceprint feature data of the sound to be registered, the voiceprint feature data of the sound to be registered can be synchronized into the memory in real time through the callback function.

Illustratively, the application process of the method of the above embodiment may be as follows:

when the user A inputs information to the bank system for identity registration through the client in the service server, the identity registration can be performed in a voice registration mode. Specifically, when the service server receives the voice input by the user a, the voice of the user a is sent to the voiceprint feature extraction server for voiceprint feature extraction, and when the voiceprint feature extraction server extracts the voiceprint feature data of the voice of the user a, the voiceprint feature data of the voice of the user a is sent to the service server, and then the service server can send the voiceprint feature data of the voice of the user a to the Redis server for storage.

And after the Redis server receives the voiceprint feature data of the voice of the user A, sending a voiceprint feature data synchronization instruction to the voiceprint retrieval server so as to store the voiceprint feature data of the voice of the user A into the Redis dynamic database module.

When the Redis server and the Redis dynamic database module synchronize the voiceprint feature data of the voice of the user A, the voiceprint feature data of the voice of the user A can be synchronized into the memory in real time through the callback function.

The voiceprint retrieval method sends voiceprint feature data to be retrieved to a voiceprint retrieval server, so that a voiceprint retrieval module of the voiceprint retrieval server performs feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance, receives a feature comparison result corresponding to the voiceprint feature data to be retrieved, and displays the feature comparison result. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

Referring to fig. 8, a voiceprint retrieval method provided in the embodiment of the present application is applied to a voiceprint retrieval system, where the system includes a voiceprint retrieval server and a service server, and the method includes:

step S410: and the service server sends the voiceprint characteristic data to be retrieved to the voiceprint retrieval server.

Step S420: and the voiceprint retrieval server acquires the voiceprint characteristic data to be retrieved.

Step S430: and the voiceprint retrieval server compares the voiceprint characteristic data to be retrieved with the voiceprint characteristic data pre-stored in the memory.

Step S440: and the service server receives a feature comparison result corresponding to the voiceprint feature data to be retrieved, which is sent by the voiceprint retrieval server, and displays the feature comparison result.

For example, the application process of the method of the embodiment of the present application may be as shown in fig. 9:

the mobile terminal user and the client A conduct business negotiation in a telephone communication mode. When the service server receives the voice of the client A, the voice of the client A is sent to the voiceprint feature extraction server for voiceprint feature extraction, after the service server receives the voiceprint feature data of the voice of the client A, a voiceprint feature retrieval instruction is sent to the voiceprint retrieval server, a voiceprint retrieval module of the voiceprint feature retrieval server utilizes a voiceprint comparison algorithm in a core algorithm module to perform feature comparison on the voiceprint feature data of the voice of the client A and voiceprint feature data stored in a memory in advance, and if the voiceprint feature data of the voice of the client A are not matched, the voiceprint retrieval module indicates that a mobile terminal user and the client A perform business negotiation for the first time. When negotiating with the client a for the first time, the voiceprint feature data of the client a can be stored in the memory, and meanwhile, the voiceprint feature data of the client a can be sent to the Redis server for storage, so that the voiceprint feature data of the client a is prevented from being lost when the mobile terminal is replaced. Since the voiceprint has uniqueness, so that the mobile terminal can identify the client a by using the uniqueness of the voiceprint when conducting business negotiation with the client a again. When the mobile terminal user conducts business negotiation again with the client a, voiceprint feature data of the client a is extracted when voice of the client a is received. Since the voiceprint feature data of the client a is pre-stored in the memory, the voiceprint matching can be successful, and the mobile terminal automatically starts to record the audio data of the business negotiation as an evidence when needed, so that the recording operation is not required to be actively triggered by a mobile terminal user. Of course, if the client B is communicating with the mobile terminal user, the voiceprint matching is unsuccessful, and the mobile terminal will not automatically record the audio data.

According to the voiceprint retrieval method, a service server sends voiceprint feature data to be retrieved to a voiceprint retrieval server, the voiceprint retrieval server obtains the voiceprint feature data to be retrieved, then the voiceprint retrieval server compares the voiceprint feature data to be retrieved with voiceprint feature data stored in a memory in advance, finally the service server receives feature comparison results corresponding to the voiceprint feature data to be retrieved sent by the voiceprint retrieval server, and the feature comparison results are displayed. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

Referring to fig. 10, a voiceprint retrieval apparatus 500 provided in the embodiment of the present application runs on a voiceprint retrieval module of a voiceprint retrieval server, and the apparatus 500 includes:

a data obtaining unit 510, configured to obtain voiceprint feature data to be retrieved.

Optionally, the data obtaining unit 510 is configured to receive a voiceprint retrieval instruction sent by a service server; and responding to the voiceprint retrieval instruction, and acquiring the voiceprint characteristic data to be retrieved.

A feature comparison unit 520, configured to perform feature comparison on the voiceprint feature data to be retrieved and the voiceprint feature data pre-stored in the memory.

A result obtaining unit 530, configured to obtain a feature comparison result corresponding to the voiceprint feature data to be retrieved, and send the feature comparison result.

Optionally, the result obtaining unit 530 is configured to obtain a feature comparison result corresponding to the voiceprint feature data to be retrieved, and send the feature comparison result to the service server for displaying.

Referring to fig. 11, the apparatus 500 further includes:

a data synchronization unit 540, configured to receive a voiceprint feature data synchronization instruction; and synchronously storing the voiceprint characteristic data stored in the Redis dynamic database module to the memory based on the voiceprint characteristic data synchronization instruction.

Optionally, the data synchronization unit 540 is configured to, when receiving the voiceprint feature data synchronization instruction, synchronously store the voiceprint feature data stored in the Redis dynamic database module to the memory through a callback function.

Referring to fig. 12, in an embodiment of the present application, a voiceprint retrieval apparatus 600 runs on a service server, where the apparatus 600 includes:

the data sending unit 610 is configured to send voiceprint feature data to be retrieved to a voiceprint retrieval server, so that a voiceprint retrieval module of the voiceprint retrieval server performs feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance.

And the display unit 620 is configured to receive a feature comparison result corresponding to the voiceprint feature data to be retrieved, and display the feature comparison result.

Referring to fig. 13, the apparatus 600 further includes:

a data storage unit 630, configured to obtain voiceprint feature data to be stored; and synchronously storing the voiceprint feature data to be stored into the memory in the voiceprint retrieval server.

Optionally, the data storage unit 630 is configured to receive a sound to be registered; sending the voice to be registered to a voiceprint feature extraction server so that the voiceprint feature extraction server extracts voiceprint feature data of the voice to be registered; and receiving the voiceprint feature data of the voice to be registered sent by the voiceprint feature extraction server, and taking the voiceprint feature data as the voiceprint feature data to be stored.

Referring to fig. 14, in an embodiment of the present application, a voiceprint retrieval system 700 is provided, where the system 700 includes a voiceprint retrieval server 710 and a service server 720;

the service server 720 is configured to send voiceprint feature data to be retrieved to the voiceprint retrieval server 710.

And the voiceprint retrieval server 720 is configured to obtain the voiceprint feature data to be retrieved.

The voiceprint retrieval server 720 is configured to perform feature comparison on the voiceprint feature data to be retrieved and the voiceprint feature data pre-stored in the memory.

The service server 710 is configured to receive a feature comparison result corresponding to the voiceprint feature data to be retrieved, which is sent by the voiceprint retrieval server 720, and display the feature comparison result.

It should be noted that the device embodiment and the method embodiment in the present application correspond to each other, and specific principles in the device embodiment may refer to the contents in the method embodiment, which is not described herein again.

A server provided by the present application will be described below with reference to fig. 15.

Referring to fig. 15, based on the voiceprint retrieval method and apparatus, another server 800 capable of executing the voiceprint retrieval method is provided in the embodiment of the present application. The server 800 includes one or more processors 802 (only one shown), a memory 804, and a network module 806 coupled to each other. The memory 804 stores programs that can execute the content of the foregoing embodiments, and the processor 802 can execute the programs stored in the memory 804.

Processor 802 may include one or more processing cores, among others. The processor 802, using various interfaces and connections throughout the server 800, performs various functions and processes data for the server 800 by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 804 and invoking data stored in the memory 804. Alternatively, the processor 802 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 802 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 802, but may be implemented by a single communication chip.

The Memory 804 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory 804 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 804 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by server 800 during use (e.g., phone book, audio-video data, chat log data), etc.

The network module 806 is configured to receive and transmit electromagnetic waves, and achieve interconversion between the electromagnetic waves and the electrical signals, so as to communicate with a communication network or other devices, for example, an audio playing device. The network module 806 may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and so forth. The network module 806 may communicate with various networks, such as the internet, an intranet, a wireless network, or with other devices via a wireless network. The wireless network may comprise a cellular telephone network, a wireless local area network, or a metropolitan area network. For example, the network module 806 can interact with the base station.

Referring to fig. 16, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 900 has stored therein program code that can be called by a processor to perform the methods described in the above-described method embodiments.

The computer-readable storage medium 900 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 900 includes a non-volatile computer-readable storage medium. The computer readable storage medium 900 has storage space for program code 910 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 910 may be compressed, for example, in a suitable form.

According to the voiceprint retrieval method, device, system, server and storage medium, voiceprint feature data to be retrieved are firstly obtained, then feature comparison is carried out on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance, finally a feature comparison result corresponding to the voiceprint feature data to be retrieved is obtained, and the feature comparison result is sent. By the method, when the voiceprint feature data to be retrieved are obtained, the voiceprint feature data to be retrieved are directly compared with the voiceprint feature data stored in the memory in advance, the voiceprint feature comparison result is output, and the voiceprint feature retrieval speed can be increased by directly traversing the voiceprint feature data in the memory.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A voiceprint retrieval method is characterized in that the method is applied to a voiceprint retrieval module of a voiceprint retrieval server, and comprises the following steps:

acquiring voiceprint characteristic data to be retrieved;

comparing the voiceprint feature data to be retrieved with voiceprint feature data stored in a memory in advance;

and acquiring a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result.

2. The method according to claim 1, wherein the voiceprint retrieval module is integrated with a Redis dynamic database module, and the obtaining the voiceprint feature to be retrieved included in the voiceprint feature retrieval instruction further comprises:

receiving a voiceprint characteristic data synchronization instruction;

and synchronously storing the voiceprint characteristic data stored in the Redis dynamic database module to the memory based on the voiceprint characteristic data synchronization instruction.

3. The method of claim 2, wherein the voiceprint feature data stored in the Redis dynamic database module is voiceprint feature data of the enrollee collected during an enrollment phase.

4. The method according to claim 2, wherein the synchronously storing the voiceprint feature data stored in the Redis dynamic database module to the memory based on the voiceprint feature data synchronization instruction comprises:

and when the voiceprint characteristic data synchronization instruction is received, the voiceprint characteristic data stored in the Redis dynamic database module is synchronously stored to the memory through a callback function.

5. The method according to claim 1, wherein the obtaining the voiceprint feature data to be retrieved comprises:

receiving a voiceprint retrieval instruction sent by a service server;

responding to the voiceprint retrieval instruction, and acquiring the voiceprint characteristic data to be retrieved;

the obtaining of the feature comparison result corresponding to the voiceprint feature data to be retrieved and the sending of the feature comparison result include:

and acquiring a feature comparison result corresponding to the voiceprint feature data to be retrieved, and sending the feature comparison result to the service server for displaying.

6. A voiceprint retrieval method is applied to a service server, and comprises the following steps:

sending voiceprint feature data to be retrieved to a voiceprint retrieval server so that a voiceprint retrieval module of the voiceprint retrieval server performs feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance;

and receiving a feature comparison result corresponding to the voiceprint feature data to be retrieved, and displaying the feature comparison result.

7. The method according to claim 6, wherein before sending the voiceprint feature data to be retrieved to the voiceprint retrieval server, the method further comprises:

acquiring voiceprint characteristic data to be stored;

and synchronously storing the voiceprint feature data to be stored into the memory in the voiceprint retrieval server.

8. The method according to claim 7, wherein the obtaining the voiceprint feature data to be stored comprises:

receiving a sound to be registered;

sending the voice to be registered to a voiceprint feature extraction server so that the voiceprint feature extraction server extracts voiceprint feature data of the voice to be registered;

and receiving the voiceprint feature data of the voice to be registered sent by the voiceprint feature extraction server, and taking the voiceprint feature data as the voiceprint feature data to be stored.

9. A voiceprint retrieval method is applied to a voiceprint retrieval system, the system comprises a voiceprint retrieval server and a service server, and the method comprises the following steps:

the service server sends voiceprint characteristic data to be retrieved to the voiceprint retrieval server;

the voiceprint retrieval server acquires the voiceprint characteristic data to be retrieved;

the voiceprint retrieval server compares the voiceprint characteristic data to be retrieved with voiceprint characteristic data stored in a memory in advance;

and the service server receives a feature comparison result corresponding to the voiceprint feature data to be retrieved, which is sent by the voiceprint retrieval server, and displays the feature comparison result.

10. A voiceprint retrieval apparatus, wherein the voiceprint retrieval module is operable on a voiceprint retrieval server, the apparatus comprising:

the data acquisition unit is used for acquiring the voiceprint characteristic data to be retrieved;

the characteristic comparison unit is used for comparing the voiceprint characteristic data to be retrieved with the voiceprint characteristic data pre-stored in the memory;

and the result obtaining unit is used for obtaining a feature comparison result corresponding to the voiceprint feature data to be retrieved and sending the feature comparison result.

11. A voiceprint retrieval apparatus, operable on a service server, the apparatus comprising:

the data sending unit is used for sending voiceprint feature data to be retrieved to a voiceprint retrieval server so that a voiceprint retrieval module of the voiceprint retrieval server can perform feature comparison on the voiceprint feature data to be retrieved and voiceprint feature data stored in a memory in advance;

and the display unit is used for receiving the feature comparison result corresponding to the voiceprint feature data to be retrieved and displaying the feature comparison result.

12. A voiceprint retrieval system, characterized in that the system comprises a voiceprint retrieval server and a service server;

the service server is used for sending voiceprint characteristic data to be retrieved to the voiceprint retrieval server;

the voiceprint retrieval server is used for acquiring the voiceprint characteristic data to be retrieved;

the voiceprint retrieval server is used for comparing the voiceprint feature data to be retrieved with the voiceprint feature data pre-stored in the memory;

and the service server is used for receiving the characteristic comparison result corresponding to the voiceprint characteristic data to be retrieved, which is sent by the voiceprint retrieval server, and displaying the characteristic comparison result.

13. A server, comprising one or more processors and memory; one or more programs are stored in the memory and configured to be executed by the one or more processors to perform the methods of any of claims 1-5, any of claims 6-8, and claim 9.

14. A computer-readable storage medium, characterized in that a program code is stored in the computer-readable storage medium, wherein the program code when executed by a processor performs the method of any of claims 1-5, any of claims 6-8, and claim 9.