CN108847245B

CN108847245B - Voice detection method and device

Info

Publication number: CN108847245B
Application number: CN201810883930.4A
Authority: CN
Inventors: 邵志明; 曹琼; 宋琼; 郝玉峰
Original assignee: Beijing Speechocean Technology Co ltd
Current assignee: Beijing Speechocean Technology Co ltd
Priority date: 2018-08-06
Filing date: 2018-08-06
Publication date: 2020-06-23
Anticipated expiration: 2038-08-06
Also published as: CN108847245A

Abstract

The embodiment of the invention provides a voice detection method and a voice detection device, wherein the method comprises the following steps: the method comprises the steps of adopting a first pre-acquired speaking model to acquire a first probability of each testing voice except a first testing voice in a voice database, wherein the first speaking model is used for detecting the probability that the testing voice is the voice spoken by a first speaker, the first speaker is the speaker corresponding to the first testing voice, and then determining a second testing voice which is the same speaker as the first testing voice and is spoken by the same speaker as the first testing voice according to the first probability of each testing voice except the first testing voice in the voice database, wherein the second testing voice is the testing voice with the highest first probability. The method provided by the embodiment detects all voices of the same person in the voice database, and improves efficiency and accuracy.

Description

Voice detection method and device

Technical Field

The embodiment of the invention relates to the field of voice detection, in particular to a voice detection method and a voice detection device.

Background

With the development of speech recognition technology, speech databases are gradually built, and in order to cover the acoustic characteristics of all speakers in a certain language as much as possible, a large amount of data of the speakers needs to be recorded, and usually, one database may contain thousands of speakers, which causes the problem that some speakers in the speech database repeat speech.

At present, for voice detection of the same person, a manual sampling detection mode is adopted, for example, a database of 1000 persons, data of 100 persons are randomly extracted, and the data of 100 persons is manually listened and distinguished one by one, so as to check whether repeat speakers exist in the data of 100 persons of the examiner.

However, the sampling test can only obtain the proportion of repeat speakers on a sample, cannot eliminate all repeat speakers, and extracts a certain proportion of data for manual listening and identification, which is time-consuming, labor-consuming and not high in accuracy.

Disclosure of Invention

The embodiment of the invention provides a voice detection method, which aims to solve the problems of time and labor consumption and low accuracy caused by manual listening on a sampling inspection sample.

In a first aspect, an embodiment of the present invention provides a voice detection method, including:

acquiring a first probability of each test voice except a first test voice in a voice database by adopting a first pre-acquired speaking model, wherein the first speaking model is used for detecting the probability that the test voice is the voice spoken by a first speaker, and the first speaker is the speaker corresponding to the first test voice;

and determining a second test voice which is spoken by the same speaker as the first test voice according to the first probability of each test voice except the first test voice in the voice database, wherein the second test voice is the test voice with the highest first probability.

Optionally, before obtaining the score of any test speech except the first test speech in the speech database by using the pre-obtained first speaking model, the method further includes:

and establishing the first speaking model according to the plurality of voices of the first speaker.

Optionally, before determining that the first speaker and the second speaker corresponding to the second test utterance are the same person, the method further includes:

and inputting the first test voice into a second speaking model which is acquired in advance, and acquiring a second probability of the first test voice, wherein the second speaking model is used for detecting the probability that the test voice is the voice spoken by a second speaker, and the second speaker is the speaker corresponding to the second test voice.

Optionally, the determining, according to the first probability of each test speech except the first test speech in the speech database, a second test speech spoken by the same speaker as the first test speech includes:

acquiring a first probability of detecting the first test voice according to the first speaker model and a second probability of detecting the second test voice according to the second speaker model, and acquiring a third probability of the first speaker and the second speaker being the same speaker;

and if the third probability is greater than a preset probability threshold, determining that the first test voice and the second test voice are spoken by the same speaker.

Optionally, the third probability is an average of the first probability and the second probability.

Optionally, the voice database includes a plurality of test voices corresponding to the first speaker and a plurality of test voices corresponding to the second speaker; determining a second test utterance spoken by the same speaker as the first test utterance based on a first probability of each test utterance except the first test utterance in the utterance database, the second test utterance being a test utterance with a highest first probability, comprising:

acquiring a fourth probability that the detected speaker is the same person as the first test voice in a plurality of test voices corresponding to the second speaker;

acquiring a fifth probability that the detected speaker and the second test voice are the same test voice of the same person from a plurality of test voices corresponding to the first speaker;

acquiring a sixth probability that the first speaker and the second speaker are the same speaker according to the fourth probability and the fifth probability;

and if the sixth probability is greater than a preset probability threshold, determining that the first test voice and the second test voice are spoken by the same speaker.

Optionally, the sixth probability is an average of the fourth probability and the fifth probability.

In a second aspect, an embodiment of the present invention provides a speech detection apparatus, including:

the device comprises an acquisition module, a comparison module and a comparison module, wherein the acquisition module is used for acquiring a first probability of each test voice except a first test voice in a voice database by adopting a first pre-acquired speaking model, the first speaking model is used for detecting the probability that the test voice is the voice spoken by a first speaker, and the first speaker is the speaker corresponding to the first test voice;

and the processing module is used for determining a second test voice which is spoken by the same speaker as the first test voice according to the first probability of each test voice except the first test voice in the voice database, wherein the second test voice is the test voice with the highest first probability.

Optionally, the processing module is further configured to establish the first speaking model according to the multiple voices of the first speaker.

Optionally, the obtaining module is further configured to input the first test speech into a second speech model obtained in advance, to obtain a second probability of the first test speech, where the second speech model is used to detect a probability that the test speech is a speech spoken by a second speaker, and the second speaker is a speaker corresponding to the second test speech.

Optionally, the obtaining module is specifically configured to obtain a third probability that the first speaker and the second speaker are the same speaker according to a first probability that the first speaker model detects the second test voice and a second probability that the second speaker model detects the first test voice;

the processing module is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if the third probability is greater than a preset probability threshold.

Optionally, the processing module is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if an average of the first probability and the second probability is greater than a preset probability threshold.

Optionally, the obtaining module is specifically configured to obtain a fourth probability that the detected speaker is a test voice of the same person as the first test voice from among the multiple test voices corresponding to the second speaker;

the processing module is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if the sixth probability is greater than a preset probability threshold.

Optionally, the processing module is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if an average of the fourth probability and the fifth probability is greater than a preset probability threshold.

In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the speech detection method according to any of the first aspects.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for detecting a voice according to any one of the first aspect is implemented.

The method includes acquiring a first probability of each test voice except a first test voice in a voice database by adopting a first pre-acquired speaking model, wherein the first speaking model is used for detecting the probability that the test voice is the voice spoken by a first speaker, and the first speaker is the speaker corresponding to the first test voice; and determining a second test voice which is spoken by the same speaker as the first test voice according to the first probability of each test voice except the first test voice in the voice database, wherein the second test voice is the test voice with the highest first probability, and all voices of the same person in the voice database are detected, so that the efficiency and the accuracy are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a first schematic flow chart of a voice detection method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a voice detection method according to an embodiment of the present invention;

fig. 3 is a third schematic flowchart of a voice detection method according to an embodiment of the present invention;

fig. 4 is a first schematic structural diagram of a voice detection apparatus according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Since it is often necessary to record a large amount of speaker data in a speech database in order to cover the acoustic characteristics of all speakers in a language as much as possible, often a database may contain thousands of people speaking, resulting in a speech database that includes the problem of some people repeating the speech. If repeated speakers are contained in the database, the total number of the speakers cannot reach the standard, the final training model cannot fully cover the acoustic characteristics of the voice, and the final model is poor in effect due to the fact that the data volume of some people is too large, and therefore the voice of the same speaker needs to be checked on the database before the database is submitted.

At present, a manual sampling inspection mode is adopted, for example, a database of 1000 persons randomly samples data of 100 persons, and the data of 100 persons is manually checked one by one to see whether there are repeat speakers in the data of 100 persons. However, the sampling test can only obtain the proportion of repeat speakers on a sample, cannot eliminate all repeat speakers, and extracts a certain proportion of data for manual listening and identification, which is time-consuming, labor-consuming and not high in accuracy. The embodiment provides a voice detection method, which detects all voices of the same person in a voice database, and improves efficiency and accuracy. The following examples are given for illustrative purposes.

Fig. 1 is a schematic flow chart of a voice detection method according to an embodiment of the present invention, where an execution main body of the embodiment may be a terminal such as a personal computer, a tablet, a mobile phone, and the like, and the embodiment is not limited herein. As shown in fig. 1, the method includes:

s101, a first probability of each test voice except the first test voice in the voice database is obtained by adopting a first pre-obtained speaking model.

Optionally, before this step, the method further comprises establishing a first speaking model based on the plurality of voices of the first speaker. The first speaking model can be established by referring to the implementation process of the prior art, assuming that a speech database contains N speakers, each speaker contains 2 × M sentences of speech, and half the number of speech data of each speaker can be randomly selected as training data, that is, data of the training model, to obtain a speech model of each speaker, where the input of the model can be each speech of the multiple speeches of the first speaker, and the output is probability, where the probability represents the similarity of the speech to the speaker corresponding to the model, and the higher the probability is, the more similar the speech can be represented, theoretically, when the input is the speech of the first speaker, the output probability should be 1.

It should be noted that, in the speech database in the present solution, multiple sentences of speech of the same person in the sample are known, and these speech may be detected manually, but only a certain proportion is selected for manual sampling, so that only speech of the same person can be obtained on the sample.

In this step, the first speaking model is used to detect the probability that the test speech is the speech spoken by the first speaker, and the first speaker is the speaker corresponding to the first test speech. Taking each test voice except the first test voice in the voice database as the input of the first speaking model, obtaining a first probability of each test voice except the first test voice, wherein the first probability represents the similarity of the test voice to the voice spoken by the first speaker, and the closer to 1 can represent the more probable the test voice to be the voice spoken by the first speaker.

S102, according to the first probability of each test voice except the first test voice in the voice database, determining a second test voice which is spoken by the same speaker as the first test voice.

In this step, each test voice except the first test voice in the voice database is input into the first speaking model, a first probability of each test voice except the first test voice is obtained, and a second test voice which is spoken by the same speaker as the first test voice is determined according to the voice with the highest first probability, wherein the second test voice is the test voice with the highest first probability.

In the speech detection method provided by this embodiment, a first probability of each test speech except for a first test speech in a speech database is obtained by using a first speech model obtained in advance, where the first speech model is used to detect the probability that the test speech is the speech spoken by a first speaker, and the first speaker is a speaker corresponding to the first test speech; and determining a second test voice which is spoken by the same speaker as the first test voice according to the first probability of each test voice except the first test voice in the voice database, wherein the second test voice is the test voice with the highest first probability, and all voices of the same person in the voice database are detected, so that the efficiency and the accuracy are improved.

Fig. 2 is a schematic flow chart of a speech detection method according to an embodiment of the present invention, and in another implementation manner of the scheme, the method specifically includes the following steps:

s201, a first probability of each test voice except the first test voice in the voice database is obtained by adopting a first pre-obtained speaking model.

Step S201 is similar to the implementation process of S101 in the embodiment of fig. 1, and is not described herein again.

S202, inputting the first test voice into a second speaking model which is acquired in advance, and acquiring a second probability of the first test voice.

The second speaking model may refer to a specific implementation process of the first speaking model, and is not described herein again.

Optionally, according to the first probability of each test voice except the first test voice in the voice database in the step S201, a second test voice with the highest first probability is obtained, and then the first test voice is used as an input of a second speech model to obtain a second probability of the first test voice, where the second speech model is used to detect a probability that the test voice is a voice spoken by a second speaker, and the second speaker is a speaker corresponding to the second test voice.

S203, according to the first probability of each test voice except the first test voice in the voice database, determining a second test voice which is spoken by the same speaker as the first test voice.

In an implementation manner, a third probability that the first speaker and the second speaker are the same speaker is obtained according to a first probability that the first speaker model detects the second test voice and a second probability that the second speaker model detects the first test voice, so as to further improve and confirm that the first test voice and the second test voice are the same speaker, optionally, the third probability may be an average value of the first probability and the second probability, if the third probability is greater than a preset probability threshold, the first test voice and the second test voice are determined to be the same speaker, that is, the second test voice spoken by the same speaker as the first test voice is determined, and if the third probability is not greater than the preset probability threshold, the first test voice and the second test voice are judged not to be spoken by the same speaker.

In the voice detection method provided by this embodiment, a first probability of each test voice except for the first test voice in the voice database is obtained by using a first pre-obtained speaking model, the first test voice is input into a second pre-obtained speaking model to obtain a second probability of the first test voice, and a second test voice spoken by the same speaker as the first test voice is determined according to the first probability of each test voice except for the first test voice in the voice database, so that all voices of the same person in the voice database are detected, and efficiency and accuracy are improved.

Fig. 3 is a schematic flow chart of a speech detection method provided in an embodiment of the present invention, and as shown in fig. 3, S102 in the embodiment of fig. 1 specifically includes the following steps:

s301, a fourth probability that the detected speaker and the first test voice are the same person in the plurality of test voices corresponding to the second speaker is obtained.

The voice database comprises a plurality of testing voices corresponding to a plurality of persons, and the speaker corresponding to the first testing voice is the first speaker.

Firstly, a test voice of a second speaker is input into each speaking model except the second speaking model and output as a plurality of probability values, and a first speaking model corresponding to the highest probability can be obtained, namely the voice is identified as the first speaking model, namely the voice is spoken by the first speaker.

Similarly, the remaining test voices of the second speaker are respectively input into each speaking model except the second speaking model, whether the corresponding probability of the test voices is highest when the test voices are input into the first speaking model is judged, specifically, for example, the second speaker corresponds to M test voices, the first test voice S1 is input into each speaking model except the second speaking model, the first speaking model corresponding to the highest probability is obtained, then M-1 test voices are respectively input into each speaking model except the second speaking model, if the probability of the M-1 test voices corresponding to the first speaking model is highest, the second speaker and the first speaker are the same person, that is, the fourth probability is 1.

If k voices in the test voices of the second speaker are identified as the first speaking model, the fourth probability that the tested voices of the speaker and the first test voice are the same person in the plurality of test voices corresponding to the second speaker is detected to be k/M.

S302, acquiring a fifth probability that the detected speaker and the second test voice are the same test voice of the same person from a plurality of test voices corresponding to the first speaker.

The implementation process of step S302 is similar to that of step S301, and is not described herein again.

And S303, acquiring a sixth probability that the first speaker and the second speaker are the same speaker according to the fourth probability and the fifth probability.

Alternatively, the sixth probability may be an average of the fourth probability and the fifth probability.

S304, if the sixth probability is larger than the preset probability threshold, the first test voice and the second test voice are determined to be spoken by the same speaker.

In this step, if the sixth probability is greater than the preset probability threshold, it is determined that the first test voice and the second test voice are spoken by the same speaker, that is, it is determined that the second test voice is spoken by the same speaker as the first test voice, and if the sixth probability is not greater than the preset probability threshold, it is determined that the first test voice and the second test voice are not spoken by the same speaker.

In the voice detection method provided by this embodiment, a fourth probability that the detected speaker and the first test voice are the same person in the plurality of test voices corresponding to the second speaker is obtained, a fifth probability that the detected speaker and the second test voice are the same person in the plurality of test voices corresponding to the first speaker is obtained, a sixth probability that the first speaker and the second speaker are the same speaker is obtained according to the fourth probability and the fifth probability, and if the sixth probability is greater than a preset probability threshold, it is determined that the first test voice and the second test voice are the same speaker, all voices of the same person in the voice database are detected, and efficiency and accuracy are improved.

Fig. 4 is a schematic structural diagram of a first voice detection apparatus according to an embodiment of the present invention, as shown in fig. 4, the voice detection apparatus 40 includes: the device comprises an acquisition module 401 and a processing module 402.

An obtaining module 401, configured to obtain, by using a first speech model obtained in advance, a first probability for each test speech except for a first test speech in a speech database, where the first speech model is used to detect a probability that the test speech is a speech spoken by a first speaker, and the first speaker is a speaker corresponding to the first test speech;

a processing module 402, configured to determine, according to a first probability of each test voice except for a first test voice in the voice database, a second test voice spoken by the same speaker as the first test voice, where the second test voice is a test voice with a highest first probability.

Optionally, the processing module 402 is further configured to establish the first speaking model according to the multiple voices of the first speaker.

Optionally, the obtaining module 401 is further configured to input the first test speech into a second speech model obtained in advance, and obtain a second probability of the first test speech, where the second speech model is used to detect a probability that the test speech is a speech spoken by a second speaker, and the second speaker is a speaker corresponding to the second test speech.

Optionally, the obtaining module 401 is specifically configured to obtain a third probability that the first speaker and the second speaker are the same speaker according to a first probability that the first speaker model detects the second test voice and a second probability that the second speaker model detects the first test voice;

the processing module 402 is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if the third probability is greater than a preset probability threshold.

Optionally, the processing module 402 is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if an average of the first probability and the second probability is greater than a preset probability threshold.

Optionally, the obtaining module 401 is specifically configured to obtain a fourth probability that the detected speaker is the same person as the first test voice in the multiple test voices corresponding to the second speaker;

the processing module 402 is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if the sixth probability is greater than a preset probability threshold.

Optionally, the processing module 402 is specifically configured to determine that the first test voice and the second test voice are spoken by the same speaker if an average of the fourth probability and the fifth probability is greater than a preset probability threshold.

The device provided in this embodiment may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, an electronic device 50 according to the embodiment includes: a processor 501 and a memory 502; wherein

A memory 502 for storing computer-executable instructions;

the processor 501 is configured to execute computer-executable instructions stored in the memory to implement the steps performed by the receiving device in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 502 may be separate or integrated with the processor 501.

When the memory 502 is separately provided, the voice interaction device further comprises a bus 503 for connecting the memory 502 and the processor 501.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the voice detection method as described above is implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for speech detection, comprising:

determining a second test voice with the highest first probability in each test voice according to the first probability of each test voice except the first test voice in the voice database;

inputting the first test voice into a second speaking model which is acquired in advance, and acquiring a second probability of the first test voice, wherein the second speaking model is used for detecting the probability that the test voice is the voice spoken by a second speaker, and the second speaker is the speaker corresponding to the second test voice;

acquiring a first probability of detecting the first test voice according to the first speaking model and a second probability of detecting the second test voice according to the second speaking model, and acquiring a third probability that the first speaker and the second speaker are the same speaker;

2. The method of claim 1, wherein prior to obtaining the first probability for each test utterance in the speech database other than the first test utterance using the pre-obtained first utterance model, the method further comprises:

the first speaking model is established according to a plurality of speeches spoken by a first speaker.

3. The method of claim 1, wherein the third probability is an average of the first probability and the second probability.

4. The method of claim 1, wherein the voice database includes a plurality of test voices corresponding to a plurality of persons; said determining that said first test utterance and said second test utterance are spoken by the same speaker comprises:

5. The method of claim 4, wherein the sixth probability is an average of the fourth probability and the fifth probability.

6. A speech detection apparatus, comprising:

the processing module is used for determining a second test voice with the highest first probability in each test voice according to the first probability of each test voice except the first test voice in the voice database;

the acquisition module is used for inputting the first test voice into a second speaking model acquired in advance and acquiring a second probability of the first test voice, the second speaking model is used for detecting the probability that the test voice is the voice spoken by a second speaker, and the second speaker is the speaker corresponding to the second test voice;

the obtaining module is specifically configured to obtain a third probability that the first speaker and the second speaker are the same speaker according to a first probability that the first speaker model detects the second test voice and a second probability that the second speaker model detects the first test voice;

7. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the speech detection method of any of claims 1 to 5.

8. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the speech detection method of any one of claims 1 to 5.