WO2018095167A1 - Procédé d'identification d'empreinte vocale et système d'identification d'empreinte vocale - Google Patents

Procédé d'identification d'empreinte vocale et système d'identification d'empreinte vocale Download PDF

Info

Publication number
WO2018095167A1
WO2018095167A1 PCT/CN2017/106886 CN2017106886W WO2018095167A1 WO 2018095167 A1 WO2018095167 A1 WO 2018095167A1 CN 2017106886 W CN2017106886 W CN 2017106886W WO 2018095167 A1 WO2018095167 A1 WO 2018095167A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
tested
sample
type
feature matrix
Prior art date
Application number
PCT/CN2017/106886
Other languages
English (en)
Chinese (zh)
Inventor
雷利博
薛韬
罗超
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2018095167A1 publication Critical patent/WO2018095167A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Definitions

  • the present disclosure relates to the field of voiceprint recognition, and in particular to a voiceprint recognition method and a voiceprint recognition system.
  • Voiceprint refers to a spectrum pattern showing sound wave characteristics drawn by a special electro-acoustic conversion instrument (such as a sonograph, a mapter, etc.), which is a collection of various acoustic feature maps.
  • a special electro-acoustic conversion instrument such as a sonograph, a mapter, etc.
  • the voiceprint is a long-term stable characteristic signal. Due to the innate physiological differences of the vocal organs and the behavioral differences formed by the acquired organs, each person's voiceprint has a strong personal color.
  • Voiceprint recognition is a biometric method that automatically recognizes the identity of a speaker based on characteristic parameters such as unique physiological and behavioral characteristics contained in human speech.
  • the voiceprint recognition mainly collects the voice information of the person, extracts the unique voice feature and converts it into a digital symbol, and saves it into a feature template, so that the voice to be recognized is matched with the template in the database during the application, thereby discriminating the speech. Human identity.
  • Sound spectrum analysis plays a major role in the lives of modern people. For example, the installation, adjustment and operation of machinery in industrial production can be monitored by means of sound spectrum analysis. In addition, sound spectrum analysis has a wide range of applications in scientific testing of musical instrument manufacturing processes, jewelry identification, communication and efficient use of broadcast equipment.
  • the "voiceprint recognition" technology can be used for identity authentication to discriminate the identity of the speaker.
  • most of the research results in this field are based on text relevance, that is, the verifier must be pronounced according to the prescribed text, thus limiting the development of the technology.
  • the fault tolerance of the existing algorithms is too poor, basically relying on a similarity score to assess whether the samples of the two speech features belong to the same person. If the sample size is not large enough or the sample's speech feature similarity is high, it is difficult to make an accurate judgment.
  • a voiceprint recognition method may include: receiving audio to be tested and dividing the audio to be tested into a first part and a second part; selecting one sample from the sample database Audio and dividing the selected sample audio into a first part and a second part; extracting a feature matrix for the audio to be tested and the selected sample audio by using a method of extracting the Mel cepstrum coefficients; Part of the feature matrix is used as the first type of sample, and the feature matrix of the selected sample audio is used as the second type of sample, and the support vector machine training is performed.
  • the second part of the audio to be tested belongs to the ratio a of the second type of samples; by using the feature matrix of the first part of the selected sample audio as the first type of samples and the feature matrix of the audio to be tested as the second type of samples, performing Support vector machine training, and calculate the ratio b of the second part of the selected sample audio belonging to the second type of sample; by using the feature matrix of the second part of the audio to be tested as the first type of sample, and the characteristics of the selected sample audio As a second type of sample, the matrix performs support vector machine training, and calculates a ratio c of the first part of the audio to be tested belonging to the second type of sample; by using the feature matrix of the second part of the selected sample audio as the first type of sample, and Using the feature matrix of the audio to be tested as the second type of sample, performing support vector machine training, and calculating the ratio d of the first part of the selected sample audio belonging to the second type of sample; calculating according to the calculated a, b, c, and d The degree to which the
  • the voiceprint recognition method further includes: preprocessing the received audio to be tested, wherein the preprocessing includes at least one of: pre-emphasizing the audio to be detected;
  • the framing method of the stacked segment is to framing the test audio;
  • the Hamming window is applied to eliminate the Gibbs effect; and the speech frame and the non-speech frame are distinguished and the non-speech frame is discarded.
  • the dividing the audio to be tested into the first portion and the second portion includes dividing the audio to be tested into two portions of equal length.
  • the dividing the selected sample audio into the first portion and the second portion comprises dividing the selected sample audio into two portions of equal length.
  • the calculating the degree of matching of the audio to be tested and the sample audio comprises: calculating an average value of a, b, c, and d; and determining a ratio of the average value to 0.5 as the audio and sample to be tested The degree of matching of the audio.
  • a voiceprint recognition system comprising: a receiver configured to receive audio to be tested; a sample database configured to store one or more sample audio; a vector machine configured to classify the test data according to the classification sample; the controller configured to: divide the audio to be tested from the receiver into the first part and the second part, and select a sample audio from the sample database and select the The sample audio is divided into a first part and a second part; the feature matrix for the audio to be tested and the selected sample audio is extracted by using the extraction method of the Mel cepstral coefficient; the test to be tested as the first type of sample is input to the support vector machine a feature matrix of the first portion of the audio and a feature matrix of the selected sample audio as the second type of samples and training the support vector machine, calculating a ratio a of the second portion of the audio to be tested belonging to the second type of samples; Entering the feature matrix of the first part of the selected sample audio as the first type of sample and as the first a
  • the controller may be further configured to perform pre-processing on the received audio to be tested; wherein the pre-processing comprises at least one of: pre-emphasizing the audio to be detected; overlapping by using The segmented framing method framing the test audio; applying a Hamming window to eliminate the Gibbs effect; and distinguishing between speech frames and non-speech frames and discarding non-speech frames.
  • the controller is further configured to divide the audio to be tested into two parts of equal length.
  • the controller is further configured to split the selected sample audio into two parts of equal length.
  • the controller is further configured to: calculate an average value of a, b, c, and d; and determine a ratio of the average value to 0.5 as a degree of matching of the audio to be tested and the sample audio.
  • a computer system comprising: one or more processors; a memory for storing one or more programs, wherein when the one or more programs are When executed by a plurality of processors, the one or more processors are caused to implement the voiceprint recognition method as described above.
  • a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the voiceprint recognition method as described above.
  • FIG. 1 is a block diagram showing the structure of a voiceprint recognition system according to an exemplary embodiment of the present disclosure
  • FIG. 2 illustrates an operational logic diagram of a voiceprint recognition method in accordance with an example embodiment of the present disclosure
  • FIG. 3 illustrates a flow chart of a voiceprint recognition method according to an example embodiment of the present disclosure
  • FIG. 4 is a diagram showing an example of a process of training the support vector machine of FIG. 3 and calculating an audio matching degree
  • FIG. 5 schematically illustrates a block diagram of a computer system suitable for implementing a voiceprint recognition method in accordance with an embodiment of the present disclosure.
  • the present disclosure provides a text-independent voiceprint recognition method and a voiceprint recognition system, wherein the voiceprint recognition method can effectively improve the fault tolerance of voiceprint recognition in a small sample, and quickly and efficiently identify two segments. Whether the audio belongs to the same person has broad application prospects. Through speaker recognition in voiceprint recognition technology, identity identification using voice information can be achieved.
  • FIG. 1 shows a block diagram of a structure of a voiceprint recognition system 100 in accordance with an exemplary embodiment of the present disclosure.
  • the voiceprint recognition system 100 includes a receiver 110 configured to receive audio to be tested, a sample database 120 configured to store one or more sample audios, and a support vector machine 130 configured to test according to a classification sample. The data is classified; and the controller 140.
  • the support vector machine 130 is capable of performing a classification function.
  • the input space is first transformed into a high-dimensional space by a nonlinear transformation, so that the sample is transformed into a linearly separable case, wherein the The linear transformation is achieved by an appropriate inner product function; then the optimal linear classification surface is sought in the new space to achieve the classification function.
  • the controller 140 may be configured to divide the audio to be tested from the receiver 110 into a first portion and a second portion, and select one sample audio from the sample database 130 and divide the selected sample audio into the first portion and the second portion. For example, the audio to be tested and the selected sample audio are both divided into two parts of equal length.
  • the controller 140 extracts a feature matrix for the audio to be tested and the selected sample audio by using the extraction method of the Mel Cepstrum Coefficient (MFCC).
  • MFCC Mel Cepstrum Coefficient
  • the Mel frequency is based on the auditory characteristics of the human ear, which is nonlinearly related to the frequency (Hz).
  • the Mel Frequency Cepstral Coefficient (MFCC) is a Hz spectral feature calculated using this relationship between them.
  • MFCC Mel Frequency Cepstral Coefficient
  • the controller 140 determines whether the audio to be tested and the selected sample audio are from the same person by using a support vector machine.
  • the feature matrix of the first part of the to-be-tested audio as the first type of samples and the feature matrix of the selected sample audio as the second type of samples may be input to the support vector machine 130 and the support vector machine 130 may be trained to calculate The second portion of the audio to be tested belongs to the ratio a of the second type of samples; by inputting to the support vector machine 130 the feature matrix of the first portion of the selected sample audio as the first type of samples and the audio to be tested as the second type of samples Feature matrix and training the support vector machine 130 to calculate a ratio b of the second portion of the selected sample audio belonging to the second type of sample; by inputting to the support vector machine 130 the second portion of the audio to be tested as the first type of sample a feature matrix and a feature matrix of the selected sample audio as the second type of samples and training the support vector machine 130 to calculate a ratio c of the first portion of
  • the controller 140 may be further configured to pre-process the received audio to be tested, for example, pre-emphasis the audio to be detected; pre-value filtering and high frequency compensation; then by using overlapping points
  • the segmentation method of the segment is to frame the test audio; then the Hamming window is applied to eliminate the Gibbs effect; and the speech frame and the non-speech frame are distinguished and the non-speech frame is discarded.
  • the sound signal tends to be continuously changing, in order to simplify the continuously varying signal, it is assumed that the audio signal does not change in a short time scale, so that the signal is grouped into a unit by a plurality of sampling points, which is called a "frame". That is, "one frame.”
  • a frame is often 20-40 milliseconds. If the frame length is shorter, the sampling points in each frame will not be enough to make a reliable spectrum calculation. However, if the length is too long, each frame signal will change too. Big.
  • FIG. 2 illustrates an operational logic diagram of a voiceprint recognition method in accordance with an example embodiment of the present disclosure.
  • the audio to be tested is received by the receiver; then, in operation S05, the audio to be tested is pre-processed, for example, pre-value filtering and high-frequency compensation; then the audio is to be tested by using the overlapping segmentation method Framing is performed; then a Hamming window is applied to eliminate the Gibbs effect; and speech and non-speech frames are distinguished and non-speech frames are discarded.
  • the audio to be tested Split into first and second parts.
  • sample audio may be selected from the sample database, and the selected sample audio is divided into a first portion and a second portion at operation S20.
  • feature vectors for respective portions of the audio to be tested and the selected sample audio are extracted by using the extraction method of the Mel cepstrum coefficients, so that one or more of the feature vectors are used in operation S30. To train the support vector machine.
  • operation S35 it is determined whether the audio to be tested and the selected sample audio are from the same person.
  • FIG. 3 illustrates a flow chart of a voiceprint recognition method in accordance with an example embodiment of the present disclosure.
  • the audio A to be tested is received and the audio A to be tested is divided into a first part A1 and a second part A2.
  • a sample audio B is selected from the sample database and the selected sample audio B is divided into a first portion B1 and a second portion B2.
  • the audio A to be tested can be divided from the middle into two parts of equal lengths A1 and A2, while the sample audio B is equally divided into two parts B1 and B2 from the middle.
  • the audio to be tested and the selected sample audio may be divided in other division ratios, for example, the audio to be tested is divided into two parts of 1:2, and the selected sample audio is divided into Two parts of 2:3.
  • the method may further include pre-processing the audio to be tested, for example, pre-emphasizing the audio to be detected; framing the test audio by using a framing method of overlapping segments; applying Hamming Window to eliminate the Gibbs effect; and distinguish between speech frames and non-speech frames and discard non-speech frames.
  • a special filter is firstly designed to filter and high-frequency compensation according to the frequency characteristics of the speech signal; then, the overlapping segmentation method is used to perform frame division; secondly, the signal is added to the signal.
  • the window is used to eliminate the Gibbs effect; then the endpoint detection method is used to distinguish the speech frame from the non-speech frame according to the short-time energy and the short-term average zero-crossing rate, and the non-speech frame is discarded.
  • a feature matrix for the audio to be tested and the selected sample audio is extracted by using the extraction method of the Mel cepstrum coefficients. That is to say, according to the extraction method of Mel's cepstrum coefficient, a vector of 1 row and 20 columns is extracted from each frame of each speaker's speech as its feature vector, then a person's n frame constitutes a feature vector. n rows and 20 columns of feature matrices.
  • step S320 by using the feature matrix of the first part A1 of the audio to be tested as the first type of samples and the feature matrix of the selected sample audio B as the second type of samples, the support vector machine training is performed, and the audio to be tested is calculated.
  • the second portion A2 belongs to the ratio a of the second type of samples in order to determine whether the second portion A2 of the audio to be tested belongs to the selected sample audio; then in step S325, the feature matrix of the first portion B1 of the selected sample audio is taken as the first a type of sample, and the feature matrix of the audio A to be tested is used as the second type of sample, performing support vector machine training, and calculating the ratio b of the second part B2 of the selected sample audio belonging to the second type of sample; then, in step S330 Performing support vector machine training by calculating the feature matrix of the second part A2 of the audio to be tested as the first type of samples and using the feature matrix of the selected sample audio B as the second type of samples, and calculating the first part of the audio to be tested A1 belongs to the second category a ratio c of the samples; and in step S335, performing support vector machine training by using the feature matrix of the second portion B2 of the selected sample audio as the first type of samples and using the feature matrix of the audio A to be tested as the second type
  • step S340 based on the calculated a, b, c, and d, the degree of matching between the audio to be tested and the selected sample audio is calculated to determine whether the audio to be tested and the selected sample audio are from the same person.
  • the sound of For example, the average of a, b, c, and d can be calculated, and the ratio of the average to 0.5 can be determined as the degree of matching of the audio to be tested with the sample audio. In this case, if the audio to be tested and the selected sample audio belong to one person, the average value should be close to 0.5. If not from the same person, the average should be close to zero.
  • the ratio of the average value to 0.5 can be regarded as the degree of matching of the audio to be tested with the sample audio. According to this matching degree, it is possible to confirm whether the matching result and the test sample are a person's voice and prevent misjudgment.
  • different proportional thresholds may be set based on the requirements of different application environments to determine whether the audio to be tested and the sample audio are from the same person. For example, in the case of lower security, it can be determined whether the sample audio and the audio to be tested are from the same person by setting the threshold to a lower value, for example, 70%, that is, if the calculated ratio is greater than or equal to 70. %, they think that the two are from the same person, otherwise they think that the two are from different people's voices. In the case of higher security (eg, access control system), it can be determined whether the sample audio and the audio to be tested are from the same person by setting the threshold to a higher value, for example, 95%. This can achieve the effect of adjusting the recognition accuracy according to the needs of the application, and is more convenient for the user to use.
  • the voiceprint recognition method and system proposed by the present disclosure can classify the segmented samples in different manners under different small sample conditions by classifying the to-be-matched audio and sample audio, thereby achieving high fault tolerance and high efficiency. Identification.
  • a computer system comprising: one or more processors; a memory for storing one or more programs, wherein when the one or more programs are When executed by a plurality of processors, the one or more processors are caused to implement the voiceprint recognition method as described above.
  • a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the voiceprint recognition method as described above.
  • FIG. 5 schematically illustrates a block diagram of a computer system suitable for implementing a voiceprint recognition method in accordance with an embodiment of the present disclosure.
  • the computer system shown in FIG. 5 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • a computer system 500 in accordance with an embodiment of the present disclosure includes a processor 501 that can be loaded into a random access memory (RAM) 503 according to a program stored in a read only memory (ROM) 502 or from a storage portion 508.
  • the program in the middle performs various appropriate actions and processes.
  • Processor 501 can include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor, and/or a related chipset and/or a special purpose microprocessor (e.g., an application specific integrated circuit (ASIC)), and the like.
  • ASIC application specific integrated circuit
  • Processor 501 can also include an onboard memory for caching purposes.
  • the processor 501 may include a single processing unit or a plurality of processing units for performing different actions of the method flow according to the embodiments of the present disclosure described with reference to FIGS. 2 and 3.
  • the processor 501 In the RAM 503, various programs and data required for the operation of the system 500 are stored.
  • the processor 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • the processor 501 performs the various operations described above with reference to FIGS. 2 and 3 by executing programs in the ROM 502 and/or the RAM 503. It is noted that the program can also be stored in one or more memories other than ROM 502 and RAM 503.
  • the processor 501 can also perform the various operations described above with reference to FIGS. 2 and 3 by executing a program stored in the one or more memories.
  • System 500 may also include an input/output (I/O) interface 505 to which an input/output (I/O) interface 505 is also coupled, in accordance with an embodiment of the present disclosure.
  • System 500 can also include one or more of the following components coupled to I/O interface 505: an input portion 506 including a keyboard, mouse, etc.; including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker
  • An output portion 507 of the like a storage portion 508 including a hard disk or the like; and a communication portion 509 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program carried on a computer readable storage medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511.
  • the above-described functions defined in the system of the embodiments of the present disclosure are executed when the computer program is executed by the processor 501.
  • the systems, devices, devices, modules, units, and the like described above may be implemented by a computer program module in accordance with an embodiment of the present disclosure.
  • the computer readable storage medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable storage medium other than a computer readable storage medium, which can be transmitted, propagated or transmitted for use by or in connection with an instruction execution system, apparatus or device. program of.
  • Program code embodied on a computer readable storage medium may be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the foregoing.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams or flowcharts, and combinations of blocks in the block diagrams or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be used A combination of dedicated hardware and computer instructions is implemented.
  • the foregoing method can be implemented in a form of executable program commands by a plurality of computer devices and recorded in a computer readable recording medium.
  • the computer readable recording medium may include a separate program command, a data file, a data structure, or a combination thereof.
  • program commands recorded in a recording medium may be specifically designed or configured for use in the present disclosure, or are known to those skilled in the art of computer software.
  • the computer readable recording medium includes a magnetic medium such as a hard disk, a floppy disk or a magnetic tape, an optical medium such as a compact disk read only memory (CD-ROM) or a digital versatile disk (DVD), a magneto-optical medium such as a magneto-optical floppy disk, and, for example, a storage and Hardware devices such as ROM, RAM, and flash memory that execute program commands.
  • the program commands include machine language code formed by the compiler and a high-level language that the computer can execute by using the interpreter.
  • the foregoing hardware device may be configured to operate as at least one software module to perform the operations of the present disclosure, and the reverse operation is also the same.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne un procédé et un système d'identification d'empreinte vocale. Le procédé comprend les étapes suivantes consistant à : recevoir un contenu audio à tester et segmenter le contenu audio à tester en une première partie et en une seconde partie ; sélectionner un échantillon audio et segmenter l'échantillon audio en une première partie et en une seconde partie ; extraire des matrices caractéristiques pour le contenu audio à tester et pour l'échantillon audio à l'aide d'un procédé d'extraction de coefficients cepstraux de fréquence Mel ; exécuter un apprentissage automatique de vecteur de support en utilisant la matrice de caractéristique de la première partie du contenu audio à tester en tant que premier type d'échantillon et en utilisant la matrice caractéristique de l'échantillon audio sélectionné en tant que second type d'échantillon, et calculer le degré de correspondance de la seconde partie du contenu audio à tester et du second type d'échantillon ; réaliser un processus similaire sur la première partie de l'échantillon audio, sur la première partie du contenu audio à tester et sur la seconde partie de l'échantillon audio, et calculer respectivement le degré de correspondance entre les trois avec le contenu audio à tester, avec l'échantillon audio sélectionné et avec le contenu audio à tester en tant que second type d'échantillon correspondant respectif ; et déterminer, selon le degré de correspondance, si les voix dans le contenu audio à tester et dans l'échantillon audio proviennent de la même personne.
PCT/CN2017/106886 2016-11-22 2017-10-19 Procédé d'identification d'empreinte vocale et système d'identification d'empreinte vocale WO2018095167A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611035943.3 2016-11-22
CN201611035943.3A CN108091340B (zh) 2016-11-22 2016-11-22 声纹识别方法、声纹识别系统和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2018095167A1 true WO2018095167A1 (fr) 2018-05-31

Family

ID=62168704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106886 WO2018095167A1 (fr) 2016-11-22 2017-10-19 Procédé d'identification d'empreinte vocale et système d'identification d'empreinte vocale

Country Status (2)

Country Link
CN (1) CN108091340B (fr)
WO (1) WO2018095167A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031961A (zh) * 2018-06-29 2018-12-18 百度在线网络技术(北京)有限公司 用于控制操作对象的方法和装置
CN111489756A (zh) * 2020-03-31 2020-08-04 中国工商银行股份有限公司 一种声纹识别方法及装置
CN115100776A (zh) * 2022-05-30 2022-09-23 厦门快商通科技股份有限公司 一种基于语音识别的门禁认证方法、系统及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108908377B (zh) * 2018-07-06 2020-06-23 达闼科技(北京)有限公司 说话人识别方法、装置和机器人
CN110889008B (zh) * 2018-09-10 2021-11-09 珠海格力电器股份有限公司 一种音乐推荐方法、装置、计算装置和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239457A1 (en) * 2006-04-10 2007-10-11 Nokia Corporation Method, apparatus, mobile terminal and computer program product for utilizing speaker recognition in content management
CN102664011A (zh) * 2012-05-17 2012-09-12 吉林大学 一种快速说话人识别方法
CN102737633A (zh) * 2012-06-21 2012-10-17 北京华信恒达软件技术有限公司 一种基于张量子空间分析的说话人识别方法及其装置
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
CN103562993A (zh) * 2011-12-16 2014-02-05 华为技术有限公司 说话人识别方法及设备
CN104464756A (zh) * 2014-12-10 2015-03-25 黑龙江真美广播通讯器材有限公司 一种小型说话人情感识别系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001318692A (ja) * 2000-05-11 2001-11-16 Yasutaka Sakamoto 音声認識による人物同定システム
CN101562012B (zh) * 2008-04-16 2011-07-20 创而新(中国)科技有限公司 语音分级测定方法及系统
US20160365096A1 (en) * 2014-03-28 2016-12-15 Intel Corporation Training classifiers using selected cohort sample subsets
CN104485102A (zh) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 声纹识别方法和装置
CN105244026B (zh) * 2015-08-24 2019-09-20 北京意匠文枢科技有限公司 一种语音处理方法及装置
CN105244031A (zh) * 2015-10-26 2016-01-13 北京锐安科技有限公司 说话人识别方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239457A1 (en) * 2006-04-10 2007-10-11 Nokia Corporation Method, apparatus, mobile terminal and computer program product for utilizing speaker recognition in content management
CN103562993A (zh) * 2011-12-16 2014-02-05 华为技术有限公司 说话人识别方法及设备
CN102664011A (zh) * 2012-05-17 2012-09-12 吉林大学 一种快速说话人识别方法
CN102737633A (zh) * 2012-06-21 2012-10-17 北京华信恒达软件技术有限公司 一种基于张量子空间分析的说话人识别方法及其装置
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
CN104464756A (zh) * 2014-12-10 2015-03-25 黑龙江真美广播通讯器材有限公司 一种小型说话人情感识别系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031961A (zh) * 2018-06-29 2018-12-18 百度在线网络技术(北京)有限公司 用于控制操作对象的方法和装置
CN111489756A (zh) * 2020-03-31 2020-08-04 中国工商银行股份有限公司 一种声纹识别方法及装置
CN115100776A (zh) * 2022-05-30 2022-09-23 厦门快商通科技股份有限公司 一种基于语音识别的门禁认证方法、系统及存储介质
CN115100776B (zh) * 2022-05-30 2023-12-26 厦门快商通科技股份有限公司 一种基于语音识别的门禁认证方法、系统及存储介质

Also Published As

Publication number Publication date
CN108091340A (zh) 2018-05-29
CN108091340B (zh) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2021128741A1 (fr) Procédé et appareil d'analyse de fluctuation d'émotion dans la voix, et dispositif informatique et support de stockage
Boles et al. Voice biometrics: Deep learning-based voiceprint authentication system
WO2018095167A1 (fr) Procédé d'identification d'empreinte vocale et système d'identification d'empreinte vocale
US9536547B2 (en) Speaker change detection device and speaker change detection method
Ahmad et al. A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network
CN108922541B (zh) 基于dtw和gmm模型的多维特征参数声纹识别方法
US20170154640A1 (en) Method and electronic device for voice recognition based on dynamic voice model selection
WO2020034628A1 (fr) Procédé et dispositif d'identification d'accents, dispositif informatique et support d'informations
JP2019211749A (ja) 音声の始点及び終点の検出方法、装置、コンピュータ設備及びプログラム
Vyas A Gaussian mixture model based speech recognition system using Matlab
CN108335699A (zh) 一种基于动态时间规整和语音活动检测的声纹识别方法
Archana et al. Gender identification and performance analysis of speech signals
CN109215634A (zh) 一种多词语音控制通断装置的方法及其系统
Pao et al. Combining acoustic features for improved emotion recognition in mandarin speech
CN110570870A (zh) 一种文本无关的声纹识别方法、装置及设备
Ramgire et al. A survey on speaker recognition with various feature extraction and classification techniques
Sapijaszko et al. An overview of recent window based feature extraction algorithms for speaker recognition
GB2576960A (en) Speaker recognition
CN109065026A (zh) 一种录音控制方法及装置
Krishna et al. Emotion recognition using dynamic time warping technique for isolated words
CN111429919A (zh) 基于会议实录系统的防串音方法、电子装置及存储介质
Budiga et al. CNN trained speaker recognition system in electric vehicles
Komlen et al. Text independent speaker recognition using LBG vector quantization
Tahliramani et al. Performance analysis of speaker identification system with and without spoofing attack of voice conversion
Estrebou et al. Voice recognition based on probabilistic SOM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17874980

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 04.09.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17874980

Country of ref document: EP

Kind code of ref document: A1