WO2021051505A1 - 基于样本量的声纹聚类方法、装置、设备及存储介质 - Google Patents

基于样本量的声纹聚类方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021051505A1
WO2021051505A1 PCT/CN2019/116474 CN2019116474W WO2021051505A1 WO 2021051505 A1 WO2021051505 A1 WO 2021051505A1 CN 2019116474 W CN2019116474 W CN 2019116474W WO 2021051505 A1 WO2021051505 A1 WO 2021051505A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
clustering
clustered
sample set
interval
Prior art date
Application number
PCT/CN2019/116474
Other languages
English (en)
French (fr)
Inventor
冯晨
王健宗
彭俊清
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051505A1 publication Critical patent/WO2021051505A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of data processing, and in particular to a method, device, equipment, and computer-readable storage medium for voiceprint clustering based on sample size.
  • Voiceprint clustering refers to the clustering algorithm to determine that these voiceprint samples are provided by several independent users from multiple unlabeled voiceprint samples. Multiple unlabeled voiceprint samples are clustered according to their characteristics.
  • the existing voiceprint clustering methods all directly use clustering algorithm to cluster the voiceprint sample sets that need to be clustered, so that when clustering sample sets with a large sample size, not only is the calculation time-consuming but also the clustering effect not ideal. Therefore, how to solve the technical problem of low clustering efficiency of existing voiceprint clustering methods is a problem that needs to be solved urgently at present.
  • the main purpose of this application is to provide a voiceprint clustering method, device, equipment and computer-readable storage medium based on sample size, aiming to solve the technical problem of low clustering efficiency of existing voiceprint clustering methods.
  • the present application provides a voiceprint clustering method based on sample size.
  • the sample size-based voiceprint clustering method is applied to the sample size-based voiceprint clustering system, and the voiceprint clustering method is
  • the class system includes a cache module, a storage module, and a processor.
  • the voiceprint clustering method based on sample size includes the following steps:
  • the processor receives the voiceprint sample set to be clustered sent by the user terminal, stores the voiceprint sample set to be clustered in the cache module, and according to the number of samples in the voiceprint sample set to be clustered , Determining whether the voiceprint sample set to be clustered is a large sample set with a sample size exceeding a preset sample size threshold;
  • the storage module determines the voiceprint clustering model corresponding to the large sample size sample set as the target voice Pattern clustering model
  • the processor inputs the voiceprint sample set to be clustered in the cache module to the target voiceprint clustering model, and clusters the voiceprint sample set to be clustered based on the trained partition clustering algorithm , And output the clustering result of the voiceprint sample to be clustered.
  • the present application also provides a voiceprint clustering device based on sample size, and the voiceprint clustering device based on sample size includes:
  • the sample size determination module is used for the processor to receive the voiceprint sample set to be clustered sent by the user terminal, store the voiceprint sample set to be clustered in the cache module, and according to the voiceprint sample set to be clustered The sample size of the voiceprint sample set, determining whether the voiceprint sample set to be clustered is a large sample set whose sample size exceeds a preset sample size threshold;
  • the clustering model determining module is configured to, if the processor determines that the voiceprint sample set to be clustered is the large sample set, then determine the voice corresponding to the large sample set in the storage module Pattern clustering model, as the target voiceprint clustering model;
  • the first partition clustering module is used for the processor to input the voiceprint sample set to be clustered in the cache module into the target voiceprint clustering model, and to process the to-be-clustered voiceprint clustering model based on the trained partition clustering algorithm. Clustering the voiceprint sample set is clustered, and the clustering result of the voiceprint sample to be clustered is output.
  • the present application also provides a sample size-based voiceprint clustering device.
  • the sample size-based voiceprint clustering device includes a processor, a memory, and a device stored on the memory and can be The computer-readable instructions for voiceprint clustering based on sample size executed by the processor, wherein when the computer-readable instructions for voiceprint clustering based on sample size are executed by the processor, the above-mentioned sample size-based voiceprint clustering computer readable instruction is implemented. The steps of the voiceprint clustering method.
  • the present application also provides a computer-readable storage medium having stored on the computer-readable storage medium a computer-readable instruction for voiceprint clustering based on sample size, wherein the voiceprint clustering based on sample size is
  • the computer-readable instructions for pattern clustering are executed by the processor, the steps of the voiceprint clustering method based on sample size as described above are realized.
  • This application provides a voiceprint clustering method based on sample size, that is, the processor receives the voiceprint sample set to be clustered sent by the user terminal, and stores the voiceprint sample set to be clustered in the cache Module, and according to the number of samples in the voiceprint sample set to be clustered, determine whether the voiceprint sample set to be clustered is a large sample set with a sample size exceeding a preset sample size threshold; if the processor determines If the voiceprint sample set to be clustered is the large sample set, the voiceprint clustering model corresponding to the large sample set is determined in the storage module as the target voiceprint clustering model; The processor inputs the voiceprint sample set to be clustered in the cache module to the target voiceprint clustering model, and clusters the voiceprint sample set to be clustered based on the trained partition clustering algorithm, And output the clustering result of the voiceprint sample to be clustered.
  • this application adopts different clustering models for sample sizes of different scales, and shortens the clustering time of a large sample volume of voiceprint sample sets to be clustered through partition clustering, improves the clustering effect, and solves the problem.
  • the technical problem of low clustering efficiency of existing voiceprint clustering methods is the technical problem of low clustering efficiency of existing voiceprint clustering methods.
  • FIG. 1 is a schematic diagram of the hardware structure of the voiceprint clustering device based on sample size involved in the solution of the embodiment of the application;
  • FIG. 2 is a schematic flowchart of a first embodiment of a voiceprint clustering method based on sample size according to this application;
  • FIG. 3 is a schematic flowchart of a second embodiment of a voiceprint clustering method based on sample size according to this application.
  • the sample size-based voiceprint clustering method involved in the embodiments of this application is mainly applied to a sample size-based voiceprint clustering device.
  • the sample size-based voiceprint clustering device may be a PC, a portable computer, a mobile terminal, etc., with a display And processing function equipment.
  • FIG. 1 is a schematic diagram of the hardware structure of the voiceprint clustering device based on sample size involved in the solution of the embodiment of the application.
  • the voiceprint clustering device based on sample size may include a processor 1001 (for example, a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to realize the connection and communication between these components;
  • the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard);
  • the network interface 1004 may optionally include a standard wired interface, a wireless interface (Such as WI-FI interface);
  • the memory 1005 can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a disk memory.
  • the memory 1005 can optionally be a storage device independent of the aforementioned processor 1001 .
  • FIG. 1 does not constitute a limitation on the sample size-based voiceprint clustering device, and may include more or fewer components than shown in the figure, or a combination of certain components. Or different component arrangements.
  • the memory 1005 in FIG. 1 as a computer-readable storage medium may include an operating system, a network communication module, and computer-readable instructions for voiceprint clustering based on sample size.
  • the network communication module is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the sample size-based voiceprint clustering computer readable instructions stored in the memory 1005, and execute the embodiments of the present application Provide voiceprint clustering method based on sample size.
  • the embodiment of the present application provides a voiceprint clustering method based on sample size.
  • FIG. 2 is a schematic flowchart of a first embodiment of a voiceprint clustering method based on sample size in this application.
  • the sample size-based voiceprint clustering method is applied to the sample size-based voiceprint clustering system.
  • the voiceprint clustering system includes a cache module, a storage module, and a processor.
  • the sample size voiceprint clustering method includes the following steps:
  • Step S10 the processor receives the voiceprint sample set to be clustered sent by the user terminal, stores the voiceprint sample set to be clustered in the cache module, and according to the voiceprint sample set to be clustered Determine whether the voiceprint sample set to be clustered is a large sample set with a sample size exceeding a preset sample size threshold;
  • the computing resources and configuration of the clustering results can also be obtained in a relatively short time. For a sample set with a large sample size, the clustering calculation will take a long time.
  • the voiceprint clustering method provided in this embodiment is suitable for The voiceprint sample sets to be clustered with different sample sizes adopt different clustering models for clustering calculation.
  • the processor receives the voiceprint sample set to be clustered sent by the user through the user terminal, the processor first stores the voiceprint sample set to be clustered in the cache module, so as to subsequently call the corresponding clustering model to The voiceprint sample set to be clustered is clustered.
  • a threshold for the number of voiceprint samples is preset, and then the number of samples corresponding to the set of voiceprint samples to be clustered in the cache module is obtained, and the number of samples is compared with a preset sample size threshold to determine the Whether the voiceprint sample set to be clustered is a large sample set whose sample size exceeds the sample size threshold.
  • Step S20 If the processor determines that the voiceprint sample set to be clustered is the large sample size sample set, determine the voiceprint clustering model corresponding to the large sample size sample set in the storage module, As the target voiceprint clustering model;
  • the voiceprint clustering model corresponding to the small sample size sample set is used, if If the number of voiceprint samples in the voiceprint sample set to be clustered is greater than the threshold, that is, the voiceprint sample set to be clustered is a large sample size, then the voiceprint clustering model corresponding to the large sample size sample set is used as the target voiceprint cluster model.
  • the voiceprint clustering model corresponding to the large sample size sample set is added with a parallel model, and Map and Reduce (Map: mapping, Reduce: reduction) methods are used in the calculation process of the clustering algorithm.
  • step S30 the processor inputs the voiceprint sample set to be clustered in the cache module to the target voiceprint clustering model, and analyzes the voiceprint sample set to be clustered based on the trained partition clustering algorithm. Perform clustering, and output the clustering results of the voiceprint samples to be clustered.
  • the processor inputs the voiceprint sample set to be clustered in the cache module into the corresponding target voiceprint clustering model for clustering, so that the target voiceprint clustering model is based on the trained
  • the partition clustering algorithm of the clustering algorithm clusters the voiceprint sample set to be clustered, obtains the voiceprint provider corresponding to the voiceprint sample set to be clustered, and outputs the clustering result of the voiceprint sample to be clustered .
  • This embodiment provides a voiceprint clustering method based on sample size.
  • This application adopts different clustering models for sample sizes of different scales, and reduces the number of voiceprint sample sets with a large sample size to be clustered through partition clustering.
  • the clustering time improves the clustering effect and solves the technical problem of low clustering efficiency of the existing voiceprint clustering methods.
  • FIG. 3 is a schematic flowchart of a second embodiment of a voiceprint clustering method based on sample size according to the present application.
  • the method further includes:
  • Step S40 Perform data preprocessing and feature extraction on the voiceprint sample set to be clustered, and extract the voiceprint sample features MFCC, the first-order difference of MFCC, and the second-order MFCC of the voiceprint sample set to be clustered difference;
  • Step S50 The processor determines each voiceprint in the voiceprint sample set to be clustered based on a preset method and the MFCC of the voiceprint sample set to be clustered, the first-order difference of MFCC, and the second-order difference of MFCC.
  • the voiceprint vector I-vector corresponding to the data, and the voiceprint vector I-vector corresponding to each piece of voiceprint data is stored in the storage module.
  • the voiceprint sample has multiple features. Perform data preprocessing and feature extraction on the voiceprint sample set to be clustered.
  • the feature of the voiceprint sample is selected as: MFCC (Mel Frequency Cepstral Coefficents, Mel frequency inversion) Spectral coefficient) and the first-order difference of MFCC, the second-order difference of MFCC, that is, the first-order difference of MFCC and MFCC is selected in this embodiment, and the second-order difference of MFCC is used as the input of the clustering model; the processed voiceprint feature data is adopted
  • the GMM+UBM+JFA method is used to obtain the I-vector corresponding to each voiceprint data.
  • the I-vector is a low-dimensional fixed-length vector containing only speaker spatial information.
  • the GMM+UBM+JFA method is based on GMM (Gaussian Mixture Model, Gaussian Mixture Model)-UBM (Universal Background Model, Universal Background Model) joint factor analysis (Joint Factor Analysis, JFA) method.
  • GMM Gaussian Mixture Model
  • UBM Universal Background Model
  • JFA Joint Factor Analysis
  • the method further includes:
  • the storage module determines the voiceprint cluster corresponding to the small sample size sample set.
  • Class model as the target voiceprint clustering model
  • the processor inputs the voiceprint vector I-vector corresponding to each voiceprint data corresponding to the voiceprint sample set to be clustered in the storage module to the target voiceprint clustering model;
  • the target voiceprint clustering model divides each voiceprint data in the voiceprint sample set to be clustered into intervals of equal length based on the voiceprint vector I-vector corresponding to each voiceprint data;
  • the target voiceprint clustering model obtains the pairwise corresponding class spacing and PLDA score of each voiceprint data in each interval, and clusters each voiceprint data in each interval based on the class spacing and the PLDA score, And output the clustering result corresponding to the voiceprint sample set to be clustered.
  • the target voiceprint clustering model obtains the pairwise corresponding class spacing and PLDA score of each voiceprint data in each interval, and performs each voiceprint data in each interval based on the class spacing and the PLDA score
  • the step of performing clustering and outputting the clustering result corresponding to the voiceprint sample set to be clustered specifically includes:
  • the target voiceprint clustering model records each piece of voiceprint data in each interval as one category, as the initial category;
  • Clustering is performed based on the cluster spacing corresponding to each piece of voiceprint data, and the clustering result corresponding to the voiceprint sample set to be clustered is output.
  • the length of the selection vector is 600. If it is determined in the above step S10 that the voiceprint clustering model is a clustering model corresponding to a sample set with a small sample size, then the voiceprint samples to be clustered after the feature processing are collected The I-vector corresponding to each voiceprint data is input to the first clustering model, and the following steps are performed:
  • Step a the first clustering model first divides each dimension of the 600-dimensional I-vector into k intervals of equal length: [a1,b1),[a2,b2),...[ak,bk ); Wherein, the value of K can be 10% of the total voiceprint sample volume included in the voiceprint sample set to be clustered;
  • Step b in each interval, regard each voiceprint sample as one class and record it as the initial class.
  • the class distance 1-the standardized PLDA score of the two samples representing the two classes, and the class distance is obtained.
  • Step c automatic clustering based on class spacing
  • step S30 specifically includes:
  • the processor inputs the voiceprint vector I-vector corresponding to each voiceprint data corresponding to the voiceprint sample set to be clustered in the storage module to the target voiceprint clustering model;
  • the target voiceprint clustering model divides each voiceprint data in the voiceprint sample set to be clustered into intervals of equal length based on the voiceprint vector I-vector corresponding to each voiceprint data;
  • the target voiceprint clustering model uses a clustering algorithm to cluster each piece of voiceprint data in each interval in parallel, and outputs the clustering results of the voiceprint samples to be clustered.
  • the target voiceprint clustering model divides each voiceprint data in the voiceprint sample set to be clustered into intervals of equal length based on the voiceprint vector I-vector corresponding to each voiceprint data After that, it also includes:
  • the target voiceprint clustering model determines a dense interval and a sparse interval in the divided intervals according to a preset number threshold, wherein if the number of voiceprint samples falling in the current interval is greater than the number threshold, the current The interval is a dense interval, if not, the current interval is a sparse interval;
  • the target voiceprint clustering model obtains the interval density of the sparse interval adjacent to the dense interval, and compares the interval density with a preset density threshold to determine whether the sparse interval adjacent to the dense interval is a sparse interval.
  • the target voiceprint clustering model determines that the sparse section adjacent to the dense section is a dense part of the sparse section, merge the sparse section adjacent to the dense section into the dense section;
  • the target voiceprint clustering model merges adjacent dense intervals, and updates the dense and sparse intervals corresponding to the voiceprint sample set to be clustered.
  • the target voiceprint clustering model uses a clustering algorithm to cluster each piece of voiceprint data in each interval in parallel, and the step of outputting the clustering result of the voiceprint sample to be clustered specifically includes:
  • the target voiceprint clustering model performs parallel local clustering in each dense interval and each sparse interval through a cure algorithm
  • the target voiceprint clustering model performs clustering processing on the voiceprint data after parallel local clustering through the Map function and the Reduce function, and merges and outputs the clustering results of each interval.
  • the voiceprint clustering model is a clustering model corresponding to a large sample set
  • the feature processing is performed on the voiceprint data corresponding to each voiceprint data in the voiceprint sample set to be clustered.
  • -vector input to the clustering model corresponding to a large sample set, and enter the following steps:
  • Step a the first clustering model first divides each dimension of the 600-dimensional I-vector into k intervals of equal length: [a 1 ,b 1 ),[a 2 ,b 2 ),... [a k ,b k );
  • the value of K can be 10% of the total voiceprint sample volume included in the voiceprint sample set to be clustered; this step corresponds to the clustering model corresponding to the small sample set above
  • the steps are the same, the data interval is divided into grids, the clustering time is shortened, and the clustering effect is improved.
  • Step d Use the threshold to judge the dense interval and the sparse interval; in this embodiment, the threshold is set to 60% of the total voiceprint sample volume. When the volume of voiceprint samples falling in the current interval is greater than the set threshold, the interval is a dense interval, otherwise Is the sparse interval;
  • Step e update the dense interval; if the dense interval [a i ,b i ) is 0.5 interval of the adjacent sparse interval, that is , the density threshold of [a i +d, a i +d/2)>0.5*density threshold, where d is the length of the interval, then mark the interval as the dense part of the sparse interval, merge the interval into the dense interval, and update the dense interval to [a i , b i + d/2), if the dense interval [a i ,b i ) 0.5 interval of adjacent sparse interval ⁇ 0.5*density threshold, do not do any processing;
  • Step f processing the adjacent intervals of all dense intervals in each dimension, and merge the adjacent dense intervals
  • Step g clustering by clustering algorithm in each grid unit; for example, the cure algorithm is used for local clustering, when the size of the grid reaches the set size, multiple samples falling into the grid are similar , And the sample points of different grids are not similar. The distance between the samples of different grids is greater than the distance between the samples of the same grid. Therefore, clustering is performed in the sample set with a small distance first to improve the clustering. The efficiency of the class.
  • the calculation task is divided into two stages: Map and Reduce.
  • Map function runs on multiple nodes to process one or more local data partitions; the Reduce function processes the output of the Map function.
  • the intermediate results of can also be run in parallel. All the outputs of Reduce are combined to obtain the results of all the partitions.
  • the embodiment of the present application also provides a voiceprint clustering device based on sample size.
  • the voiceprint clustering device based on sample size includes:
  • the sample size determination module is used for the processor to receive the voiceprint sample set to be clustered sent by the user terminal, store the voiceprint sample set to be clustered in the cache module, and according to the voiceprint sample set to be clustered The sample size of the voiceprint sample set, determining whether the voiceprint sample set to be clustered is a large sample set whose sample size exceeds a preset sample size threshold;
  • the first model determination module is configured to, if the processor determines that the voiceprint sample set to be clustered is the large sample set, then determine the voice corresponding to the large sample set in the storage module Pattern clustering model, as the target voiceprint clustering model;
  • the voiceprint clustering device based on sample size further includes:
  • the sample feature extraction module is used to perform data preprocessing and feature extraction on the voiceprint sample set to be clustered, and extract the voiceprint sample features MFCC, the first-order difference sum of the MFCC of the voiceprint sample set to be clustered The second-order difference of MFCC;
  • the voiceprint vector determination module is used for the processor to determine the voiceprint sample to be clustered based on a preset method and the MFCC of the voiceprint sample set to be clustered, the first-order difference of MFCC, and the second-order difference of MFCC.
  • the voiceprint vector I-vector corresponding to each piece of voiceprint data is collected, and the voiceprint vector I-vector corresponding to each piece of voiceprint data is stored in the storage module.
  • the first partition clustering module specifically includes:
  • the voiceprint vector input unit is used for the processor to input the voiceprint vector I-vector corresponding to each voiceprint data corresponding to the voiceprint sample set to be clustered in the storage module to the target voiceprint Clustering model
  • the partition parallel clustering unit is used for the target voiceprint clustering model to perform parallel clustering of each voiceprint data in each interval through a clustering algorithm, and output the clustering result of the voiceprint sample to be clustered.
  • first partition clustering module is also used for:
  • the target voiceprint clustering model determines a dense interval and a sparse interval in the divided intervals according to a preset number threshold, wherein if the number of voiceprint samples falling in the current interval is greater than the number threshold, the current The interval is a dense interval, if not, the current interval is a sparse interval;
  • the target voiceprint clustering model obtains the interval density of the sparse interval adjacent to the dense interval, and compares the interval density with a preset density threshold to determine whether the sparse interval adjacent to the dense interval is a sparse interval.
  • the target voiceprint clustering model determines that the sparse section adjacent to the dense section is a dense part of the sparse section, merge the sparse section adjacent to the dense section into the dense section;
  • the target voiceprint clustering model merges adjacent dense intervals, and updates the dense and sparse intervals corresponding to the voiceprint sample set to be clustered.
  • first partition clustering module is also used for:
  • the target voiceprint clustering model performs clustering processing on the voiceprint data after parallel local clustering through the Map function and the Reduce function, and merges and outputs the clustering results of each interval.
  • the voiceprint clustering device based on sample size further includes:
  • the second model determination module is configured to determine the small sample in the storage module if the processor determines that the voiceprint sample set to be clustered is a small sample size sample set that does not exceed the sample size threshold
  • the voiceprint clustering model corresponding to the sample set is used as the target voiceprint clustering model
  • a sample data input module for the processor to input the voiceprint vector I-vector corresponding to each voiceprint data corresponding to the voiceprint sample set to be clustered in the storage module to the target voiceprint cluster Class model
  • the voiceprint data partition module is used for the target voiceprint clustering model to divide each voiceprint data in the voiceprint sample set to be clustered into based on the voiceprint vector I-vector corresponding to each voiceprint data Intervals of equal length;
  • the first partition clustering module is used for the target voiceprint clustering model to obtain the pairwise corresponding class spacing and PLDA score of each voiceprint data in each interval, and perform each interval on the basis of the class spacing and the PLDA score
  • Each voiceprint data in the cluster is clustered, and the clustering result corresponding to the voiceprint sample set to be clustered is output.
  • first partition clustering module is also used for:
  • the target voiceprint clustering model records each piece of voiceprint data in each interval as one category, as the initial category;
  • Clustering is performed based on the cluster spacing corresponding to each piece of voiceprint data, and the clustering result corresponding to the voiceprint sample set to be clustered is output.
  • each module in the voiceprint clustering device based on sample size corresponds to each step in the above embodiment of the voiceprint clustering method based on sample size, and its functions and implementation processes will not be repeated here.
  • the embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
  • the computer-readable storage medium of the present application stores computer-readable instructions for voiceprint clustering based on sample size.
  • the computer-readable instructions for voiceprint clustering based on sample size are executed by a processor, the sample-based clustering computer readable instruction is implemented as described above. The steps of a quantitative voiceprint clustering method.
  • the method implemented when the computer readable instruction of the sample size-based voiceprint clustering is executed can refer to the various embodiments of the sample size-based voiceprint clustering method of the present application, which will not be repeated here.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于样本量的声纹聚类方法、装置、设备及存储介质,所述方法包括:处理器将所述待聚类声纹样本集存储至所述缓存模块,并判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;若为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;基于所述目标声纹聚类模型中的训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。该方法对不同规模的样本量采用不同的聚类模型,并通过分区聚类缩短了大样本量的待聚类声纹样本集的聚类时间,提升了聚类效果。

Description

基于样本量的声纹聚类方法、装置、设备及存储介质
本申请要求于2019年9月18日提交中国专利局、申请号为201910880452.6、发明名称为“基于样本量的声纹聚类方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,尤其涉及一种基于样本量的声纹聚类方法、装置、设备及计算机可读存储介质。
背景技术
聚类是一种重要的无监督机器学习数据分析方法,声纹聚类是指从多个无标签的声纹样本中通过聚类算法判断出这些声纹样本由几个独立用户提供,即对多个无标签的声纹样本按其特征聚类。现有的声纹聚类方法,均是对需要聚类的声纹样本集直接采用聚类算法聚类,从而在对样本量大的样本集聚类时,不仅计算耗时久而且聚类效果不理想。因此,如何解决现有声纹聚类方法聚类效率低下的技术问题,是目前亟需解决的问题。
发明内容
本申请的主要目的在于提供一种基于样本量的声纹聚类方法、装置、设备及计算机可读存储介质,旨在解决现有声纹聚类方法聚类效率低下的技术问题。
为实现上述目的,本申请提供一种基于样本量的声纹聚类方法,所述基于样本量的声纹聚类方法应用于所述基于样本量的声纹聚类系统,所述声纹聚类系统包括缓存模块、存储模块以及处理器,所述基于样本量的声纹聚类方法包括以下步骤:
所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹 聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
此外,为实现上述目的,本申请还提供一种基于样本量的声纹聚类装置,所述基于样本量的声纹聚类装置包括:
样本量确定模块,用于所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
聚类模型确定模块,用于若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
第一分区聚类模块,用于所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
此外,为实现上述目的,本申请还提供一种基于样本量的声纹聚类设备,所述基于样本量的声纹聚类设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的基于样本量的声纹聚类计算机可读指令,其中所述基于样本量的声纹聚类计算机可读指令被所述处理器执行时,实现如上述的基于样本量的声纹聚类方法的步骤。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于样本量的声纹聚类计算机可读指令,其中所述基于样本量的声纹聚类计算机可读指令被处理器执行时,实现如上述的基于样本量的声纹聚类方法的步骤。
本申请提供一种基于样本量的声纹聚类方法,即所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚 类声纹样本的聚类结果。通过上述方式,本申请对不同规模的样本量采用不同的聚类模型,并通过分区聚类缩短了大样本量的待聚类声纹样本集的聚类时间,提升了聚类效果,解决了现有声纹聚类方法聚类效率低下的技术问题。
附图说明
图1为本申请实施例方案中涉及的基于样本量的声纹聚类设备的硬件结构示意图;
图2为本申请基于样本量的声纹聚类方法第一实施例的流程示意图;
图3为本申请基于样本量的声纹聚类方法第二实施例的流程示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例涉及的基于样本量的声纹聚类方法主要应用于基于样本量的声纹聚类设备,该基于样本量的声纹聚类设备可以是PC、便携计算机、移动终端等具有显示和处理功能的设备。
参照图1,图1为本申请实施例方案中涉及的基于样本量的声纹聚类设备的硬件结构示意图。本申请实施例中,基于样本量的声纹聚类设备可以包括处理器1001(例如CPU),通信总线1002,用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信;用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard);网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口);存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器,存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的硬件结构并不构成对基于样本量的声纹聚类设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
继续参照图1,图1中作为一种计算机可读存储介质的存储器1005可以包括操作系统、网络通信模块以及基于样本量的声纹聚类计算机可读指令。
在图1中,网络通信模块主要用于连接服务器,与服务器进行数据通信; 而处理器1001可以调用存储器1005中存储的基于样本量的声纹聚类计算机可读指令,并执行本申请实施例提供的基于样本量的声纹聚类方法。
本申请实施例提供了一种基于样本量的声纹聚类方法。
参照图2,图2为本申请基于样本量的声纹聚类方法第一实施例的流程示意图。
本实施例中,所述基于样本量的声纹聚类方法应用于所述基于样本量的声纹聚类系统,所述声纹聚类系统包括缓存模块、存储模块以及处理器,所述基于样本量的声纹聚类方法包括以下步骤:
步骤S10,所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
现有的声纹聚类方法,均是对需要聚类的声纹样本集直接采用聚类算法聚类,从而在对样本量大的样本集聚类时,不仅计算耗时久而且聚类效果不理想。为了解决上述问题,本实施例中对不同规模的样本量采用不同的聚类模型,并通过分区聚类缩短了大样本量的待聚类声纹样本集的聚类时间,提升了聚类效果。具体地,根据声纹聚类应用场景的不同,待聚类声纹样本集中声纹样本的数量也是存在很大差异的,对于样本量较小的样本集,聚类计算时不需要占用并行运算的计算资源和配置,也可以在较短的时间内得到聚类结果,而对于样本量较大的样本集,聚类计算耗时则会较久,本实施例提供的声纹聚类方法对不同样本量的待聚类声纹样本集采用不同的聚类模型做聚类计算。所述处理器在接收到用户通过用户端发送的待聚类声纹样本集时,先将所述待聚类声纹样本集存储至所述缓存模块,以便后续调用对应聚类模型对所述待聚类声纹样本集进行聚类。预先设定声纹样本的数量阈值,然后获取所述缓存模块中所述待聚类声纹样本集对应的样本数量,并将所述样本数量与预设样本量阈值进行比较,以判断所述待聚类声纹样本集是否为样本量超过所述样本量阈值的大样本量样本集。
步骤S20,若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
本实施例中,若待聚类声纹样本集的样本数量小于该数量阈值,即待聚 类声纹样本集为小样本量,则采用小样本量样本集对应的声纹聚类模型,若待聚类声纹样本集中声纹样本的数量大于该阈值,即待聚类声纹样本集为大样本量,则采用大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型。其中,大样本量样本集对应的声纹聚类模型加入了并行模型,在聚类算法的运算过程中运用Map以及Reduce(Map:映射,Reduce:归约)的方法。
步骤S30,所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
本实施例中,所述处理器将所述缓存模块中的待聚类声纹样本集输入至对应的目标声纹聚类模型中进行聚类,以便所述目标声纹聚类模型基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,得到所述待聚类声纹样本集对应的声纹提供者,并输出所述待聚类声纹样本的聚类结果。
本实施例提供一种基于样本量的声纹聚类方法,本申请对不同规模的样本量采用不同的聚类模型,并通过分区聚类缩短了大样本量的待聚类声纹样本集的聚类时间,提升了聚类效果,解决了现有声纹聚类方法聚类效率低下的技术问题。
参照图3,图3为本申请基于样本量的声纹聚类方法第二实施例的流程示意图。
基于上述图2所示实施例,本实施例中,所述步骤S20之后,还包括:
步骤S40,对所述待聚类声纹样本集进行数据预处理和特征提取,并提取出所述待聚类声纹样本集的声纹样本特征MFCC、MFCC的一阶差分和MFCC的二阶差分;
步骤S50,所述处理器基于预设方式以及所述待聚类声纹样本集的MFCC、MFCC的一阶差分和MFCC的二阶差分,确定所述待聚类声纹样本集中各条声纹数据对应的声纹向量I-vector,并将所述各条声纹数据对应的声纹向量I-vector存储至所述存储模块。
本实施例中,声纹样本有多个特征,对所述待聚类声纹样本集进行数据预处理和特征提取,选取声纹样本的特征为:MFCC(Mel Frequency Cepstral Coefficents,梅尔频率倒谱系数)和MFCC的一阶差分、MFCC的二阶差分,即本实施例选择MFCC和MFCC的一阶差分,MFCC的二阶差分作为聚类模型的输入;将处理好的声纹特征数据采用GMM+UBM+JFA的方式,得到每 条声纹数据对应的I-vector,I-vector是低维定长且只包含说话者空间信息的向量,GMM+UBM+JFA方式是基于GMM(Gaussian Mixture Model,高斯混合模型)-UBM(Universal Background Model,通用背景模型)的联合因子分析(Joint Factor Analysis,JFA)的方式。并将所述各条声纹数据对应的声纹向量I-vector存储至所述存储模块,以便后续基于所述各条声纹数据对应的声纹向量I-vector进行声纹数据的聚类。
基于上述图2所示实施例,本实施例中,所述步骤S10之后,还包括:
若所述处理器判定所述待聚类声纹样本集为不超过所述样本量阈值的小样本量样本集,则在所述存储模块中确定所述小样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
所述目标声纹聚类模型获取各个区间内各个声纹数据两两对应的类间距以及PLDA打分,基于所述类间距以及所述PLDA打分对所述各个区间内各个声纹数据进行聚类,并输出所述待聚类声纹样本集对应的聚类结果。
其中,所述所述目标声纹聚类模型获取各个区间内各个声纹数据两两对应的类间距以及PLDA打分,基于所述类间距以及所述PLDA打分对所述各个区间内各个声纹数据进行聚类,并输出所述待聚类声纹样本集对应的聚类结果的步骤具体包括:
所述目标声纹聚类模型将各个区间内的各条声纹数据,分别记为一类,作为初始类;
根据预设类间距计算公式,得到各个区间内各个声纹数据两两对应的类间距,并根据一个类与其他各类的类间距均值的大小对所述初始类构造一个堆,其中,所述类间距=1-代表两类的两条声纹数据标准化后的PLDA打分,且所述类间距满足正态分布;
基于所述各条声纹数据对应的类间距进行聚类,并输出所述待聚类声纹样本集对应的聚类结果。
本实施例中,选择向量的长度为600,若由上述步骤S10确定声纹聚类模型是小样本量的样本集对应的聚类模型,则将经过特征处理后的待聚类声纹 样本集中每条声纹数据对应的I-vector,输入至第一聚类模型,进入以下步骤:
步骤a,第一聚类模型首先将600维长度的I-vector的每一维均匀划分为k个长度相等的区间:[a1,b1),[a2,b2),...[ak,bk);其中,K的取值可以是待聚类声纹样本集包括的总声纹样本量的10%;
步骤b,在每一个区间内,将每个声纹样本都看做一类,记为初始类,此时类间距=1-代表两类的两个样本标准化后的PLDA打分,得到类间距,按照其中一个类与其他各类的类间距均值的大小对初始类构造一个堆;其中,类间距满足正态分布;
步骤c,基于类间距的自动聚类;
选择类间距最小的两个类Ai,Bj,μ i为类Ai的类间距服从分布的均值,μ j为类Bj的类间距服从分布的均值,具体地:
若类间距-u≤αμ i且类间距-u≤αμ j,则合并Ai,Bj;
若类间距-u>αμ i且类间距-u>αμ j,则分离Ai,Bj,α取值为3,u为Ai,Bj之间的类间距。
作为一种实施方式,若Ai和/或Bj的个数大于1时,选择用代表点的方式来计算二者之间的类间距用于聚类,代表点的选择方法具体为:先筛选出两两PLDA打分最小的两个点,再选择剩余的点中与这两个点两两PLDA打分最小的点;以代表点方式来计算二者之间的类间距聚类时,类间距=(∑i-类内代表点标准化后两两PLDA打分)/类中样本个数,直到剩余一个类,或者没有类剩余,得到最终的聚类结果,采用自动分离子类的方法,直接得到所聚的类别数,不需要人工给定超参数,提升聚类速度。
进一步地,所述步骤S30具体包括:
所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果。
其中,所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间的步骤之后,还包括:
所述目标声纹聚类模型根据预设个数阈值在划分后的区间中确定稠密区间以及稀疏区间,其中,若落在当前区间的声纹样本量大于所述个数阈值,则所述当前区间为稠密区间,若否,则所述当前区间为稀疏区间;
所述目标声纹聚类模型获取所述稠密区间相邻的稀疏区间的区间密度,并将所述区间密度与预设密度阈值进行比较,判断所述稠密区间相邻的稀疏区间是否为稀疏区间的稠密部分;
若所述目标声纹聚类模型判定所述稠密区间相邻的稀疏区间为稀疏区间的稠密部分,则将所述稠密区间相邻的稀疏区间并入所述稠密区间;
所述目标声纹聚类模型将相邻的稠密区间进行合并,并更新所述待聚类声纹样本集对应的稠密区间与稀疏区间。
其中,所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果的步骤具体包括:
所述目标声纹聚类模型通过cure算法在各个稠密区间以及各个稀疏区间中进行并行局部聚类;
所述目标声纹聚类模型通过Map函数以及Reduce函数对并行局部聚类后的声纹数据进行聚类处理,并合并输出各个区间的聚类结果。
本实施例中,若由上述步骤S10确定声纹聚类模型是大样本量的样本集对应的聚类模型,将经过特征处理后的待聚类声纹样本集中每条声纹数据对应的I-vector,输入至大样本量的样本集对应的聚类模型,进入以下步骤:
步骤a,第一聚类模型首先将600维长度的I-vector的每一维均匀划分为k个长度相等的区间:[a 1,b 1),[a 2,b 2),...[a k,b k);其中,K的取值可以是待聚类声纹样本集包括的总声纹样本量的10%;该步骤与上述小样本量的样本集对应的聚类模型的步骤相同,将数据区间划分网格,缩短了聚类时间,提高聚类效果。
步骤d,利用阈值判断稠密区间和稀疏区间;本实施例阈值设置为总声纹样本量的60%,当落在当前区间内的声纹样本量大于设置的阈值,则该区间为稠密区间,否则则为稀疏区间;
步骤e,更新稠密区间;如果稠密区间[a i,b i)相邻的稀疏区间的0.5区间,即[a i+d,a i+d/2)的密度阈值>0.5*密度阈值,其中d为区间长度,则标记该区间为稀疏区间的稠密部分,将该区间并入稠密区间,稠密区间更新为[a i,b i+d/2),若该稠密区间[a i,b i)相邻的稀疏区间的0.5区间<0.5*密度阈值,不做任 何处理;
步骤f,对每个维度的所有稠密区间的相邻区间进行处理,合并相邻的稠密区间;
步骤g,在每个网格单元中采用聚类算法聚类;如cure算法进行局部聚类,当网格的大小达到设定的大小的时候,落入该网格的多个样本是相似的,而不同网格的样本点是不相似的,不同网格的样本间的距离是大于相同网格的样本间的距离的,由此,先在距离小的样本集合中进行聚类,提高聚类的效率,进一步地,在本实施例中,计算任务被分为Map和Reduce两个阶段,Map函数在多个节点上运行,处理一个或多个本地的数据分区;Reduce函数处理Map函数输出的中间结果,也可以并行运行,所有Reduce的输出合并后得到所有的分区的结果,Reduce对每个数据区间进行局部聚类得到的各类综合在一起,得到最终的聚类结果,本实施例采用分区聚类且聚类算法并行计算,对于样本数量大的待聚类声纹样本集也能达到快速聚类的效果。
此外,本申请实施例还提供一种基于样本量的声纹聚类装置。
本实施例中,所述基于样本量的声纹聚类装置包括:
样本量确定模块,用于所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
第一模型确定模块,用于若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
第一分区聚类模块,用于所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
进一步地,所述基于样本量的声纹聚类装置还包括:
样本特征提取模块,用于对所述待聚类声纹样本集进行数据预处理和特征提取,并提取出所述待聚类声纹样本集的声纹样本特征MFCC、MFCC的一阶差分和MFCC的二阶差分;
声纹向量确定模块,用于所述处理器基于预设方式以及所述待聚类声纹样本集的MFCC、MFCC的一阶差分和MFCC的二阶差分,确定所述待聚类 声纹样本集中各条声纹数据对应的声纹向量I-vector,并将所述各条声纹数据对应的声纹向量I-vector存储至所述存储模块。
进一步地,所述第一分区聚类模块具体包括:
声纹向量输入单元,用于所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
数据区间划分单元,用于所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
分区并行聚类单元,用于所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果。
进一步地,所述第一分区聚类模块还用于:
所述目标声纹聚类模型根据预设个数阈值在划分后的区间中确定稠密区间以及稀疏区间,其中,若落在当前区间的声纹样本量大于所述个数阈值,则所述当前区间为稠密区间,若否,则所述当前区间为稀疏区间;
所述目标声纹聚类模型获取所述稠密区间相邻的稀疏区间的区间密度,并将所述区间密度与预设密度阈值进行比较,判断所述稠密区间相邻的稀疏区间是否为稀疏区间的稠密部分;
若所述目标声纹聚类模型判定所述稠密区间相邻的稀疏区间为稀疏区间的稠密部分,则将所述稠密区间相邻的稀疏区间并入所述稠密区间;
所述目标声纹聚类模型将相邻的稠密区间进行合并,并更新所述待聚类声纹样本集对应的稠密区间与稀疏区间。
进一步地,所述第一分区聚类模块还用于:
所述目标声纹聚类模型通过cure算法在各个稠密区间以及各个稀疏区间中进行并行局部聚类;
所述目标声纹聚类模型通过Map函数以及Reduce函数对并行局部聚类后的声纹数据进行聚类处理,并合并输出各个区间的聚类结果。
进一步地,所述基于样本量的声纹聚类装置还包括:
第二模型确定模块,用于若所述处理器判定所述待聚类声纹样本集为不超过所述样本量阈值的小样本量样本集,则在所述存储模块中确定所述小样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
样本数据输入模块,用于所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
声纹数据分区模块,用于所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
第一分区聚类模块,用于所述目标声纹聚类模型获取各个区间内各个声纹数据两两对应的类间距以及PLDA打分,基于所述类间距以及所述PLDA打分对所述各个区间内各个声纹数据进行聚类,并输出所述待聚类声纹样本集对应的聚类结果。
进一步地,所述第一分区聚类模块还用于:
所述目标声纹聚类模型将各个区间内的各条声纹数据,分别记为一类,作为初始类;
根据预设类间距计算公式,得到各个区间内各个声纹数据两两对应的类间距,并根据一个类与其他各类的类间距均值的大小对所述初始类构造一个堆,其中,所述类间距=1-代表两类的两条声纹数据标准化后的PLDA打分,且所述类间距满足正态分布;
基于所述各条声纹数据对应的类间距进行聚类,并输出所述待聚类声纹样本集对应的聚类结果。
其中,上述基于样本量的声纹聚类装置中各个模块与上述基于样本量的声纹聚类方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。
此外,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质可以为非易失性可读存储介质。
本申请计算机可读存储介质上存储有基于样本量的声纹聚类计算机可读指令,其中所述基于样本量的声纹聚类计算机可读指令被处理器执行时,实现如上述的基于样本量的声纹聚类方法的步骤。
其中,基于样本量的声纹聚类计算机可读指令被执行时所实现的方法可参照本申请基于样本量的声纹聚类方法的各个实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系 统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于样本量的声纹聚类方法,其特征在于,所述基于样本量的声纹聚类方法应用于所述基于样本量的声纹聚类系统,所述声纹聚类系统包括缓存模块、存储模块以及处理器,所述基于样本量的声纹聚类方法包括以下步骤:
    所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
    若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
    所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
  2. 如权利要求1所述的基于样本量的声纹聚类方法,其特征在于,所述处理器在接收到用户端发送的待聚类声纹样本集的步骤之后,还包括:
    对所述待聚类声纹样本集进行数据预处理和特征提取,并提取出所述待聚类声纹样本集的声纹样本特征MFCC、MFCC的一阶差分和MFCC的二阶差分;
    所述处理器基于预设方式以及所述待聚类声纹样本集的MFCC、MFCC的一阶差分和MFCC的二阶差分,确定所述待聚类声纹样本集中各条声纹数据对应的声纹向量I-vector,并将所述各条声纹数据对应的声纹向量I-vector存储至所述存储模块。
  3. 如权利要求2所述的基于样本量的声纹聚类方法,其特征在于,所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果的步骤具体包括:
    所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
    所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
    所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果。
  4. 如权利要求3所述的基于样本量的声纹聚类方法,其特征在于,所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间的步骤之后,还包括:
    所述目标声纹聚类模型根据预设个数阈值在划分后的区间中确定稠密区间以及稀疏区间,其中,若落在当前区间的声纹样本量大于所述个数阈值,则所述当前区间为稠密区间,若否,则所述当前区间为稀疏区间;
    所述目标声纹聚类模型获取所述稠密区间相邻的稀疏区间的区间密度,并将所述区间密度与预设密度阈值进行比较,判断所述稠密区间相邻的稀疏区间是否为稀疏区间的稠密部分;
    若所述目标声纹聚类模型判定所述稠密区间相邻的稀疏区间为稀疏区间的稠密部分,则将所述稠密区间相邻的稀疏区间并入所述稠密区间;
    所述目标声纹聚类模型将相邻的稠密区间进行合并,并更新所述待聚类声纹样本集对应的稠密区间与稀疏区间。
  5. 如权利要求4所述的基于样本量的声纹聚类方法,其特征在于,所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果的步骤具体包括:
    所述目标声纹聚类模型通过cure算法在各个稠密区间以及各个稀疏区间中进行并行局部聚类;
    所述目标声纹聚类模型通过Map函数以及Reduce函数对并行局部聚类后的声纹数据进行聚类处理,并合并输出各个区间的聚类结果。
  6. 如权利要求2所述的基于样本量的声纹聚类方法,其特征在于,所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集的步骤之后,还包括:
    若所述处理器判定所述待聚类声纹样本集为不超过所述样本量阈值的小 样本量样本集,则在所述存储模块中确定所述小样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
    所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
    所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
    所述目标声纹聚类模型获取各个区间内各个声纹数据两两对应的类间距以及PLDA打分,基于所述类间距以及所述PLDA打分对所述各个区间内各个声纹数据进行聚类,并输出所述待聚类声纹样本集对应的聚类结果。
  7. 如权利要求6所述的基于样本量的声纹聚类方法,其特征在于,所述目标声纹聚类模型获取各个区间内各个声纹数据两两对应的类间距以及PLDA打分,基于所述类间距以及所述PLDA打分对所述各个区间内各个声纹数据进行聚类,并输出所述待聚类声纹样本集对应的聚类结果的步骤具体包括:
    所述目标声纹聚类模型将各个区间内的各条声纹数据,分别记为一类,作为初始类;
    根据预设类间距计算公式,得到各个区间内各个声纹数据两两对应的类间距,并根据一个类与其他各类的类间距均值的大小对所述初始类构造一个堆,其中,所述类间距=1-代表两类的两条声纹数据标准化后的PLDA打分,且所述类间距满足正态分布;
    基于所述各条声纹数据对应的类间距进行聚类,并输出所述待聚类声纹样本集对应的聚类结果。
  8. 一种基于样本量的声纹聚类装置,其特征在于,所述基于样本量的声纹聚类装置包括:
    样本量确定模块,用于所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
    聚类模型确定模块,用于若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
    第一分区聚类模块,用于所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
  9. 如权利要求8所述的基于样本量的声纹聚类装置,其特征在于,进一步地,所述基于样本量的声纹聚类装置还包括:
    样本特征提取模块,用于对所述待聚类声纹样本集进行数据预处理和特征提取,并提取出所述待聚类声纹样本集的声纹样本特征MFCC、MFCC的一阶差分和MFCC的二阶差分;
    声纹向量确定模块,用于所述处理器基于预设方式以及所述待聚类声纹样本集的MFCC、MFCC的一阶差分和MFCC的二阶差分,确定所述待聚类声纹样本集中各条声纹数据对应的声纹向量I-vector,并将所述各条声纹数据对应的声纹向量I-vector存储至所述存储模块。
  10. 如权利要求8所述的基于样本量的声纹聚类装置,其特征在于,所述第一分区聚类模块具体包括:
    声纹向量输入单元,用于所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
    数据区间划分单元,用于所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
    分区并行聚类单元,用于所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果。
  11. 如权利要求8所述的基于样本量的声纹聚类装置,其特征在于,所述第一分区聚类模块还用于:
    所述目标声纹聚类模型根据预设个数阈值在划分后的区间中确定稠密区间以及稀疏区间,其中,若落在当前区间的声纹样本量大于所述个数阈值,则所述当前区间为稠密区间,若否,则所述当前区间为稀疏区间;
    所述目标声纹聚类模型获取所述稠密区间相邻的稀疏区间的区间密度,并将所述区间密度与预设密度阈值进行比较,判断所述稠密区间相邻的稀疏区间是否为稀疏区间的稠密部分;
    若所述目标声纹聚类模型判定所述稠密区间相邻的稀疏区间为稀疏区间 的稠密部分,则将所述稠密区间相邻的稀疏区间并入所述稠密区间;
    所述目标声纹聚类模型将相邻的稠密区间进行合并,并更新所述待聚类声纹样本集对应的稠密区间与稀疏区间。
  12. 如权利要求8所述的基于样本量的声纹聚类装置,其特征在于,所述第一分区聚类模块还用于:
    所述目标声纹聚类模型通过cure算法在各个稠密区间以及各个稀疏区间中进行并行局部聚类;
    所述目标声纹聚类模型通过Map函数以及Reduce函数对并行局部聚类后的声纹数据进行聚类处理,并合并输出各个区间的聚类结果。
  13. 一种基于样本量的声纹聚类设备,其特征在于,所述基于样本量的声纹聚类设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的基于样本量的声纹聚类计算机可读指令,其中所述基于样本量的声纹聚类计算机可读指令被所述处理器执行时,实现如下步骤:
    所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样本集;
    若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
    所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
  14. 如权利要求13所述的基于样本量的声纹聚类设备,其特征在于,所述处理器在接收到用户端发送的待聚类声纹样本集的步骤之后,还包括:
    对所述待聚类声纹样本集进行数据预处理和特征提取,并提取出所述待聚类声纹样本集的声纹样本特征MFCC、MFCC的一阶差分和MFCC的二阶差分;
    所述处理器基于预设方式以及所述待聚类声纹样本集的MFCC、MFCC的一阶差分和MFCC的二阶差分,确定所述待聚类声纹样本集中各条声纹数据对应的声纹向量I-vector,并将所述各条声纹数据对应的声纹向量I-vector 存储至所述存储模块。
  15. 如权利要求14所述的基于样本量的声纹聚类设备,其特征在于,所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果的步骤具体包括:
    所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
    所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
    所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果。
  16. 如权利要求15所述的基于样本量的声纹聚类设备,其特征在于,所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间的步骤之后,还包括:
    所述目标声纹聚类模型根据预设个数阈值在划分后的区间中确定稠密区间以及稀疏区间,其中,若落在当前区间的声纹样本量大于所述个数阈值,则所述当前区间为稠密区间,若否,则所述当前区间为稀疏区间;
    所述目标声纹聚类模型获取所述稠密区间相邻的稀疏区间的区间密度,并将所述区间密度与预设密度阈值进行比较,判断所述稠密区间相邻的稀疏区间是否为稀疏区间的稠密部分;
    若所述目标声纹聚类模型判定所述稠密区间相邻的稀疏区间为稀疏区间的稠密部分,则将所述稠密区间相邻的稀疏区间并入所述稠密区间;
    所述目标声纹聚类模型将相邻的稠密区间进行合并,并更新所述待聚类声纹样本集对应的稠密区间与稀疏区间。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于样本量的声纹聚类计算机可读指令,其中所述基于样本量的声纹聚类计算机可读指令被处理器执行时,实现如下步骤:
    所述处理器在接收到用户端发送的待聚类声纹样本集,将所述待聚类声纹样本集存储至所述缓存模块,并根据所述待聚类声纹样本集的样本数量,判断所述待聚类声纹样本集是否为样本量超过预设样本量阈值的大样本量样 本集;
    若所述处理器判定所述待聚类声纹样本集为所述大样本量样本集,则在所述存储模块中确定所述大样本量样本集对应的声纹聚类模型,作为目标声纹聚类模型;
    所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果。
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,所述处理器在接收到用户端发送的待聚类声纹样本集的步骤之后,还包括:
    对所述待聚类声纹样本集进行数据预处理和特征提取,并提取出所述待聚类声纹样本集的声纹样本特征MFCC、MFCC的一阶差分和MFCC的二阶差分;
    所述处理器基于预设方式以及所述待聚类声纹样本集的MFCC、MFCC的一阶差分和MFCC的二阶差分,确定所述待聚类声纹样本集中各条声纹数据对应的声纹向量I-vector,并将所述各条声纹数据对应的声纹向量I-vector存储至所述存储模块。
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,所述处理器将所述缓存模块中的待聚类声纹样本集输入至所述目标声纹聚类模型,基于训练后的分区聚类算法对所述待聚类声纹样本集进行聚类,并输出所述待聚类声纹样本的聚类结果的步骤具体包括:
    所述处理器将所述存储模块中的所述待聚类声纹样本集对应的各条声纹数据对应的声纹向量I-vector输入至所述目标声纹聚类模型;
    所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间;
    所述目标声纹聚类模型通过聚类算法对各个区间内的各条声纹数据进行并行聚类,并输出所述待聚类声纹样本的聚类结果。
  20. 如权利要求19所述的计算机可读存储介质,其特征在于,所述目标声纹聚类模型基于所述各条声纹数据对应的声纹向量I-vector将所述待聚类声纹样本集中的各条声纹数据划分为长度相等的区间的步骤之后,还包括:
    所述目标声纹聚类模型根据预设个数阈值在划分后的区间中确定稠密区间以及稀疏区间,其中,若落在当前区间的声纹样本量大于所述个数阈值, 则所述当前区间为稠密区间,若否,则所述当前区间为稀疏区间;
    所述目标声纹聚类模型获取所述稠密区间相邻的稀疏区间的区间密度,并将所述区间密度与预设密度阈值进行比较,判断所述稠密区间相邻的稀疏区间是否为稀疏区间的稠密部分;
    若所述目标声纹聚类模型判定所述稠密区间相邻的稀疏区间为稀疏区间的稠密部分,则将所述稠密区间相邻的稀疏区间并入所述稠密区间;
    所述目标声纹聚类模型将相邻的稠密区间进行合并,并更新所述待聚类声纹样本集对应的稠密区间与稀疏区间。
PCT/CN2019/116474 2019-09-18 2019-11-08 基于样本量的声纹聚类方法、装置、设备及存储介质 WO2021051505A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910880452.6 2019-09-18
CN201910880452.6A CN110782879B (zh) 2019-09-18 2019-09-18 基于样本量的声纹聚类方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021051505A1 true WO2021051505A1 (zh) 2021-03-25

Family

ID=69383815

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116474 WO2021051505A1 (zh) 2019-09-18 2019-11-08 基于样本量的声纹聚类方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN110782879B (zh)
WO (1) WO2021051505A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117809070A (zh) * 2024-03-01 2024-04-02 唐山市食品药品综合检验检测中心(唐山市农产品质量安全检验检测中心、唐山市检验检测研究院) 一种用于蔬菜农药残留检测的光谱数据智能处理方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869645A (zh) * 2016-03-25 2016-08-17 腾讯科技(深圳)有限公司 语音数据处理方法和装置
CN106156856A (zh) * 2015-03-31 2016-11-23 日本电气株式会社 用于混合模型选择的方法和装置
US20180329951A1 (en) * 2017-05-11 2018-11-15 Futurewei Technologies, Inc. Estimating the number of samples satisfying the query
CN108922543A (zh) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 模型库建立方法、语音识别方法、装置、设备及介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993071A (zh) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 电子装置、基于声纹的身份验证方法及存储介质
CN109065028B (zh) * 2018-06-11 2022-12-30 平安科技(深圳)有限公司 说话人聚类方法、装置、计算机设备及存储介质
CN109473112B (zh) * 2018-10-16 2021-10-26 中国电子科技集团公司第三研究所 一种脉冲声纹识别方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156856A (zh) * 2015-03-31 2016-11-23 日本电气株式会社 用于混合模型选择的方法和装置
CN105869645A (zh) * 2016-03-25 2016-08-17 腾讯科技(深圳)有限公司 语音数据处理方法和装置
US20180329951A1 (en) * 2017-05-11 2018-11-15 Futurewei Technologies, Inc. Estimating the number of samples satisfying the query
CN108922543A (zh) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 模型库建立方法、语音识别方法、装置、设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117809070A (zh) * 2024-03-01 2024-04-02 唐山市食品药品综合检验检测中心(唐山市农产品质量安全检验检测中心、唐山市检验检测研究院) 一种用于蔬菜农药残留检测的光谱数据智能处理方法
CN117809070B (zh) * 2024-03-01 2024-05-14 唐山市食品药品综合检验检测中心(唐山市农产品质量安全检验检测中心、唐山市检验检测研究院) 一种用于蔬菜农药残留检测的光谱数据智能处理方法

Also Published As

Publication number Publication date
CN110782879B (zh) 2023-07-07
CN110782879A (zh) 2020-02-11

Similar Documents

Publication Publication Date Title
WO2018086470A1 (zh) 关键词提取方法、装置和服务器
CN103956169B (zh) 一种语音输入方法、装置和系统
CN106057206B (zh) 声纹模型训练方法、声纹识别方法及装置
CN105279397B (zh) 一种识别蛋白质相互作用网络中关键蛋白质的方法
WO2019134247A1 (zh) 基于声纹识别模型的声纹注册方法、终端装置及存储介质
CN110263854B (zh) 直播标签确定方法、装置及存储介质
CN105893351B (zh) 语音识别方法及装置
CN106297773A (zh) 一种神经网络声学模型训练方法
CN110209809B (zh) 文本聚类方法和装置、存储介质及电子装置
CN107195299A (zh) 训练神经网络声学模型的方法和装置及语音识别方法和装置
CN105488098B (zh) 一种基于领域差异性的新词提取方法
WO2018121145A1 (zh) 段落向量化的方法和装置
CN108520752A (zh) 一种声纹识别方法和装置
WO2019223104A1 (zh) 确定事件影响因素的方法、装置、终端设备及可读存储介质
WO2018059302A1 (zh) 文本识别方法、装置及存储介质
CN109145003A (zh) 一种构建知识图谱的方法及装置
CN110969172A (zh) 一种文本的分类方法以及相关设备
CN109800309A (zh) 课堂话语类型分类方法及装置
WO2021072893A1 (zh) 一种声纹聚类方法、装置、处理设备以及计算机存储介质
CN107195312B (zh) 情绪宣泄模式的确定方法、装置、终端设备和存储介质
CN104167206B (zh) 声学模型合并方法和设备以及语音识别方法和系统
WO2021051505A1 (zh) 基于样本量的声纹聚类方法、装置、设备及存储介质
CN113488023B (zh) 一种语种识别模型构建方法、语种识别方法
CN111950267B (zh) 文本三元组的抽取方法及装置、电子设备及存储介质
CN110929509B (zh) 一种基于louvain社区发现算法的领域事件触发词聚类方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946121

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946121

Country of ref document: EP

Kind code of ref document: A1