CN111613249A - Voice analysis method and equipment - Google Patents

Voice analysis method and equipment Download PDF

Info

Publication number
CN111613249A
CN111613249A CN202010444381.8A CN202010444381A CN111613249A CN 111613249 A CN111613249 A CN 111613249A CN 202010444381 A CN202010444381 A CN 202010444381A CN 111613249 A CN111613249 A CN 111613249A
Authority
CN
China
Prior art keywords
voice
segment
segments
speech
speakers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010444381.8A
Other languages
Chinese (zh)
Inventor
李旭滨
范红亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010444381.8A priority Critical patent/CN111613249A/en
Publication of CN111613249A publication Critical patent/CN111613249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention provides a voice analysis method and equipment, which are applied to single-channel voice analysis, and the method comprises the following steps: dividing the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers; segmenting a voice part into a plurality of voice segments; clustering the voice segments with the time exceeding the preset duration to obtain the information of each voice segment; processing each voice segment with the determined information to determine the voice characteristics of a plurality of speakers; comparing the voice segment which is not processed currently and is sorted in time at the forefront with the voice characteristics of a plurality of processed speakers, determining the speaker corresponding to the voice segment which is compared currently, and setting the voice segment which is compared currently as the processed voice segment. The role separation is carried out aiming at the single-channel audio, the sounds of a plurality of pronouncing persons are separated, the follow-up operation such as quality inspection and intention analysis is facilitated, and the processing efficiency is improved.

Description

Voice analysis method and equipment
Technical Field
The present invention relates to the field of speech processing, and in particular, to a speech analysis method and apparatus.
Background
At present, telephone customer service generally appears in the aspect of our life. How to improve the service quality of telephone service and analyze the intention of the customer are very important topics. The voice recognition is needed to carry out customer service quality inspection and customer intention analysis, but the current customer service quality inspection and customer intention analysis are detected by manual sampling, so that the efficiency is low, and the omission ratio is high
Automatic detection and analysis by machine is then becoming more and more important, and it is relatively much easier to analyze customers and clients separately when they are on different telephone channels, simply by recognizing the text by speech and then analyzing it. When the voice of the customer and the customer service is stored in one channel, it becomes extremely difficult to perform quality inspection of the customer service and analysis of the intention of the customer at the same time, and in this case, the most difficult part of the analysis is role analysis.
Thus, there is a need for a better solution to this problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a voice analysis method and equipment, which can separate the voices of a plurality of speakers by performing role separation aiming at single-channel audio, are beneficial to subsequent operations such as quality inspection, intention analysis and the like, and the scheme supports online real-time processing and improves the processing efficiency.
Specifically, the present invention proposes the following specific examples:
the embodiment of the invention provides a voice analysis method, which is applied to single-channel voice analysis and comprises the following steps:
dividing the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers;
segmenting the voice part into a plurality of voice segments;
clustering voice segments with time exceeding a preset duration to obtain information of each voice segment;
processing each of the voice segments with determined information to determine voice characteristics of a plurality of the speakers;
comparing the voice segment which is not processed currently and is sorted in time at the forefront with the voice characteristics of the plurality of processed speakers, determining the speaker corresponding to the voice segment which is compared currently, and setting the voice segment which is compared currently as the processed voice segment.
In a specific embodiment, the segmenting the voice data to be analyzed into a voice portion and a non-voice portion includes:
and segmenting the voice data to be analyzed by a Voice Activity Detection (VAD) method so as to divide the voice data to be analyzed into a voice part and a non-voice part.
In a specific embodiment, the segmenting the voice portion into a plurality of voice segments includes:
dividing the voice part into a plurality of non-overlapping voice segments according to a preset time length;
and if the time length of the last voice segment is less than the preset value, combining the last voice segment with the adjacent voice segments.
In a specific embodiment, the speech segments include forward and backward frames and/or overlap.
In a specific embodiment, in a time period corresponding to the preset time duration, each speaker in the voice data to be analyzed performs voice with a specified time duration.
In a particular embodiment, the information includes any combination of one or more of the following: characteristics, speaker of the speech, time point of the speech.
In a specific embodiment, said processing each of said speech segments for which information is determined to determine speech characteristics of a plurality of said speakers comprises:
smoothing each voice segment with determined information, merging adjacent voice segments belonging to the same speaker, and setting the speaker corresponding to a preset voice segment as the speaker same as the adjacent voice segments so as to determine the voice characteristics of a plurality of speakers;
the preset voice segment is located between the front and the back adjacent voice segments, the corresponding speakers of the front and the back adjacent voice segments are the same, and the time length of the preset voice segment is smaller than the preset time length.
The embodiment of the invention also provides a voice analysis device, which is applied to single-channel voice analysis and comprises:
the first segmentation module is used for segmenting the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers;
the second segmentation module is used for segmenting the voice part into a plurality of voice segments;
the clustering module is used for clustering the voice segments with the time exceeding the preset duration so as to obtain the information of each voice segment;
the determining module is used for processing each voice segment with determined information and determining the voice characteristics of a plurality of speakers;
and the analysis module is used for comparing the voice segment which is not processed currently and is most time-sequenced with the voice characteristics of the processed speakers, determining the speaker corresponding to the currently compared voice segment, and setting the currently compared voice segment as the processed voice segment.
In a specific embodiment, the first division module is configured to:
and segmenting the voice data to be analyzed by a Voice Activity Detection (VAD) method so as to divide the voice data to be analyzed into a voice part and a non-voice part.
In a specific embodiment, the second segmentation module is configured to:
dividing the voice part into a plurality of non-overlapping voice segments according to a preset time length;
and if the time length of the last voice segment is less than the preset value, combining the last voice segment with the adjacent voice segments.
Therefore, the embodiment of the invention provides a voice analysis method and equipment, which are applied to single-channel voice analysis, and the method comprises the following steps: dividing the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers; segmenting the voice part into a plurality of voice segments; clustering voice segments with time exceeding a preset duration to obtain information of each voice segment; processing each of the voice segments with determined information to determine voice characteristics of a plurality of the speakers; comparing the voice segment which is not processed currently and is sorted in time at the forefront with the voice characteristics of the plurality of processed speakers, determining the speaker corresponding to the voice segment which is compared currently, and setting the voice segment which is compared currently as the processed voice segment. The role separation is carried out to the single channel audio frequency, can isolate a plurality of pronunciation people's sound, does benefit to the follow-up operation such as quality control and intention analysis that carries on, and this scheme supports online real-time processing, has promoted the treatment effeciency.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a speech analysis method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a speech analysis method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a speech analysis method according to an embodiment of the present invention.
Detailed Description
Various embodiments of the present disclosure will be described more fully hereinafter. The present disclosure is capable of various embodiments and of modifications and variations therein. However, it should be understood that: there is no intention to limit the various embodiments of the disclosure to the specific embodiments disclosed herein, but rather, the disclosure is to cover all modifications, equivalents, and/or alternatives falling within the spirit and scope of the various embodiments of the disclosure.
The terminology used in the various embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the present disclosure. As used herein, the singular forms are intended to include the plural forms as well, unless the context clearly indicates otherwise. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present disclosure belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined in various embodiments of the present disclosure.
Example 1
Embodiment 1 of the present invention discloses a speech analysis method, which is applied to single-channel speech analysis, and as shown in fig. 1-2, includes the following steps:
step 101, dividing voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers;
the specific voice data to be analyzed may be single-channel voice data, specifically, for example, single-channel voice mixed with two persons, namely customer service and customer, or single-channel voice mixed with more speakers, so that the step 101 of dividing the voice data to be analyzed into voice parts and non-voice parts includes:
the method comprises the steps of segmenting Voice data to be analyzed through a Voice Activity Detection (VAD) method so as to divide the Voice data to be analyzed into a Voice part and a non-Voice part.
Specifically, besides the VAD method, other methods may also be used for segmentation, for example, the segmentation may also be performed according to the waveform of the voice, as long as the voice data to be analyzed can be divided into a voice part and a non-voice part, and the method is not limited to the VAD method.
Specifically, as shown in fig. 2, it is the stage of VAD, and the specific voice part is the region corresponding to speed 1 and speed 2.
Step 102, segmenting the voice part into a plurality of voice segments;
specifically, the segmenting the voice part into a plurality of voice segments in step 102 includes:
dividing the voice part into a plurality of non-overlapping voice segments according to a preset time length;
and if the time length of the last voice segment is less than the preset value, combining the last voice segment with the adjacent voice segments.
Specifically, the preset time length may be set to 500ms, for example, and the preset value may be set to 300ms, for example, to explain this, the voice portion is divided into voice segments that do not overlap with each other, and specifically, each voice segment has a length of 500 ms. If the length of the last voice segment is less than 300ms, the last voice segment and the previous voice segment are spliced into a longer voice segment; the last speech segment can be used alone as a speech segment if it is greater than or equal to 300ms, but less than 500 ms.
In the segmentation principle in the present scheme, only one speaker is considered in each segmented speech segment, so that the length of each speech segment cannot be too long or too short, generally several hundred milliseconds, and 500ms is a preferred embodiment through experiments, and in addition, according to different application scenarios, the preset time length may also be set to a certain value between 400 and 600ms, for example, and the preset value may be set to a certain value between 250 and 350ms, for example.
Specifically, in one embodiment, the speech segment includes forward and backward frames and/or overlap.
Corresponding to the stage 2 and Segment in fig. 2, a speech part is specifically divided into small speech segments (speech segments), and features of the segments are extracted. In order to ensure better effect, the voice segment has information of frame expansion and/or overlap.
Specifically, when the speech segment is processed in the scheme, a front-back frame expansion and/or overlap technology is adopted, so that the information extraction accuracy of the segment and the overall performance of the system can be greatly improved. The term "forward and backward extended frames" refers to frames processed frame by frame when extracting information of a speech segment, but the information of the current frame is not processed during processing, and the frames before and after the frame are included together for processing, that is, the total obtained information is the information of the current frame including "context information", in this case, the forward and backward extended frames are the frames before and after the current frame.
The Overlap means that the moving mode of the "current frame" is overlapped (Overlap) in the process of extracting information frame by frame. The overlap type means that, for example, the window length of each frame is 25ms, the window shift is 10ms, that is, the current frame and the next frame have an overlap of 15 ms; the information extracted by the method is more accurate.
103, clustering voice segments with time exceeding a preset duration to obtain information of each voice segment;
clustering refers to collecting and analyzing information of the individual voice segments obtained in the previous step, and classifying the individual voice segments into specific categories, wherein a bottom-up hierarchical clustering algorithm AHC (adaptive high performance clustering) clustering algorithm can be adopted in the scheme.
In a specific embodiment, in order to ensure that all speakers can be accurately identified subsequently, in a time period corresponding to the preset time period, each speaker in the voice data to be analyzed performs voice with a specified time period, in an embodiment, for example, there are 2 speakers, and the preset time period includes that each of the 2 speakers speaks for 1 minute.
In addition, specific information includes any combination of one or more of the following: characteristics, speaker information of the voice, time point of the voice.
Specifically, as shown in fig. 2, the segment clustering method corresponds to the Cluster stage, so that after a sufficiently long time (for example, when both the client and the customer service say at least one minute), all segments at present are clustered to obtain information (features, speakers, time points, etc.) of each segment.
104, processing each voice segment with determined information to determine voice characteristics of a plurality of speakers;
in a specific step 104, the processing each of the speech segments for which information is determined to determine speech characteristics of a plurality of the speakers includes:
smoothing each voice segment with determined information, merging adjacent voice segments belonging to the same speaker, and setting the speaker corresponding to a preset voice segment as the speaker same as the adjacent voice segments so as to determine the voice characteristics of a plurality of speakers;
the preset voice segment is located between the front and the back adjacent voice segments, the corresponding speakers of the front and the back adjacent voice segments are the same, and the time length of the preset voice segment is smaller than the preset time length.
Specifically, as shown in fig. 2, in response to the smoothening process, adjacent segments belonging to the same speaker are merged by a Smoothing process, and some segments of too short and adjacent different voices are "smoothed out". Therefore, the characteristics of the client and the customer service and the speaking segment information can be obtained.
Specifically, the smoothing process includes two cases: merging and floating. Wherein, merging refers to merging adjacent voice segments belonging to the same speaker. The floating is that if two voice segments belonging to the same speaker a are mixed with voice segments of other speakers B, and the length of the voice segment of this speaker B is very small (smaller than a preset threshold), the speaker can be modified from B to a (the floating means is too short, the voice segment is different from the adjacent voice segment in decision, and the decision is modified to the same speaker as the adjacent decision).
And 105, comparing the voice segment which is not processed currently and is most time-sequenced with the voice features of the processed speakers, determining the speaker corresponding to the currently compared voice segment, and setting the currently compared voice segment as the processed voice segment.
In the step 104, after the voice features of each person (speaker) are obtained, subsequent character analysis and recognition processing may be performed, for example, online processing may be performed, and a subsequent segment (segment) is compared with the two obtained speaker features to obtain the speaker to which the segment belongs, that is, the speaker of the voice or the speaker of the voice; and then smoothing is carried out to be combined with the existing information. This step loops until the session ends.
Example 2
Embodiment 2 of the present invention further discloses a speech analysis device, which is applied to single-channel speech analysis, and as shown in fig. 3, the speech analysis device includes:
a first segmentation module 201, configured to segment the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers;
a second segmentation module 202, configured to segment the voice portion into a plurality of voice segments;
the clustering module 203 is configured to cluster the voice segments with time exceeding a preset duration to obtain information of each voice segment;
a determining module 204, configured to process each of the speech segments for which information is determined, and determine speech features of a plurality of speakers;
the analysis module 205 is configured to compare the currently unprocessed voice segment with the most time-ordered voice features with the plurality of processed voice features of the speaker, determine the speaker corresponding to the currently compared voice segment, and set the currently compared voice segment as the processed voice segment.
In a specific embodiment, the first dividing module 201 is configured to:
and segmenting the voice data to be analyzed by a Voice Activity Detection (VAD) method so as to divide the voice data to be analyzed into a voice part and a non-voice part.
In a specific embodiment, the second segmentation module 202 is configured to:
dividing the voice part into a plurality of non-overlapping voice segments according to a preset time length;
and if the time length of the last voice segment is less than the preset value, combining the last voice segment with the adjacent voice segments.
In a specific embodiment, the speech segments include forward and backward frames and/or overlap.
In a specific embodiment, in a time period corresponding to the preset time duration, each speaker in the voice data to be analyzed performs voice with a specified time duration.
In a particular embodiment, the information includes any combination of one or more of the following: characteristics, speaker information of the voice, time point of the voice.
In a specific embodiment, the determining module 204 is configured to:
smoothing each voice segment with determined information, merging adjacent voice segments belonging to the same speaker, and setting the speaker corresponding to a preset voice segment as the speaker same as the adjacent voice segments so as to determine the voice characteristics of a plurality of speakers;
the preset voice segment is located between the front and the back adjacent voice segments, the corresponding speakers of the front and the back adjacent voice segments are the same, and the time length of the preset voice segment is smaller than the preset time length.
Therefore, the embodiment of the invention provides a voice analysis method and equipment, which are applied to single-channel voice analysis, and the method comprises the following steps: dividing the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers; segmenting the voice part into a plurality of voice segments; clustering voice segments with time exceeding a preset duration to obtain information of each voice segment; processing each of the voice segments with determined information to determine voice characteristics of a plurality of the speakers; comparing the voice segment which is not processed currently and is sorted in time at the forefront with the voice characteristics of the plurality of processed speakers, determining the speaker corresponding to the voice segment which is compared currently, and setting the voice segment which is compared currently as the processed voice segment. The role separation is carried out to the single channel audio frequency, can isolate a plurality of pronunciation people's sound, does benefit to the follow-up operation such as quality control and intention analysis that carries on, and this scheme supports online real-time processing, has promoted the treatment effeciency.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned invention numbers are merely for description and do not represent the merits of the implementation scenarios.
The above disclosure is only a few specific implementation scenarios of the present invention, however, the present invention is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims (10)

1. A speech analysis method, applied to single-channel speech analysis, the method comprising:
dividing the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers;
segmenting the voice part into a plurality of voice segments;
clustering voice segments with time exceeding a preset duration to obtain information of each voice segment;
processing each of the voice segments with determined information to determine voice characteristics of a plurality of the speakers;
comparing the voice segment which is not processed currently and is sorted in time at the forefront with the voice characteristics of the plurality of processed speakers, determining the speaker corresponding to the voice segment which is compared currently, and setting the voice segment which is compared currently as the processed voice segment.
2. The speech analysis method of claim 1, wherein said segmenting the speech data to be analyzed into speech portions and non-speech portions comprises:
and segmenting the voice data to be analyzed by a Voice Activity Detection (VAD) method so as to divide the voice data to be analyzed into a voice part and a non-voice part.
3. The speech analysis method of claim 1, wherein said segmenting said speech portion into a plurality of speech segments comprises:
dividing the voice part into a plurality of non-overlapping voice segments according to a preset time length;
and if the time length of the last voice segment is less than the preset value, combining the last voice segment with the adjacent voice segments.
4. The speech analysis method of claim 1, wherein the speech segments comprise forward and backward frames and/or overlap.
5. The speech analysis method according to claim 1, wherein each speaker in the speech data to be analyzed performs speech of a specified duration in a time period corresponding to the preset duration.
6. The speech analysis method of claim 1, wherein the information comprises any combination of one or more of: characteristics, speaker of the speech, time point of the speech.
7. The speech analysis method of claim 1 wherein said processing each of said speech segments for which information is determined to determine speech characteristics of a plurality of said speakers comprises:
smoothing each voice segment with determined information, merging adjacent voice segments belonging to the same speaker, and setting the speaker corresponding to a preset voice segment as the speaker same as the adjacent voice segments so as to determine the voice characteristics of a plurality of speakers;
the preset voice segment is located between the front and the back adjacent voice segments, the corresponding speakers of the front and the back adjacent voice segments are the same, and the time length of the preset voice segment is smaller than the preset time length.
8. A speech analysis apparatus, for single channel speech analysis, comprising:
the first segmentation module is used for segmenting the voice data to be analyzed into a voice part and a non-voice part; the voice data to be analyzed comprises voice data of a plurality of speakers;
the second segmentation module is used for segmenting the voice part into a plurality of voice segments;
the clustering module is used for clustering the voice segments with the time exceeding the preset duration so as to obtain the information of each voice segment;
the determining module is used for processing each voice segment with determined information and determining the voice characteristics of a plurality of speakers;
and the analysis module is used for comparing the voice segment which is not processed currently and is most time-sequenced with the voice characteristics of the processed speakers, determining the speaker corresponding to the currently compared voice segment, and setting the currently compared voice segment as the processed voice segment.
9. The speech analysis device of claim 8, wherein the first scoring module is to:
and segmenting the voice data to be analyzed by a Voice Activity Detection (VAD) method so as to divide the voice data to be analyzed into a voice part and a non-voice part.
10. The speech analysis device of claim 8, wherein the second segmentation module is to:
dividing the voice part into a plurality of non-overlapping voice segments according to a preset time length;
and if the time length of the last voice segment is less than the preset value, combining the last voice segment with the adjacent voice segments.
CN202010444381.8A 2020-05-22 2020-05-22 Voice analysis method and equipment Pending CN111613249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010444381.8A CN111613249A (en) 2020-05-22 2020-05-22 Voice analysis method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010444381.8A CN111613249A (en) 2020-05-22 2020-05-22 Voice analysis method and equipment

Publications (1)

Publication Number Publication Date
CN111613249A true CN111613249A (en) 2020-09-01

Family

ID=72201644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010444381.8A Pending CN111613249A (en) 2020-05-22 2020-05-22 Voice analysis method and equipment

Country Status (1)

Country Link
CN (1) CN111613249A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707130A (en) * 2021-08-16 2021-11-26 北京搜狗科技发展有限公司 Voice recognition method and device for voice recognition
WO2022161264A1 (en) * 2021-01-26 2022-08-04 阿里巴巴集团控股有限公司 Audio signal processing method, conference recording and presentation method, device, system, and medium

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009109712A (en) * 2007-10-30 2009-05-21 National Institute Of Information & Communication Technology System for sequentially distinguishing online speaker and computer program thereof
CN102682760A (en) * 2011-03-07 2012-09-19 株式会社理光 Overlapped voice detection method and system
CN102831891A (en) * 2011-06-13 2012-12-19 富士通株式会社 Processing method and system for voice data
US20140074467A1 (en) * 2012-09-07 2014-03-13 Verint Systems Ltd. Speaker Separation in Diarization
US20150025887A1 (en) * 2013-07-17 2015-01-22 Verint Systems Ltd. Blind Diarization of Recorded Calls with Arbitrary Number of Speakers
WO2016152132A1 (en) * 2015-03-25 2016-09-29 日本電気株式会社 Speech processing device, speech processing system, speech processing method, and recording medium
US20170323643A1 (en) * 2016-05-03 2017-11-09 SESTEK Ses ve Ìletisim Bilgisayar Tekn. San. Ve Tic. A.S. Method for Speaker Diarization
CN107967912A (en) * 2017-11-28 2018-04-27 广州势必可赢网络科技有限公司 A kind of voice dividing method and device
CN108564968A (en) * 2018-04-26 2018-09-21 广州势必可赢网络科技有限公司 A kind of method and device of evaluation customer service
CN109036386A (en) * 2018-09-14 2018-12-18 北京网众共创科技有限公司 A kind of method of speech processing and device
CN110299150A (en) * 2019-06-24 2019-10-01 中国科学院计算技术研究所 A kind of real-time voice speaker separation method and system
CN110390946A (en) * 2019-07-26 2019-10-29 龙马智芯(珠海横琴)科技有限公司 A kind of audio signal processing method, device, electronic equipment and storage medium
CN110459240A (en) * 2019-08-12 2019-11-15 新疆大学 The more speaker's speech separating methods clustered based on convolutional neural networks and depth
CN110517667A (en) * 2019-09-03 2019-11-29 龙马智芯(珠海横琴)科技有限公司 A kind of method of speech processing, device, electronic equipment and storage medium
CN110827853A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Voice feature information extraction method, terminal and readable storage medium
CN110853666A (en) * 2019-12-17 2020-02-28 科大讯飞股份有限公司 Speaker separation method, device, equipment and storage medium
CN110930984A (en) * 2019-12-04 2020-03-27 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN111063341A (en) * 2019-12-31 2020-04-24 苏州思必驰信息科技有限公司 Method and system for segmenting and clustering multi-person voice in complex environment
CN111128223A (en) * 2019-12-30 2020-05-08 科大讯飞股份有限公司 Text information-based auxiliary speaker separation method and related device
CN111145782A (en) * 2019-12-20 2020-05-12 深圳追一科技有限公司 Overlapped speech recognition method, device, computer equipment and storage medium
US20200234717A1 (en) * 2018-05-28 2020-07-23 Ping An Technology (Shenzhen) Co., Ltd. Speaker separation model training method, two-speaker separation method and computing device

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009109712A (en) * 2007-10-30 2009-05-21 National Institute Of Information & Communication Technology System for sequentially distinguishing online speaker and computer program thereof
CN102682760A (en) * 2011-03-07 2012-09-19 株式会社理光 Overlapped voice detection method and system
CN102831891A (en) * 2011-06-13 2012-12-19 富士通株式会社 Processing method and system for voice data
US20140074467A1 (en) * 2012-09-07 2014-03-13 Verint Systems Ltd. Speaker Separation in Diarization
US20150025887A1 (en) * 2013-07-17 2015-01-22 Verint Systems Ltd. Blind Diarization of Recorded Calls with Arbitrary Number of Speakers
WO2016152132A1 (en) * 2015-03-25 2016-09-29 日本電気株式会社 Speech processing device, speech processing system, speech processing method, and recording medium
US20170323643A1 (en) * 2016-05-03 2017-11-09 SESTEK Ses ve Ìletisim Bilgisayar Tekn. San. Ve Tic. A.S. Method for Speaker Diarization
CN107967912A (en) * 2017-11-28 2018-04-27 广州势必可赢网络科技有限公司 A kind of voice dividing method and device
CN108564968A (en) * 2018-04-26 2018-09-21 广州势必可赢网络科技有限公司 A kind of method and device of evaluation customer service
US20200234717A1 (en) * 2018-05-28 2020-07-23 Ping An Technology (Shenzhen) Co., Ltd. Speaker separation model training method, two-speaker separation method and computing device
CN109036386A (en) * 2018-09-14 2018-12-18 北京网众共创科技有限公司 A kind of method of speech processing and device
CN110299150A (en) * 2019-06-24 2019-10-01 中国科学院计算技术研究所 A kind of real-time voice speaker separation method and system
CN110390946A (en) * 2019-07-26 2019-10-29 龙马智芯(珠海横琴)科技有限公司 A kind of audio signal processing method, device, electronic equipment and storage medium
CN110459240A (en) * 2019-08-12 2019-11-15 新疆大学 The more speaker's speech separating methods clustered based on convolutional neural networks and depth
CN110517667A (en) * 2019-09-03 2019-11-29 龙马智芯(珠海横琴)科技有限公司 A kind of method of speech processing, device, electronic equipment and storage medium
CN110827853A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Voice feature information extraction method, terminal and readable storage medium
CN110930984A (en) * 2019-12-04 2020-03-27 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN110853666A (en) * 2019-12-17 2020-02-28 科大讯飞股份有限公司 Speaker separation method, device, equipment and storage medium
CN111145782A (en) * 2019-12-20 2020-05-12 深圳追一科技有限公司 Overlapped speech recognition method, device, computer equipment and storage medium
CN111128223A (en) * 2019-12-30 2020-05-08 科大讯飞股份有限公司 Text information-based auxiliary speaker separation method and related device
CN111063341A (en) * 2019-12-31 2020-04-24 苏州思必驰信息科技有限公司 Method and system for segmenting and clustering multi-person voice in complex environment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022161264A1 (en) * 2021-01-26 2022-08-04 阿里巴巴集团控股有限公司 Audio signal processing method, conference recording and presentation method, device, system, and medium
CN113707130A (en) * 2021-08-16 2021-11-26 北京搜狗科技发展有限公司 Voice recognition method and device for voice recognition

Similar Documents

Publication Publication Date Title
US10902856B2 (en) System and method of diarization and labeling of audio data
CN105161093B (en) A kind of method and system judging speaker's number
US8145486B2 (en) Indexing apparatus, indexing method, and computer program product
CN112289323B (en) Voice data processing method and device, computer equipment and storage medium
CN107562760B (en) Voice data processing method and device
CN105632501A (en) Deep-learning-technology-based automatic accent classification method and apparatus
CN106157951B (en) Carry out the automatic method for splitting and system of audio punctuate
CN108257592A (en) A kind of voice dividing method and system based on shot and long term memory models
KR101616112B1 (en) Speaker separation system and method using voice feature vectors
US20180047387A1 (en) System and method for generating accurate speech transcription from natural speech audio signals
CN106847259B (en) Method for screening and optimizing audio keyword template
CN111613249A (en) Voice analysis method and equipment
CN101625860A (en) Method for self-adaptively adjusting background noise in voice endpoint detection
US7689414B2 (en) Speech recognition device and method
CN112802498B (en) Voice detection method, device, computer equipment and storage medium
CN113744742A (en) Role identification method, device and system in conversation scene
CN115063155A (en) Data labeling method and device, computer equipment and storage medium
CN112241467A (en) Audio duplicate checking method and device
CN111613208B (en) Language identification method and equipment
CN115100701A (en) Conference speaker identity identification method based on artificial intelligence technology
US20230215439A1 (en) Training and using a transcript generation model on a multi-speaker audio stream
JPH0683384A (en) Automatic detecting and identifying device for vocalization section of plural speakers in speech
CN114299962A (en) Method, system, device and storage medium for separating conversation role based on audio stream
CN115985315A (en) Speaker labeling method, device, electronic equipment and storage medium
CN114333784A (en) Information processing method, information processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200901