CN105895116B - Double-track voice break-in analysis method - Google Patents

Double-track voice break-in analysis method Download PDF

Info

Publication number
CN105895116B
CN105895116B CN201610209686.4A CN201610209686A CN105895116B CN 105895116 B CN105895116 B CN 105895116B CN 201610209686 A CN201610209686 A CN 201610209686A CN 105895116 B CN105895116 B CN 105895116B
Authority
CN
China
Prior art keywords
time
call
endpoint
end point
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610209686.4A
Other languages
Chinese (zh)
Other versions
CN105895116A (en
Inventor
刘郁松
何国涛
李全忠
蒲瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd
Original Assignee
Puqiang Information Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puqiang Information Technology (beijing) Co Ltd filed Critical Puqiang Information Technology (beijing) Co Ltd
Priority to CN201610209686.4A priority Critical patent/CN105895116B/en
Publication of CN105895116A publication Critical patent/CN105895116A/en
Application granted granted Critical
Publication of CN105895116B publication Critical patent/CN105895116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Abstract

The invention discloses a double-track voice break-in analysis method, which carries out effective voice endpoint detection on recording streams of two tracks by a voice activity detection technology to find the talk-over from a few seconds to a few seconds in the whole voice; according to the effective voice endpoints recorded by the two sound channels, the endpoint time of each segment is processed in a unified mode, the endpoint is described in a unified mode through three attributes of the time point, the sound channel and the endpoint type, and all the endpoints are tiled on a time axis; and traversing all the time points from front to back, and analyzing whether the endpoint types are the starting position endpoint and the ending position endpoint. The double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.

Description

Double-track voice break-in analysis method
Technical Field
The invention belongs to the technical field of customer service calls, and particularly relates to a double-track voice break-in analysis method.
Background
The voice customer service is customer service mainly carried out in a mobile phone mode, and the problems of call snatching and call insertion often occur between two or more roles in the process of the customer service. The speech robbing refers to the situation between two characters, one character just speaks, the other character speaks immediately, and no time interval exists between the two characters, which is a polite way in the conversation and can be considered as being outstanding and not serious by the other party. Interlude refers to a way between two characters, one of which is speaking and the other of which is directly interlude to express its own opinions, which is a much less polite way in a conversation. The phenomena of call snatching and call inserting seriously affect the quality level of the customer service.
Disclosure of Invention
The invention aims to provide a method for analyzing the emergency call and the plug-in call of the double-track voice, and aims to solve the problems of emergency call and plug-in call in the customer service process.
The invention is realized in this way, the method for analyzing the double-track voice inserting speech includes the following steps:
the method comprises the following steps that firstly, effective voice endpoint detection is carried out on recording streams of two sound channels through a voice activity detection technology, and the fact that the whole voice is over-talking from a few seconds to a few seconds is found out;
step two, according to the effective voice end points of the two sound channel recordings, the end point time of each segment is processed in a unified mode, the end point time is described in a unified mode through three attributes of the time point, the sound channel and the end point type, and all the end points are tiled on a time axis;
and step three, two end points are arranged next to each other, wherein the former end point is the starting end point of the speaking of the role A, and the latter end point is the ending end point of the speaking of the role B, which is the phenomenon of call insertion.
And step four, two end points are arranged next to each other, wherein the former end point is the end point of the speaking of the role A, the latter end point is the start end point of the speaking of the role B, and the time boundary difference of the two end points is less than 200ms, namely the phenomenon of the call robbing.
The invention also adopts the following technical measures:
the valid voice endpoint in step one comprises three attributes of a start time, an end time and a speaker.
The endpoint types in step two include start and end.
The method for analyzing the endpoint type comprises the following steps:
step one, checking the type of an endpoint;
step two, if the stack top is the starting position end point, judging whether the stack top contains the starting position;
step three, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the starting position;
step four, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;
step five, if the difference is different, the occurrence of the call-in is indicated, the call-in information is recorded, and the end point at the top of the stack is popped up;
step six, if the stack top does not contain the starting position, the starting position is pushed, the end position is added by 1, and the circulation is continued;
step seven, if the stack top is the end position end point, judging whether the stack top comprises a start position;
step eight, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the ending position;
step nine, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;
step ten, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;
step eleven, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the end point of the stack top is popped up;
step twelve, arranging and recording all the inserting message information, wherein each inserting message section comprises a starting time, an ending time, a type and an inserting direction.
The invention has the advantages and positive effects that: the double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.
Drawings
Fig. 1 is a flowchart of a method for analyzing a double-channel speech break-in provided by an embodiment of the present invention;
fig. 2 is a flowchart of a method for analyzing peer types according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The application of the principles of the present invention will be further described with reference to the accompanying figures 1 and 2 and the specific embodiments.
The double-track voice break-in analysis method comprises the following steps:
s101, performing effective voice endpoint detection on the recording streams of the two sound channels through a voice activity detection technology to find the talk-over-speech in the whole voice from several seconds to several seconds;
s102, according to the effective voice endpoints recorded by the two sound channels, the endpoint time of each segment is processed in a unified mode, the endpoint time is described in a unified mode through three attributes of the time point, the sound channel and the endpoint type, and all the endpoints are tiled on a time axis;
and S103, two end points which are close to each other, wherein the former end point is the starting end point of the speaking of the role A, and the latter end point is the ending end point of the speaking of the role B, which is the phenomenon of call interruption.
And S104, two end points which are close to each other, wherein the former end point is the end point of the speaking of the role A, the latter end point is the start end point of the speaking of the role B, and the time boundary difference of the two end points is less than 200ms, namely the phenomenon of the call robbing.
The valid speech endpoint in S101 contains three attributes, a start time, an end time, and a speaker.
The endpoint type in S102 includes start and end.
The method for analyzing the endpoint type comprises the following steps:
s201, checking the type of an endpoint;
s202, if the stack top is the starting position end point, judging whether the stack top contains the starting position;
s203, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the starting position;
s204, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;
s205, if the difference is different, the occurrence of the call-in is indicated, the call-in information is recorded, and the end point at the top of the stack is popped up;
s206, if the stack top does not contain the starting position, the starting position is pushed, the end position is added with 1, and the circulation is continued;
s207, if the stack top is the end position, judging whether the stack top comprises a starting position;
s208, if the stack top contains the starting position, judging whether the starting time position is the same as the role of the ending position;
s209, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;
s210, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;
s211, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the stack top end point is popped up;
s212, all the information of the emergency call is sorted and recorded, wherein each section of the emergency call comprises a start time, an end time, a type (emergency call or call), and a direction (who calls who).
The double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (1)

1. A double-track voice break-in analysis method is characterized by comprising the following steps:
the method comprises the following steps that firstly, effective voice endpoint detection is carried out on recording streams of two sound channels through a voice activity detection technology, and the fact that the whole voice is over-talking from a few seconds to a few seconds is found out;
step two, according to the effective voice end points of the two sound channel recordings, the end point time of each segment is processed in a unified mode, the end point time is described in a unified mode through three attributes of the time point, the sound channel and the end point type, and all the end points are tiled on a time axis;
step three, traversing all time points from front to back, and analyzing whether the endpoint types are starting position endpoints and ending position endpoints;
the effective voice endpoint in the step one comprises three attributes of a start time, an end time and a speaker;
the end point type in the second step comprises start and end;
the method for analyzing the endpoint types comprises the following steps:
step 1, checking the type of an endpoint;
step 2, if the stack top is the starting position end point, judging whether the stack top contains the starting position;
step 3, if the stack top contains a starting position, judging whether the starting time position is the same as the role of the starting position;
step 4, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;
step 5, if the difference is different, indicating that the call is inserted, recording the information of the call, and popping up the end point at the top of the stack;
step 6, if the stack top does not contain the starting position, the starting position is pushed, the end position is added by 1, and the circulation is continued;
step 7, if the stack top is the end position end point, judging whether the stack top contains the start position;
step 8, if the stack top contains a starting position, judging whether the starting time position is the same as the role of the ending position;
step 9, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;
step 10, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;
step 11, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the stack top end point is popped up;
and 12, arranging and recording all the inserting message information, wherein each segment of inserting message comprises a starting time, an ending time, a type and an inserting direction.
CN201610209686.4A 2016-04-06 2016-04-06 Double-track voice break-in analysis method Active CN105895116B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610209686.4A CN105895116B (en) 2016-04-06 2016-04-06 Double-track voice break-in analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610209686.4A CN105895116B (en) 2016-04-06 2016-04-06 Double-track voice break-in analysis method

Publications (2)

Publication Number Publication Date
CN105895116A CN105895116A (en) 2016-08-24
CN105895116B true CN105895116B (en) 2020-01-03

Family

ID=57012984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610209686.4A Active CN105895116B (en) 2016-04-06 2016-04-06 Double-track voice break-in analysis method

Country Status (1)

Country Link
CN (1) CN105895116B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109600526A (en) * 2019-01-08 2019-04-09 上海上湖信息技术有限公司 Customer service quality determining method and device, readable storage medium storing program for executing
CN111147669A (en) * 2019-12-30 2020-05-12 科讯嘉联信息技术有限公司 Full real-time automatic service quality inspection system and method
CN112511698B (en) * 2020-12-03 2022-04-01 普强时代(珠海横琴)信息技术有限公司 Real-time call analysis method based on universal boundary detection
CN113066496A (en) * 2021-03-17 2021-07-02 浙江百应科技有限公司 Method for analyzing call robbing of two conversation parties in audio

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001265368A (en) * 2000-03-17 2001-09-28 Omron Corp Voice recognition device and recognized object detecting method
CN102522081A (en) * 2011-12-29 2012-06-27 北京百度网讯科技有限公司 Method for detecting speech endpoints and system
CN103811009A (en) * 2014-03-13 2014-05-21 华东理工大学 Smart phone customer service system based on speech analysis
CN104052610A (en) * 2014-05-19 2014-09-17 国家电网公司 Informatization intelligent conference dispatching management device and using method
WO2015001492A1 (en) * 2013-07-02 2015-01-08 Family Systems, Limited Systems and methods for improving audio conferencing services

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8914288B2 (en) * 2011-09-01 2014-12-16 At&T Intellectual Property I, L.P. System and method for advanced turn-taking for interactive spoken dialog systems
JP2015169827A (en) * 2014-03-07 2015-09-28 富士通株式会社 Speech processing device, speech processing method, and speech processing program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001265368A (en) * 2000-03-17 2001-09-28 Omron Corp Voice recognition device and recognized object detecting method
CN102522081A (en) * 2011-12-29 2012-06-27 北京百度网讯科技有限公司 Method for detecting speech endpoints and system
WO2015001492A1 (en) * 2013-07-02 2015-01-08 Family Systems, Limited Systems and methods for improving audio conferencing services
CN103811009A (en) * 2014-03-13 2014-05-21 华东理工大学 Smart phone customer service system based on speech analysis
CN104052610A (en) * 2014-05-19 2014-09-17 国家电网公司 Informatization intelligent conference dispatching management device and using method

Also Published As

Publication number Publication date
CN105895116A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
CN105895116B (en) Double-track voice break-in analysis method
US9571638B1 (en) Segment-based queueing for audio captioning
US8826210B2 (en) Visualization interface of continuous waveform multi-speaker identification
CN105979106B (en) A kind of the ringing tone recognition methods and system of call center system
US20150310863A1 (en) Method and apparatus for speaker diarization
EP3127114B1 (en) Situation dependent transient suppression
US10798135B2 (en) Switch controller for separating multiple portions of call
US20070041522A1 (en) System and method for integrating and managing E-mail, voicemail, and telephone conversations using speech processing techniques
CN103190139B (en) For providing the system and method for conferencing information
WO2014069076A1 (en) Conversation analysis device and conversation analysis method
US10504538B2 (en) Noise reduction by application of two thresholds in each frequency band in audio signals
CN109644192B (en) Audio delivery method and apparatus with speech detection period duration compensation
CN104023110A (en) Voiceprint recognition-based caller management method and mobile terminal
CN112995422A (en) Call control method and device, electronic equipment and storage medium
US10540983B2 (en) Detecting and reducing feedback
US11050871B2 (en) Storing messages
CN104851423A (en) Sound message processing method and device
US20130244623A1 (en) Updating Contact Information In A Mobile Communications Device
CN110225213B (en) Recognition method of voice call scene and audio policy server
WO2020046435A1 (en) Transcription presentation
US20220124193A1 (en) Presentation of communications
CN112511698B (en) Real-time call analysis method based on universal boundary detection
US10580410B2 (en) Transcription of communications
WO2014069443A1 (en) Complaint call determination device and complaint call determination method
CN113808592A (en) Method and device for transcribing call recording, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200309

Address after: 519000 room 105-58115, No. 6, Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province (centralized office area)

Patentee after: Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd

Address before: 100085 cloud base 4 / F, tower C, Software Park Plaza, building 4, No. 8, Dongbei Wangxi Road, Haidian District, Beijing

Patentee before: Puqiang Information Technology (Beijing) Co., Ltd.