CN110010124A - Equipment and the call method of inspection are examined in call - Google Patents
Equipment and the call method of inspection are examined in call Download PDFInfo
- Publication number
- CN110010124A CN110010124A CN201910282000.8A CN201910282000A CN110010124A CN 110010124 A CN110010124 A CN 110010124A CN 201910282000 A CN201910282000 A CN 201910282000A CN 110010124 A CN110010124 A CN 110010124A
- Authority
- CN
- China
- Prior art keywords
- voice
- real
- time
- phone
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000007689 inspection Methods 0.000 title claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 77
- 230000005236 sound signal Effects 0.000 claims abstract description 56
- 230000006854 communication Effects 0.000 claims abstract description 32
- 238000006243 chemical reaction Methods 0.000 claims abstract description 31
- 238000003058 natural language processing Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 21
- 238000004891 communication Methods 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 7
- 238000011946 reduction process Methods 0.000 claims description 6
- 238000013179 statistical model Methods 0.000 claims description 6
- 238000002203 pretreatment Methods 0.000 claims description 5
- 238000007418 data mining Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 7
- 230000009466 transformation Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention proposes that equipment is examined in a kind of call.Phone is conversed, and voice flow is exported.Real-time voice processing unit receives the voice flow of phone output, is shunted in real time to voice flow, generates sequential real-time voice segment, and exports real-time voice segment in order, and real-time voice processing unit transmits voice flow by udp protocol with phone.The voice flow of offline voice processing apparatus dialogue machine output is recorded, and carries out audio format conversion to the voice flow of recording, and export the audio signal converted through format.Speech recognition equipment carries out pretreatment and speech recognition to received real-time voice segment or audio signal, exports text results.Natural language processing device carries out natural language processing to text results, tests to communication process.The present invention also proposes a kind of call method of inspection, examines equipment to execute by above-mentioned call.
Description
Technical field
The present invention relates to user interaction techniques, test the skill of assessment more specifically to a kind of pair of communication process
Art.
Background technique
For enterprise of service industry, the level and user experience of customer service are the weights for determining service enterprise's acceptance of the users
Want factor.For financial circles enterprise, providing voice service by service calls is common customer service mode.In order to have
Effect promotes the service quality of service calls, improves customer satisfaction, needs to check the speech quality of service calls.Due to
The communication process of service calls is voice mode, therefore traditional detection methods are that the communication process of service calls is carried out entirety
Recording.It for the voice recorded, is inspected by random samples according to a certain percentage, alternatively, being transferred corresponding when receiving customer complaint
Voice segments drop into capable inspection.Traditional voice test mode is checked by way of manually listening to recording, is had artificial
Judge with the presence or absence of undesirable place in communication process, and records result.
The mode efficiency of manual inspection is very low, it is necessary to completely listen to the recording played according to normal word speed, can just make and sentence
It is disconnected.In the case where inspection personnel limited personnel, it is able to carry out the recording limited amount of inspection, the service calls with substantial amounts
Total amount compare, the ratio of sampling observation is too low.Inspection personnel only can satisfy the demand of customer complaint, needle in many cases,
Relevant telephonograph is transferred to the content of customer complaint to check.In order to promote inspection efficiency, broadcast sometimes using speed
The mode to play a record, but speed playback may ignore part details, have an adverse effect to checking on the quality.
In addition, the mode of manual inspection is all based on calling record progress, belong to postmortem, for promoting service quality
Contribution it is limited.It is more and more for the demand of customer phone checked in real time under increasingly increased competitive pressure, it is desirable to
It can be synchronized during the call of service calls and carry out quality examination, found the problem and solved in time.And the side manually checked
Formula is unable to satisfy the requirement checked in real time.
With the development of artificial intelligence technology, also occur utilizing speech recognition technology and natural language analysis technology at present
Carry out the scheme of service calls inspection.Speech recognition technology can convert speech into text, then pass through natural language analysis
Technology carries out language analysis to text, extracts and the closely related typical word of service quality or information, completion quality examination.Benefit
It can satisfy the offline requirement for checking (postmortem) and check in real time with speech recognition technology and natural language analysis technology.It is right
It checks, implements relatively easy in offline, it is only necessary to which the audio signal of recording, which is imported speech recognition equipment, to be converted into
Then text output carries out natural language analysis for text, can complete to check offline.But the realization difficulty checked in real time compared with
Height is needed using dedicated phone in the prior art, and configures the corollary equipments such as gateway, mirror-image system, could make phone
Real time communication is carried out with speech recognition server.Phone acquisition voice signal be sent in speech recognition server in real time into
Row speech recognition is simultaneously converted to text signal output, then carries out natural language analysis by natural language analysis server, obtains inspection
Come to an end fruit.Due to needing to realize that the cost checked in real time is very high using dedicated phone.Since the quantity of service calls is huge
Greatly, if existing common phone is all changed to dedicated telephone set, cost is very high, needs input quantity huge dedicated
The relevant devices such as phone, matched gateway, mirror-image system.The considerations of in cost, the scheme of the prior art are also unable to satisfy greatly
The real-time inspection requirements of the customer phone of amount.
Summary of the invention
The present invention is directed to propose a kind of technology that can be realized real-time speech quality with lower cost and examine.
An embodiment according to the present invention proposes that equipment is examined in a kind of call, comprising: phone, real-time voice processing unit,
Offline voice processing apparatus, speech recognition equipment and natural language processing device.Phone is conversed, and voice flow is exported.In real time
Voice processing apparatus is connected to phone, and real-time voice processing unit receives the voice flow of phone output, carries out to voice flow real-time
It shunts, generates sequential real-time voice segment, real-time voice processing unit exports real-time voice segment in order, wherein in real time
Voice processing apparatus transmits voice flow by udp protocol with phone.Offline voice processing apparatus is connected to phone, at offline voice
The voice flow of reason device dialogue machine output is recorded, and carries out audio format conversion, offline speech processes to the voice flow of recording
Device exports the audio signal converted through format.Speech recognition equipment and real-time voice processing unit and offline speech processes fill
It sets and communicates, pretreatment and speech recognition are carried out to received real-time voice segment or audio signal, export text results.From
Right language processing apparatus is connected to speech recognition equipment, and natural language processing device carries out natural language processing to text results,
It tests to communication process.
In one embodiment, real-time voice processing unit includes: voice stream reception device and voice flow branching device.Language
Sound flow receiving device is communicated with phone by udp protocol, and the voice flow of phone output is received.Voice flow branching device and voice flow
Reception device is communicated by udp protocol, is segmented to voice flow, and sequential real-time voice segment is formed, and voice flow shunts
Device exports real-time voice segment in order.
In one embodiment, phone accesses telecommunication carrier networks by Public Switched Telephone Network (PSTN), is talking about
After machine, PSTN and telecommunication carrier networks establish call and generate voice, pass through UDP between phone and real-time voice processing unit
Agreement, which is established to communicate, simultaneously transmits voice flow, phone, PSTN and telecommunication carrier networks end of conversation and hang up, phone and in real time
Communication between voice processing apparatus also terminates.
In one embodiment, it is communicated between speech recognition equipment and real-time voice processing unit with Real-time Transport Protocol foundation,
Real-time Transport Protocol executes the stage, is communicated between speech recognition equipment and real-time voice processing unit with udp protocol, real-time voice processing
Device transmits real-time voice segment to speech recognition equipment with udp protocol.
In one embodiment, offline voice processing apparatus includes: recording device and audio format conversion equipment.Recording dress
The voice flow for setting dialogue machine output is recorded.Audio format conversion equipment is connected to recording device, audio format conversion equipment
Audio format conversion is carried out to the voice flow that recording device is recorded, is exported to voice by the audio signal that audio format is converted
Identification device.
In one embodiment, speech recognition equipment includes pretreatment unit, and pretreatment unit is to the real-time language received
Tablet section or audio signal are pre-processed, and pretreatment unit includes: end point detecting device, denoising device and feature extraction dress
It sets.End point detecting device detects the starting endpoint and end caps of real-time voice segment or audio signal.Denoising device is to real-time
Sound bite or audio signal carry out noise reduction process.Feature deriving means extract voice from real-time voice segment or audio signal
Feature.
In one embodiment, speech recognition equipment includes identification and conversion equipment, and identification and conversion equipment are to pretreatment
The real-time voice segment or audio signal of device output carry out voice semantics recognition, and exporting has semantic text results, described
Identification and conversion equipment include: training device, identification device and document output appts.Training device be connected to speech database and
Language database, training device is using the semantic data in the voice data and language database in speech database, according to leading
The training modeling parameters entered are trained, and training device exports acoustic model and language model.Identification device is by real-time voice piece
Section or audio signal import acoustic model and language model, according to extracted phonetic feature, in conjunction with acoustic model and language mould
Type carries out speech recognition.Document output appts are converted to the result that identification device identifies with semantic text results and defeated
Out.
In one embodiment, training modeling parameters include: that voice and semantics model, signal processing model, data are dug
Dig model, statistical model.
An embodiment according to the present invention proposes a kind of call method of inspection, comprising:
Call establishment step passes through phone and establishes call, exports voice flow;
Real-time obtaining step receives the voice flow of phone output, is shunted in real time to voice flow, generate sequential reality
When sound bite and in order export real-time voice segment, wherein phone transmits voice flow by udp protocol;
The voice flow of offline obtaining step, dialogue machine output is recorded, and is carried out audio format to the voice flow of recording and is turned
It changes, exports the audio signal converted through format;
Speech recognition steps carry out pretreatment and speech recognition to received real-time voice segment or audio signal, defeated
Text results out;
Natural language processing step carries out natural language processing to text results, tests to communication process.
In one embodiment, real-time obtaining step includes:
Voice stream reception step, the voice flow exported by udp protocol communications reception phone;
Voice flow separating step is segmented voice flow using udp protocol communications voice flow, is formed with sequence
Real-time voice segment and in order export real-time voice segment.
In one embodiment, phone accesses telecommunication carrier networks by Public Switched Telephone Network (PSTN), is talking about
After machine, PSTN and telecommunication carrier networks establish call and generate voice, is communicated by udp protocol foundation with phone and transmit language
Sound stream, phone, PSTN and telecommunication carrier networks end of conversation and hang up, the communication between phone also terminates.
In one embodiment, speech recognition steps are executed by background computer, and background computer is established by Real-time Transport Protocol
Communication executes the stage in Real-time Transport Protocol, transmits real-time voice segment with udp protocol background computer.
In one embodiment, offline obtaining step includes:
The voice flow of recording step, dialogue machine output is recorded;
Audio format switch process carries out audio format conversion to the voice flow of recording, and output is converted by audio format
Audio signal.
In one embodiment, speech recognition steps include pre-treatment step, and pre-treatment step is to the real-time language received
Tablet section or audio signal are pre-processed, and pretreatment includes:
End-point detection step detects the starting endpoint and end caps of real-time voice segment or audio signal;
Noise reduction step carries out noise reduction process to real-time voice segment or audio signal;
Characteristic extraction step extracts phonetic feature from real-time voice segment or audio signal.
In one embodiment, speech recognition steps include identification and switch process, and identification and switch process are to pretreatment
The real-time voice segment or audio signal of output carry out voice semantics recognition, and exporting has semantic text results, identify and turn
Changing step includes:
Training step, using the semantic data in the voice data and language database in speech database, according to importing
Training modeling parameters be trained, export acoustic model and language model;
Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted by identification step
Phonetic feature carries out speech recognition in conjunction with acoustic model and language model;
Text output step is converted to the result that identification step identifies with semantic text results and output.
In one embodiment, training modeling parameters include: that voice and semantics model, signal processing model, data are dug
Dig model, statistical model.
Equipment is examined in call of the invention and the method for inspection of conversing can carry out inexpensive transformation on existing phone,
It makes full use of existing infrastructure to realize offline to examine with the speech quality of real-time both of which, be able to satisfy service calls field
Examination requirements, effectively promoted service calls service quality and user experience.
Detailed description of the invention
The above and other feature of the present invention, property and advantage will pass through description with reference to the accompanying drawings and examples
And become apparent, identical appended drawing reference always shows identical feature in the accompanying drawings, in which:
The structural block diagram of equipment is examined in the call that Fig. 1 discloses an embodiment according to the present invention.
Fig. 2 discloses the flow chart of the call method of inspection of an embodiment according to the present invention.
Fig. 3 a and Fig. 3 b disclose the progress of work that the present invention carries out real time phone call inspection.
Specific embodiment
Refering to what is shown in Fig. 1, the structural block diagram of equipment is examined in the call that Fig. 1 discloses an embodiment according to the present invention.It should
It includes: phone 102, real-time voice processing unit 104, offline voice processing apparatus 106, speech recognition equipment that equipment is examined in call
108 and natural language processing device 110.
Phone 102 is conversed, and voice flow is exported.In an embodiment of the present invention, phone 102 uses mandarin
Machine does not need to be changed to dedicated telephone set.
Real-time voice processing unit 104 is connected to phone 102, and real-time voice processing unit 104 receives what phone 102 exported
Voice flow shunts voice flow in real time, generates sequential real-time voice segment.Real-time voice processing unit 104 is by suitable
Sequence exports real-time voice segment, and wherein real-time voice processing unit 104 and phone 102 transmit voice flow by udp protocol.Scheming
In embodiment shown in 1, real-time voice processing unit 104 includes: voice stream reception device 142 and voice flow branching device 144.
Voice stream reception device 142 is communicated with phone by udp protocol, and voice stream reception device 142 receives the voice that phone 102 exports
Stream.Voice flow branching device 144 is communicated with voice stream reception device 142 by udp protocol.Voice flow branching device 144 is to language
Sound stream is segmented, and sequential real-time voice segment is formed, and voice flow branching device 144 exports real-time voice segment in order
To speech recognition equipment 108.
Offline voice processing apparatus 106 is connected to phone 102, the language of offline 106 dialogue machine 102 of voice processing apparatus output
Sound stream is recorded, and carries out audio format conversion to the voice flow of recording.The offline output of voice processing apparatus 106 is converted through format
Audio signal.In the embodiment shown in fig. 1, offline voice processing apparatus 106 includes: recording device 162 and audio format
Conversion equipment 164.The voice flow of 162 dialogue machine 102 of recording device output is recorded.Audio format conversion equipment 164 connects
To recording device 162.Audio format conversion equipment 164 carries out audio format conversion, warp to the voice flow that recording device 162 is recorded
The audio signal for crossing audio format conversion is exported to speech recognition equipment.In one embodiment, in audio format converting means
When setting 164 pairs of voice flows progress audio format conversions, voice flow can be split and time synchronization is added in voice flow
Signal, so that each section of voice flow can be corresponding with the air time of corresponding service calls, in order to carry out subsequent inspection
Test work.
In one embodiment, real-time voice processing unit 104 and offline voice processing apparatus 106 can integrate at one
In entity device.The entity device can be the general-purpose chip that hardware circuit is also possible to runs software.Real-time voice processing dress
Set 104 and offline voice processing apparatus 106 can respectively by can divided hardware circuit realize, can also be by general-purpose chip point
Do not run different software to realize.The entity for being integrated with real-time voice processing unit 104 and offline voice processing apparatus 106 is set
It is standby by wired or be wirelessly connected to phone 102.In one embodiment, connection is to pass through net in a wired manner
Line, and wirelessly connecting is to pass through WiFi.
Speech recognition equipment 108 is communicated with real-time voice processing unit 104 and offline voice processing apparatus 106.Language
Sound identification device 108 is to from the received real-time voice segment of real-time voice processing unit 104 or from offline voice processing apparatus
106 received audio signals carry out pretreatment and speech recognition, then export text results.In one embodiment, voice is known
Other device 108 is communicated with real-time voice processing unit 104 and offline voice processing apparatus 106 in wired or wireless manner
Letter.In one embodiment, connection is by cable in a wired manner, and wirelessly connecting is to pass through WiFi.?
In embodiment shown in FIG. 1, speech recognition equipment 108 includes pretreatment unit 181 and identification and conversion equipment 182.Pretreatment
Device 181 pre-processes the real-time voice segment or audio signal that receive, in the illustrated embodiment, pretreatment unit
181 include: end point detecting device 183, denoising device 184 and feature deriving means 185.End point detecting device 183 detects in real time
The starting endpoint and end caps of sound bite or audio signal.Denoising device 184 to real-time voice segment or audio signal into
Row noise reduction process.Feature deriving means 185 extract phonetic feature from real-time voice segment or audio signal.Identification and converting means
The real-time voice segment or audio signal progress voice semantics recognition that 182 pairs of pretreatment units 181 export are set, output has semantic
Text results.In the illustrated embodiment, identification and conversion equipment 182 include: training device 186, identification device 187 and text
This output device 188.Training device 186 is connected to speech database 191 and language database 192.Training device 186 uses language
The semantic data in voice data and language database 192 in sound database 191 is carried out according to the training modeling parameters of importing
Training.Training device 186 exports acoustic model 197 and language model 198.In the illustrated embodiment, it imported into training device
Training modeling parameters in 186 include: voice and semantics model 193, signal processing model 194, data mining model 195,
Statistical model 196.Identification device 187, will be real using the acoustic model 197 and language model 198 that are exported by training device 186
When sound bite or audio signal import acoustic model 197 and language model 198, then according to extracted phonetic feature, knot
It closes acoustic model and language model carries out speech recognition, and export the result of speech recognition.Document output appts 188 are connected to knowledge
Other device 187, document output appts 188 are converted to the result that identification device 187 identifies with semantic text results and defeated
Out.Speech recognition equipment 108 is realized using existing speech recognition technology on the whole, pretreatment therein and speech recognition skill
Art can be realized by purchase acquisition technology, therefore do not do deep discussion herein using existing.
Natural language processing device 110 is connected to speech recognition equipment 108.Natural language processing device 110 is to text knot
Fruit carries out natural language processing, parses the semanteme of text results.In the language for obtaining text results by natural language processing device 110
After justice, it can be tested based on the text results with semanteme to communication process.For text results, it can use and search
The technological means such as rope, matching carry out quality according to the keyword relevant to speech quality, keyword or key sentence of setting
It checks.Text based examines speed to be significantly higher than speech pattern, and accuracy rate is high, it is not easy to generate omission.Utilize text
The high speed and high accuracy of inspection, sampling observation rate can be obviously improved in off-line check, cover more speech samples.Meanwhile
Fast and accurately detectability also can satisfy the requirement of real-time inspection.Also need explanation, natural language processing device
110 can be realized by buying existing NLP technology, therefore using existing natural language processing (NLP) technology on the whole
Deep discussion is not done to NLP technology herein.
In one embodiment, speech recognition equipment 108 and natural language processing device 110 can be set in background service
On device.Speech recognition equipment 108 and natural language processing device 110 can be the dedicated hardware disposed on background server
Circuit can also be realized by the general-purpose chip runs software of background server.It is deployed with speech recognition equipment 108 and natural language
The background server of speech processing unit 110 passes through wired or be wirelessly connected to phone 102.In one embodiment,
Connection is by cable in a wired manner, and wirelessly connecting is to pass through WiFi.
An important use of the invention is to be to realize that call is examined in real time.Fig. 3 a and Fig. 3 b disclose the present invention into
The progress of work that row real time phone call is examined.Referring initially to shown in Fig. 3 a, phone accesses electricity by common exchanging telephone network PSTN
Believe carrier network FXO.Establish call between PSTN and FXO, generate voice between PSTN and FXO, at the same PSTN and phone it
Between by udp protocol establish communicate and transmit voice flow.It is established in phone, PSTN and telecommunication carrier networks FXO and converses and produce
After raw voice, is communicated between phone and real-time voice processing unit by udp protocol foundation and transmit voice flow.End of conversation
Afterwards, it the end of conversation of PSTN and telecommunication carrier networks TXO and hangs up, the communication between PSTN and phone is also hung up, at this moment,
Communication between phone and real-time voice processing unit also terminates.Fig. 3 a is from the angle for establishing call between each equipment
Establish the process for communicating and transmitting voice flow.With reference next to shown in Fig. 3 b, Fig. 3 b is the communication from phone and background server
Angle observation establish and communicate and transmit the process of voice flow.It is deployed with speech recognition equipment and (generally also while being deployed with nature
Language processing apparatus) background server and real-time voice processing unit between with Real-time Transport Protocol foundation communicate.Establish the mistake of communication
Journey includes: to invite (INVITE), attempt (100Tring), dialing (180Ring), wait (2000K), confirmation (ACK), is then opened
Beginning Real-time Transport Protocol communicates (RTP).Between background server and phone at Real-time Transport Protocol communication period, phone and real-time voice
It is established between reason device by udp protocol and communicates and transmit voice flow.Meanwhile speech recognition equipment and real-time voice processing unit
Between also communicated with udp protocol, real-time voice processing unit with udp protocol to speech recognition equipment transmit real-time voice segment.
During maintaining udp protocol between phone and real-time voice processing unit and transmitting voice flow, between background server and phone
Real-time Transport Protocol continuously carry out to maintain, sign off between phone and real-time voice processing unit, until phone and reality
When voice processing apparatus between communication can be with reference to shown in Fig. 3 a.Pass through UDP between phone and real-time voice processing unit
After the sign off of agreement, the Real-time Transport Protocol between background server and phone also terminates.By terminating (BYE) and waiting
After (2000K) process, entire communication process terminates.
Fig. 2 discloses the flow chart of the call method of inspection of an embodiment according to the present invention.The present invention also proposes one kind
The call method of inspection.The call method of inspection is the course of work that equipment is examined in call shown in FIG. 1.Refering to what is shown in Fig. 2, should
The call method of inspection comprises the following steps that
202. call establishment steps.It in call establishment step, is established and is conversed by phone, export voice flow.
204. real-time obtaining steps.In real-time obtaining step, the voice flow of phone output is received, voice flow is carried out real
When shunt, generate sequential real-time voice segment and in order export real-time voice segment, wherein phone by udp protocol biography
Defeated voice flow.In one embodiment, real-time obtaining step includes:
Voice stream reception step, the voice flow exported by udp protocol communications reception phone.
Voice flow separating step is segmented voice flow using udp protocol communications voice flow, is formed with sequence
Real-time voice segment and in order export real-time voice segment.
The implementation procedure of real-time obtaining step is corresponding with the course of work of real-time voice processing unit above-mentioned.
206. offline obtaining steps.In offline obtaining step, the voice flow of dialogue machine output is recorded, to recording
Voice flow carries out audio format conversion, exports the audio signal converted through format.In one embodiment, offline obtaining step packet
It includes:
The voice flow of recording step, dialogue machine output is recorded.
Audio format switch process carries out audio format conversion to the voice flow of recording, and output is converted by audio format
Audio signal.
The implementation procedure of offline obtaining step is corresponding with the course of work of offline voice processing apparatus above-mentioned.
208. speech recognition steps.In speech recognition steps, to received real-time voice segment or audio signal into
Row pretreatment and speech recognition, export text results.In one embodiment, speech recognition steps include pre-treatment step and knowledge
Other and switch process.
Pre-treatment step pre-processes the real-time voice segment or audio signal that receive, and pretreatment includes:
End-point detection step detects the starting endpoint and end caps of real-time voice segment or audio signal.
Noise reduction step carries out noise reduction process to real-time voice segment or audio signal.
Characteristic extraction step extracts phonetic feature from real-time voice segment or audio signal.
Identification and switch process carry out voice semantics recognition to the real-time voice segment or audio signal of pretreatment output, defeated
Semantic text results are provided, identification and switch process include:
Training step, using the semantic data in the voice data and language database in speech database, according to importing
Training modeling parameters be trained, export acoustic model and language model.In one embodiment, training modeling parameters packet
It includes: voice and semantics model, signal processing model, data mining model, statistical model.
Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted by identification step
Phonetic feature carries out speech recognition in conjunction with acoustic model and language model.
Text output step is converted to the result that identification step identifies with semantic text results and output.
The implementation procedure of speech recognition steps is corresponding with the course of work of speech recognition equipment above-mentioned.
210. natural language processing steps.In natural language processing step, natural language processing is carried out to text results,
It tests to communication process.The implementation procedure of natural language processing step is worked with natural language processing device above-mentioned
Journey is corresponding.
One important use of the call method of inspection is also to be to realize that call is examined in real time, with reference also to Fig. 3 a and
Shown in Fig. 3 b, Fig. 3 a is to establish the process for communicating and transmitting voice flow between each equipment from the angle for establishing call: words
Machine accesses telecommunication carrier networks FXO by common exchanging telephone network PSTN, in phone, PSTN and telecommunication carrier networks
After FXO establishes call and generates voice, voice flow, phone, PSTN and telecommunications are communicated and transmitted with phone by udp protocol foundation
The end of conversation of carrier network is simultaneously hung up, and the communication between phone also terminates.Fig. 3 b is from phone and background server
The process for communicating and transmitting voice flow is established in the angle observation of communication: speech recognition steps are executed by background computer, and backstage is counted
Calculation machine is established by Real-time Transport Protocol and is communicated, and is executed the stage in Real-time Transport Protocol, is transmitted real-time voice piece with udp protocol background computer
Section.
The present invention only needs to carry out transformation appropriate to existing phone, increases a small amount of equipment, phone is passed through
Voice is streamed to background server by WiFi or cable.The present invention does not have to carry out the extensive or complicated system reform, right
In the small-scale intelligent solution using offer low cost.
Equipment is examined in call of the invention and the method for inspection of conversing can carry out inexpensive transformation on existing phone,
It makes full use of existing infrastructure to realize offline to examine with the speech quality of real-time both of which, be able to satisfy service calls field
Examination requirements, effectively promoted service calls service quality and user experience.
Above-described embodiment, which is available to, to be familiar with person in the art to realize or use the present invention, and is familiar with this field
Personnel can make various modifications or variation, thus this to above-described embodiment without departing from the present invention in the case of the inventive idea
The protection scope of invention is not limited by above-described embodiment, and should meet inventive features that claims are mentioned most
On a large scale.
Claims (16)
1. equipment is examined in a kind of call characterized by comprising
Phone, phone are conversed, and voice flow is exported;
Real-time voice processing unit is connected to phone, and real-time voice processing unit receives the voice flow of phone output, to voice flow
It being shunted in real time, generates sequential real-time voice segment, real-time voice processing unit exports real-time voice segment in order,
Wherein real-time voice processing unit transmits voice flow by udp protocol with phone;
Offline voice processing apparatus is connected to phone, and the voice flow of offline voice processing apparatus dialogue machine output is recorded, right
The voice flow of recording carries out audio format conversion, and offline voice processing apparatus exports the audio signal converted through format;
Speech recognition equipment is communicated with real-time voice processing unit and offline voice processing apparatus, to received real-time language
Tablet section or audio signal carry out pretreatment and speech recognition, export text results;
Natural language processing device, is connected to speech recognition equipment, and natural language processing device carries out nature language to text results
Speech processing, tests to communication process.
2. equipment is examined in call as described in claim 1, which is characterized in that the real-time voice processing unit includes:
Voice stream reception device is communicated with phone by udp protocol, and the voice flow of phone output is received;
Voice flow branching device is communicated by udp protocol with voice stream reception device, is segmented, is formed with suitable to voice flow
The real-time voice segment of sequence, voice flow branching device export real-time voice segment in order.
3. equipment is examined in call as described in claim 1, which is characterized in that the phone passes through Public Switched Telephone Network
(PSTN) telecommunication carrier networks are accessed, after phone, PSTN and telecommunication carrier networks are established and converse and generate voice, phone
Between real-time voice processing unit voice flow, phone, PSTN and telecom operators' net are communicated and transmitted by udp protocol foundation
The end of conversation of network is simultaneously hung up, and the communication between phone and real-time voice processing unit also terminates.
4. equipment is examined in call as described in claim 1, which is characterized in that the speech recognition equipment and real-time voice are handled
Established and communicated with Real-time Transport Protocol between device, execute the stage in Real-time Transport Protocol, speech recognition equipment and real-time voice processing unit it
Between communicated with udp protocol, real-time voice processing unit with udp protocol to speech recognition equipment transmit real-time voice segment.
5. equipment is examined in call as described in claim 1, which is characterized in that the offline voice processing apparatus includes:
The voice flow of recording device, the output of recording device dialogue machine is recorded;
Audio format conversion equipment, is connected to recording device, and the voice that audio format conversion equipment records recording device flows into
The conversion of row audio format is exported to speech recognition equipment by the audio signal that audio format is converted.
6. equipment is examined in call as described in claim 1, which is characterized in that the speech recognition equipment includes pretreatment dress
It sets, pretreatment unit pre-processes the real-time voice segment or audio signal that receive, and the pretreatment unit includes:
End point detecting device detects the starting endpoint and end caps of real-time voice segment or audio signal;
Denoising device carries out noise reduction process to real-time voice segment or audio signal;
Feature deriving means extract phonetic feature from real-time voice segment or audio signal.
7. equipment is examined in call as claimed in claim 6, which is characterized in that the speech recognition equipment includes identification and conversion
The real-time voice segment or audio signal progress voice semantics recognition that device, identification and conversion equipment exports pretreatment unit,
Exporting has semantic text results, and the identification and conversion equipment include:
Training device, training device are connected to speech database and language database, and training device uses in speech database
Semantic data in voice data and language database is trained according to the training modeling parameters of importing, training device output
Acoustic model and language model;
Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted voice by identification device
Feature carries out speech recognition in conjunction with acoustic model and language model;
Document output appts are converted to the result that identification device identifies with semantic text results and output.
8. equipment is examined in call as claimed in claim 7, which is characterized in that the trained modeling parameters include: voice and language
Justice learns model, signal processing model, data mining model, statistical model.
9. a kind of call method of inspection characterized by comprising
Call establishment step passes through phone and establishes call, exports voice flow;
Real-time obtaining step receives the voice flow of phone output, is shunted in real time to voice flow, generates sequential real-time language
Tablet section simultaneously exports real-time voice segment in order, and wherein phone transmits voice flow by udp protocol;
The voice flow of offline obtaining step, dialogue machine output is recorded, and carries out audio format conversion to the voice flow of recording, defeated
The audio signal converted out through format;
Speech recognition steps carry out pretreatment and speech recognition, output text to received real-time voice segment or audio signal
This result;
Natural language processing step carries out natural language processing to text results, tests to communication process.
10. the call method of inspection as claimed in claim 9, which is characterized in that the real-time obtaining step includes:
Voice stream reception step, the voice flow exported by udp protocol communications reception phone;
Voice flow separating step is segmented voice flow using udp protocol communications voice flow, forms sequential reality
When sound bite and in order export real-time voice segment.
11. the call method of inspection as claimed in claim 9, which is characterized in that the phone passes through Public Switched Telephone Network
(PSTN) telecommunication carrier networks are accessed, after phone, PSTN and telecommunication carrier networks are established and converse and generate voice, are passed through
Udp protocol foundation communicates with phone and transmits voice flow, phone, PSTN and telecommunication carrier networks end of conversation and hang up,
Communication between phone also terminates.
12. the call method of inspection as claimed in claim 9, which is characterized in that speech recognition steps are executed by background computer,
Background computer is established by Real-time Transport Protocol and is communicated, and is executed the stage in Real-time Transport Protocol, is transmitted real-time language with udp protocol background computer
Tablet section.
13. the call method of inspection as claimed in claim 9, which is characterized in that the offline obtaining step includes:
The voice flow of recording step, dialogue machine output is recorded;
Audio format switch process carries out audio format conversion, sound of the output by audio format conversion to the voice flow of recording
Frequency signal.
14. the call method of inspection as claimed in claim 9, which is characterized in that the speech recognition steps include pretreatment step
Suddenly, pre-treatment step pre-processes the real-time voice segment or audio signal that receive, and the pretreatment includes:
End-point detection step detects the starting endpoint and end caps of real-time voice segment or audio signal;
Noise reduction step carries out noise reduction process to real-time voice segment or audio signal;
Characteristic extraction step extracts phonetic feature from real-time voice segment or audio signal.
15. the call method of inspection as claimed in claim 14, which is characterized in that the speech recognition steps include identification and turn
It changes step, identification and switch process and voice semantics recognition is carried out to the real-time voice segment or audio signal of pretreatment output, it is defeated
Semantic text results are provided, the identification and switch process include:
Training step, using the semantic data in the voice data and language database in speech database, according to the instruction of importing
Practice modeling parameters to be trained, exports acoustic model and language model;
Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted voice by identification step
Feature carries out speech recognition in conjunction with acoustic model and language model;
Text output step is converted to the result that identification step identifies with semantic text results and output.
16. as claimed in claim 14 call the method for inspection, which is characterized in that the trained modeling parameters include: voice and
Semantics model, signal processing model, data mining model, statistical model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910282000.8A CN110010124A (en) | 2019-04-09 | 2019-04-09 | Equipment and the call method of inspection are examined in call |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910282000.8A CN110010124A (en) | 2019-04-09 | 2019-04-09 | Equipment and the call method of inspection are examined in call |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110010124A true CN110010124A (en) | 2019-07-12 |
Family
ID=67170647
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910282000.8A Pending CN110010124A (en) | 2019-04-09 | 2019-04-09 | Equipment and the call method of inspection are examined in call |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110010124A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161711A (en) * | 2020-04-01 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Method and device for sentence segmentation of flow type speech recognition text |
CN111432062A (en) * | 2020-05-12 | 2020-07-17 | 国网天津市电力公司 | Intelligent authentication recording telephone system |
CN112735421A (en) * | 2019-10-28 | 2021-04-30 | 北京京东尚科信息技术有限公司 | Real-time quality inspection method and device for voice call |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102082879A (en) * | 2009-11-27 | 2011-06-01 | 华为技术有限公司 | Method, device and system for call center speech detection |
CN102456344A (en) * | 2010-10-22 | 2012-05-16 | 中国电信股份有限公司 | System and method for analyzing customer behavior characteristic based on speech recognition technique |
CN102625005A (en) * | 2012-03-05 | 2012-08-01 | 广东天波信息技术股份有限公司 | Call center system with function of real-timely monitoring service quality and implement method of call center system |
CN103701999A (en) * | 2012-09-27 | 2014-04-02 | 中国电信股份有限公司 | Method and system for monitoring voice communication of call center |
US20150156392A1 (en) * | 2005-10-17 | 2015-06-04 | Cutting Edge Vision Llc | Pictures Using Voice Commands and Automatic Upload |
CN105206269A (en) * | 2015-08-14 | 2015-12-30 | 百度在线网络技术(北京)有限公司 | Voice processing method and device |
CN107464573A (en) * | 2017-09-06 | 2017-12-12 | 竹间智能科技(上海)有限公司 | A kind of new customer service call quality inspection system and method |
CN108090052A (en) * | 2018-01-05 | 2018-05-29 | 深圳市沃特沃德股份有限公司 | Voice translation method and device |
CN108965620A (en) * | 2018-08-24 | 2018-12-07 | 杭州数心网络科技有限公司 | A kind of artificial intelligence call center system |
CN109040482A (en) * | 2018-08-09 | 2018-12-18 | 武汉优品楚鼎科技有限公司 | The unattended intelligent phone inquiry method, system and device of field of securities |
CN109327632A (en) * | 2018-11-23 | 2019-02-12 | 深圳前海微众银行股份有限公司 | Intelligent quality inspection system, method and the computer readable storage medium of customer service recording |
-
2019
- 2019-04-09 CN CN201910282000.8A patent/CN110010124A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150156392A1 (en) * | 2005-10-17 | 2015-06-04 | Cutting Edge Vision Llc | Pictures Using Voice Commands and Automatic Upload |
CN102082879A (en) * | 2009-11-27 | 2011-06-01 | 华为技术有限公司 | Method, device and system for call center speech detection |
CN102456344A (en) * | 2010-10-22 | 2012-05-16 | 中国电信股份有限公司 | System and method for analyzing customer behavior characteristic based on speech recognition technique |
CN102625005A (en) * | 2012-03-05 | 2012-08-01 | 广东天波信息技术股份有限公司 | Call center system with function of real-timely monitoring service quality and implement method of call center system |
CN103701999A (en) * | 2012-09-27 | 2014-04-02 | 中国电信股份有限公司 | Method and system for monitoring voice communication of call center |
CN105206269A (en) * | 2015-08-14 | 2015-12-30 | 百度在线网络技术(北京)有限公司 | Voice processing method and device |
CN107464573A (en) * | 2017-09-06 | 2017-12-12 | 竹间智能科技(上海)有限公司 | A kind of new customer service call quality inspection system and method |
CN108090052A (en) * | 2018-01-05 | 2018-05-29 | 深圳市沃特沃德股份有限公司 | Voice translation method and device |
CN109040482A (en) * | 2018-08-09 | 2018-12-18 | 武汉优品楚鼎科技有限公司 | The unattended intelligent phone inquiry method, system and device of field of securities |
CN108965620A (en) * | 2018-08-24 | 2018-12-07 | 杭州数心网络科技有限公司 | A kind of artificial intelligence call center system |
CN109327632A (en) * | 2018-11-23 | 2019-02-12 | 深圳前海微众银行股份有限公司 | Intelligent quality inspection system, method and the computer readable storage medium of customer service recording |
Non-Patent Citations (1)
Title |
---|
王冲等: "《现代信息检索技术基本原理教程》", 30 November 2013 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112735421A (en) * | 2019-10-28 | 2021-04-30 | 北京京东尚科信息技术有限公司 | Real-time quality inspection method and device for voice call |
CN111161711A (en) * | 2020-04-01 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Method and device for sentence segmentation of flow type speech recognition text |
CN111161711B (en) * | 2020-04-01 | 2020-07-03 | 支付宝(杭州)信息技术有限公司 | Method and device for sentence segmentation of flow type speech recognition text |
CN111432062A (en) * | 2020-05-12 | 2020-07-17 | 国网天津市电力公司 | Intelligent authentication recording telephone system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110010124A (en) | Equipment and the call method of inspection are examined in call | |
CN110347863B (en) | Speaking recommendation method and device and storage medium | |
CN106156009A (en) | Voice translation method and device | |
CN106409283A (en) | Audio frequency-based man-machine mixed interaction system and method | |
CN110401577A (en) | A kind of voice exchange automatic test approach, electronic equipment, storage medium and system | |
CN111128241A (en) | Intelligent quality inspection method and system for voice call | |
CN107705791A (en) | Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition | |
CN102333163B (en) | External auxiliary testing apparatus, and testing system thereof and method thereof | |
CN105529038A (en) | Method and system for processing users' speech signals | |
CN106952657A (en) | A kind of speech quality detection method and device | |
CN102160335A (en) | Conversation detection in ambient telephony system | |
CN104410973A (en) | Recognition method and system for tape played phone fraud | |
CN111508527B (en) | Telephone answering state detection method, device and server | |
CN107566168A (en) | Remote configuring method, equipment configuration method and remote configuration facility method | |
CN105741841B (en) | Sound control method and electronic equipment | |
CN203747882U (en) | Test device used for efficiently detecting audio performance of IP telephone | |
CN111627448A (en) | System and method for realizing trial and talk control based on voice big data | |
CN108259653A (en) | A kind of tone testing method and device, system | |
CN110047486A (en) | Sound control method, device, server, system and storage medium | |
CN110556114B (en) | Speaker identification method and device based on attention mechanism | |
CN111696576A (en) | Intelligent voice robot talk test system | |
CN112261214A (en) | Network voice communication automatic test method and system | |
CN101977273B (en) | Method for testing conducting anti-interference performance of network voice communication equipment | |
CN105306685B (en) | The test method and mobile terminal of signal quality | |
CN116798431A (en) | Cross-mode multi-feature fusion audio voice recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190712 |