CN110010124A

CN110010124A - Equipment and the call method of inspection are examined in call

Info

Publication number: CN110010124A
Application number: CN201910282000.8A
Authority: CN
Inventors: 余伟; 岑葆凌; 胡发泽; 曹旭纬; 关惠宇
Original assignee: Shenzhen Ping An Comprehensive Financial Services Co Ltd Shanghai Branch
Current assignee: Shenzhen Ping An Comprehensive Financial Services Co Ltd Shanghai Branch
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2019-07-12

Abstract

The present invention proposes that equipment is examined in a kind of call.Phone is conversed, and voice flow is exported.Real-time voice processing unit receives the voice flow of phone output, is shunted in real time to voice flow, generates sequential real-time voice segment, and exports real-time voice segment in order, and real-time voice processing unit transmits voice flow by udp protocol with phone.The voice flow of offline voice processing apparatus dialogue machine output is recorded, and carries out audio format conversion to the voice flow of recording, and export the audio signal converted through format.Speech recognition equipment carries out pretreatment and speech recognition to received real-time voice segment or audio signal, exports text results.Natural language processing device carries out natural language processing to text results, tests to communication process.The present invention also proposes a kind of call method of inspection, examines equipment to execute by above-mentioned call.

Description

Equipment and the call method of inspection are examined in call

Technical field

The present invention relates to user interaction techniques, test the skill of assessment more specifically to a kind of pair of communication process Art.

Background technique

For enterprise of service industry, the level and user experience of customer service are the weights for determining service enterprise's acceptance of the users Want factor.For financial circles enterprise, providing voice service by service calls is common customer service mode.In order to have Effect promotes the service quality of service calls, improves customer satisfaction, needs to check the speech quality of service calls.Due to The communication process of service calls is voice mode, therefore traditional detection methods are that the communication process of service calls is carried out entirety Recording.It for the voice recorded, is inspected by random samples according to a certain percentage, alternatively, being transferred corresponding when receiving customer complaint Voice segments drop into capable inspection.Traditional voice test mode is checked by way of manually listening to recording, is had artificial Judge with the presence or absence of undesirable place in communication process, and records result.

The mode efficiency of manual inspection is very low, it is necessary to completely listen to the recording played according to normal word speed, can just make and sentence It is disconnected.In the case where inspection personnel limited personnel, it is able to carry out the recording limited amount of inspection, the service calls with substantial amounts Total amount compare, the ratio of sampling observation is too low.Inspection personnel only can satisfy the demand of customer complaint, needle in many cases, Relevant telephonograph is transferred to the content of customer complaint to check.In order to promote inspection efficiency, broadcast sometimes using speed The mode to play a record, but speed playback may ignore part details, have an adverse effect to checking on the quality.

In addition, the mode of manual inspection is all based on calling record progress, belong to postmortem, for promoting service quality Contribution it is limited.It is more and more for the demand of customer phone checked in real time under increasingly increased competitive pressure, it is desirable to It can be synchronized during the call of service calls and carry out quality examination, found the problem and solved in time.And the side manually checked Formula is unable to satisfy the requirement checked in real time.

With the development of artificial intelligence technology, also occur utilizing speech recognition technology and natural language analysis technology at present Carry out the scheme of service calls inspection.Speech recognition technology can convert speech into text, then pass through natural language analysis Technology carries out language analysis to text, extracts and the closely related typical word of service quality or information, completion quality examination.Benefit It can satisfy the offline requirement for checking (postmortem) and check in real time with speech recognition technology and natural language analysis technology.It is right It checks, implements relatively easy in offline, it is only necessary to which the audio signal of recording, which is imported speech recognition equipment, to be converted into Then text output carries out natural language analysis for text, can complete to check offline.But the realization difficulty checked in real time compared with Height is needed using dedicated phone in the prior art, and configures the corollary equipments such as gateway, mirror-image system, could make phone Real time communication is carried out with speech recognition server.Phone acquisition voice signal be sent in speech recognition server in real time into Row speech recognition is simultaneously converted to text signal output, then carries out natural language analysis by natural language analysis server, obtains inspection Come to an end fruit.Due to needing to realize that the cost checked in real time is very high using dedicated phone.Since the quantity of service calls is huge Greatly, if existing common phone is all changed to dedicated telephone set, cost is very high, needs input quantity huge dedicated The relevant devices such as phone, matched gateway, mirror-image system.The considerations of in cost, the scheme of the prior art are also unable to satisfy greatly The real-time inspection requirements of the customer phone of amount.

Summary of the invention

The present invention is directed to propose a kind of technology that can be realized real-time speech quality with lower cost and examine.

An embodiment according to the present invention proposes that equipment is examined in a kind of call, comprising: phone, real-time voice processing unit, Offline voice processing apparatus, speech recognition equipment and natural language processing device.Phone is conversed, and voice flow is exported.In real time Voice processing apparatus is connected to phone, and real-time voice processing unit receives the voice flow of phone output, carries out to voice flow real-time It shunts, generates sequential real-time voice segment, real-time voice processing unit exports real-time voice segment in order, wherein in real time Voice processing apparatus transmits voice flow by udp protocol with phone.Offline voice processing apparatus is connected to phone, at offline voice The voice flow of reason device dialogue machine output is recorded, and carries out audio format conversion, offline speech processes to the voice flow of recording Device exports the audio signal converted through format.Speech recognition equipment and real-time voice processing unit and offline speech processes fill It sets and communicates, pretreatment and speech recognition are carried out to received real-time voice segment or audio signal, export text results.From Right language processing apparatus is connected to speech recognition equipment, and natural language processing device carries out natural language processing to text results, It tests to communication process.

In one embodiment, real-time voice processing unit includes: voice stream reception device and voice flow branching device.Language Sound flow receiving device is communicated with phone by udp protocol, and the voice flow of phone output is received.Voice flow branching device and voice flow Reception device is communicated by udp protocol, is segmented to voice flow, and sequential real-time voice segment is formed, and voice flow shunts Device exports real-time voice segment in order.

In one embodiment, phone accesses telecommunication carrier networks by Public Switched Telephone Network (PSTN), is talking about After machine, PSTN and telecommunication carrier networks establish call and generate voice, pass through UDP between phone and real-time voice processing unit Agreement, which is established to communicate, simultaneously transmits voice flow, phone, PSTN and telecommunication carrier networks end of conversation and hang up, phone and in real time Communication between voice processing apparatus also terminates.

In one embodiment, it is communicated between speech recognition equipment and real-time voice processing unit with Real-time Transport Protocol foundation, Real-time Transport Protocol executes the stage, is communicated between speech recognition equipment and real-time voice processing unit with udp protocol, real-time voice processing Device transmits real-time voice segment to speech recognition equipment with udp protocol.

In one embodiment, offline voice processing apparatus includes: recording device and audio format conversion equipment.Recording dress The voice flow for setting dialogue machine output is recorded.Audio format conversion equipment is connected to recording device, audio format conversion equipment Audio format conversion is carried out to the voice flow that recording device is recorded, is exported to voice by the audio signal that audio format is converted Identification device.

In one embodiment, speech recognition equipment includes pretreatment unit, and pretreatment unit is to the real-time language received Tablet section or audio signal are pre-processed, and pretreatment unit includes: end point detecting device, denoising device and feature extraction dress It sets.End point detecting device detects the starting endpoint and end caps of real-time voice segment or audio signal.Denoising device is to real-time Sound bite or audio signal carry out noise reduction process.Feature deriving means extract voice from real-time voice segment or audio signal Feature.

In one embodiment, speech recognition equipment includes identification and conversion equipment, and identification and conversion equipment are to pretreatment The real-time voice segment or audio signal of device output carry out voice semantics recognition, and exporting has semantic text results, described Identification and conversion equipment include: training device, identification device and document output appts.Training device be connected to speech database and Language database, training device is using the semantic data in the voice data and language database in speech database, according to leading The training modeling parameters entered are trained, and training device exports acoustic model and language model.Identification device is by real-time voice piece Section or audio signal import acoustic model and language model, according to extracted phonetic feature, in conjunction with acoustic model and language mould Type carries out speech recognition.Document output appts are converted to the result that identification device identifies with semantic text results and defeated Out.

In one embodiment, training modeling parameters include: that voice and semantics model, signal processing model, data are dug Dig model, statistical model.

An embodiment according to the present invention proposes a kind of call method of inspection, comprising:

Call establishment step passes through phone and establishes call, exports voice flow；

Real-time obtaining step receives the voice flow of phone output, is shunted in real time to voice flow, generate sequential reality When sound bite and in order export real-time voice segment, wherein phone transmits voice flow by udp protocol；

The voice flow of offline obtaining step, dialogue machine output is recorded, and is carried out audio format to the voice flow of recording and is turned It changes, exports the audio signal converted through format；

Speech recognition steps carry out pretreatment and speech recognition to received real-time voice segment or audio signal, defeated Text results out；

Natural language processing step carries out natural language processing to text results, tests to communication process.

In one embodiment, real-time obtaining step includes:

Voice stream reception step, the voice flow exported by udp protocol communications reception phone；

Voice flow separating step is segmented voice flow using udp protocol communications voice flow, is formed with sequence Real-time voice segment and in order export real-time voice segment.

In one embodiment, phone accesses telecommunication carrier networks by Public Switched Telephone Network (PSTN), is talking about After machine, PSTN and telecommunication carrier networks establish call and generate voice, is communicated by udp protocol foundation with phone and transmit language Sound stream, phone, PSTN and telecommunication carrier networks end of conversation and hang up, the communication between phone also terminates.

In one embodiment, speech recognition steps are executed by background computer, and background computer is established by Real-time Transport Protocol Communication executes the stage in Real-time Transport Protocol, transmits real-time voice segment with udp protocol background computer.

In one embodiment, offline obtaining step includes:

The voice flow of recording step, dialogue machine output is recorded；

Audio format switch process carries out audio format conversion to the voice flow of recording, and output is converted by audio format Audio signal.

In one embodiment, speech recognition steps include pre-treatment step, and pre-treatment step is to the real-time language received Tablet section or audio signal are pre-processed, and pretreatment includes:

End-point detection step detects the starting endpoint and end caps of real-time voice segment or audio signal；

Noise reduction step carries out noise reduction process to real-time voice segment or audio signal；

Characteristic extraction step extracts phonetic feature from real-time voice segment or audio signal.

In one embodiment, speech recognition steps include identification and switch process, and identification and switch process are to pretreatment The real-time voice segment or audio signal of output carry out voice semantics recognition, and exporting has semantic text results, identify and turn Changing step includes:

Training step, using the semantic data in the voice data and language database in speech database, according to importing Training modeling parameters be trained, export acoustic model and language model；

Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted by identification step Phonetic feature carries out speech recognition in conjunction with acoustic model and language model；

Text output step is converted to the result that identification step identifies with semantic text results and output.

Equipment is examined in call of the invention and the method for inspection of conversing can carry out inexpensive transformation on existing phone, It makes full use of existing infrastructure to realize offline to examine with the speech quality of real-time both of which, be able to satisfy service calls field Examination requirements, effectively promoted service calls service quality and user experience.

Detailed description of the invention

The above and other feature of the present invention, property and advantage will pass through description with reference to the accompanying drawings and examples And become apparent, identical appended drawing reference always shows identical feature in the accompanying drawings, in which:

The structural block diagram of equipment is examined in the call that Fig. 1 discloses an embodiment according to the present invention.

Fig. 2 discloses the flow chart of the call method of inspection of an embodiment according to the present invention.

Fig. 3 a and Fig. 3 b disclose the progress of work that the present invention carries out real time phone call inspection.

Specific embodiment

Refering to what is shown in Fig. 1, the structural block diagram of equipment is examined in the call that Fig. 1 discloses an embodiment according to the present invention.It should It includes: phone 102, real-time voice processing unit 104, offline voice processing apparatus 106, speech recognition equipment that equipment is examined in call 108 and natural language processing device 110.

Phone 102 is conversed, and voice flow is exported.In an embodiment of the present invention, phone 102 uses mandarin Machine does not need to be changed to dedicated telephone set.

Real-time voice processing unit 104 is connected to phone 102, and real-time voice processing unit 104 receives what phone 102 exported Voice flow shunts voice flow in real time, generates sequential real-time voice segment.Real-time voice processing unit 104 is by suitable Sequence exports real-time voice segment, and wherein real-time voice processing unit 104 and phone 102 transmit voice flow by udp protocol.Scheming In embodiment shown in 1, real-time voice processing unit 104 includes: voice stream reception device 142 and voice flow branching device 144. Voice stream reception device 142 is communicated with phone by udp protocol, and voice stream reception device 142 receives the voice that phone 102 exports Stream.Voice flow branching device 144 is communicated with voice stream reception device 142 by udp protocol.Voice flow branching device 144 is to language Sound stream is segmented, and sequential real-time voice segment is formed, and voice flow branching device 144 exports real-time voice segment in order To speech recognition equipment 108.

Offline voice processing apparatus 106 is connected to phone 102, the language of offline 106 dialogue machine 102 of voice processing apparatus output Sound stream is recorded, and carries out audio format conversion to the voice flow of recording.The offline output of voice processing apparatus 106 is converted through format Audio signal.In the embodiment shown in fig. 1, offline voice processing apparatus 106 includes: recording device 162 and audio format Conversion equipment 164.The voice flow of 162 dialogue machine 102 of recording device output is recorded.Audio format conversion equipment 164 connects To recording device 162.Audio format conversion equipment 164 carries out audio format conversion, warp to the voice flow that recording device 162 is recorded The audio signal for crossing audio format conversion is exported to speech recognition equipment.In one embodiment, in audio format converting means When setting 164 pairs of voice flows progress audio format conversions, voice flow can be split and time synchronization is added in voice flow Signal, so that each section of voice flow can be corresponding with the air time of corresponding service calls, in order to carry out subsequent inspection Test work.

In one embodiment, real-time voice processing unit 104 and offline voice processing apparatus 106 can integrate at one In entity device.The entity device can be the general-purpose chip that hardware circuit is also possible to runs software.Real-time voice processing dress Set 104 and offline voice processing apparatus 106 can respectively by can divided hardware circuit realize, can also be by general-purpose chip point Do not run different software to realize.The entity for being integrated with real-time voice processing unit 104 and offline voice processing apparatus 106 is set It is standby by wired or be wirelessly connected to phone 102.In one embodiment, connection is to pass through net in a wired manner Line, and wirelessly connecting is to pass through WiFi.

Speech recognition equipment 108 is communicated with real-time voice processing unit 104 and offline voice processing apparatus 106.Language Sound identification device 108 is to from the received real-time voice segment of real-time voice processing unit 104 or from offline voice processing apparatus 106 received audio signals carry out pretreatment and speech recognition, then export text results.In one embodiment, voice is known Other device 108 is communicated with real-time voice processing unit 104 and offline voice processing apparatus 106 in wired or wireless manner Letter.In one embodiment, connection is by cable in a wired manner, and wirelessly connecting is to pass through WiFi.? In embodiment shown in FIG. 1, speech recognition equipment 108 includes pretreatment unit 181 and identification and conversion equipment 182.Pretreatment Device 181 pre-processes the real-time voice segment or audio signal that receive, in the illustrated embodiment, pretreatment unit 181 include: end point detecting device 183, denoising device 184 and feature deriving means 185.End point detecting device 183 detects in real time The starting endpoint and end caps of sound bite or audio signal.Denoising device 184 to real-time voice segment or audio signal into Row noise reduction process.Feature deriving means 185 extract phonetic feature from real-time voice segment or audio signal.Identification and converting means The real-time voice segment or audio signal progress voice semantics recognition that 182 pairs of pretreatment units 181 export are set, output has semantic Text results.In the illustrated embodiment, identification and conversion equipment 182 include: training device 186, identification device 187 and text This output device 188.Training device 186 is connected to speech database 191 and language database 192.Training device 186 uses language The semantic data in voice data and language database 192 in sound database 191 is carried out according to the training modeling parameters of importing Training.Training device 186 exports acoustic model 197 and language model 198.In the illustrated embodiment, it imported into training device Training modeling parameters in 186 include: voice and semantics model 193, signal processing model 194, data mining model 195, Statistical model 196.Identification device 187, will be real using the acoustic model 197 and language model 198 that are exported by training device 186 When sound bite or audio signal import acoustic model 197 and language model 198, then according to extracted phonetic feature, knot It closes acoustic model and language model carries out speech recognition, and export the result of speech recognition.Document output appts 188 are connected to knowledge Other device 187, document output appts 188 are converted to the result that identification device 187 identifies with semantic text results and defeated Out.Speech recognition equipment 108 is realized using existing speech recognition technology on the whole, pretreatment therein and speech recognition skill Art can be realized by purchase acquisition technology, therefore do not do deep discussion herein using existing.

Natural language processing device 110 is connected to speech recognition equipment 108.Natural language processing device 110 is to text knot Fruit carries out natural language processing, parses the semanteme of text results.In the language for obtaining text results by natural language processing device 110 After justice, it can be tested based on the text results with semanteme to communication process.For text results, it can use and search The technological means such as rope, matching carry out quality according to the keyword relevant to speech quality, keyword or key sentence of setting It checks.Text based examines speed to be significantly higher than speech pattern, and accuracy rate is high, it is not easy to generate omission.Utilize text The high speed and high accuracy of inspection, sampling observation rate can be obviously improved in off-line check, cover more speech samples.Meanwhile Fast and accurately detectability also can satisfy the requirement of real-time inspection.Also need explanation, natural language processing device 110 can be realized by buying existing NLP technology, therefore using existing natural language processing (NLP) technology on the whole Deep discussion is not done to NLP technology herein.

In one embodiment, speech recognition equipment 108 and natural language processing device 110 can be set in background service On device.Speech recognition equipment 108 and natural language processing device 110 can be the dedicated hardware disposed on background server Circuit can also be realized by the general-purpose chip runs software of background server.It is deployed with speech recognition equipment 108 and natural language The background server of speech processing unit 110 passes through wired or be wirelessly connected to phone 102.In one embodiment, Connection is by cable in a wired manner, and wirelessly connecting is to pass through WiFi.

An important use of the invention is to be to realize that call is examined in real time.Fig. 3 a and Fig. 3 b disclose the present invention into The progress of work that row real time phone call is examined.Referring initially to shown in Fig. 3 a, phone accesses electricity by common exchanging telephone network PSTN Believe carrier network FXO.Establish call between PSTN and FXO, generate voice between PSTN and FXO, at the same PSTN and phone it Between by udp protocol establish communicate and transmit voice flow.It is established in phone, PSTN and telecommunication carrier networks FXO and converses and produce After raw voice, is communicated between phone and real-time voice processing unit by udp protocol foundation and transmit voice flow.End of conversation Afterwards, it the end of conversation of PSTN and telecommunication carrier networks TXO and hangs up, the communication between PSTN and phone is also hung up, at this moment, Communication between phone and real-time voice processing unit also terminates.Fig. 3 a is from the angle for establishing call between each equipment Establish the process for communicating and transmitting voice flow.With reference next to shown in Fig. 3 b, Fig. 3 b is the communication from phone and background server Angle observation establish and communicate and transmit the process of voice flow.It is deployed with speech recognition equipment and (generally also while being deployed with nature Language processing apparatus) background server and real-time voice processing unit between with Real-time Transport Protocol foundation communicate.Establish the mistake of communication Journey includes: to invite (INVITE), attempt (100Tring), dialing (180Ring), wait (2000K), confirmation (ACK), is then opened Beginning Real-time Transport Protocol communicates (RTP).Between background server and phone at Real-time Transport Protocol communication period, phone and real-time voice It is established between reason device by udp protocol and communicates and transmit voice flow.Meanwhile speech recognition equipment and real-time voice processing unit Between also communicated with udp protocol, real-time voice processing unit with udp protocol to speech recognition equipment transmit real-time voice segment. During maintaining udp protocol between phone and real-time voice processing unit and transmitting voice flow, between background server and phone Real-time Transport Protocol continuously carry out to maintain, sign off between phone and real-time voice processing unit, until phone and reality When voice processing apparatus between communication can be with reference to shown in Fig. 3 a.Pass through UDP between phone and real-time voice processing unit After the sign off of agreement, the Real-time Transport Protocol between background server and phone also terminates.By terminating (BYE) and waiting After (2000K) process, entire communication process terminates.

Fig. 2 discloses the flow chart of the call method of inspection of an embodiment according to the present invention.The present invention also proposes one kind The call method of inspection.The call method of inspection is the course of work that equipment is examined in call shown in FIG. 1.Refering to what is shown in Fig. 2, should The call method of inspection comprises the following steps that

202. call establishment steps.It in call establishment step, is established and is conversed by phone, export voice flow.

204. real-time obtaining steps.In real-time obtaining step, the voice flow of phone output is received, voice flow is carried out real When shunt, generate sequential real-time voice segment and in order export real-time voice segment, wherein phone by udp protocol biography Defeated voice flow.In one embodiment, real-time obtaining step includes:

Voice stream reception step, the voice flow exported by udp protocol communications reception phone.

The implementation procedure of real-time obtaining step is corresponding with the course of work of real-time voice processing unit above-mentioned.

206. offline obtaining steps.In offline obtaining step, the voice flow of dialogue machine output is recorded, to recording Voice flow carries out audio format conversion, exports the audio signal converted through format.In one embodiment, offline obtaining step packet It includes:

The voice flow of recording step, dialogue machine output is recorded.

The implementation procedure of offline obtaining step is corresponding with the course of work of offline voice processing apparatus above-mentioned.

208. speech recognition steps.In speech recognition steps, to received real-time voice segment or audio signal into Row pretreatment and speech recognition, export text results.In one embodiment, speech recognition steps include pre-treatment step and knowledge Other and switch process.

Pre-treatment step pre-processes the real-time voice segment or audio signal that receive, and pretreatment includes:

End-point detection step detects the starting endpoint and end caps of real-time voice segment or audio signal.

Noise reduction step carries out noise reduction process to real-time voice segment or audio signal.

Identification and switch process carry out voice semantics recognition to the real-time voice segment or audio signal of pretreatment output, defeated Semantic text results are provided, identification and switch process include:

Training step, using the semantic data in the voice data and language database in speech database, according to importing Training modeling parameters be trained, export acoustic model and language model.In one embodiment, training modeling parameters packet It includes: voice and semantics model, signal processing model, data mining model, statistical model.

Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted by identification step Phonetic feature carries out speech recognition in conjunction with acoustic model and language model.

The implementation procedure of speech recognition steps is corresponding with the course of work of speech recognition equipment above-mentioned.

210. natural language processing steps.In natural language processing step, natural language processing is carried out to text results, It tests to communication process.The implementation procedure of natural language processing step is worked with natural language processing device above-mentioned Journey is corresponding.

One important use of the call method of inspection is also to be to realize that call is examined in real time, with reference also to Fig. 3 a and Shown in Fig. 3 b, Fig. 3 a is to establish the process for communicating and transmitting voice flow between each equipment from the angle for establishing call: words Machine accesses telecommunication carrier networks FXO by common exchanging telephone network PSTN, in phone, PSTN and telecommunication carrier networks After FXO establishes call and generates voice, voice flow, phone, PSTN and telecommunications are communicated and transmitted with phone by udp protocol foundation The end of conversation of carrier network is simultaneously hung up, and the communication between phone also terminates.Fig. 3 b is from phone and background server The process for communicating and transmitting voice flow is established in the angle observation of communication: speech recognition steps are executed by background computer, and backstage is counted Calculation machine is established by Real-time Transport Protocol and is communicated, and is executed the stage in Real-time Transport Protocol, is transmitted real-time voice piece with udp protocol background computer Section.

The present invention only needs to carry out transformation appropriate to existing phone, increases a small amount of equipment, phone is passed through Voice is streamed to background server by WiFi or cable.The present invention does not have to carry out the extensive or complicated system reform, right In the small-scale intelligent solution using offer low cost.

Above-described embodiment, which is available to, to be familiar with person in the art to realize or use the present invention, and is familiar with this field Personnel can make various modifications or variation, thus this to above-described embodiment without departing from the present invention in the case of the inventive idea The protection scope of invention is not limited by above-described embodiment, and should meet inventive features that claims are mentioned most On a large scale.

Claims

1. equipment is examined in a kind of call characterized by comprising

Phone, phone are conversed, and voice flow is exported；

Real-time voice processing unit is connected to phone, and real-time voice processing unit receives the voice flow of phone output, to voice flow It being shunted in real time, generates sequential real-time voice segment, real-time voice processing unit exports real-time voice segment in order, Wherein real-time voice processing unit transmits voice flow by udp protocol with phone；

Offline voice processing apparatus is connected to phone, and the voice flow of offline voice processing apparatus dialogue machine output is recorded, right The voice flow of recording carries out audio format conversion, and offline voice processing apparatus exports the audio signal converted through format；

Speech recognition equipment is communicated with real-time voice processing unit and offline voice processing apparatus, to received real-time language Tablet section or audio signal carry out pretreatment and speech recognition, export text results；

Natural language processing device, is connected to speech recognition equipment, and natural language processing device carries out nature language to text results Speech processing, tests to communication process.

2. equipment is examined in call as described in claim 1, which is characterized in that the real-time voice processing unit includes:

Voice stream reception device is communicated with phone by udp protocol, and the voice flow of phone output is received；

Voice flow branching device is communicated by udp protocol with voice stream reception device, is segmented, is formed with suitable to voice flow The real-time voice segment of sequence, voice flow branching device export real-time voice segment in order.

3. equipment is examined in call as described in claim 1, which is characterized in that the phone passes through Public Switched Telephone Network (PSTN) telecommunication carrier networks are accessed, after phone, PSTN and telecommunication carrier networks are established and converse and generate voice, phone Between real-time voice processing unit voice flow, phone, PSTN and telecom operators' net are communicated and transmitted by udp protocol foundation The end of conversation of network is simultaneously hung up, and the communication between phone and real-time voice processing unit also terminates.

4. equipment is examined in call as described in claim 1, which is characterized in that the speech recognition equipment and real-time voice are handled Established and communicated with Real-time Transport Protocol between device, execute the stage in Real-time Transport Protocol, speech recognition equipment and real-time voice processing unit it Between communicated with udp protocol, real-time voice processing unit with udp protocol to speech recognition equipment transmit real-time voice segment.

5. equipment is examined in call as described in claim 1, which is characterized in that the offline voice processing apparatus includes:

The voice flow of recording device, the output of recording device dialogue machine is recorded；

Audio format conversion equipment, is connected to recording device, and the voice that audio format conversion equipment records recording device flows into The conversion of row audio format is exported to speech recognition equipment by the audio signal that audio format is converted.

6. equipment is examined in call as described in claim 1, which is characterized in that the speech recognition equipment includes pretreatment dress It sets, pretreatment unit pre-processes the real-time voice segment or audio signal that receive, and the pretreatment unit includes:

End point detecting device detects the starting endpoint and end caps of real-time voice segment or audio signal；

Denoising device carries out noise reduction process to real-time voice segment or audio signal；

Feature deriving means extract phonetic feature from real-time voice segment or audio signal.

7. equipment is examined in call as claimed in claim 6, which is characterized in that the speech recognition equipment includes identification and conversion The real-time voice segment or audio signal progress voice semantics recognition that device, identification and conversion equipment exports pretreatment unit, Exporting has semantic text results, and the identification and conversion equipment include:

Training device, training device are connected to speech database and language database, and training device uses in speech database Semantic data in voice data and language database is trained according to the training modeling parameters of importing, training device output Acoustic model and language model；

Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted voice by identification device Feature carries out speech recognition in conjunction with acoustic model and language model；

Document output appts are converted to the result that identification device identifies with semantic text results and output.

8. equipment is examined in call as claimed in claim 7, which is characterized in that the trained modeling parameters include: voice and language Justice learns model, signal processing model, data mining model, statistical model.

9. a kind of call method of inspection characterized by comprising

Real-time obtaining step receives the voice flow of phone output, is shunted in real time to voice flow, generates sequential real-time language Tablet section simultaneously exports real-time voice segment in order, and wherein phone transmits voice flow by udp protocol；

The voice flow of offline obtaining step, dialogue machine output is recorded, and carries out audio format conversion to the voice flow of recording, defeated The audio signal converted out through format；

Speech recognition steps carry out pretreatment and speech recognition, output text to received real-time voice segment or audio signal This result；

10. the call method of inspection as claimed in claim 9, which is characterized in that the real-time obtaining step includes:

Voice flow separating step is segmented voice flow using udp protocol communications voice flow, forms sequential reality When sound bite and in order export real-time voice segment.

11. the call method of inspection as claimed in claim 9, which is characterized in that the phone passes through Public Switched Telephone Network (PSTN) telecommunication carrier networks are accessed, after phone, PSTN and telecommunication carrier networks are established and converse and generate voice, are passed through Udp protocol foundation communicates with phone and transmits voice flow, phone, PSTN and telecommunication carrier networks end of conversation and hang up, Communication between phone also terminates.

12. the call method of inspection as claimed in claim 9, which is characterized in that speech recognition steps are executed by background computer, Background computer is established by Real-time Transport Protocol and is communicated, and is executed the stage in Real-time Transport Protocol, is transmitted real-time language with udp protocol background computer Tablet section.

13. the call method of inspection as claimed in claim 9, which is characterized in that the offline obtaining step includes:

The voice flow of recording step, dialogue machine output is recorded；

Audio format switch process carries out audio format conversion, sound of the output by audio format conversion to the voice flow of recording Frequency signal.

14. the call method of inspection as claimed in claim 9, which is characterized in that the speech recognition steps include pretreatment step Suddenly, pre-treatment step pre-processes the real-time voice segment or audio signal that receive, and the pretreatment includes:

15. the call method of inspection as claimed in claim 14, which is characterized in that the speech recognition steps include identification and turn It changes step, identification and switch process and voice semantics recognition is carried out to the real-time voice segment or audio signal of pretreatment output, it is defeated Semantic text results are provided, the identification and switch process include:

Training step, using the semantic data in the voice data and language database in speech database, according to the instruction of importing Practice modeling parameters to be trained, exports acoustic model and language model；

Real-time voice segment or audio signal are imported acoustic model and language model, according to extracted voice by identification step Feature carries out speech recognition in conjunction with acoustic model and language model；

16. as claimed in claim 14 call the method for inspection, which is characterized in that the trained modeling parameters include: voice and Semantics model, signal processing model, data mining model, statistical model.