WO2009138012A1 - 声音处理的方法、设备及系统 - Google Patents

声音处理的方法、设备及系统 Download PDF

Info

Publication number
WO2009138012A1
WO2009138012A1 PCT/CN2009/071603 CN2009071603W WO2009138012A1 WO 2009138012 A1 WO2009138012 A1 WO 2009138012A1 CN 2009071603 W CN2009071603 W CN 2009071603W WO 2009138012 A1 WO2009138012 A1 WO 2009138012A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voiceprint
channel
site
voice
Prior art date
Application number
PCT/CN2009/071603
Other languages
English (en)
French (fr)
Inventor
邓庆锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2009138012A1 publication Critical patent/WO2009138012A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends

Definitions

  • the present invention relates to the field of communications, and in particular, to a method, device and system for sound processing.
  • call centers In order to adapt to the individualized demand for communication services, call centers have been introduced in many industries, such as banks, securities, etc., through the call center to connect users with manual agents or automatic services to achieve voice-based services.
  • voiceprint collection can be performed.
  • the prior art method for performing voiceprint collection is mainly to perform voiceprint collection through a proprietary device, and then use the proprietary device to perform voiceprint recognition.
  • Embodiments of the present invention provide a method, device, and system for sound processing, which ensure normal operation of a call service during sound processing.
  • the embodiment of the invention provides a method for sound processing, which is used for sound processing during a user and a seat call, and the method includes:
  • the embodiment of the present invention further provides a sound processing device, which is used for sound processing during a user and a seat call, and the device includes:
  • the site unit is used to create a voice conference site; a contigating unit, configured to add the user and the agent to the voice site created by the site unit, where the voice site is used to connect the channel of the user and the channel of the agent;
  • a recording unit configured to record a channel of the user in a voice conference site created by the site unit, to obtain a channel recording file of the user.
  • the embodiment of the present invention further provides a voiceprint collection and recognition system, which is used for sound processing during a user and a seat call, and the system includes:
  • the site unit is used to create a voice conference site
  • a splicing unit configured to join the user and the agent to the voice site created by the site unit, where the voice site is used to connect the channel of the user and the channel of the agent;
  • a recording unit configured to record a channel of the user in a voice conference site created by the site unit, to obtain a channel recording file of the user.
  • a voice conference site can be created during the conversation between the user and the agent, and the user and the agent are added to the voice conference site.
  • the user and the agent can also interact through the voice site, so that the normal call service is not interrupted.
  • the call between the user and the agent is interrupted due to the need of sound processing, which improves the user experience and service quality.
  • FIG. 1 is a schematic flow chart of a method for sound processing according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an apparatus for sound processing according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a call center system according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a voiceprint collection process according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a voiceprint recognition process according to an embodiment of the present invention.
  • the sound processing method and device provided by the embodiments of the present invention are applied to voice processing during a user and a seat call, and the call between the user and the agent can be established through various networks, including but not limited to a fixed network, a mobile global mobile. Communication system (GSM) network, mobile code division multiple access (CDMA) network, mobile third generation mobile communication technology (3G) network, personal wireless access system (PHS) and next generation (NGN) network, of which 3G network These include, but are not limited to, Wideband Code Division Multiple Access (WCDMA) networks, CDMA-2000 networks, and Time Division Synchronous CDMA (TSS-CDMA) networks.
  • GSM Global System
  • CDMA mobile code division multiple access
  • 3G mobile third generation mobile communication technology
  • PHS personal wireless access system
  • NTN next generation
  • 3G network include, but are not limited to, Wideband Code Division Multiple Access (WCDMA) networks, CDMA-2000 networks, and Time Division Synchronous CDMA (TSS-CDMA) networks.
  • WCDMA Wideband
  • a sound processing method is used for sound processing in a user and a seat call process according to an embodiment of the present invention.
  • the method includes the following steps:
  • the method may further include:
  • the voiceprint collection process can include:
  • a voiceprint identifier (ID) corresponding to the user's voiceprint information is generated.
  • the voiceprint information of the user is extracted from the recording file, wherein the voiceprint information may be acoustic feature information carrying the user's speech information, including but not limited to the pitch contour, the formant frequency bandwidth and the trajectory, Spectral envelope parameters, auditory characteristic parameters, and linear prediction coefficients.
  • a predetermined number of voiceprint information can be recorded as a voiceprint file. This predetermined number can be specified according to the demand amount before the sound processing, or can be randomly specified during the sound processing.
  • the voiceprint ID of the first user who performs sound processing is one, and the second user who performs sound processing
  • the voiceprint ID is two, and so on;
  • the voiceprint information may also be generated according to an algorithm, and the corresponding voiceprint ID is generated, and the type of the algorithm is not limited, and may be a linear algorithm or a nonlinear algorithm.
  • the voiceprint ID of the user is generated, the correspondence between the user's voiceprint ID and the voiceprint file can also be recorded.
  • the method may further include:
  • the user needs to perform a specific operation to perform voiceprint recognition on the user, and the specific operation here can be specified in advance according to the importance level of the operation.
  • Voiceprint recognition for users can include: 28
  • the voiceprint recognition It is judged whether the voiceprint has been collected before the voiceprint recognition. If the voiceprint has been collected before the voiceprint recognition, it may include but is not limited to the user who has collected the voiceprint before the current call and the user during the current call. If you have collected the voiceprint, continue to judge whether the voiceprint recognition needs to perform the voiceprint collection. If the voiceprint recognition does not need to perform the voiceprint collection again, execute 105 directly after 103, without executing 104; if this voiceprint If it is necessary to perform the voiceprint collection again, then 104 is performed first after 103, and then 105 is performed; if the voiceprint is not collected before the voiceprint recognition, 104 is performed first after 103, and then 105 is executed.
  • the specific process of voiceprint recognition can include:
  • the corresponding relationship recorded in the process may also be the corresponding relationship recorded during the call;
  • the user identity is determined to be successful according to the comparison result. For example, if the first voiceprint information and the second voiceprint information are consistent, the user identity is determined to be successful. If the first voiceprint information and the second voiceprint information are inconsistent, the user identity determination fails. Subsequent operations are performed based on the recognition result. If the recognition is successful, the user is allowed to perform the specific operation, and if the recognition fails, the user is prohibited from performing the specific operation.
  • the method may further include:
  • the release of the voice site can include:
  • the voiceprint is collected for the user; if the user has collected the voiceprint, the user may perform voiceprint collection again, or may not need to perform voiceprinting on the user again.
  • a field may be included in the user information to indicate whether the user has collected the voiceprint. If the field is one, the user collects the voiceprint, and if it is zero, the user does not collect the voiceprint.
  • a voice conference site is created during the conversation between the user and the agent, and the user and the agent are added to the voice conference site, so that the user's channel is recorded in the voice conference site. Users and agents can also interact with each other through the voice site to maintain normal call services. This avoids the interruption of calls between users and agents caused by the voice processing process in the prior art, which improves the user service experience and service quality.
  • the device for sound processing can be independently set or integrated in a call center system, and can also be integrated into a voiceprint collection system to implement user and agent calls. Sound processing in the process, the device includes:
  • the site unit 201 is configured to create a voice conference site
  • the splicing unit 202 is configured to join the user and the agent to the voice site created by the site unit 201, where the voice site is used to connect the channel of the user and the channel of the agent;
  • the recording unit 203 is configured to record the channel of the user in the voice conference site created by the site unit 201, and obtain a channel recording file of the user.
  • the device may further include:
  • the collecting unit 204 is configured to perform a sound collection on the user according to the channel recording file of the user obtained by the recording unit 203.
  • the collection unit 204 can include:
  • the extracting subunit 204-1 is configured to extract the voiceprint information of the user from the channel recording file of the user obtained by the recording unit 203;
  • the identifier subunit 204-2 is configured to generate a sound ID corresponding to the voiceprint information of the user extracted by the extraction subunit 204-1.
  • the device may further include:
  • the identification unit 205 is configured to perform voiceprint recognition on the user according to the channel recording file of the user obtained by the recording unit 203.
  • the identification unit 205 can include:
  • a receiving subunit 205-1 configured to receive a voiceprint recognition operation request
  • the identifier subunit 205-2 is configured to obtain a voiceprint ID of the user from the voiceprint recognition operation request received by the receiving subunit 205-1;
  • the locating unit 205-3 is configured to search for a corresponding relationship between the voiceprint ID of the recorded voice and the voiceprint file according to the voiceprint ID obtained by the identifier subunit 205-2, and obtain voiceprint information corresponding to the user in the voiceprint file (ie, The corresponding relationship may be the corresponding relationship recorded by the user during the previous call, or may be the corresponding relationship recorded during the current call;
  • the extracting subunit 205-4 is configured to extract the voiceprint information (ie, the second voiceprint information) of the user from the channel recording file of the user obtained by the recording unit 203;
  • the comparison subunit 205-5 is configured to compare the first voiceprint information and the second voiceprint information.
  • the device may further include:
  • the release unit 206 is configured to release the voice site created by the site unit 201.
  • the release unit 206 can include:
  • the sub-unit 206-1 is configured to remove the user and the agent from the voice site created by the site unit 201.
  • the splicing sub-unit 206-2 is used to connect the voice slots of the user and the agent.
  • the resource sub-unit 206-3 is configured to release the resources of the voice site created by the site unit 201.
  • a voice site is created by the site unit during the conversation between the user and the agent, and the user and the agent are added to the voice site through the connection unit, so that the voice is recorded in the recording unit. While recording the channel of the user in the site, the user and the agent can also interact through the voice site to maintain normal call service. In this way, while the user's channel is recorded in the voice site, the user and the agent can also interact through the voice site, so that the normal call service is not interrupted. In the prior art, the call between the user and the agent is interrupted due to the need of sound processing, which improves the user service experience and service quality. Referring to FIG.
  • FIG. 3 it is a schematic structural diagram of a call center (CC: Call Center) system according to an embodiment of the present invention.
  • CC Call Center
  • This embodiment is a specific application of the foregoing embodiment.
  • the specific application is only one application mode of the foregoing embodiment, and those skilled in the art can make some improvements and refinements to the specific application without departing from the principle of the embodiment of the present invention. And retouching should also be regarded as the scope of protection of the embodiments of the present invention.
  • the switching system connects the call to the call center (CC) system, enters the corresponding manual or automatic service, and executes the corresponding service in the CC platform.
  • CC call center
  • the method for sound processing in the above embodiments may be implemented by the CC system, or the method for sound processing in the above embodiment may be implemented by separately setting the device for sound processing, or the sound processing in the above embodiment may be implemented by the voiceprint collection system. The method is the same as the principle of the implementation. In the embodiment, the method for implementing the sound processing in the above embodiment by the voiceprint set recognition system is taken as an example for description.
  • the CC system includes:
  • the operation system 301 realizes an access call of the entire CC system, can analyze a called number called by the user (including the terminal users A, B), trigger a specific value-added service, and perform call control and connection operation of the value-added service, when adding value
  • the service needs to collect and identify the voiceprint information
  • the user side voice data is connected to the voiceprint collection recognition system, and the voice tone collection and voiceprint recognition commands are sent to the voiceprint collection system.
  • the voiceprint collection recognition system 302 receives the command of the operation system 301, and performs voiceprinting and voiceprint recognition of the user.
  • Value-added service system 303 Value-added services include automatic services and manual services. Therefore, value-added service systems mainly include manual business systems and automatic business systems, which are mainly responsible for achieving specific value-added services.
  • the voiceprint collection system may include:
  • the site unit is used to create a voice conference site
  • a splicing unit configured to join the user and the agent to the voice site created by the site unit, where the voice site is used to connect the channel of the user and the channel of the agent;
  • the recording unit is configured to record a channel of the user in a voice conference site created by the site unit, and obtain a channel recording file of the user.
  • the voiceprint collection recognition system may further include:
  • the voiceprint collection identification server 302-1 is used for the channel recording file of the user obtained according to the recording unit, and realizes the collection of the user voiceprint information and the recognition operation of the user voiceprint.
  • Voiceprint file server 302-2 used to store the user channel recording file obtained by the recording unit for the first time for subsequent voiceprint recognition, and can also store the user channel recording file obtained by the recording unit except for the first time; The original user channel recording file can be loaded after the system is restarted.
  • Database server 302-3 It is used to store the full path of the user channel recording file obtained by the recording unit and the corresponding voiceprint identifier for the voiceprint collection identification server.
  • the value-added service system determines whether the user enters the system for the first time, and if so, invokes an interface provided by the CC system to perform a voiceprint collection operation. Otherwise, when the user enters a specific service operation, voiceprint recognition is performed on the user, and the specific service operation may be performed. It is determined by the following methods: Each service operation is given different levels, according to the general division standard of people's importance to the business, or according to the division standard of the user's importance to the business, the more important the business operation level is higher, and the level is greater than Business operations with specific values are treated as specific business operations. For example, business operations with a level greater than five are considered as specific business operations, and may include, but are not limited to, password modification and amount removal.
  • the voiceprint collection process may include:
  • the user dials the system access code corresponding to the process.
  • the user calls to connect to the CC system.
  • the call enters the CC system, and the call is connected to the value-added service system after number analysis. 401-4.
  • the value-added service system performs authentication according to the number of the user, and extracts user information.
  • the user information is obtained from the value-added service system according to the user number, and the user information may include a field indicating whether the user has collected the voiceprint, and if the field is one, the user collects the voiceprint. If it is zero, there is no collection. If the user does not collect the voiceprint information, the value-added service system requests the operation system to perform the voiceprint collection operation.
  • the operating system After receiving the voiceprint collection request of the value-added service system, the operating system notifies the voiceprint collection 28
  • the voiceprint collection system first creates a voice conference site. After the voice conference site is successfully created, the original connected user and the agent's channel are temporarily disconnected, and the user and the agent are added to the voice conference site to ensure that the service function continues; The voiceprint collection system adds the user and the agent to the voice conference site, and then instructs the voice conference site to record the conference channel.
  • the recorded channel is the user's channel, so that the voiceprint collection system can record the user's voiceprint separately. After recording the user channel, the voiceprint collection system performs the user's voiceprint collection operation.
  • the voiceprint collection server performs the voiceprint collection operation, and the process of collecting the voiceprint is extracted from the user channel recording file recorded in the front, and when the voiceprint collection server collects enough voiceprint information, the user is
  • the voiceprint information is recorded into a file, recorded in the voiceprint file server, and the voiceprint ID of the user voiceprint is generated according to an internal algorithm, and the correspondence between the user's voiceprint ID and the generated voiceprint file is recorded in the database server.
  • the voiceprint collection system completes the collection of the user voiceprint information
  • the channel recording of the voice conference site started in the previous period is stopped, and then the user and the agent are simultaneously removed from the voice conference site, and their voice channels are re-joined to make the user
  • the agent and the agent can continue the business operation, and then release the voice site, and send the user's voiceprint ID to the operating system.
  • the operating system sends the user's voiceprint ID to the value-added service system according to the result returned by the voiceprint collection identification system, and the value-added service system records the voiceprint ID of the user and is associated with the user.
  • the voiceprint recognition process may include:
  • Trigger value-added service configure an access code corresponding to the specified value-added service system in the CC system, and the current exchange office determines that the call is a call of the CC system, then routes the call to the CC system, and then the CC system controls the call, CC
  • the operating system in the system routes the call to the specified value-added service system according to the information in the configuration, and the value-added service system interacts with the user, which may be the same as the processes of the above 401-1 to 401-4.
  • the value-added service system After controlling the call, the value-added service system obtains the user information from the value-added service system according to the user number, and then determines whether the user performs the voiceprint collection. If the user collects the voiceprint information, the subsequent steps are continued.
  • the value-added service system requests the operation system to perform 28
  • the operating system After receiving the voiceprint recognition operation request of the value-added service system, the operating system forwards the voiceprint identification request to the voiceprint collection identification system.
  • the voiceprint collection system first creates a voice conference site, and adds the user and the agent to the voice conference site to ensure that the user and the agent continue to talk, and then instructs the voice conference site to perform channel recording, and the recorded channel is the user channel.
  • the voiceprint collection identification server searches for the voiceprint file corresponding to the voiceprint ID from the database server according to the requested voiceprint ID information, and then calls the voiceprint file from the voiceprint file server, and the voiceprint information and the recorded channel in the voiceprint file.
  • the user's voiceprint information is extracted from the recorded file for comparison, and the user's identity is successfully identified based on the comparison result.
  • the voiceprint recognition operation system completes the voiceprint recognition operation, the voice recording function of the voice conference site is stopped, and then the user and the agent are moved out of the voice conference site, and the voice time slot of the user and the agent is overlapped, and then the voice conference site is released, and the voiceprint is released.
  • the collection recognition system then transmits the recognition result to the operational system.
  • the operating system forwards the identification result of the voiceprint collection identification system to the value-added service system. 509. After receiving the voiceprint recognition result, the value-added service system performs subsequent operations according to the recognition result.
  • a voice conference site can be created during the conversation between the user and the agent, and the user and the agent are added to the voice conference site. In this way, while the user's channel is recorded in the voice site, the user and the agent can also interact through the voice site, so that the normal call service is not interrupted. In the prior art, the call between the user and the agent is interrupted due to the need of sound processing, which improves the user experience and service quality.

Landscapes

  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Description

声音处理的方法、 设备及系统
本申请要求于 2008 年 05 月 14 日提交中国专利局、 申请号为 200810094737.9、 发明名称为"声音处理的方法、 设备及系统,,的中国专利申请 的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信领域, 特别涉及声音处理的方法、 设备及系统。
背景技术
为了适应对通讯业务的个性化需求,在许多行业中引入了呼叫中心, 例如 银行、 证券等行业, 通过呼叫中心将用户和人工坐席或者自动业务连接起来, 实现语音类业务。
在通过呼叫中心实现业务的过程中, 可以进行声紋釆集。现有技术进行声 紋釆集的方法主要是通过专有的设备进行声紋釆集,然后再使用该专有设备进 行声紋识别。
发明人在实现本发明的过程中发现:在声紋釆集和识别期间只有被釆集人 或者识别人与该专有设备之间进行语音交互,而断开人工坐席或者自动流程与 用户的呼叫连接, 因此在声紋釆集和识别的同时, 用户不能够进行业务操作。 发明内容
本发明实施例提供声音处理的方法、设备及系统, 在声音处理过程中保证 呼叫业务的正常进行。
本发明实施例提供了一种声音处理的方法,用于用户和坐席通话过程中的 声音处理, 该方法包括:
创建语音会场;
将所述用户和坐席加入所述语音会场,所述语音会场用于连接所述用户的 通道和所述坐席的通道;
在所述语音会场中对所述用户的通道录音, 得到所述用户的通道录音文 件。
本发明实施例还提供了一种声音处理的设备,用于用户和坐席通话过程中 的声音处理, 该设备包括:
会场单元, 用于创建语音会场; -2- 接续单元, 用于将所述用户和坐席加入所述会场单元创建的语音会场, 所 述语音会场用于连接所述用户的通道和所述坐席的通道;
录音单元, 用于在所述会场单元创建的语音会场中对所述用户的通道录 音, 得到用户的通道录音文件。
本发明实施例还提供了一种声紋釆集识别系统,用于用户和坐席通话过程 中的声音处理, 该系统包括:
会场单元, 用于创建语音会场;
接续单元, 用于将所述用户和坐席加入所述会场单元创建的语音会场, 所 述语音会场用于连接所述用户的通道和所述坐席的通道;
录音单元, 用于在所述会场单元创建的语音会场中对所述用户的通道录 音, 得到用户的通道录音文件。
釆用本发明实施例提供的各个技术方案, 能够在用户和坐席的通话过程 中, 创建语音会场, 并将用户和坐席加入该语音会场中。 这样, 在该语音会场 中对用户的通道录音的同时, 用户和坐席也可以通过该语音会场进行交互,使 得正常的呼叫业务不受中断。避免现有技术中因为声音处理的需要而使得用户 与坐席之间通话被迫中断, 提升了用户业务体验和服务质量。
附图说明
图 1是本发明实施例的声音处理的方法的流程示意图;
图 2是本发明实施例的声音处理的设备的结构示意图;
图 3是本发明实施例的呼叫中心系统的结构示意图;
图 4是本发明实施例的声紋釆集流程示意图;
图 5是本发明实施例的声紋识别流程示意图。
具体实施方式
本发明实施例提供的声音处理方法及设备,应用于用户和坐席通话过程中 的声音处理, 该用户和坐席之间的通话可以通过各种网络建立, 包括但不局限 于固定网络、 移动全球移动通讯系统(GSM ) 网络、 移动码分多址(CDMA ) 网络、 移动第三代移动通信技术(3G ) 网络, 个人无线接入系统(PHS )以及 下一代( NGN )网络,其中, 3G网络又包括但不限于宽带码分多址( WCDMA ) 网络、 CDMA-2000网络以及时分同步 CDMA ( TDS-CDMA )网络。 坐席包括 28
-3 - 人工坐席和自动坐席。
参见图 1 , 为本发明实施例一种声音处理的方法, 用于用户和坐席通话过 程中的声音处理, 该方法包括步骤:
101、 创建语音会场;
102、 将用户和坐席加入语音会场;
将用户和坐席加入语音会场,可以通过语音会场将用户的通道和坐席的通 道连接起来。
103、 在语音会场中对用户的通道录音, 得到用户的通道录音文件。
进一步的, 该方法还可以包括:
104、 根据用户的通道录音文件, 对用户进行声紋釆集。
声紋釆集过程可以包括:
从用户的通道录音文件中提取用户的声紋信息;
生成与用户的声紋信息对应的声紋标识 ( ID )。
本步骤中, 在用户的通道录音之后, 从录音文件中提取用户的声紋信息, 其中, 声紋信息可以为携带用户言语信息的声学特征信息, 包括但不限于基音 轮廓、 共振峰频率带宽及轨迹、 谱包络参数、 听觉特性参数及线性预测系数。 提取了用户的声紋信息后, 可以将预定数量的声紋信息记录成声紋文件, 这个 预定数量可以在进行声音处理之前根据需求量指定,也可以在声音处理过程中 随机指定。 同时, 还可以生成与用户的声紋信息对应的声紋 ID, 既可以按照 声音处理的顺序生成声紋 ID, 例如, 第一个进行声音处理的用户的声紋 ID为 一, 第二个进行声音处理的用户的声紋 ID为二, 依次类推; 也可以将所述声 紋信息根据算法的计算, 生成对应的声紋 ID, 算法的类型不限, 可以为线性 算法, 也可以为非线性算法。 在生成用户的声紋 ID后, 还可以将用户的声紋 ID与声紋文件的对应关系记录下来。
进一步的, 该方法还可以包括:
105、 根据用户的通道录音文件, 对用户进行声紋识别。
用户需要进行特定的操作, 则对用户进行声紋识别, 此处特定的操作可以 根据操作的重要级别预先指定。
对用户进行声紋识别可以包括: 28
一 4一
判断在本次声紋识别之前是否已经釆集过声紋,如果在本次声紋识别之前 已经釆集过声紋,可以包括但不限于用户本次通话之前曾经釆集过声紋的和用 户本次通话过程中已经釆集过声紋的,则继续判断本次声紋识别是否需要再进 行声紋釆集,如果本次声紋识别无需再进行声紋釆集, 则在 103之后直接执行 105, 无需执行 104; 如果本次声紋识别还需再进行声紋釆集, 则在 103之后 先执行 104, 再执行 105; 如果本次声紋识别之前没有釆集过声紋, 则在 103 之后先执行 104, 再执行 105。
声紋识别具体过程可以包括:
接收声紋识别操作请求;
从声紋识别操作请求中获得用户的声紋标识(ID );
根据用户的声紋 ID, 查找记录的用户的声紋 ID与声紋文件的对应关系, 获得声紋文件中对应于该用户的声紋信息 (即第一声紋信息); 该对应关系可 以是该用户在以前通话过程中记录下来的对应关系,也可以是本次通话过程中 记录下来的对应关系;
从 103中得到的用户的通道录音文件中提取该用户的声紋信息(即第二声 紋信息);
比较第一声紋信息和第二声紋信息。根据比较结果判断用户身份是否识别 成功, 例如, 第一声紋信息和第二声紋信息一致, 则判断用户身份识别成功; 第一声紋信息和第二声紋信息不一致, 则判断用户身份识别失败。根据识别结 果进行后续操作, 如果识别成功, 则允许用户进行该特定的操作, 如果识别失 败, 则禁止用户进行该特定的操作。
进一步的, 为了尽量少地占用会场资源, 提高呼叫系统资源的利用率, 完 善会场资源的管理, 该方法还可以包括:
106、 释放语音会场。
如果用户不需要进行特定的操作, 即在本次声音处理过程中,仅需要对用 户进行声紋釆集, 则声紋釆集后直接释放语音会场; 如果用户需要进行特定的 操作, 声紋识别后释放语音会场。 释放语音会场可以包括:
将用户和坐席移出语音会场;
搭接用户和坐席的语音时隙; 28 释放语音会场的资源。
如果该用户没有釆集过声紋, 则对用户进行声紋釆集; 如果该用户曾经釆 集过声紋, 则可以再次对用户进行声紋釆集, 或者, 也可以无需对用户再进行 声紋釆集。在具体实现过程中, 可以在用户信息中包含了一个字段表示用户是 否曾经釆集了声紋,如果该字段为一则表示用户釆集过声紋,如果为零则表示 没有釆集过。
现有技术中, 在用户和坐席通话过程中, 如果要进行声音处理, 需要断开 用户和坐席之间的呼叫连接, 由专有设备对用户进行声音处理, 整个声音处理 的过程中, 用户不能在声音处理的同时进行业务操作。釆用本实施例提供的声 音处理的方法, 在用户和坐席的通话过程中, 创建语音会场, 并将用户和坐席 加入该语音会场中, 这样, 在该语音会场中对用户的通道录音的同时, 用户和 坐席也可以通过该语音会场进行交互,保持正常的呼叫业务,避免现有技术中 , 因为声音处理过程而造成的用户与坐席之间通话中断,提升了用户业务体验和 服务质量。 参见图 2, 为本发明实施例一种声音处理的设备, 该声音处理的设备可以 独立设置,也可以集成于呼叫中心系统中,还可以集成于声紋釆集识别系统中, 实现用户和坐席通话过程中的声音处理, 该设备包括:
会场单元 201 , 用于创建语音会场;
接续单元 202 , 用于将用户和坐席加入会场单元 201创建的语音会场, 其 中, 语音会场用于连接用户的通道和坐席的通道;
录音单元 203 ,用于在会场单元 201创建的语音会场中对用户的通道录音, 得到用户的通道录音文件。
进一步的, 该设备还可以包括:
釆集单元 204 , 用于根据录音单元 203得到的用户的通道录音文件, 对用 户进行声故釆集。
其中, 釆集单元 204可以包括:
提取子单元 204-1 , 用于从录音单元 203得到的用户的通道录音文件中提 取用户的声紋信息; 标识子单元 204-2, 用于生成与提取子单元 204-1提取的用户的声紋信息 对应的声故 ID。
进一步的, 该设备还可以包括:
识别单元 205 , 用于根据录音单元 203得到的用户的通道录音文件, 对用 户进行声紋识别。
其中, 识别单元 205可以包括:
接收子单元 205-1 , 用于接收声紋识别操作请求;
标识子单元 205-2, 用于从所述接收子单元 205-1接收的声紋识别操作请 求中获得用户的声紋 ID;
查找子单元 205-3 ,用于根据所述标识子单元 205-2获得的声紋 ID查找记 录的用户的声紋 ID与声紋文件的对应关系, 获得声紋文件中对应于该用户的 声紋信息 (即第一声紋信息); 该对应关系可以是该用户在以前通话过程中记 录下来的对应关系, 也可以是本次通话过程中记录下来的对应关系;
提取子单元 205-4, 用于从录音单元 203得到的用户的通道录音文件中提 取该用户的声紋信息 (即第二声紋信息);
比较子单元 205-5 , 用于比较第一声紋信息和第二声紋信息。
进一步的, 该设备还可以包括:
释放单元 206, 用于释放会场单元 201创建的语音会场。
其中, 释放单元 206可以包括:
移出子单元 206-1 ,用于将用户和坐席移出会场单元 201创建的语音会场; 搭接子单元 206-2, 用于搭接用户和坐席的语音时隙;
资源子单元 206-3 , 用于释放会场单元 201创建的语音会场的资源。
釆用本实施例提供的声音处理的设备, 在用户和坐席的通话过程中,通过 会场单元创建语音会场, 并通过接续单元将用户和坐席加入该语音会场中, 这 样,在录音单元对该语音会场中对用户的通道录音的同时, 用户和坐席也可以 通过该语音会场进行交互, 保持正常的呼叫业务。 这样, 在该语音会场中对用 户的通道录音的同时, 用户和坐席也可以通过该语音会场进行交互,使得正常 的呼叫业务不受中断。避免现有技术中因为声音处理的需要而使得用户与坐席 之间通话被迫中断, 提升了用户业务体验和服务质量。 参见图 3 , 为本发明实施例呼叫中心(CC: Call Center )系统结构示意图, 本实施例是对上述实施例的具体应用。该具体应用只是上述实施例的一种应用 方式,对于本技术领域的普通技术人员来说,在不脱离本发明实施例原理的前 提下,还可以对该具体应用作出若干改进和润饰, 这些改进和润饰也应视为本 发明实施例的保护范围。
本实施例中, 用户拨打服务提供商提供的自动业务的系统接入码后, 交换 系统将呼叫接续到呼叫中心(CC ) 系统, 进入相应的人工或自动业务, 在 CC 平台中执行对应的业务。可以由 CC系统实现上述实施例中的声音处理的方法, 也可以独立设置声音处理的设备实现上述实施例中的声音处理的方法,也可以 由声紋釆集识别系统实现上述实施例中的声音处理的方法,这些实现方案的原 理相同,在本实施例中,仅以由声紋釆集识别系统实现上述实施例中的声音处 理的方法为例进行说明。
参见图 3 , CC系统包括:
运营系统 301 : 实现整个 CC系统的接入呼叫, 能够按照用户 (包括终端 用户 A、 B )呼叫的被叫号码分析, 触发特定的增值业务, 并且进行增值业务 的呼叫控制和接续操作, 当增值业务需要釆集和识别声紋信息时,把用户侧的 语音数据接入到声紋釆集识别系统,并向声紋釆集识别系统发送进行用户的声 紋釆集和声紋识别的命令。
声紋釆集识别系统 302: 接收运营系统 301的命令, 进行用户的声紋釆集 和声紋识别。
增值业务系统 303: 增值业务包括自动业务和人工业务, 因此增值业务系 统主要包括人工业务系统和自动业务系统, 主要负责实现特定的增值业务功
•6匕
匕。
其中, 声紋釆集识别系统可以包括:
会场单元, 用于创建语音会场;
接续单元, 用于将所述用户和坐席加入所述会场单元创建的语音会场, 所 述语音会场用于连接所述用户的通道和所述坐席的通道;
录音单元, 用于在所述会场单元创建的语音会场中对所述用户的通道录 音, 得到用户的通道录音文件。 一 8—
声紋釆集识别系统还可以包括:
声紋釆集识别服务器 302-1 : 用于根据录音单元得到的用户的通道录音文 件, 实现用户声紋信息的釆集和用户声紋的识别操作。
声紋文件服务器 302-2:用来存储录音单元首次得到的用户通道录音文件, 以便后续声紋识别时使用,也可以存储录音单元除首次之外得到的用户通道录 音文件; 另外, 在声紋釆集识别系统重启后可以加载原用户通道录音文件。
数据库服务器 302-3 : 用来存储录音单元得到的用户通道录音文件的全路 径和对应的声紋标识, 供声紋釆集识别服务器使用。
本实施例中,增值业务系统判断用户是否是首次进入系统,如果是则调用 CC系统提供的接口进行声紋釆集操作, 否则当用户进入特定业务操作时, 对 用户进行声紋识别,特定业务操作可以通过以下方式判定: 将各业务操作的分 别给予不同的级别,按照人们对于业务重要性的一般划分标准, 或者根据用户 对业务重要性的划分标准,越重要业务操作的级别越高, 而级别大于特定值的 业务操作都视为特定业务操作,例如级别大于五的业务操作都视为特定业务操 作, 可以包括但不限于进行密码修改和金额移出。
参见图 4, 声紋釆集流程可以包括:
401、 在 CC 系统中配置接入码对应指定的增值业务系统, 当前方交换局 判断用户的呼叫为 CC系统的呼叫, 则把呼叫路由到 CC系统, CC系统中的 运营系统配置的接入码,把呼叫路由到指定的增值业务系统, 由增值业务系统 和用户交互, 具体可以包括:
401-1、 用户拨打该流程对应的系统接入码。
401-2、 根据配置的路由策略, 用户呼叫接续到 CC系统。
401-3、 呼叫进入 CC系统, 经过号码分析把该呼叫接续到增值业务系统。 401-4、 增值业务系统根据用户的号码进行鉴权, 提取用户信息。
402、 增值业务系统控制呼叫后, 根据用户号码从增值业务系统中获取用 户信息, 用户信息中可以包含了一个字段表示用户是否曾经釆集了声紋,如果 该字段为一则表示用户釆集过声紋,如果为零则没有釆集过,如果用户没有釆 集过声紋信息, 则增值业务系统请求运营系统进行声紋釆集操作。
403、 运营系统收到增值业务系统的声紋釆集请求后, 通知声紋釆集识别 28
-9- 系统进行声紋釆集。
404、 声紋釆集识别系统首先创建一个语音会场, 创建语音会场成功后再 把原来连接的用户和坐席的通道暂时拆开,再把用户和坐席都加入到语音会场 中,保证业务功能继续;在声紋釆集识别系统把用户和坐席加入到语音会场后, 再指示语音会场进行会场通道录音, 录制的通道为用户的通道, 这样声紋釆集 识别系统就可以单独把用户的声紋录制下来,在开始用户通道的录音后, 声紋 釆集识别系统再进行用户的声紋釆集操作。
405、 声紋釆集识别服务器进行声紋釆集操作, 釆集的过程是从前面录制 的用户通道录音文件中提取用户的声紋信息,当声紋釆集识别服务器釆集到足 够的声紋信息后, 把用户的声紋信息记录成文件, 记录到声紋文件服务器中, 并根据内部算法生成用户声紋的声紋 ID, 同时把用户的声紋 ID和生成的声紋 文件的对应关系记录到数据库服务器中。
406、 声紋釆集识别系统完成了用户声紋信息的釆集后, 停止前期启动的 语音会场的通道录音, 然后把用户和坐席同时移出语音会场, 并且把它们的语 音通道重新搭接起来,使得用户和坐席可以继续进行业务操作, 然后释放语音 会场, 同时把釆集的用户的声紋 ID发送给运营系统。
407、运营系统根据声紋釆集识别系统返回的结果,把用户的声紋 ID发送 给增值业务系统, 增值业务系统记录用户的声紋 ID, 并且和用户关联起来。
408、 用户的声紋釆集工作完成, 坐席和用户继续交互完成业务功能。 参见图 5 , 声紋识别流程可以包括:
501、 触发增值业务, 在 CC系统中配置接入码对应指定的增值业务系统, 当前方交换局判断该呼叫为 CC系统的呼叫, 则把呼叫路由到 CC系统, 后续 由 CC系统控制呼叫, CC系统中的运营系统根据配置中的信息, 把呼叫路由 到指定的增值业务系统, 由增值业务系统和用户交互, 具体可以与以上 401-1 至 401-4的流程相同。
502、 增值业务系统控制呼叫后, 根据用户号码从增值业务系统中获取用 户的信息, 然后判断用户是否进行了声紋釆集, 如果用户釆集过声紋信息, 则 继续进行后续的步骤。
503、 当用户准备进行特定业务操作时, 增值业务系统请求运营系统进行 28
- 10- 声紋识别操作, 携带声紋釆集时由声紋釆集识别系统返回的用户的声紋 ID。
504、 收到了增值业务系统的声紋识别操作请求后, 运营系统把该声紋识 别请求转发给声紋釆集识别系统。
505、 声紋釆集识别系统首先创建一个语音会场, 并且把用户和坐席都加 入到语音会场中,保证用户和坐席继续通话,然后指示语音会场进行通道录音, 录制的通道为用户的通道。
506、声紋釆集识别服务器根据请求的声紋 ID信息从数据库服务器中查找 到该声紋 ID对应的声紋文件, 然后从声紋文件服务器中调用该声紋文件, 将 该声紋文件中的声紋信息和录制的通道录音文件中提取用户的声紋信息进行 比较, 根据比较结果判断用户身份是否识别成功。
507、 声紋釆集识别系统完成了声紋识别操作后, 停止语音会场的通道录 音功能, 然后把用户和坐席移出语音会场, 并且把用户和坐席的语音时隙搭接 起来, 然后释放语音会场, 声紋釆集识别系统再把识别结果发送给运营系统。
508、 运营系统把声紋釆集识别系统的识别结果转发给增值业务系统。 509、 收到声紋识别结果后, 增值业务系统按照识别结果进行后续操作。 釆用本实施例提供的声紋釆集识别系统, 能够在用户和坐席的通话过程 中, 创建语音会场, 并将用户和坐席加入该语音会场中。 这样, 在该语音会场 中对用户的通道录音的同时, 用户和坐席也可以通过该语音会场进行交互,使 得正常的呼叫业务不受中断。避免现有技术中因为声音处理的需要而使得用户 与坐席之间通话被迫中断, 提升了用户业务体验和服务质量。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明 可借助软件加必需的硬件平台的方式来实现, 当然也可以全部通过硬件来实 施。基于这样的理解, 本发明的技术方案对背景技术做出贡献的全部或者部分 可以以软件产品的形式体现出来, 该计算机软件产品可以存储在存储介质中, 如 ROM/RAM、 磁碟、 光盘等, 包括若干指令用以使得一台计算机设备(可以 是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例或者实施例 的某些部分所述的方法。 28
-11- 以上所述仅是本发明的具体实施方式,应当指出,对于本技术领域的普通 技术人员来说, 在不脱离本发明原理的前提下, 还可以作出若干改进和润饰, 这些改进和润饰也应视为本发明的保护范围。

Claims

OP080728 - 12- 权 利 要 求
1、 一种声音处理的方法, 其特征在于, 用于用户和坐席通话过程中的声 音处理, 该方法包括:
创建语音会场;
5 将所述用户和坐席加入所述语音会场,所述语音会场用于连接所述用户的 通道和所述坐席的通道;
在所述语音会场中对所述用户的通道录音, 得到所述用户的通道录音文 件。
2、 根据权利要求 1所述的方法, 其特征在于, 该方法还包括:
10 根据当前用户的第二通道录音文件, 对所述当前用户进行声紋识别。
3、 根据权利要求 2所述的方法, 其特征在于, 所述根据当前用户的第二 通道录音文件, 对所述当前用户进行声紋识别包括:
接收声紋识别操作请求;
从所述声紋识别操作请求中获得与所述声紋识别操作请求对应的用户的 15 声紋标识;
根据所述用户的声紋标识查找对应的用户的声紋信息;
从所述当前用户的第二通道录音文件中提取第二声紋信息;
比较查找到的所述声紋信息和所述第二声紋信息。
4、 根据权利要求 1所述的方法, 其特征在于, 该方法还包括: 20 释放所述语音会场。
5、 根据权利要求 4所述的方法, 其特征在于, 所述释放所述语音会场具 体包括:
将所述用户和坐席移出所述语音会场;
搭接所述用户和坐席的语音时隙;
25 释放所述语音会场的资源。
6、 一种声音处理的设备, 其特征在于, 用于用户和坐席通话过程中的声 音处理, 该设备包括:
会场单元, 用于创建语音会场;
接续单元, 用于将所述用户和坐席加入所述会场单元创建的语音会场, 所 OP080728
- 13 - 述语音会场用于连接所述用户的通道和所述坐席的通道;
录音单元, 用于在所述会场单元创建的语音会场中对所述用户的通道录 音, 得到用户的通道录音文件。
7、 根据权利要求 6所述的设备, 其特征在于, 该设备还包括:
5 识别单元, 用于根据所述录音单元得到的当前用户的第二通道录音文件, 对所述当前用户进行声紋识别。
8、 根据权利要求 6所述的设备, 其特征在于, 该设备还包括: 释放单元, 用于释放所述会场单元创建的所述语音会场。
9、 根据权利要求 6至 8任一项所述的设备, 其特征在于, 所述声音处理 10 的设备独立设置, 或者集成于呼叫中心系统中, 或者集成于声紋釆集识别系统 中。
10、 一种声紋釆集识别系统, 其特征在于, 用于用户和坐席通话过程中的 声音处理, 该系统包括:
会场单元, 用于创建语音会场;
15 接续单元, 用于将所述用户和坐席加入所述会场单元创建的语音会场, 所 述语音会场用于连接所述用户的通道和所述坐席的通道;
录音单元, 用于在所述会场单元创建的语音会场中对所述用户的通道录 音, 得到用户的通道录音文件。
11、 根据权利要求 10所述的系统, 其特征在于, 该系统还可以包括:
20 声紋釆集识别服务器, 用于根据所述录音单元得到的用户的通道录音文 件, 实现用户声紋信息的釆集和用户声紋的识别操作;
声紋文件服务器, 用于存储所述录音单元得到的用户通道录音文件; 数据库服务器,用于存储所述录音单元得到的用户通道录音文件的全路径 和对应的声紋标识。
PCT/CN2009/071603 2008-05-14 2009-04-30 声音处理的方法、设备及系统 WO2009138012A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2008100947379A CN101287044B (zh) 2008-05-14 2008-05-14 声音处理的方法、设备及系统
CN200810094737.9 2008-05-14

Publications (1)

Publication Number Publication Date
WO2009138012A1 true WO2009138012A1 (zh) 2009-11-19

Family

ID=40059004

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/071603 WO2009138012A1 (zh) 2008-05-14 2009-04-30 声音处理的方法、设备及系统

Country Status (2)

Country Link
CN (1) CN101287044B (zh)
WO (1) WO2009138012A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101287044B (zh) * 2008-05-14 2012-04-25 华为技术有限公司 声音处理的方法、设备及系统
CN101997995A (zh) * 2009-08-26 2011-03-30 华为技术有限公司 一种用户身份识别方法、设备及呼叫中心系统
CN104574167B (zh) * 2013-10-29 2020-02-18 腾讯科技(深圳)有限公司 一种租赁处理方法、相关装置及系统
CN104901929B (zh) * 2014-03-07 2018-01-12 华为技术有限公司 一种录音方法、呼叫控制服务器及录音系统
CN108257605B (zh) * 2018-02-01 2021-05-04 Oppo广东移动通信有限公司 多通道录音方法、装置及电子设备
CN111681650A (zh) * 2019-03-11 2020-09-18 阿里巴巴集团控股有限公司 一种智能会议控制方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1815484A (zh) * 2006-03-06 2006-08-09 覃文华 数字化认证系统及其认证方法
CN1829267A (zh) * 2005-03-04 2006-09-06 华为技术有限公司 对坐席和用户间的通话语音进行录制的方法
CN101055718A (zh) * 2007-05-11 2007-10-17 华东师范大学 一种基于矢量量化的声纹识别方法
CN101287044A (zh) * 2008-05-14 2008-10-15 华为技术有限公司 声音处理的方法、设备及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002237891A (ja) * 2001-02-07 2002-08-23 Nippon Telegr & Teleph Corp <Ntt> 通話内容記録システム及び音声記録装置
JP4085924B2 (ja) * 2003-08-04 2008-05-14 ソニー株式会社 音声処理装置
CN101079934B (zh) * 2007-07-02 2011-03-02 中兴通讯股份有限公司 利用会话初始协议软终端实现录制语音的方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1829267A (zh) * 2005-03-04 2006-09-06 华为技术有限公司 对坐席和用户间的通话语音进行录制的方法
CN1815484A (zh) * 2006-03-06 2006-08-09 覃文华 数字化认证系统及其认证方法
CN101055718A (zh) * 2007-05-11 2007-10-17 华东师范大学 一种基于矢量量化的声纹识别方法
CN101287044A (zh) * 2008-05-14 2008-10-15 华为技术有限公司 声音处理的方法、设备及系统

Also Published As

Publication number Publication date
CN101287044B (zh) 2012-04-25
CN101287044A (zh) 2008-10-15

Similar Documents

Publication Publication Date Title
WO2009138012A1 (zh) 声音处理的方法、设备及系统
US8885797B2 (en) Systems and methods for providing network-based voice authentication
US8116436B2 (en) Technique for verifying identities of users of a communications service by voiceprints
US8175650B2 (en) Providing telephone services based on a subscriber voice identification
JP3614604B2 (ja) 自動ダイアリングダイレクトリのオンライン訓練
KR101054680B1 (ko) 혼성 모바일 튠 어웨이를 검출하기 위한 측정 데이터 기록 방법
US20030231746A1 (en) Teleconference speaker identification
CA2266276A1 (en) Method and apparatus for providing voice assisted call management in a telecommunications network
CA2565983A1 (en) Centralized biometric authentication
WO2015172435A1 (zh) 远程会议中实现有序发言的方法及服务器
WO2006020329B1 (en) Method and apparatus for determining authentication capabilities
CA2564463A1 (en) Voice over ip based biometric authentication
JP2001503156A (ja) 話者確認法
KR20170012873A (ko) 음성 검증 방법, 장치 및 시스템
CN100517291C (zh) Ip流的按需会话提供
US20060262908A1 (en) Voice authentication for call control
WO2014140970A2 (en) Voice print tagging of interactive voice response sessions
EP2293291B1 (en) User identity identifying method, device and call center system
CN109660677A (zh) 通话方法、装置、系统、存储介质和计算机设备
CN102547921A (zh) 通信控制方法、装置及系统、以及多模终端
CN103401882A (zh) Voip网关语音链路备份方法及系统
WO2012155598A1 (zh) 移动终端寻回及信息保护的方法及装置
CN107148008A (zh) 通话切换方法、系统、终端及服务器
JP2004328759A (ja) 私設無線高速データシステムにおける端末機認証及び呼処理装置及びその方法
WO2010135934A1 (zh) 基于智能网的电话簿管理系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09745396

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09745396

Country of ref document: EP

Kind code of ref document: A1