WO2020125588A1 - 一种语音通话识别方法、装置及存储介质 - Google Patents

一种语音通话识别方法、装置及存储介质 Download PDF

Info

Publication number
WO2020125588A1
WO2020125588A1 PCT/CN2019/125707 CN2019125707W WO2020125588A1 WO 2020125588 A1 WO2020125588 A1 WO 2020125588A1 CN 2019125707 W CN2019125707 W CN 2019125707W WO 2020125588 A1 WO2020125588 A1 WO 2020125588A1
Authority
WO
WIPO (PCT)
Prior art keywords
call
voice
text
module
caller
Prior art date
Application number
PCT/CN2019/125707
Other languages
English (en)
French (fr)
Inventor
赵永良
Original Assignee
西安中兴新软件有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安中兴新软件有限责任公司 filed Critical 西安中兴新软件有限责任公司
Publication of WO2020125588A1 publication Critical patent/WO2020125588A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/656Recording arrangements for recording a message from the calling party for recording conversations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72484User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • the present invention requires the priority of a Chinese patent application filed on December 21, 2018 in the Chinese Patent Office, with the application number 201811575096.9 and the invention titled "A Voice Call Recognition Method, Device, and Storage Medium”. The entire content of the application is passed The reference is incorporated in the present invention.
  • the invention relates to the technical field of communication terminals, in particular to a voice call recognition method, device and storage medium.
  • call recording Since the advent of mobile phones, calling has been one of the most frequently used and essential features. There are often some important information to be recorded during the call, so many terminal manufacturers have implemented such functions as call recording, specifically saving the voice content of the caller as an audio file for the user to view and play.
  • the main method currently used is to use the recording function of the mobile phone to record.
  • this recording method takes up a lot of storage space, and it is also inconvenient to find the content of the caller in each call.
  • Audio files have the disadvantages of large storage space, which is not conducive to searching, and are not intuitive. People often use the method of converting call recordings into text for storage, because text saves storage space and facilitates subsequent search.
  • Embodiments of the present invention provide a voice call recognition method, device, and storage medium to solve the problem that the audio file has a large storage space, is not conducive to searching, and is not intuitive.
  • the present invention is implemented by at least one of the following technical solutions:
  • a voice call recognition method which includes: when a call occurs, the voice streams of the call parties in the call are recognized and converted into corresponding text information; according to the call attributes of the call parties, each The caller is associated with the corresponding text information and generates a call text; the call text is displayed.
  • the present application provides a voice call recognition device, including: a voice recognition device and a message module; the voice recognition device is used to recognize the voice streams of the parties in the call when a call occurs, and Convert to corresponding text information; according to the call attributes of each caller, associate each caller with the corresponding text information and generate a call text; the message module is used to display the call text.
  • the present application provides a computer-readable storage medium that stores a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, the above method is implemented.
  • FIG. 1 is a flowchart of Embodiment 1 of the voice call recognition method of the present application
  • FIG. 2 is a flowchart of step S01;
  • FIG. 3 is a structural block diagram of a message conversation page of the voice call recognition device of the present application.
  • Embodiment 4 is a structural block diagram of Embodiment 4 of the voice call recognition device of the present application.
  • Embodiment 4 is another structural block diagram of Embodiment 4 of the voice call recognition device of the present application.
  • FIG. 6 is another structural block diagram of Embodiment 4 of the voice call recognition device of the present application.
  • FIG. 7 is another structural block diagram of Embodiment 4 of the voice call recognition device of the present application.
  • 1-voice recognition device 11-call module, 111-communication unit, 112-decoding unit, 12-audio module, 13-voice recognition module, 14-processing module, 15-contact module, 2-message module.
  • this application provides a voice call recognition method, which may include the following steps.
  • This application realizes that call records can be recorded and retrieved by means of voice to text, and provides an information portal for big data analysis and secretarial services.
  • This application converts the call voice to text, and then presents each call attribute information and text to the user.
  • the information interaction and voice interaction of the caller and the contact attribute information are integrated, and the interaction content of the caller is recorded more completely and continuously, and presented to the user in a more intuitive and convenient way.
  • the method for converting call voice to text described in S01 includes the following steps.
  • S011 Obtain at least one upstream audio stream and at least one downstream audio stream in the call voice; the upstream and downstream audio streams include time information; the real-time transport protocol (Real-time Transport Protocol, RTP) stream captured in the call voice After that, two audio streams in different directions are extracted from this RTP stream according to the direction of the audio stream; in order to distinguish, we call one of the audio streams as the upstream audio stream and the other as the downstream audio stream.
  • RTP Real-time Transport Protocol
  • the upstream audio stream and the downstream audio stream are parsed and converted into corresponding text information, specifically: a) The upstream audio stream corresponds to the voice of the local terminal caller, and the downstream audio stream corresponds to the caller's voice Voice; b) parsing the upstream audio stream to generate terminal call text; c) parsing the downstream audio stream to generate caller text, including: performing RTP packet preprocessing and voice on the two audio streams respectively Decoding and other processes, forming two separate voice signals for uplink and downlink, and outputting, playing, or generating a text file to save these two separate voice signals; d) Start the number and upstream audio stream of the terminal party The time and duration correspond to the call text of the terminal; f) the caller number of the caller, the start time and duration of the downstream audio stream correspond to the caller text.
  • the call attribute includes a phone number; or/and the call attribute includes a phone number and voice start time; or/and the call attribute includes a phone number and voice duration; or/and the call attribute includes a phone number, call Start time and call duration.
  • identifying the caller's caller's number also includes: matching the caller's number with the stored contact information, specifically: if the caller's number is the same as the phone number in the contact list, the caller's number is matched To an existing contact, if the caller number is different from the phone number in the contact list, the caller number is the new number.
  • step f is directly performed.
  • the method for displaying the call text in S03 includes: displaying the call text in chronological order; or displaying the call text of each of the call parties one by one.
  • the display form of the call text includes: if the calling party has a message session, the call text is added to the existing message session; the text can be added separately or together with the audio content To the message session; if the caller does not have a message session, a new message session is added, and the call text is added to the newly added message session; the text can be added individually or together with the audio content.
  • the present application also includes: saving the call text.
  • the voice recognition of the upstream and downstream audio streams during the call is converted into text separately, and then the converted text is presented to the user in combination with the time information and the caller information.
  • the rate of reading the text file is far higher For listening to audio files, it will save users more time to get call information.
  • the information interaction and voice interaction of the caller are integrated according to time, and the interaction content of the caller is recorded more completely and continuously.
  • the present application provides a voice call recognition device for the voice call recognition method described above, including: a voice recognition device 1 and a message module 2; the voice recognition device 1 is used when a call occurs , Identify the voice stream of each caller in the call and convert it to corresponding text information; according to the call attributes of each caller, associate each caller with the corresponding text information and generate a call text; the message Module 2, used to display the call text.
  • the voice recognition device 1 includes: a call module 11, an audio module 12, a voice recognition module 13, and a processing module 14 connected in sequence;
  • the call module 11 is used to identify a caller's incoming number , Display the incoming call number, connect the phone, voice chat, and save the call content;
  • the audio module 12 is used to obtain at least one upstream audio stream and at least one downstream audio stream generated by the voice chat in the calling module;
  • the voice recognition The module 13 is used to parse the upstream audio stream and the downstream audio stream acquired by the audio module and convert it into corresponding text information;
  • the processing module 14 is used to convert each call according to the call attributes of each caller The party is associated with the corresponding text information and generates a call text.
  • the speech recognition of the upstream and downstream audio streams during the call is separately converted into text, and then the converted text is injected into the message module 2 to be presented to the user in combination with the time information and the caller information.
  • the information interaction and voice interaction of the caller are integrated in time, and the interaction content of the caller is recorded more completely and continuously.
  • the voice recognition device 1 includes: a call module 11, an audio module 12, a voice recognition module 13, and a processing module 14 connected in sequence; and a call module 11 and processing
  • the contact module 15 connected to the module 14; the contact module 15 is used to read the caller number in the call module 11 and to name and store the caller number; or/and the contact module 15 is used to directly input and store user equipment Contact information; the processing module 14 is used to combine call attribute information with the text to generate a call text.
  • the start time and duration of the upstream audio stream correspond to the call text of the terminal; correspond to the calling number of the calling party, the start time and duration of the downstream audio stream correspond to the text of the calling party.
  • This application is provided with a contact module 15, by directly entering contact information and phone numbers in the contact module 15, or by reading the phone number in the call module 11 to add contact information, if the caller number and existing contacts If the information matches, the received call text is directly displayed on the existing message session or the newly added message session.
  • the voice recognition device 1 includes: a call module 11, an audio module 12, a voice recognition module 13, and a processing module 14 connected in sequence; the call module 11 includes: a A communication unit 111 that transmits and receives signals, and a decoding unit 112 that is electrically connected to the communication unit 111 and can resolve the electric number.
  • the communication unit 111 receives the contact person's phone, and the decoding unit 112 analyzes the caller number of the call module 11 to realize the function that the call module can make calls and analyze the caller number.
  • the present application provides a mobile terminal, including the above-mentioned voice call recognition device; the mobile terminal may have a function button for call start assist mode or normal call mode, and when it is detected that the user clicks the function button for normal call mode, the computer program may Control the mobile terminal to enter the ordinary call interface, where the ordinary call mode can be understood as the traditional voice call mode.
  • the mobile terminal of the present application has a call, analyzes the incoming call number, recognizes contact information, performs voice recognition on the upstream and downstream audio streams during the call and converts it into text, and then injects the converted text into the message module to present to the user in combination with time information and caller information Features.
  • the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above method is implemented.
  • An embodiment of the present invention provides a computer program product.
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer To make the computer execute the method in any of the above method embodiments.
  • the call is converted into text, and then each caller is associated with corresponding text information according to the call attributes of each caller, and the call text is generated and presented to the user.
  • the information interaction and voice interaction of the caller and the call attribute information are integrated, and the interaction content of the caller is recorded more completely and continuously, and presented to the user in a more intuitive and convenient way.
  • the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation.
  • the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or Steps can be performed cooperatively by several physical components.
  • Some physical components or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit .
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage medium includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and accessible by a computer.
  • the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .
  • the technical solution of the present invention can be embodied in the form of a software product in essence or part that contributes to the existing technology, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk,
  • the CD-ROM includes several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

本发明公开了一种语音通话识别方法、装置及存储介质,方法包括:当发生通话时,识别所述通话中各通话方的语音流,并转换为相应的文本信息;依据各通话方的通话属性,将各通话方与所述对应的文本信息关联,并生成通话文本;显示所述通话文本。

Description

一种语音通话识别方法、装置及存储介质
交叉引用
本发明要求在2018年12月21日提交中国专利局、申请号为201811575096.9、发明名称为“一种语音通话识别方法、装置及存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。
技术领域
本发明涉及通信终端技术领域,尤其涉及一种语音通话识别方法、装置及存储介质。
背景技术
从手机问世以来,通话就是一个使用最频繁和必不可少的功能之一。在通话的过程中往往存在一些重要的信息需要记录,所以很多终端厂商实现了通话录音这样的功能,具体就是将通话方的语音内容保存为音频文件供用户查看和播放。
如果需要记录通话内容的话,目前采用的主要方式就是利用手机的录音功能来记录。但这种记录方式存储占用空间大,同时也不方便查找各通话中的通话方的内容。音频文件存在存储占用空间大,不利于查找,呈现不直观的缺点,人们常采用将通话录音转换成文本的方式进行存储,因为文本节省存储空间,便于后续查找。
发明内容
本发明实施例提供一种语音通话识别方法、装置及存储介质,以解决音频文件存在存储占用空间大,不利于查找,呈现不直观的问题。
为了解决上述技术问题,本发明通过以下至少一技术方案实现:
第一方面,提供了一种语音通话识别方法,包括:当发生通话时,识别 所述通话中各通话方的语音流,并转换为相应的文本信息;依据各通话方的通话属性,将各通话方与所述相应的文本信息关联,并生成通话文本;显示所述通话文本。
第二方面,本申请提供了一种语音通话识别装置,包括:语音识别装置和消息模块;所述语音识别装置,用于当发生通话时,识别所述通话中各通话方的语音流,并转换为相应的文本信息;依据各通话方的通话属性,将各通话方与所述相应的文本信息关联,并生成通话文本;所述消息模块,用于显示所述通话文本。
第三方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述的方法。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本申请的语音通话识别方法的实施例1的流程图;
图2为步骤S01的流程图;
图3为本申请的语音通话识别装置的消息会话页面的结构框图;
图4为本申请的语音通话识别装置的实施例4的结构框图;
图5为本申请的语音通话识别装置的实施例4的另一结构框图;
图6为本申请的语音通话识别装置的实施例4的另一结构框图;
图7为本申请的语音通话识别装置的实施例4的另一结构框图;
附图中各部件的标记如下:
1-语音识别装置,11-通话模块,111-通信单元,112-解编码单元,12-音频模块,13-语音识别模块,14-处理模块,15-联系人模块,2-消息模块。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例1:
如图1所示,本申请提供了一种语音通话识别方法,可以包括以下步骤。
S01,当发生通话时,识别所述通话中各通话方的语音流,并转换为相应的文本信息;
S02,依据各通话方的通话属性,将各通话方与所述相应的文本信息关联,并生成通话文本;
S03,显示所述通话文本。
本申请通过语音转文本的方式实现了通话记录可记录、可检索,并为大数据分析、秘书化服务提供了信息入口。
本申请将通话语音转换为文本,然后将每个通话属性信息与文本结合后呈现给用户。如此将通话方的信息交互及语音交互与联系人属性信息整合起来,更加完整连续的记录了通话方的交互内容,以一种更加直观便捷的方式呈现给用户。
实施例2:
如图2所示,在实施例1的基础上,当通话开始后,还包括:识别通话方的来电号码。
S01中所述将通话语音转换为文本的方法包括以下步骤。
S011,获取所述通话语音中的至少一个上行音频流和至少一个下行音频流;上下行音频流包括时间信息;在通话语音中捕获到传输的实时传输协议(Real-time Transport Protocol,RTP)流后,从这个RTP流中按照其中音频流的 方向提取出不同方向地两条音频流;为了加以区分,我们把其中一条音频流称为上行音频流,另一条称之为下行音频流,上、下行的称谓是相对的,可以根据不同的应用场景加以定义。
S012,对所述上行音频流和所述下行音频流进行解析,分别转换为相应的文本信息,具体为:a)上行音频流对应于本地终端通话方的语音,下行音频流对应于通话方的语音;b)对所述上行音频流进行解析,生成终端通话文本;c)对所述下行音频流进行解析,生成通话方文本,包括:对这两条音频流分别进行RTP包预处理和语音解码等过程,形成上、下行两条单独的语音信号,并分别对这两条单独的语音信号进行输出、播放,或者生成文本文件加以保存;d)将终端通话方的号码、上行音频流开始时间和持续时间对应于所述终端通话文本;f)将通话方来电号码、下行音频流开始时间和持续时间对应于所述通话方文本。
所述通话属性包括电话号码;或/和所述通话属性包括电话号码和语音开始时间;或/和所述通话属性包括电话号码和语音持续时间;或/和所述通话属性包括电话号码、通话开始时间和通话持续时间。
进一步地,识别所述通话方的来电号码,还包括:将所述来电号码与存储的联系人信息进行匹配,具体为:如果来电号码与联系人列表中的电话号码相同,则将来电号码匹配到已有联系人上,如果来电号码与联系人列表中的电话号码不相同,则将该来电号码为新号码。
如果匹配成功,则将存储的所述联系人信息与下行音频流开始时间和持续时间对应于所述通话方文本;如果匹配失败,则直接执行步骤f。
实施例3:
在实施例1或2的基础上,S03中显示通话文本的方法包括:按照时间顺序,显示通话文本;或逐一显示每个所述通话方的通话文本。如图3所示,通话文本的显示形式包括:如果所述通话方存在消息会话,则将所述通话文本添加至已有所述消息会话上;可以单独添加文本,也可以与音频内容一起 添加至消息会话上;如果通话方不存在消息会话,则新增消息会话,所述通话文本添加至新增消息会话上;可以单独添加文本,也可以与音频内容一起添加至消息会话上。
进一步地,本申请还包括:保存所述通话文本。
本申请将单独针对通话时上下行音频流进行语音识别转换成文本,然后结合时间信息及通话方信息将转换的文本呈现给用户,用户需要查询通话内容的时候,阅读文本文件的速率远远高于听取音频文件的,这将更加节省用户获取通话信息的时间。如此将通话方的信息交互及语音交互按时间整合起来,更加完整连续的记录了通话方的交互内容。
实施例4:
如图4所示,本申请提供了一种语音通话识别装置,用于上述的语音通话识别方法,包括:语音识别装置1和消息模块2;所述语音识别装置1,用于当发生通话时,识别所述通话中各通话方的语音流,并转换为相应的文本信息;依据各通话方的通话属性,将各通话方与所述对应的文本信息关联,并生成通话文本;所述消息模块2,用于显示所述通话文本。
进一步地,如图5所示,所述语音识别装置1包括:依次连接的通话模块11、音频模块12、语音识别模块13、处理模块14;所述通话模块11用于识别通话方的来电号码、显示来电号码、接通电话、语音聊天,以及保存通话内容;所述音频模块12用于获取所述通话模块中语音聊天产生的至少一个上行音频流和至少一个下行音频流;所述语音识别模块13用于对所述音频模块获取的所述上行音频流和所述下行音频流进行解析,转换为相应的文本信息;所述处理模块14用于依据各通话方的通话属性,将各通话方与所述相应的文本信息关联,并生成通话文本。
本申请将单独针对通话时上下行音频流进行语音识别转换成文本,然后结合时间信息及通话方信息将转换的文本注入消息模块2来呈现给用户。如此将通话方的信息交互及语音交互按时间整合起来,更加完整连续的记录了 通话方的交互内容。
如图6所示,本申请提供了一种语音通话识别装置,语音识别装置1包括:依次连接的通话模块11、音频模块12、语音识别模块13、处理模块14;以及与通话模块11和处理模块14连接的联系人模块15;联系人模块15用于读取通话模块11中的来电号码,并对来电号码进行命名、存储;或/和联系人模块15用于直接对用户设备输入、储存联系人信息;处理模块14用于将通话属性信息与所述文本结合,生成通话文本。将通话方的号码、上行音频流开始时间和持续时间对应于所述终端通话文本;将通话方来电号码、下行音频流开始时间和持续时间对应于所述通话方文本。
本申请设置有联系人模块15,通过在联系人模块15中直接录入联系人信息和电话号码,或者通过读取通话模块11中的电话号码新增联系人信息,如果来电号码与已有联系人信息匹配,则将接收到的通话文本直接显示于已有消息会话上或者新增的消息会话上。
如图7所示,本申请提供了一种语音通话识别装置,语音识别装置1包括:依次连接的通话模块11、音频模块12、语音识别模块13、处理模块14;通话模块11包括:一个能够收发信号的通信单元111、一个电连接通信单元111且能够解析出来电号码的解编码单元112。
本申请通过通信单元111接收联系人的电话,且通过解编码单元112对通话模块11的来电号码进行解析,实现了通话模块能够进行通话和解析来电号码的功能。
本申请提供了一种移动终端,包括上述的语音通话识别装置;该移动终端可具有通话开启辅助模式或普通通话模式的功能按钮,当检测到用户点击普通通话模式的功能按钮时,计算机程序可控制移动终端进入普通通话界面,其中,该普通通话模式可理解就是传统语音通话模式。
本申请的移动终端具有通话、解析来电号码、识别联系信息,对通话时上下行音频流进行语音识别转换成文本,然后结合时间信息及通话方信息将 转换的文本注入消息模块来呈现给用户的功能。
本申请提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述的方法。
本发明实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意方法实施例中的方法。
在本发明实施例中,将通话转换为文本,然后将依据各通话方的通话属性,将各通话方与对应的文本信息关联,并生成通话文本后呈现给用户。如此将通话方的信息交互及语音交互与通话属性信息整合起来,更加完整连续的记录了通话方的交互内容,以一种更加直观便捷的方式呈现给用户。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。应说明的是,在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性 介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本发明的保护之内。

Claims (10)

  1. 一种语音通话识别方法,其中,包括:
    当发生通话时,识别所述通话中各通话方的语音流,并转换为相应的文本信息;
    依据各通话方的通话属性,将各通话方与所述相应的文本信息关联,并生成通话文本;
    显示所述通话文本。
  2. 根据权利要求1所述的语音通话识别方法,其中,所述识别所述通话中各通话方的语音流,并转换为相应的文本信息的方法包括:
    获取所述通话中的至少一个上行音频流和至少一个下行音频流;
    对所述上行音频流和所述下行音频流进行解析,转换为相应的文本信息。
  3. 根据权利要求1或2所述的语音通话识别方法,其中,所述通话属性包括电话号码;或/和
    所述通话属性包括电话号码和语音开始时间;或/和
    所述通话属性包括电话号码和语音持续时间;或/和
    所述通话属性包括电话号码、通话开始时间和通话持续时间。
  4. 根据权利要求1或2所述的语音通话识别方法,其中,所述显示所述通话文本包括:
    按照时间顺序,显示通话文本;或
    逐一显示每个所述通话方的通话文本。
  5. 根据权利要求4所述的语音通话识别方法,其中,如果所述通话方存在消息会话,则将所述通话文本添加至已有所述消息会话上;
    如果所述通话方与不存在消息会话,则新增消息会话,且所述通话文本显示于新增所述消息会话上。
  6. 根据权利要求1或2所述的语音通话识别方法,其中,在所述通话开始后,还包括:
    识别所述通话方的来电号码;
    将来电号码与本机终端存储的联系人信息进行匹配;
    如果匹配成功,则将存储的所述联系人信息与所述通话属性信息、文本进行结合;
    如果匹配失败,则直接将通话属性信息与所述文本结合。
  7. 根据权利要求1或2任一项所述的语音通话识别方法,其中,还包括:保存所述通话文本。
  8. 一种语音通话识别装置,其中,包括:语音识别装置和消息模块;
    所述语音识别装置,用于当发生通话时,识别所述通话中各通话方的语音流,并转换为相应的文本信息;依据各通话方的通话属性,将各通话方与所述对应的文本信息关联,并生成通话文本;
    所述消息模块,用于显示所述通话文本。
  9. 根据权利要求8所述的语音通话识别装置,其中,所述语音识别装置包括:依次连接的通话模块、音频模块、语音识别模块和处理模块;
    所述通话模块用于识别通话方的来电号码、显示来电号码、接通电话、语音聊天,以及保存通话内容;
    所述音频模块用于获取所述通话模块中语音聊天产生的至少一个上行音频流和至少一个下行音频流;
    所述语音识别模块用于对所述音频模块获取的所述上行音频流和所述下行音频流进行解析,转换为相应的文本信息;
    所述处理模块用于依据各通话方的通话属性,将各通话方与所述相应的文本信息关联,并生成通话文本。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的方法。
PCT/CN2019/125707 2018-12-21 2019-12-16 一种语音通话识别方法、装置及存储介质 WO2020125588A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811575096.9A CN111355838A (zh) 2018-12-21 2018-12-21 一种语音通话识别方法、装置及存储介质
CN201811575096.9 2018-12-21

Publications (1)

Publication Number Publication Date
WO2020125588A1 true WO2020125588A1 (zh) 2020-06-25

Family

ID=71100413

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125707 WO2020125588A1 (zh) 2018-12-21 2019-12-16 一种语音通话识别方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN111355838A (zh)
WO (1) WO2020125588A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113905137A (zh) * 2021-11-11 2022-01-07 北京沃东天骏信息技术有限公司 一种通话方法及装置、存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113037914A (zh) * 2021-03-01 2021-06-25 北京百度网讯科技有限公司 用于处理呼入电话的方法、相关装置及计算机程序产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160330322A1 (en) * 2015-05-04 2016-11-10 Shanghai Xiaoi Robot Technology Co., Ltd. Method and Device for Providing Voice Feedback Information to User On Call
CN108650419A (zh) * 2018-05-09 2018-10-12 深圳市知远科技有限公司 基于智能手机的电话翻译系统
CN108737667A (zh) * 2018-05-03 2018-11-02 平安科技(深圳)有限公司 语音质检方法、装置、计算机设备及存储介质
CN108877839A (zh) * 2018-08-02 2018-11-23 南京华苏科技有限公司 基于语音语义识别技术的语音质量感知评估的方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664984A (zh) * 2012-04-20 2012-09-12 上海合合信息科技发展有限公司 语音笔记的创建方法及系统
CN105100360B (zh) * 2015-08-26 2019-05-03 百度在线网络技术(北京)有限公司 用于语音通话的通话辅助方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160330322A1 (en) * 2015-05-04 2016-11-10 Shanghai Xiaoi Robot Technology Co., Ltd. Method and Device for Providing Voice Feedback Information to User On Call
CN108737667A (zh) * 2018-05-03 2018-11-02 平安科技(深圳)有限公司 语音质检方法、装置、计算机设备及存储介质
CN108650419A (zh) * 2018-05-09 2018-10-12 深圳市知远科技有限公司 基于智能手机的电话翻译系统
CN108877839A (zh) * 2018-08-02 2018-11-23 南京华苏科技有限公司 基于语音语义识别技术的语音质量感知评估的方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113905137A (zh) * 2021-11-11 2022-01-07 北京沃东天骏信息技术有限公司 一种通话方法及装置、存储介质

Also Published As

Publication number Publication date
CN111355838A (zh) 2020-06-30

Similar Documents

Publication Publication Date Title
CN103888581B (zh) 一种通信终端及其记录通话信息的方法
CN103327181B (zh) 可提高用户获知语音信息效率的语音聊天方法
US9906642B2 (en) Identity identification method and apparatus and communication terminal
US8391445B2 (en) Caller identification using voice recognition
EP3542522B1 (en) Incoming call management method and apparatus
CN102546890B (zh) 信息检测方法及终端
US10574827B1 (en) Method and apparatus of processing user data of a multi-speaker conference call
US20110228913A1 (en) Automatic extraction of information from ongoing voice communication system and methods
WO2016145973A1 (zh) 一种通话过程中的语音协助方法及装置
CN102447782A (zh) 一种能够在通话中实时呈现通话内容的电话终端
WO2020125588A1 (zh) 一种语音通话识别方法、装置及存储介质
US20170270948A1 (en) Method and device for realizing voice message visualization service
CN109842712A (zh) 通话记录生成的方法、装置、计算机设备和存储介质
US20110244842A1 (en) Communications system, device with dialing function and method thereof
CN103024129A (zh) 通话录音方法、装置及移动终端
CN107112030A (zh) 分析被叫端的情况的方法和装置以及实现该方法和装置的程序
US10313502B2 (en) Automatically delaying playback of a message
WO2018166367A1 (zh) 一种实时对话中的实时提醒方法、装置、存储介质及电子装置
EP2913822B1 (en) Speaker recognition
CN106911832B (zh) 一种语音记录的方法及装置
CN108322429B (zh) 实时通信中录制控制方法、实时通信系统及通信终端
CN208656882U (zh) 呼叫中心话务管理系统
CN103581400A (zh) 一种通话过程中存储电话号码的方法
CN105933128A (zh) 一种基于噪音过滤和身份认证的音频会议纪要推送方法
EP3007417A1 (en) Method and residential gateway for realizing voice message function

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19900117

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19900117

Country of ref document: EP

Kind code of ref document: A1