WO2022041177A1 - Communication message processing method, device, and instant messaging client - Google Patents

Communication message processing method, device, and instant messaging client Download PDF

Info

Publication number
WO2022041177A1
WO2022041177A1 PCT/CN2020/112407 CN2020112407W WO2022041177A1 WO 2022041177 A1 WO2022041177 A1 WO 2022041177A1 CN 2020112407 W CN2020112407 W CN 2020112407W WO 2022041177 A1 WO2022041177 A1 WO 2022041177A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
keyword
voice message
voice
message
Prior art date
Application number
PCT/CN2020/112407
Other languages
French (fr)
Chinese (zh)
Inventor
马宇尘
Original Assignee
深圳市永兴元科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市永兴元科技股份有限公司 filed Critical 深圳市永兴元科技股份有限公司
Publication of WO2022041177A1 publication Critical patent/WO2022041177A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information

Definitions

  • the present invention relates to the technical field of communication interaction.
  • IM Instant Messaging
  • Various instant messaging software not only supports the instant transmission of text messages, but also enables the transmission of voice messages and video messages between users.
  • the user When interacting with voice messages through the IM tool, the user can activate the terminal's microphone and other voice collection settings to record the voice message, and then transmit the voice message to the target receiving end user through the Internet. After the receiving end user enters the play instruction, he can play the Voice message, the recipient user can also reply to the message by voice.
  • the text conversion function of the voice message is also added, and the converted text content and the recorded audio file can be sent to the receiving end user as an instant communication message.
  • Some communication tools also have a speech synthesis function that converts text into speech—Text To Speech (TTS for short).
  • TTS Text To Speech
  • the former uses a large number of recorded voice fragments, combined with the text analysis results, and splices the recorded fragments to obtain synthetic voice; while the latter uses the results of text analysis to generate voice parameters through the model, such as basic frequency, etc., and then convert it into a waveform.
  • the existing voice message function only combines the features of text conversion, and does not consider further information such as expressions, emotional states, tone of voice, etc. when the user's voice is recorded, which is difficult to meet the needs of users, especially for users who like to use the dynamic image function for For young people in Doutu, voice messages lack fun.
  • the purpose of the present invention is to overcome the deficiencies of the prior art and provide a communication message processing method, device and instant messaging client.
  • relevant image data can be loaded intelligently in the process of user voice interaction, the convenience, intelligence and interest of message interaction can be improved, and user experience can be improved.
  • a communication message processing method comprising the steps of: acquiring a voice message collected by an audio collection device; extracting a keyword feature in the voice message; determining image data matching the keyword, and sending it together with the aforementioned voice message, or It is sent after replacing the keywords in the voice message with image data.
  • volume information of the voice message is acquired, and the size when outputting the matching image data is adjusted according to the volume.
  • semantic analysis is performed on the voice message, and when the semantic content obtained by the analysis includes two or more matching image data, a plurality of matched image data is obtained to produce a dynamic image output, or a plurality of images are formed into a composite image output.
  • the extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
  • a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  • image data is pictures, videos, animations and/or other multimedia information.
  • the text content of the voice message is acquired, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
  • the text content is displayed in a message box of the multimedia message
  • an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the method of extracting the keyword features in the voice message is:
  • audio analysis is performed on the voice message to obtain the user's emotional state feature, and the emotional state feature is used as a keyword feature of the voice message.
  • the method of determining the image data matching the keyword is,
  • the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
  • the communication message is an instant communication message.
  • the present invention also provides a communication message processing device, including the following structure:
  • an audio acquisition module for acquiring the voice message input by the user
  • an information extraction module for extracting the keyword features in the voice message
  • An information processing module configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  • the present invention also provides an instant messaging client for performing instant messaging interaction, including the following structure:
  • the voice message trigger module is used to collect the user's voice trigger operation
  • an information extraction module for extracting keyword features in the voice according to the voice input by the user
  • An information processing module configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
  • the present invention has the following advantages and positive effects as an example due to the adoption of the above technical solutions: by using the present invention, relevant image data can be loaded intelligently in the process of user voice interaction, and the efficiency of message interaction can be improved. Convenience, intelligence and fun, especially suitable for users who like to interact with bucket diagrams, improving the user experience.
  • FIG. 1 is a flowchart of a communication message processing method provided by an embodiment of the present invention.
  • FIG. 2 is a module structure diagram of an instant messaging client provided by an embodiment of the present invention.
  • FIG. 3 to FIG. 7 are diagrams illustrating operation examples of instant messaging interaction provided by an embodiment of the present invention.
  • FIG. 8 to FIG. 10 are exemplary diagrams when a voice message including image data is received according to an embodiment of the present invention.
  • Instant messaging client 100 voice message triggering module 110 , information extraction module 120 , information processing module 130 ; user terminal 200 , desktop 210 , instant messaging tool icon 211 , contact 220 , microphone 230 ;
  • a communication message processing method including the following steps:
  • S100 Acquire a voice message collected by an audio collection device.
  • the message at this time is an instant messaging message.
  • IM tool instant messaging tool
  • WeChat the message at this time is an instant messaging message.
  • the voice recording button After the user enters WeChat, he can trigger the voice recording button to start the audio collection device of the terminal where he is located. After the pickup is activated, the user's voice information can be collected.
  • the terminal may be various commonly used mobile terminals such as mobile phones, palmtop computers, and tablet computers, and various smart wearable electronic devices, such as smart glasses, smart watches, and the like.
  • a mobile phone is used as the mobile terminal, and the mobile phone has an audio collection structure, an image collection structure and a display structure.
  • the aforementioned voice message is recognized based on the voice recognition technology, and the keyword features in the voice message are advanced.
  • Speech recognition technology is mainly based on the analysis of three basic properties of speech: physical properties, physiological properties and social properties.
  • the physical properties of speech mainly include four elements: pitch, length, intensity and timbre.
  • Pitch refers to the height of the sound, which is mainly determined by the speed of the vibration of the sounding body;
  • the length of the sound refers to the length of the sound, which is mainly determined by the duration of the vibration of the sounding body;
  • the intensity of the sound refers to the strength of the sound, which is mainly determined by the pronunciation.
  • the timbre refers to the characteristics of the sound, which is mainly determined by the different tortuous forms of the sound wave ripples formed by the vibration of the sounding object.
  • the physiological properties of speech mainly refer to the influence of vocal organs on speech, including the lungs and trachea, head and vocal cords, as well as the vocal organs such as the oral cavity, nasal cavity and pharynx.
  • the social attributes of phonetics are mainly reflected in three aspects. First, there is no necessary connection between phonetics and meaning, and their corresponding relationship is established by social members; second, various languages or dialects have their own phonetic systems; third, Voice has the function of distinguishing meaning.
  • the basic process of speech recognition may include three steps: preprocessing of speech signals, feature extraction, and pattern matching.
  • Preprocessing usually includes speech signal sampling, anti-aliasing bandpass filtering, removal of individual pronunciation differences and noise effects caused by equipment and environment, etc., and involves the selection of speech recognition primitives and endpoint detection.
  • Feature extraction is used to extract acoustic parameters that reflect essential features in speech, such as average energy, average zero-crossing rate, formants, etc.
  • the extracted feature parameters must meet the following requirements: the extracted feature parameters can effectively represent the speech features and have good discrimination; the parameters of each order have good independence; the feature parameters should be easy to calculate, preferably with high efficiency. Algorithms to ensure real-time implementation of speech recognition.
  • a model is established for each entry and saved as a template library.
  • the speech signal passes through the same channel to obtain speech feature parameters, generates a test template, matches with the reference template, and takes the reference template with the highest matching score as the recognition result.
  • the accuracy of recognition can be improved.
  • Pattern matching is the core of the entire speech recognition system. It calculates the similarity between input features and inventory patterns according to certain rules (such as a certain distance measure) and expert knowledge (such as word formation rules, grammar rules, semantic rules, etc.). degree (such as matching distance, likelihood probability) to determine the semantic information of the input speech.
  • rules such as a certain distance measure
  • expert knowledge such as word formation rules, grammar rules, semantic rules, etc.
  • degree such as matching distance, likelihood probability
  • the keyword feature in the advance voice message refers to obtaining the key content based on the content of the voice recognition.
  • the keyword features may be words expressing emotions, words expressing emotions, words expressing preferences, words expressing intentions, or words expressing plans, and the like.
  • the method for extracting the keyword features in the voice message may be as follows:
  • the first method is to perform semantic analysis on the voice message, and obtain keyword features based on the semantic analysis.
  • Method 2 Perform audio analysis on the voice message to obtain intonation features, speed features and/or volume features, and obtain keyword features in the voice message based on the intonation features, speed features and/or volume features.
  • Manner 3 Perform audio analysis on the voice message to obtain the user's emotional state feature, and use the emotional state feature as a keyword feature of the voice message.
  • Voices can reflect people's emotions to a certain extent. For example, generally speaking, irritable and loud speech often means that the speaker is more angry, while cheerful and soft speech often means that the speaker is more happy. Accordingly, the important content that the user needs to express can be obtained by analyzing the emotional information in the user's voice information.
  • the way of identifying the emotional information in the voice information is one or more of the following ways:
  • the first way is to analyze the user's volume change in the voice information, and analyze the emotional state feature according to the volume change.
  • the second method is to analyze the pitch change in the speech information, and analyze the emotional state feature according to the pitch change.
  • the third method is to analyze the speech rate information in the speech information, and analyze the emotional state characteristics according to the speech information.
  • the fourth method is to analyze the rhythm changes in the speech information, and analyze the emotional state characteristics according to the rhythm changes.
  • the user's voice message collected is "This product is much cheaper than the one I bought before, I'm really happy.”
  • the obtained keyword feature can be "Too much happy”.
  • the user does not express emotions explicitly, but the voice messages contain emotional tendencies, the implied emotions may be used as keyword features based on situational analysis.
  • the user's voice message collected is: "This bun is much smaller than before”, and the emotional tendency contained in the above text message is "dissatisfied and unhappy". Therefore, based on the emotional tendency, "dissatisfied and unhappy" is used as a keyword feature.
  • S300 Determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  • the manner of determining the image data matching the keyword may be as follows:
  • the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
  • the user's own image data when recording voice or image data on a preset associated path can be collected, and the collected image data can be identified and used as matching image data.
  • collect the user's own image data when recording voice or collect image data on a preset associated path identify the collected image data, and then add or subtract elements from the collected image data to generate a composite image as matching image data.
  • a composite image including real elements and virtual elements can be formed, which improves the interest.
  • the user's own image data when recording voice or image data on a preset associated path is collected, and a virtual image is mapped as matching image data based on the aforementioned collected image data.
  • a virtual image containing the user's own emotions or expressions is generated on the basis of protecting the user's privacy, such as a cartoon shape, which improves the fun.
  • the volume information of the voice message may also be acquired, and the size of the matching image data when output is adjusted according to the volume.
  • the correspondence between the volume and the image size can be established in advance.
  • the sound is divided into 5 levels based on the volume, from low to high: bass, mid-bass, mid-range, mid-high and treble.
  • the image sizes corresponding to bass, mid-bass, mid-tone, mid-high and high-pitched sounds increase in sequence.
  • the image size corresponding to the volume level can be obtained based on the correspondence between the volume level and the image size.
  • semantic analysis may also be performed on the voice message, and when the semantic content obtained by the analysis includes more than two matching image data, a plurality of matching image data are obtained to produce a dynamic image for output, Or combine multiple images into composite image output.
  • both "Yangcheng Lake” and “hairy crab” in the semantic content have matching images, then multiple matching images can be made into a dynamic image “hairy crabs crawling on Yangcheng Lake", or a composite image "Multiple hairy crabs are located in Yangcheng Lake”.
  • the extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
  • sound information is set on the output image data, and the sound information can be automatically played when the receiving end user receives the information, or, when the receiving end user triggers the image data—for example, the user clicks on the area where the image data is located—played.
  • the manner of sending together with the aforementioned voice message may be as follows:
  • the voice message and the image data are sent together as two separate messages.
  • the image data is inserted into the keyword position or adjacent positions and then sent together.
  • a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  • the image data may be pictures, videos, animations and/or other multimedia information.
  • the text content of the voice message may be obtained, and the text content and the audio file of the voice message may be integrated into a multimedia message for output display.
  • the text content is displayed in a message box of the multimedia message
  • an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the instant messaging client 100 includes the following structure:
  • the voice message triggering module 110 is used for collecting user's voice triggering operation.
  • the information extraction module 120 is used for extracting the keyword features in the speech according to the speech input by the user.
  • the information processing module 130 is configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
  • the audio collection device When the user enters the instant communication tool and needs to send a voice message, the audio collection device is activated to record the voice. Specifically, the voice recording button can be triggered to activate the audio collection device of the terminal where it is located, and the user's voice information can be collected after the audio pickup is activated.
  • the terminal may be various commonly used mobile terminals such as mobile phones, palmtop computers, and tablet computers, and various smart wearable electronic devices, such as smart glasses, smart watches, and the like.
  • a mobile phone is used as the mobile terminal, and the mobile phone has an audio collection structure, an image collection structure and a display structure.
  • the aforementioned voice message is recognized based on the voice recognition technology, and the keyword features in the voice message are advanced.
  • the keyword feature in the advance voice message refers to obtaining the key content based on the content of the voice recognition.
  • the keyword features may be words expressing emotions, words expressing emotions, words expressing preferences, words expressing intentions, or words expressing plans, and the like.
  • the method of extracting the keyword features in the voice message may be as follows:
  • the first method is to perform semantic analysis on the voice message, and obtain keyword features based on the semantic analysis.
  • Method 2 Perform audio analysis on the voice message to obtain intonation features, speed features and/or volume features, and obtain keyword features in the voice message based on the intonation features, speed features and/or volume features.
  • Manner 3 Perform audio analysis on the voice message to obtain the user's emotional state feature, and use the emotional state feature as a keyword feature of the voice message.
  • Voices can reflect people's emotions to a certain extent. For example, generally speaking, irritable and loud speech often means that the speaker is more angry, while cheerful and soft speech often means that the speaker is more happy. Accordingly, the important content that the user needs to express can be obtained by analyzing the emotional information in the user's voice information.
  • the information processing module 130 may include a message synthesis unit, which is used for recognizing the text content of the voice, and integrating the text content and the audio file of the voice into a multimedia message.
  • a message synthesis unit which is used for recognizing the text content of the voice, and integrating the text content and the audio file of the voice into a multimedia message.
  • the text content is displayed in a message box of the multimedia message
  • an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the information extraction module 120 may include an emotion recognition unit.
  • the emotion recognition unit is used for recognizing emotion information in the voice message.
  • the emotion recognition unit includes a voice volume analysis sub-circuit, a voice pitch analysis sub-circuit, a voice speech rate analysis sub-circuit and/or a voice rhythm analysis sub-circuit.
  • the user enters the instant messaging tool “Quick Message” through the user terminal 200 carried by the user.
  • the user terminal 200 is preferably a mobile phone in this embodiment.
  • the desktop 210 of the user terminal 200 outputs a user interface to the user, on which all communication messages are displayed, and the communication messages display the contacts 220, the latest interactive messages, and a virtual microphone 230 (voice trigger control).
  • the virtual microphone 230 corresponding to leo can be triggered, and then the voice message collection function can be directly started.
  • a voice message input box is displayed in the user interface, and the input box displays the user's voice being entered, the text content corresponding to the voice, and related operation keys.
  • the voice message input box can be displayed directly on the current user interface, or can be displayed after generating a separate voice message interface for the contact leo, as shown in FIG. 6 , the voice message interface displays contact information, voice message input box, as well as virtual microphone and current recording quality information.
  • a user when a user records a voice, he or she can perform sending and pausing operations by operating the virtual microphone 230 .
  • sending and pausing operations by operating the virtual microphone 230 .
  • pressing and sliding the microphone up is a send operation
  • pressing and sliding the microphone to the right is a pause operation.
  • the manner in which the image data is sent together with the aforementioned voice message may be as follows:
  • the voice message and the image data are sent together as two separate messages.
  • the image data is inserted into the keyword position or adjacent positions and then sent together.
  • the inserted image data can be played directly or played after the user triggers the keyword position.
  • a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  • the keywords in the voice are replaced with image data and then sent as an instant communication message.
  • the message sent to the receiving end includes text content, audio files and image data.
  • the image data may be pictures, videos, animations and/or other multimedia information.
  • the text content of the voice message is also obtained, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
  • the text content is displayed in the message box of the multimedia message, and an audio file play button may also be set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the instant messaging client may also be set with other functional modules as required, and the specific functions can be found in the previous embodiments, which will not be repeated here.
  • Another embodiment of the present invention also provides a communication message processing device.
  • the message processing settings include the following structure:
  • an audio acquisition module for acquiring the voice message input by the user
  • an information extraction module for extracting the keyword features in the voice message
  • An information processing module configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  • the message processing device may also be provided with other functional modules as required. For details, refer to the foregoing embodiments, which will not be repeated here.

Abstract

The present invention provides a communication message processing method, a device and an instant messaging client, relating to the technical field of communication interaction. A communication message processing method, comprising the following steps: acquiring a speech message acquired by an audio acquisition device; extracting a keyword feature in the speech message; and determining image data matching the keyword, and sending same together with the speech message, or replacing the keyword in the speech message with the image data, and then sending same. By means of the present invention, relevant image data can be intelligently loaded during the speech interaction process of users, improving the convenience, intelligence and interestingness of message interaction, and improving the user experience.

Description

通信消息处理方法、设备及即时通信客户端Communication message processing method, device and instant messaging client 技术领域technical field
本发明涉及通信交互技术领域。The present invention relates to the technical field of communication interaction.
背景技术Background technique
在此处键入背景技术描述段落。即时通信(Instant Messaging,IM)是移动互联网时代最为流行的通信方式,各种各样的即时通信软件不仅支持文字消息的即时传输,还能够实现用户间的语音消息、视频消息传输。Type a background description paragraph here. Instant Messaging (IM) is the most popular communication method in the mobile Internet era. Various instant messaging software not only supports the instant transmission of text messages, but also enables the transmission of voice messages and video messages between users.
在通过IM工具进行语音消息交互时,用户可以启动终端的麦克风等语音采集设置录入语音消息,然后通过互联网将该语音消息传输给目标接收端用户,接收端用户在输入播放指令后,能够播放该语音消息,接收端用户也可以通过语音对该消息进行回复。When interacting with voice messages through the IM tool, the user can activate the terminal's microphone and other voice collection settings to record the voice message, and then transmit the voice message to the target receiving end user through the Internet. After the receiving end user enters the play instruction, he can play the Voice message, the recipient user can also reply to the message by voice.
目前,为便于用户根据场合选择是否接听语音消息,还增加了语音消息的文字转换功能,并能够将转换的文本内容与录制的音频文件一起作为即时通信消息发送至接收端用户。有些通信工具,还设置了将文本转换成语音功的语音合成能——文本转换语音(Text To Speech,简称TTS)。语音合成解决方案主要有两类,一类是拼接系统,另外一类是参数生成系统。两类系统均需要进行文本分析,前者是利用大量录制的片段语音,结合文本分析结果,将录音片段进行拼接得到合成语音;而后者是利用文本分析的结果,通过模型产生语音的参数,如基频等,进而转化成波形。 At present, in order to facilitate the user to choose whether to answer the voice message according to the occasion, the text conversion function of the voice message is also added, and the converted text content and the recorded audio file can be sent to the receiving end user as an instant communication message. Some communication tools also have a speech synthesis function that converts text into speech—Text To Speech (TTS for short). There are two main types of speech synthesis solutions, one is the splicing system, and the other is the parameter generation system. Both types of systems require text analysis. The former uses a large number of recorded voice fragments, combined with the text analysis results, and splices the recorded fragments to obtain synthetic voice; while the latter uses the results of text analysis to generate voice parameters through the model, such as basic frequency, etc., and then convert it into a waveform.
现有的语音消息功能,只结合了文本转换的特征,并没有考虑用户语音录制时的表情、情感状态、语气语调等进一层次的信息,难以满足用户需求,尤其对于喜欢用动态图像功能进行斗图的年轻人来说,语音消息缺少了趣味性。The existing voice message function only combines the features of text conversion, and does not consider further information such as expressions, emotional states, tone of voice, etc. when the user's voice is recorded, which is difficult to meet the needs of users, especially for users who like to use the dynamic image function for For young people in Doutu, voice messages lack fun.
随着人工智能技术的不断发展以及人们对于交互体验要求的不断提高,智能交互方式已逐渐开始替代一些传统的人机交互方式,并且已成为一个研究热点。目前,基于用户交互内容分析用户情绪,根据用户的情绪状态分析出用户消息所实际想要表达的深层次的情绪需求已成为可能。如何结合上述现有技术向用户提供一种更智能便捷的通信方式是亟待解决的问题。With the continuous development of artificial intelligence technology and the continuous improvement of people's requirements for interactive experience, intelligent interaction methods have gradually begun to replace some traditional human-computer interaction methods, and have become a research hotspot. At present, it is possible to analyze user emotions based on user interaction content, and to analyze the deep-level emotional needs that user messages actually want to express according to the user's emotional state. How to provide users with a more intelligent and convenient communication method in combination with the above-mentioned existing technologies is an urgent problem to be solved.
技术问题technical problem
本发明的目的在于:克服现有技术的不足,提供了一种通信消息处理方法、设备及即时通信客户端。利用本发明,能够在用户语音交互的过程中智能地加载相关的图像数据,提高消息交互的便捷度、智能度和趣味性,提升用户体验。The purpose of the present invention is to overcome the deficiencies of the prior art and provide a communication message processing method, device and instant messaging client. With the present invention, relevant image data can be loaded intelligently in the process of user voice interaction, the convenience, intelligence and interest of message interaction can be improved, and user experience can be improved.
技术解决方案technical solutions
在此处键入技术解决方案描述段落。为实现上述目标,本发明提供了如下技术方案:Type a technical solution description paragraph here. To achieve the above-mentioned goals, the present invention provides the following technical solutions:
一种通信消息处理方法,包括如下步骤:获取音频采集设备采集的语音消息;提取所述语音消息中的关键词特征;确定与所述关键词匹配的图像数据,与前述语音消息一起发送,或者用图像数据替换所述语音消息中的关键词后发送。A communication message processing method, comprising the steps of: acquiring a voice message collected by an audio collection device; extracting a keyword feature in the voice message; determining image data matching the keyword, and sending it together with the aforementioned voice message, or It is sent after replacing the keywords in the voice message with image data.
进一步,采集用户录制语音时的自身图像数据或者采集预设关联路径上的图像数据,识别采集的图像数据后作为匹配的图像数据;或者,对前述采集的图像数据增减元素生成合成图像作为匹配的图像数据;或者,基于前述采集的图像数据映射出虚拟图像作为匹配的图像数据。Further, collect the user's own image data when recording the voice or collect the image data on the preset associated path, and identify the collected image data as the matching image data; or, generate a composite image by adding or subtracting elements to the aforementioned collected image data as a matching image or, map a virtual image as the matched image data based on the aforementioned acquired image data.
进一步,获取语音消息的音量信息,根据所述音量的大小调整匹配图像数据输出时的尺寸。Further, the volume information of the voice message is acquired, and the size when outputting the matching image data is adjusted according to the volume.
进一步,对所述语音消息进行语义分析,当分析获得的语义内容包括两个以上的匹配图像数据时,获取匹配的多个图像数据制作成动态图像输出,或者将多个图像形成合成图像输出。Further, semantic analysis is performed on the voice message, and when the semantic content obtained by the analysis includes two or more matching image data, a plurality of matched image data is obtained to produce a dynamic image output, or a plurality of images are formed into a composite image output.
进一步,还包括步骤:Further, it also includes steps:
分析前述语音消息,Analyzing the aforementioned voice message,
从语音消息中提取与前述图像数据对应的声音片段;extracting sound clips corresponding to the aforementioned image data from the voice message;
将提取的声音片段对应着图像数据进行播放,或者在采集到用户对图像数据的触发操作后播放前述声音片段。The extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
进一步,与前述语音消息一起发送的方式为,Further, the manner of sending together with the aforementioned voice message is:
将语音消息与所述图像数据作为两条独立的消息一起发送;sending the voice message and the image data together as two separate messages;
或者,将所述图像数据插入所述关键词位置或相邻位置后一起发送;Or, insert the image data into the keyword position or adjacent positions and send it together;
或者,对应所述语音消息设置悬浮窗,通过悬浮窗显示所述图像数据。Alternatively, a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
进一步,所述图像数据为图片、视频、动画和/或其它多媒体信息。Further, the image data is pictures, videos, animations and/or other multimedia information.
进一步,获取所述语音消息的文字内容,将所述文字内容与语音消息的音频文件整合成多媒体消息输出显示。Further, the text content of the voice message is acquired, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
优选的,所述多媒体消息的消息框中显示所述文字内容,对应该消息框设置有音频文件播放按钮,触发所述播放按钮能够触发音频文件播放。Preferably, the text content is displayed in a message box of the multimedia message, an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
进一步,提取所述语音消息中的关键词特征的方式为, Further, the method of extracting the keyword features in the voice message is:
对语音消息进行语义分析,基于语义分析获取关键词特征;Perform semantic analysis on voice messages, and obtain keyword features based on semantic analysis;
或者,对语音消息进行音频分析获取语调特征、语速特征和/或音量特征,基于语调特征、语速特征和/或音量特征获取语音消息中的关键词特征;Or, perform audio analysis on the voice message to obtain intonation feature, speech rate feature and/or volume feature, and obtain keyword features in the voice message based on the intonation feature, speech speed feature and/or volume feature;
或者,对语音消息进行音频分析获取用户的情绪状态特征,将所述情绪状态特征作为语音消息的关键词特征。Alternatively, audio analysis is performed on the voice message to obtain the user's emotional state feature, and the emotional state feature is used as a keyword feature of the voice message.
进一步,确定与所述关键词匹配的图像数据的方式为,Further, the method of determining the image data matching the keyword is,
基于所述关键词在本地资源文件中搜索图像数据,获取与关键词匹配的图像数据;Search for image data in the local resource file based on the keyword, and obtain image data matching the keyword;
和/或,基于所述关键词在网络资源文件中搜索图像数据,获取与关键词匹配的图像数据;And/or, searching for image data in a network resource file based on the keyword, to obtain image data matching the keyword;
和/或,基于所述关键词在用户收发的历史图像数据中搜索,获取与关键词匹配的图像数据。And/or, based on the keyword, the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
进一步,所述通信消息为即时通信消息。Further, the communication message is an instant communication message.
本发明还提供了一种通信消息处理设备,包括如下结构:The present invention also provides a communication message processing device, including the following structure:
音频采集模块,用以获取用户输入的语音消息;an audio acquisition module for acquiring the voice message input by the user;
信息提取模块,用以提取所述语音消息中的关键词特征;an information extraction module for extracting the keyword features in the voice message;
信息处理模块,用以确定与所述关键词匹配的图像数据,与前述语音消息一起发送,或者用图像数据替换所述语音消息中的关键词后发送。An information processing module, configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
本发明还提供了一种即时通信客户端,用以进行即时通信交互,包括如下结构:The present invention also provides an instant messaging client for performing instant messaging interaction, including the following structure:
语音消息触发模块,用以采集用户的语音触发操作;The voice message trigger module is used to collect the user's voice trigger operation;
信息提取模块,用以根据用户输入的语音,提取所述语音中的关键词特征;an information extraction module for extracting keyword features in the voice according to the voice input by the user;
信息处理模块,用以确定与所述关键词匹配的图像数据,与前述语音一起发送,或者用图像数据替换所述语音中的关键词后作为即时通信消息发送。An information processing module, configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
有益效果beneficial effect
本发明由于采用以上技术方案,与现有技术相比,作为举例,具有以下的优点和积极效果:利用本发明,能够在用户语音交互的过程中智能地加载相关的图像数据,提高消息交互的便捷度、智能度和趣味性,尤其适用于喜欢进行斗图交互的用户,提升了用户体验。Compared with the prior art, the present invention has the following advantages and positive effects as an example due to the adoption of the above technical solutions: by using the present invention, relevant image data can be loaded intelligently in the process of user voice interaction, and the efficiency of message interaction can be improved. Convenience, intelligence and fun, especially suitable for users who like to interact with bucket diagrams, improving the user experience.
附图说明Description of drawings
图1为本发明实施例提供的通信消息处理方法的流程图。FIG. 1 is a flowchart of a communication message processing method provided by an embodiment of the present invention.
图2为本发明实施例提供的即时通信客户端的模块结构图。FIG. 2 is a module structure diagram of an instant messaging client provided by an embodiment of the present invention.
图3至图7为本发明实施例提供的即时通信交互的操作示例图。FIG. 3 to FIG. 7 are diagrams illustrating operation examples of instant messaging interaction provided by an embodiment of the present invention.
图8至图10为本发明实施例提供的包含图像数据的语音消息接收时的示例图。FIG. 8 to FIG. 10 are exemplary diagrams when a voice message including image data is received according to an embodiment of the present invention.
附图标记说明:Description of reference numbers:
即时通信客户端100,语音消息触发模块110,信息提取模块120,信息处理模块130;用户终端200,桌面210,即时通信工具图标211,联系人220,话筒230;通信交互界面300。Instant messaging client 100 , voice message triggering module 110 , information extraction module 120 , information processing module 130 ; user terminal 200 , desktop 210 , instant messaging tool icon 211 , contact 220 , microphone 230 ;
本发明的实施方式Embodiments of the present invention
在此处键入本发明的实施方式描述段落。以下结合附图和具体实施例对本发明提供的通信消息处理方法、设备及即时通信客户端作进一步详细说明。应当注意的是,下述实施例中描述的技术特征或者技术特征的组合不应当被认为是孤立的,它们可以被相互组合从而达到更好的技术效果。在下述实施例的附图中,各附图所出现的相同标号代表相同的特征或者部件,可应用于不同实施例中。因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。Type the paragraphs describing embodiments of the invention here. The communication message processing method, device and instant messaging client provided by the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the technical features or combinations of technical features described in the following embodiments should not be considered isolated, and they can be combined with each other to achieve better technical effects. In the drawings of the following embodiments, the same reference numerals appearing in the various drawings represent the same features or components, which may be used in different embodiments. Therefore, once an item is defined in one figure, it need not be discussed further in subsequent figures.
需说明的是,本说明书所附图中所绘示的结构、比例、大小等,均仅用以配合说明书所揭示的内容,以供熟悉此技术的人士了解与阅读,并非用以限定发明可实施的限定条件,任何结构的修饰、比例关系的改变或大小的调整,在不影响发明所能产生的功效及所能达成的目的下,均应落在发明所揭示的技术内容所能涵盖的范围内。本发明的优选实施方式的范围包括另外的实现,其中可以不按所述的或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。It should be noted that the structures, proportions, sizes, etc. shown in the accompanying drawings in this specification are only used to cooperate with the contents disclosed in the specification, so as to be understood and read by those who are familiar with the technology, and are not used to limit the invention. The limited conditions for implementation, any structural modification, change in proportional relationship or adjustment of size, shall fall within the scope of the technical content disclosed in the invention without affecting the efficacy and purpose of the invention. within the range. The scope of the preferred embodiments of the present invention includes additional implementations in which the functions may be performed out of the order described or discussed, including performing the functions in a substantially simultaneous manner or in the reverse order depending upon the functions involved, which should be Embodiments of the invention will be understood by those skilled in the art to which the embodiments of the invention pertain.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。Techniques, methods, and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and devices should be considered part of the authorized description. In all examples shown and discussed herein, any specific value should be construed as illustrative only and not as limiting. Accordingly, other examples of exemplary embodiments may have different values.
实施例Example
参见图1所示,公开了一种通信消息处理方法,包括如下步骤:Referring to FIG. 1, a communication message processing method is disclosed, including the following steps:
S100,获取音频采集设备采集的语音消息。S100: Acquire a voice message collected by an audio collection device.
用户需要发送语音消息时,启动音频采集设备录制语音。以即时通信工具(IM工具)微信为例进行说明,此时所述消息为即时通信消息。用户进入微信后,可以触发语音录制按钮来启动所在终端的音频采集设备,拾音器被启动后即可以采集用户的声音信息。When the user needs to send a voice message, start the audio capture device to record the voice. Taking instant messaging tool (IM tool) WeChat as an example for illustration, the message at this time is an instant messaging message. After the user enters WeChat, he can trigger the voice recording button to start the audio collection device of the terminal where he is located. After the pickup is activated, the user's voice information can be collected.
所述终端,作为举例而非限制,可以为手机、掌上电脑、平板电脑等各种常用的移动终端,以及各种智能穿戴式电子设备,比如智能眼镜、智能手表等。在本实施例中,采用手机作为移动终端,所述手机具有音频采集结构、图像采集结构和显示结构。The terminal, by way of example and not limitation, may be various commonly used mobile terminals such as mobile phones, palmtop computers, and tablet computers, and various smart wearable electronic devices, such as smart glasses, smart watches, and the like. In this embodiment, a mobile phone is used as the mobile terminal, and the mobile phone has an audio collection structure, an image collection structure and a display structure.
S200,提取所述语音消息中的关键词特征。S200, extracting keyword features in the voice message.
基于语音识别技术识别前述语音消息,提前所述语音消息中的关键词特征。The aforementioned voice message is recognized based on the voice recognition technology, and the keyword features in the voice message are advanced.
语音识别技术主要是基于对语音的物理属性、生理属性和社会属性三个个基本属性的分析。语音的物理属性,主要包括音高、音长、音强和音色4个要素。音高是指声音的高低,主要决定于发音体振动速度的快慢;音长是指声音的长短,主要决定于发音体振动时间的久暂;音强是指声音的强弱,主要决定于发音体振动幅度的大小;音色是指声音的特色,主要决定于发音物体振动所形成的音波波纹曲折形式不同。语音的生理属性,主要指发音器官对语音的影响,包括肺和气管、候头和声带以及口腔、鼻腔和咽腔等发音气官。语音的社会属性,主要表现在3个方面,一是语音与意义之间并无必然联系,它们的对应关系是社会成员约定俗成的;二是各种语言或方言都有自己的语音系统;三是语音具有区别意义的作用。 Speech recognition technology is mainly based on the analysis of three basic properties of speech: physical properties, physiological properties and social properties. The physical properties of speech mainly include four elements: pitch, length, intensity and timbre. Pitch refers to the height of the sound, which is mainly determined by the speed of the vibration of the sounding body; the length of the sound refers to the length of the sound, which is mainly determined by the duration of the vibration of the sounding body; the intensity of the sound refers to the strength of the sound, which is mainly determined by the pronunciation. The size of the vibration amplitude of the body; the timbre refers to the characteristics of the sound, which is mainly determined by the different tortuous forms of the sound wave ripples formed by the vibration of the sounding object. The physiological properties of speech mainly refer to the influence of vocal organs on speech, including the lungs and trachea, head and vocal cords, as well as the vocal organs such as the oral cavity, nasal cavity and pharynx. The social attributes of phonetics are mainly reflected in three aspects. First, there is no necessary connection between phonetics and meaning, and their corresponding relationship is established by social members; second, various languages or dialects have their own phonetic systems; third, Voice has the function of distinguishing meaning.
通常而言,语音识别的基本过程可以包括:语音信号的预处理、特征提取、模式匹配三个步骤。Generally speaking, the basic process of speech recognition may include three steps: preprocessing of speech signals, feature extraction, and pattern matching.
预处理通常可以包括语音信号采样、反混叠带通滤波、去除个体发音差异和设备、环境引起的噪声影响等,并涉及到语音识别基元的选取和端点检测问题。Preprocessing usually includes speech signal sampling, anti-aliasing bandpass filtering, removal of individual pronunciation differences and noise effects caused by equipment and environment, etc., and involves the selection of speech recognition primitives and endpoint detection.
特征提取,用于提取语音中反映本质特征的声学参数,如平均能量、平均跨零率、共振峰等。提取的特征参数必须满足以下的要求:提取的特征参数能有效地代表语音特征,具有很好的区分性;各阶参数之间有良好的独立性;特征参数要计算方便,最好有高效的算法,以保证语音识别的实时实现。在训练阶段,将特征参数进行一定的处理后,为每个词条建立一个模型,保存为模板库。在识别阶段,语音信号经过相同的通道得到语音特征参数,生成测试模板,与参考模板进行匹配,将匹配分数最高的参考模板作为识别结果。同时,还可以在很多先验知识的帮助下,提高识别的准确率。Feature extraction is used to extract acoustic parameters that reflect essential features in speech, such as average energy, average zero-crossing rate, formants, etc. The extracted feature parameters must meet the following requirements: the extracted feature parameters can effectively represent the speech features and have good discrimination; the parameters of each order have good independence; the feature parameters should be easy to calculate, preferably with high efficiency. Algorithms to ensure real-time implementation of speech recognition. In the training phase, after the feature parameters are processed to a certain extent, a model is established for each entry and saved as a template library. In the recognition stage, the speech signal passes through the same channel to obtain speech feature parameters, generates a test template, matches with the reference template, and takes the reference template with the highest matching score as the recognition result. At the same time, with the help of a lot of prior knowledge, the accuracy of recognition can be improved.
模式匹配,是整个语音识别系统的核心,它是根据一定规则(如某种距离测度)以及专家知识(如构词规则、语法规则、语义规则等),计算输入特征与库存模式之间的相似度(如匹配距离、似然概率),判断出输入语音的语意信息。Pattern matching is the core of the entire speech recognition system. It calculates the similarity between input features and inventory patterns according to certain rules (such as a certain distance measure) and expert knowledge (such as word formation rules, grammar rules, semantic rules, etc.). degree (such as matching distance, likelihood probability) to determine the semantic information of the input speech.
提前语音消息中的关键词特征,是指基于语音识别的内容获取其中的关键内容。所述关键词特征,作为举例而非限制,可以是其中的表达情绪的词、表达心情的词、表达喜好的词、表达意图的词,或者表达计划的词等。The keyword feature in the advance voice message refers to obtaining the key content based on the content of the voice recognition. The keyword features, by way of example and not limitation, may be words expressing emotions, words expressing emotions, words expressing preferences, words expressing intentions, or words expressing plans, and the like.
本实施例中,提取所述语音消息中的关键词特征的方式可以为如下方式:In this embodiment, the method for extracting the keyword features in the voice message may be as follows:
方式一,对语音消息进行语义分析,基于语义分析获取关键词特征。The first method is to perform semantic analysis on the voice message, and obtain keyword features based on the semantic analysis.
方式二,对语音消息进行音频分析获取语调特征、语速特征和/或音量特征,基于语调特征、语速特征和/或音量特征获取语音消息中的关键词特征。Method 2: Perform audio analysis on the voice message to obtain intonation features, speed features and/or volume features, and obtain keyword features in the voice message based on the intonation features, speed features and/or volume features.
声音在表达时会出现语调、语速和音量的变化,比如说到关键信息时用户通常会提高音量,加重语调,并放慢语速。根据上述变化,可以分析用户表达的重点内容作为关键词特征。Voice changes in pitch, speed, and volume when expressing. For example, when it comes to key information, users usually raise the volume, accentuate the intonation, and slow down the speech. According to the above changes, the key content expressed by the user can be analyzed as a keyword feature.
方式三,对语音消息进行音频分析获取用户的情绪状态特征,将所述情绪状态特征作为语音消息的关键词特征。Manner 3: Perform audio analysis on the voice message to obtain the user's emotional state feature, and use the emotional state feature as a keyword feature of the voice message.
声音能在一定程度上反应人的情绪,比如通常而言,急躁而大声的语音往往代表说话者比较愤怒,而欢快而柔和的语音往往代表说话者比较开心。据此,可以通过分析用户语音信息中的情绪信息来获知用户需要表达的重要内容。Voices can reflect people's emotions to a certain extent. For example, generally speaking, irritable and loud speech often means that the speaker is more angry, while cheerful and soft speech often means that the speaker is more happy. Accordingly, the important content that the user needs to express can be obtained by analyzing the emotional information in the user's voice information.
优选的,识别所述语音信息中的情绪信息的方式为如下方式一种或多种:Preferably, the way of identifying the emotional information in the voice information is one or more of the following ways:
方式一,分析语音信息中用户的音量变化,根据音量变化分析情绪状态特征。The first way is to analyze the user's volume change in the voice information, and analyze the emotional state feature according to the volume change.
方式二,分析语音信息中的音调变化,根据音调变化分析情绪状态特征。The second method is to analyze the pitch change in the speech information, and analyze the emotional state feature according to the pitch change.
方式三,分析语音信息中的语速信息,根据语音信息分析情绪状态特征。The third method is to analyze the speech rate information in the speech information, and analyze the emotional state characteristics according to the speech information.
方式四,分析语音信息中的节奏变化,根据节奏变化分析情绪状态特征。The fourth method is to analyze the rhythm changes in the speech information, and analyze the emotional state characteristics according to the rhythm changes.
作为举例而限制,比如采集到的用户的语音消息为“这个商品比我之前买的优惠了很多呀,真是太开心了”,对该语音消息进行识别后,获取的关键词特征可以是“太开心了”。It is limited as an example. For example, the user's voice message collected is "This product is much cheaper than the one I bought before, I'm really happy." After the voice message is recognized, the obtained keyword feature can be "Too much happy".
或者,虽然用户没有明确地表达情绪,但是语音消息中包含有情绪倾向性,则可以基于情景分析将暗含的情绪作为关键词特征。Alternatively, although the user does not express emotions explicitly, but the voice messages contain emotional tendencies, the implied emotions may be used as keyword features based on situational analysis.
作为举例而限制,比如采集到用户的语音消息为:“这个包子比以前小了太多了呀”,上述文字信息中包含的情绪倾向性为“不满意不开心”。于是,基于情绪倾向性“不满意不开心”作为关键词特征。It is limited as an example. For example, the user's voice message collected is: "This bun is much smaller than before", and the emotional tendency contained in the above text message is "dissatisfied and unhappy". Therefore, based on the emotional tendency, "dissatisfied and unhappy" is used as a keyword feature.
S300,确定与所述关键词匹配的图像数据,与前述语音消息一起发送,或者用图像数据替换所述语音消息中的关键词后发送。S300: Determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
具体的,确定与所述关键词匹配的图像数据的方式可以为如下方式:Specifically, the manner of determining the image data matching the keyword may be as follows:
基于所述关键词在本地资源文件中搜索图像数据,获取与关键词匹配的图像数据;Search for image data in the local resource file based on the keyword, and obtain image data matching the keyword;
和/或,基于所述关键词在网络资源文件中搜索图像数据,获取与关键词匹配的图像数据;And/or, searching for image data in a network resource file based on the keyword, to obtain image data matching the keyword;
和/或,基于所述关键词在用户收发的历史图像数据中搜索,获取与关键词匹配的图像数据。And/or, based on the keyword, the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
本实施例的另一实施方式中,可以采集用户录制语音时的自身图像数据或者采集预设关联路径上的图像数据,识别采集的图像数据后作为匹配的图像数据。In another implementation of this embodiment, the user's own image data when recording voice or image data on a preset associated path can be collected, and the collected image data can be identified and used as matching image data.
或者,采集用户录制语音时的自身图像数据或者采集预设关联路径上的图像数据,识别采集的图像数据,然后对前述采集的图像数据增减元素生成合成图像作为匹配的图像数据。以此,可以形成包含现实元素和虚拟元素的合成图像,提高趣味性。Alternatively, collect the user's own image data when recording voice or collect image data on a preset associated path, identify the collected image data, and then add or subtract elements from the collected image data to generate a composite image as matching image data. In this way, a composite image including real elements and virtual elements can be formed, which improves the interest.
或者,采集用户录制语音时的自身图像数据或者采集预设关联路径上的图像数据,基于前述采集的图像数据映射出虚拟图像作为匹配的图像数据。以此,在保护用户隐私的基础上生成包含用户自身情绪或表情的虚拟图像——比如卡通造型,提高了趣味性。Alternatively, the user's own image data when recording voice or image data on a preset associated path is collected, and a virtual image is mapped as matching image data based on the aforementioned collected image data. In this way, a virtual image containing the user's own emotions or expressions is generated on the basis of protecting the user's privacy, such as a cartoon shape, which improves the fun.
本实施例的另一实施方式中,还可以获取语音消息的音量信息,根据所述音量的大小调整匹配图像数据输出时的尺寸。In another implementation of this embodiment, the volume information of the voice message may also be acquired, and the size of the matching image data when output is adjusted according to the volume.
本方式中,可以预先建立音量与图像尺寸之间的对应关系。作为举例而非限制,比如将声音基于音量分为5个等级,从低往高依次为:低音、中低音、中音、中高音和高音。而低音、中低音、中音、中高音和高音对应的图像尺寸依次增大。In this manner, the correspondence between the volume and the image size can be established in advance. As an example and not a limitation, for example, the sound is divided into 5 levels based on the volume, from low to high: bass, mid-bass, mid-range, mid-high and treble. The image sizes corresponding to bass, mid-bass, mid-tone, mid-high and high-pitched sounds increase in sequence.
识别所述语音信息中用户的音量属于前述哪个音量等级后,即可基于音量等级与图像尺寸之间的对应关系,获取该音量等级对应的图像尺寸。After identifying which volume level the user's volume in the voice information belongs to, the image size corresponding to the volume level can be obtained based on the correspondence between the volume level and the image size.
本实施例的另一实施方式中,还可以对所述语音消息进行语义分析,当分析获得的语义内容包括两个以上的匹配图像数据时,获取匹配的多个图像数据制作成动态图像输出,或者将多个图像形成合成图像输出。In another implementation of this embodiment, semantic analysis may also be performed on the voice message, and when the semantic content obtained by the analysis includes more than two matching image data, a plurality of matching image data are obtained to produce a dynamic image for output, Or combine multiple images into composite image output.
作为举例而非限制,比如语义内容中的“阳澄湖”和“大闸蟹”均对应有匹配图像,则可以将多个匹配的图像制作成动态图像“在阳澄湖湖面上爬行的大闸蟹”,或者,合成图像“多个大闸蟹位于阳澄湖中”。As an example but not a limitation, for example, both "Yangcheng Lake" and "hairy crab" in the semantic content have matching images, then multiple matching images can be made into a dynamic image "hairy crabs crawling on Yangcheng Lake", or a composite image "Multiple hairy crabs are located in Yangcheng Lake".
本实施例的另一实施方式中,还包括如下步骤:In another implementation manner of this embodiment, the following steps are also included:
分析前述语音消息,Analyzing the aforementioned voice message,
从语音消息中提取与前述图像数据对应的声音片段;extracting sound clips corresponding to the aforementioned image data from the voice message;
将提取的声音片段对应着图像数据进行播放,或者在采集到用户对图像数据的触发操作后播放前述声音片段。The extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
即对输出的图像数据设置声音信息,该声音信息能够在接收端用户接收信息时自动播放,或者,在接收端用户触发了图像数据——比如用户点击了图像数据所在区域——后播放。That is, sound information is set on the output image data, and the sound information can be automatically played when the receiving end user receives the information, or, when the receiving end user triggers the image data—for example, the user clicks on the area where the image data is located—played.
本实施例中,与前述语音消息一起发送的方式可以为如下方式:In this embodiment, the manner of sending together with the aforementioned voice message may be as follows:
将语音消息与所述图像数据作为两条独立的消息一起发送。或者,将所述图像数据插入所述关键词位置或相邻位置后一起发送。或者,对应所述语音消息设置悬浮窗,通过悬浮窗显示所述图像数据。The voice message and the image data are sent together as two separate messages. Alternatively, the image data is inserted into the keyword position or adjacent positions and then sent together. Alternatively, a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
所述图像数据,可以为图片、视频、动画和/或其它多媒体信息。The image data may be pictures, videos, animations and/or other multimedia information.
本实施例的另一实施方式中,进一步,还可以获取所述语音消息的文字内容,将所述文字内容与语音消息的音频文件整合成多媒体消息输出显示。In another implementation manner of this embodiment, further, the text content of the voice message may be obtained, and the text content and the audio file of the voice message may be integrated into a multimedia message for output display.
优选的,所述多媒体消息的消息框中显示所述文字内容,对应该消息框设置有音频文件播放按钮,触发所述播放按钮能够触发音频文件播放。Preferably, the text content is displayed in a message box of the multimedia message, an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
参见图2所示,本发明还提供了一种即时通信客户端,用以进行即时通信交互。所述即时通信客户端100包括如下结构:Referring to FIG. 2 , the present invention also provides an instant messaging client for performing instant messaging interaction. The instant messaging client 100 includes the following structure:
语音消息触发模块110,用以采集用户的语音触发操作。The voice message triggering module 110 is used for collecting user's voice triggering operation.
信息提取模块120,用以根据用户输入的语音,提取所述语音中的关键词特征。The information extraction module 120 is used for extracting the keyword features in the speech according to the speech input by the user.
信息处理模块130,用以确定与所述关键词匹配的图像数据,与前述语音一起发送,或者用图像数据替换所述语音中的关键词后作为即时通信消息发送。The information processing module 130 is configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
用户进入即时通信工具,需要发送语音消息时,启动音频采集设备录制语音。具体的,可以触发语音录制按钮来启动所在终端的音频采集设备,拾音器被启动后即可以采集用户的声音信息。所述终端,作为举例而非限制,可以为手机、掌上电脑、平板电脑等各种常用的移动终端,以及各种智能穿戴式电子设备,比如智能眼镜、智能手表等。在本实施例中,采用手机作为移动终端,所述手机具有音频采集结构、图像采集结构和显示结构。When the user enters the instant communication tool and needs to send a voice message, the audio collection device is activated to record the voice. Specifically, the voice recording button can be triggered to activate the audio collection device of the terminal where it is located, and the user's voice information can be collected after the audio pickup is activated. The terminal, by way of example and not limitation, may be various commonly used mobile terminals such as mobile phones, palmtop computers, and tablet computers, and various smart wearable electronic devices, such as smart glasses, smart watches, and the like. In this embodiment, a mobile phone is used as the mobile terminal, and the mobile phone has an audio collection structure, an image collection structure and a display structure.
然后,基于语音识别技术识别前述语音消息,提前所述语音消息中的关键词特征。Then, the aforementioned voice message is recognized based on the voice recognition technology, and the keyword features in the voice message are advanced.
提前语音消息中的关键词特征,是指基于语音识别的内容获取其中的关键内容。所述关键词特征,作为举例而非限制,可以是其中的表达情绪的词、表达心情的词、表达喜好的词、表达意图的词,或者表达计划的词等。The keyword feature in the advance voice message refers to obtaining the key content based on the content of the voice recognition. The keyword features, by way of example and not limitation, may be words expressing emotions, words expressing emotions, words expressing preferences, words expressing intentions, or words expressing plans, and the like.
作为举例,提取所述语音消息中的关键词特征的方式可以为如下方式:As an example, the method of extracting the keyword features in the voice message may be as follows:
方式一,对语音消息进行语义分析,基于语义分析获取关键词特征。The first method is to perform semantic analysis on the voice message, and obtain keyword features based on the semantic analysis.
方式二,对语音消息进行音频分析获取语调特征、语速特征和/或音量特征,基于语调特征、语速特征和/或音量特征获取语音消息中的关键词特征。Method 2: Perform audio analysis on the voice message to obtain intonation features, speed features and/or volume features, and obtain keyword features in the voice message based on the intonation features, speed features and/or volume features.
声音在表达时会出现语调、语速和音量的变化,比如说到关键信息时用户通常会提高音量,加重语调,并放慢语速。根据上述变化,可以分析用户表达的重点内容作为关键词特征。Voice changes in pitch, speed, and volume when expressing. For example, when it comes to key information, users usually raise the volume, accentuate the intonation, and slow down the speech. According to the above changes, the key content expressed by the user can be analyzed as a keyword feature.
方式三,对语音消息进行音频分析获取用户的情绪状态特征,将所述情绪状态特征作为语音消息的关键词特征。Manner 3: Perform audio analysis on the voice message to obtain the user's emotional state feature, and use the emotional state feature as a keyword feature of the voice message.
声音能在一定程度上反应人的情绪,比如通常而言,急躁而大声的语音往往代表说话者比较愤怒,而欢快而柔和的语音往往代表说话者比较开心。据此,可以通过分析用户语音信息中的情绪信息来获知用户需要表达的重要内容。Voices can reflect people's emotions to a certain extent. For example, generally speaking, irritable and loud speech often means that the speaker is more angry, while cheerful and soft speech often means that the speaker is more happy. Accordingly, the important content that the user needs to express can be obtained by analyzing the emotional information in the user's voice information.
优选的,所述信息处理模块130,可以包括消息合成单元,其用以识别所述语音的文字内容,并将所述文字内容与语音的音频文件整合成多媒体消息。Preferably, the information processing module 130 may include a message synthesis unit, which is used for recognizing the text content of the voice, and integrating the text content and the audio file of the voice into a multimedia message.
进一步,所述多媒体消息的消息框中显示所述文字内容,对应该消息框设置有音频文件播放按钮,触发所述播放按钮能够触发音频文件播放。Further, the text content is displayed in a message box of the multimedia message, an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
优选的,所述信息提取模块120,可以包括情绪识别单元。所述情绪识别单元,用以识别所述语音消息中的情绪信息。优选的,所述情绪识别单元包括声音音量分析分电路、声音音调分析分电路、声音语速分析分电路和/或声音节奏分析分电路。Preferably, the information extraction module 120 may include an emotion recognition unit. The emotion recognition unit is used for recognizing emotion information in the voice message. Preferably, the emotion recognition unit includes a voice volume analysis sub-circuit, a voice pitch analysis sub-circuit, a voice speech rate analysis sub-circuit and/or a voice rhythm analysis sub-circuit.
结合图3至图7对本实施例的实施方式进行详细描述。The implementation of this embodiment will be described in detail with reference to FIGS. 3 to 7 .
参见图3所示,用户通过携带的用户终端200进入即时通讯工具“快信”。所述用户终端200,在本实施例中优选为手机。Referring to FIG. 3 , the user enters the instant messaging tool “Quick Message” through the user terminal 200 carried by the user. The user terminal 200 is preferably a mobile phone in this embodiment.
参见图4所示,用户终端200的桌面210向用户输出用户界面,用户界面上显示有所有通信消息,通信消息显示了联系人220、最新的交互消息、以及虚拟话筒230(语音触发控件)。4 , the desktop 210 of the user terminal 200 outputs a user interface to the user, on which all communication messages are displayed, and the communication messages display the contacts 220, the latest interactive messages, and a virtual microphone 230 (voice trigger control).
作为举例,参见图4所示,比如用户与联系人leo聊天,可以在触发leo对应的虚拟话筒230,便可直接启动语音消息采集功能。As an example, as shown in FIG. 4 , for example, when a user chats with a contact leo, the virtual microphone 230 corresponding to leo can be triggered, and then the voice message collection function can be directly started.
参见图5所示,用户界面中显示了语音消息输入框,输入框中显示了用户的正在录入的语音,语音对应的文字内容以及相关的操作按键。Referring to FIG. 5 , a voice message input box is displayed in the user interface, and the input box displays the user's voice being entered, the text content corresponding to the voice, and related operation keys.
所述语音消息输入框可以直接在当前用户界面显示,也可以针对联系人leo生成单独的语音消息界面后进行显示,参见图6所示,所述语音消息界面显示有联系人信息、语音消息输入框,以及虚拟话筒和当前录音品质信息。The voice message input box can be displayed directly on the current user interface, or can be displayed after generating a separate voice message interface for the contact leo, as shown in FIG. 6 , the voice message interface displays contact information, voice message input box, as well as virtual microphone and current recording quality information.
参见图7所示,用户录制语音时,可以通过操作虚拟话筒230来进行发送、暂停操作。作为优选方式的举例,比如按住话筒向上滑动即为发送操作,按住话筒向右方滑动即为暂停操作。Referring to FIG. 7 , when a user records a voice, he or she can perform sending and pausing operations by operating the virtual microphone 230 . As an example of a preferred manner, for example, pressing and sliding the microphone up is a send operation, and pressing and sliding the microphone to the right is a pause operation.
本实施例中,图像数据与前述语音消息一起发送的方式可以为如下方式:In this embodiment, the manner in which the image data is sent together with the aforementioned voice message may be as follows:
参见图8所示,将语音消息与所述图像数据作为两条独立的消息一起发送。Referring to Figure 8, the voice message and the image data are sent together as two separate messages.
或者,参见图9所示,将所述图像数据插入所述关键词位置或相邻位置后一起发送。插入后的图像数据可以直接播放或者在用户触发了关键词位置后进行播放。Alternatively, as shown in FIG. 9 , the image data is inserted into the keyword position or adjacent positions and then sent together. The inserted image data can be played directly or played after the user triggers the keyword position.
或者,对应所述语音消息设置悬浮窗,通过悬浮窗显示所述图像数据。Alternatively, a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
或者,参见图10所述,用图像数据替换所述语音中的关键词后作为即时通信消息发送。此时,发送至接收端的消息包括了文字内容、音频文件和图像数据。Alternatively, as described in FIG. 10 , the keywords in the voice are replaced with image data and then sent as an instant communication message. At this time, the message sent to the receiving end includes text content, audio files and image data.
所述图像数据,可以为图片、视频、动画和/或其它多媒体信息。The image data may be pictures, videos, animations and/or other multimedia information.
本实施例中,参见图8至图10,还获取了所述语音消息的文字内容,将所述文字内容与语音消息的音频文件整合成多媒体消息输出显示。In this embodiment, referring to FIG. 8 to FIG. 10 , the text content of the voice message is also obtained, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
所述多媒体消息的消息框中显示所述文字内容,对应该消息框还可以设置有音频文件播放按钮,触发所述播放按钮能够触发音频文件播放。The text content is displayed in the message box of the multimedia message, and an audio file play button may also be set corresponding to the message box, and triggering the play button can trigger the audio file to play.
所述即时通信客户端还可以根据需要设置其它功能模块,具体功能可参见在前实施例,在此不再赘述。The instant messaging client may also be set with other functional modules as required, and the specific functions can be found in the previous embodiments, which will not be repeated here.
本发明的另一实施例,还提供了一种通信消息处理设备。Another embodiment of the present invention also provides a communication message processing device.
所述消息处理设置包括如下结构:The message processing settings include the following structure:
音频采集模块,用以获取用户输入的语音消息;an audio acquisition module for acquiring the voice message input by the user;
信息提取模块,用以提取所述语音消息中的关键词特征;an information extraction module for extracting the keyword features in the voice message;
信息处理模块,用以确定与所述关键词匹配的图像数据,与前述语音消息一起发送,或者用图像数据替换所述语音消息中的关键词后发送。An information processing module, configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
所述消息处理设备还可以根据需要设置其它功能模块,具体参见前述实施例,在此不再赘述。The message processing device may also be provided with other functional modules as required. For details, refer to the foregoing embodiments, which will not be repeated here.
在上面的描述中,虽然本公开内容的各方面的所有组件可以被解释为被装配或被操作地连接为一个电路,但是本公开内容并不旨在将其自身限于这些方面。而是,在本公开内容的目标保护范围内,各组件可以以任意数目选择性地且操作性地进行合并。这些组件中的每个组件自身还可以实现成硬件,同时各个组件可以部分地合并或选择性地总体合并且实现成具有用于执行硬件等同体的功能的程序模块的计算机程序。用以构建这种程序的代码或代码段可以由本领域技术人员容易地导出。这种计算机程序可以储存在计算机可读介质中,其可以被运行以实现本公开内容的各方面。计算机可读介质可以包括磁记录介质、光学记录介质以及载波介质。 In the above description, although all components of various aspects of the present disclosure may be explained as being assembled or operatively connected as a circuit, the present disclosure is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of this disclosure. Each of these components may also itself be implemented in hardware, while the individual components may be combined in part or selectively collectively and implemented as a computer program having program modules for performing the functions of the hardware equivalent. Code or code segments to construct such programs can be readily derived by those skilled in the art. Such a computer program can be stored in a computer-readable medium, which can be executed to implement various aspects of the present disclosure. The computer-readable media may include magnetic recording media, optical recording media, and carrier wave media.
另外,像“包括”、“囊括”以及“具有”的术语应当默认被解释为包括性的或开放性的,而不是排他性的或封闭性,除非其被明确限定为相反的含义。所有技术、科技或其他方面的术语都符合本领域技术人员所理解的含义,除非其被限定为相反的含义。在词典里找到的公共术语应当在相关技术文档的背景下不被太理想化或太不实际地解释,除非本公开内容明确将其限定成那样。 Additionally, terms like "includes," "includes," and "has" should by default be construed as inclusive or open, rather than exclusive or closed, unless explicitly defined to the contrary. All technical, scientific or other terms have the meaning as understood by those skilled in the art unless they are defined to the contrary. Common terms found in dictionaries should not be interpreted too ideally or too practically in the context of related technical documents, unless this disclosure explicitly defines them as such.
虽然已出于说明的目的描述了本公开内容的示例方面,但是本领域技术人员应当意识到,上述描述仅是对本发明较佳实施例的描述,并非对本发明范围的任何限定,本发明的优选实施方式的范围包括另外的实现,其中可以不按所述出或讨论的顺序来执行功能。本发明领域的普通技术人员根据上述揭示内容做的任何变更、修饰,均属于权利要求书的保护范围。While exemplary aspects of the present disclosure have been described for illustrative purposes, those skilled in the art will appreciate that the foregoing description is merely a description of preferred embodiments of the present invention and is not intended to limit the The scope of the embodiments includes additional implementations in which the functions may be performed out of the order described or discussed. Any changes and modifications made by those of ordinary skill in the field of the present invention according to the above disclosure fall within the protection scope of the claims.

Claims (13)

  1. 一种通信消息处理方法,其特征在于包括如下步骤:A communication message processing method, characterized in that it comprises the following steps:
    获取音频采集设备采集的语音消息;Obtain the voice message collected by the audio collection device;
    提取所述语音消息中的关键词特征;extracting keyword features in the voice message;
    确定与所述关键词匹配的图像数据,与前述语音消息一起发送,或者用图像数据替换所述语音消息中的关键词后发送。Image data matching the keyword is determined, and sent together with the aforementioned voice message, or sent after replacing the keyword in the voice message with image data.
  2. 根据权利要求1所述的方法,其特征在于:采集用户录制语音时的自身图像数据或者采集预设关联路径上的图像数据,识别采集的图像数据后作为匹配的图像数据;或者,对前述采集的图像数据增减元素生成合成图像作为匹配的图像数据;或者,基于前述采集的图像数据映射出虚拟图像作为匹配的图像数据。The method according to claim 1, wherein: collecting self-image data of the user when recording voice or collecting image data on a preset associated path, and identifying the collected image data as matching image data; A composite image is generated as the matched image data by adding or subtracting elements of the image data; or, a virtual image is mapped as the matched image data based on the previously collected image data.
  3. 根据权利要求1所述的方法,其特征在于:获取语音消息的音量信息,根据所述音量的大小调整匹配图像数据输出时的尺寸。The method according to claim 1, wherein the volume information of the voice message is obtained, and the size of the output image data is adjusted according to the volume.
  4. 根据权利要求1所述的方法,其特征在于:对所述语音消息进行语义分析,当分析获得的语义内容包括两个以上的匹配图像数据时,获取匹配的多个图像数据制作成动态图像输出,或者将多个图像形成合成图像输出。The method according to claim 1, wherein: semantic analysis is performed on the voice message, and when the semantic content obtained by the analysis includes more than two matching image data, a plurality of matching image data are obtained and made into a dynamic image output , or combine multiple images into a composite image output.
  5. 根据权利要求1所述的方法,其特征在于还包括步骤:The method of claim 1, further comprising the steps of:
    分析前述语音消息,Analyzing the aforementioned voice message,
    从语音消息中提取与前述图像数据对应的声音片段;extracting sound clips corresponding to the aforementioned image data from the voice message;
    将提取的声音片段对应着图像数据进行播放,或者在采集到用户对图像数据的触发操作后播放前述声音片段。The extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
  6. 根据权利要求1所述的方法,其特征在于:与前述语音消息一起发送的方式为,The method according to claim 1, wherein the method of sending together with the aforementioned voice message is:
    将语音消息与所述图像数据作为两条独立的消息一起发送;sending the voice message and the image data together as two separate messages;
    或者,将所述图像数据插入所述关键词位置或相邻位置后一起发送;Or, insert the image data into the keyword position or adjacent positions and send it together;
    或者,对应所述语音消息设置悬浮窗,通过悬浮窗显示所述图像数据。Alternatively, a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  7. 根据权利要求1所述的方法,其特征在于:所述图像数据为图片、视频、动画和/或其它多媒体图像信息, 所述通信消息为即时通信消息。The method according to claim 1, wherein the image data is a picture, a video, an animation and/or other multimedia image information, and the communication message is an instant communication message.
  8. 根据权利要求1所述的方法,其特征在于:获取所述语音消息的文字内容,将所述文字内容与语音消息的音频文件整合成多媒体消息输出显示。The method according to claim 1, wherein the text content of the voice message is acquired, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
  9. 根据权利要求8所述的方法,其特征在于:所述多媒体消息的消息框中显示所述文字内容,对应该消息框设置有音频文件播放按钮,触发所述播放按钮能够触发音频文件播放。The method according to claim 8, wherein the text content is displayed in a message box of the multimedia message, an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  10. 根据权利要求1所述的方法,其特征在于:提取所述语音消息中的关键词特征的方式为, The method according to claim 1, wherein the method of extracting the keyword features in the voice message is:
    对语音消息进行语义分析,基于语义分析获取关键词特征;Perform semantic analysis on voice messages, and obtain keyword features based on semantic analysis;
    或者,对语音消息进行音频分析获取语调特征、语速特征和/或音量特征,基于语调特征、语速特征和/或音量特征获取语音消息中的关键词特征;Or, perform audio analysis on the voice message to obtain intonation feature, speech rate feature and/or volume feature, and obtain keyword features in the voice message based on the intonation feature, speech speed feature and/or volume feature;
    或者,对语音消息进行音频分析获取用户的情绪状态特征,将所述情绪状态特征作为语音消息的关键词特征。Alternatively, audio analysis is performed on the voice message to obtain the user's emotional state feature, and the emotional state feature is used as a keyword feature of the voice message.
  11. 根据权利要求1所述的方法,其特征在于:确定与所述关键词匹配的图像数据的方式为,The method according to claim 1, wherein the method of determining the image data matching the keyword is:
    基于所述关键词在本地资源文件中搜索图像数据,获取与关键词匹配的图像数据;Search for image data in the local resource file based on the keyword, and obtain image data matching the keyword;
    和/或,基于所述关键词在网络资源文件中搜索图像数据,获取与关键词匹配的图像数据;And/or, searching for image data in a network resource file based on the keyword, to obtain image data matching the keyword;
    和/或,基于所述关键词在用户收发的历史图像数据中搜索,获取与关键词匹配的图像数据。And/or, based on the keyword, the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
  12. 一种通信消息处理设备,其特征在于包括:A communication message processing device, characterized in that it includes:
    音频采集模块,用以获取用户输入的语音消息;an audio acquisition module for acquiring the voice message input by the user;
    信息提取模块,用以提取所述语音消息中的关键词特征;an information extraction module for extracting the keyword features in the voice message;
    信息处理模块,用以确定与所述关键词匹配的图像数据,与前述语音消息一起发送,或者用图像数据替换所述语音消息中的关键词后发送。An information processing module, configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  13. 一种即时通信客户端,用以进行即时通信交互,其特征在于包括:An instant messaging client for instant messaging interaction, characterized by comprising:
    语音消息触发模块,用以采集用户的语音触发操作;The voice message trigger module is used to collect the user's voice trigger operation;
    信息提取模块,用以根据用户输入的语音,提取所述语音中的关键词特征;an information extraction module for extracting keyword features in the voice according to the voice input by the user;
    信息处理模块,用以确定与所述关键词匹配的图像数据,与前述语音一起发送,或者用图像数据替换所述语音中的关键词后作为即时通信消息发送。An information processing module, configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
PCT/CN2020/112407 2020-08-29 2020-08-31 Communication message processing method, device, and instant messaging client WO2022041177A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010891954.1A CN112235183B (en) 2020-08-29 2020-08-29 Communication message processing method and device and instant communication client
CN202010891954.1 2020-08-29

Publications (1)

Publication Number Publication Date
WO2022041177A1 true WO2022041177A1 (en) 2022-03-03

Family

ID=74116406

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112407 WO2022041177A1 (en) 2020-08-29 2020-08-31 Communication message processing method, device, and instant messaging client

Country Status (2)

Country Link
CN (1) CN112235183B (en)
WO (1) WO2022041177A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407265B (en) * 2021-05-07 2023-04-07 上海纽盾科技股份有限公司 AR-based data acquisition method, device and system in equal insurance evaluation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120185240A1 (en) * 2011-01-17 2012-07-19 Goller Michael D System and method for generating and sending a simplified message using speech recognition
US20130210419A1 (en) * 2012-02-10 2013-08-15 Private Group Networks, Inc. System and Method for Associating Media Files with Messages
CN106531149A (en) * 2016-12-07 2017-03-22 腾讯科技(深圳)有限公司 Information processing method and device
CN106570106A (en) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 Method and device for converting voice information into expression in input process
CN106888158A (en) * 2017-02-28 2017-06-23 努比亚技术有限公司 A kind of instant communicating method and device
CN107767038A (en) * 2017-10-01 2018-03-06 上海量科电子科技有限公司 voice-based payment evaluation method, client and system
CN111368609A (en) * 2018-12-26 2020-07-03 深圳Tcl新技术有限公司 Voice interaction method based on emotion engine technology, intelligent terminal and storage medium

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693535B2 (en) * 2006-12-22 2010-04-06 Sony Ericsson Mobile Communications Ab Communication systems and methods for providing a group play list for multimedia content records
US20110276327A1 (en) * 2010-05-06 2011-11-10 Sony Ericsson Mobile Communications Ab Voice-to-expressive text
KR101226560B1 (en) * 2011-03-29 2013-01-25 (주)티아이스퀘어 System and method for providing multidedia content sharing service during communication service
CN102780651A (en) * 2012-07-21 2012-11-14 上海量明科技发展有限公司 Method for inserting emotion data in instant messaging messages, client and system
CN102780649A (en) * 2012-07-21 2012-11-14 上海量明科技发展有限公司 Method, client and system for filling instant image in instant communication message
CN102981712B (en) * 2012-11-25 2016-07-13 上海量明科技发展有限公司 The control method of interaction frame and client in instant messaging interactive interface
CN103001858B (en) * 2012-12-14 2015-09-09 上海量明科技发展有限公司 The method of message, client and system is replied in instant messaging
CN105824799B (en) * 2016-03-14 2019-05-17 厦门黑镜科技有限公司 A kind of information processing method, equipment and terminal device
CN106161215A (en) * 2016-08-31 2016-11-23 维沃移动通信有限公司 A kind of method for sending information and mobile terminal
CN110085220A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Intelligent interaction device
CN109697290B (en) * 2018-12-29 2023-07-25 咪咕数字传媒有限公司 Information processing method, equipment and computer storage medium
US10628133B1 (en) * 2019-05-09 2020-04-21 Rulai, Inc. Console and method for developing a virtual agent
CN110311858B (en) * 2019-07-23 2022-06-07 上海盛付通电子支付服务有限公司 Method and equipment for sending session message
CN110417641B (en) * 2019-07-23 2022-05-17 上海盛付通电子支付服务有限公司 Method and equipment for sending session message
CN110781329A (en) * 2019-10-25 2020-02-11 深圳追一科技有限公司 Image searching method and device, terminal equipment and storage medium
CN111106995B (en) * 2019-12-26 2022-06-24 腾讯科技(深圳)有限公司 Message display method, device, terminal and computer readable storage medium
CN111145777A (en) * 2019-12-31 2020-05-12 苏州思必驰信息科技有限公司 Virtual image display method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120185240A1 (en) * 2011-01-17 2012-07-19 Goller Michael D System and method for generating and sending a simplified message using speech recognition
US20130210419A1 (en) * 2012-02-10 2013-08-15 Private Group Networks, Inc. System and Method for Associating Media Files with Messages
CN106570106A (en) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 Method and device for converting voice information into expression in input process
CN106531149A (en) * 2016-12-07 2017-03-22 腾讯科技(深圳)有限公司 Information processing method and device
CN106888158A (en) * 2017-02-28 2017-06-23 努比亚技术有限公司 A kind of instant communicating method and device
CN107767038A (en) * 2017-10-01 2018-03-06 上海量科电子科技有限公司 voice-based payment evaluation method, client and system
CN111368609A (en) * 2018-12-26 2020-07-03 深圳Tcl新技术有限公司 Voice interaction method based on emotion engine technology, intelligent terminal and storage medium

Also Published As

Publication number Publication date
CN112235183A (en) 2021-01-15
CN112235183B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN110148427B (en) Audio processing method, device, system, storage medium, terminal and server
US10977299B2 (en) Systems and methods for consolidating recorded content
US11475897B2 (en) Method and apparatus for response using voice matching user category
CN111489424A (en) Virtual character expression generation method, control method, device and terminal equipment
CN110517689A (en) A kind of voice data processing method, device and storage medium
CN108242238B (en) Audio file generation method and device and terminal equipment
WO2005069171A1 (en) Document correlation device and document correlation method
CN112102850B (en) Emotion recognition processing method and device, medium and electronic equipment
WO2022170848A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
TW201214413A (en) Modification of speech quality in conversations over voice channels
CN111145777A (en) Virtual image display method and device, electronic equipment and storage medium
CN110097890A (en) A kind of method of speech processing, device and the device for speech processes
WO2022242706A1 (en) Multimodal based reactive response generation
CN109102800A (en) A kind of method and apparatus that the determining lyrics show data
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
WO2022041192A1 (en) Voice message processing method and device, and instant messaging client
CN114125506B (en) Voice auditing method and device
CN110910898B (en) Voice information processing method and device
WO2022041177A1 (en) Communication message processing method, device, and instant messaging client
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
Gao Audio deepfake detection based on differences in human and machine generated speech
Waghmare et al. A Comparative Study of the Various Emotional Speech Databases
WO2023236054A1 (en) Audio generation method and apparatus, and storage medium
CN113066513B (en) Voice data processing method and device, electronic equipment and storage medium
Khota et al. Modelling Emotional Valence and Arousal of Non-Linguistic Utterances for Sound Design Support

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20950846

Country of ref document: EP

Kind code of ref document: A1