WO2022041177A1 - Procédé de traitement de message de communication, dispositif et client de messagerie instantanée - Google Patents

Procédé de traitement de message de communication, dispositif et client de messagerie instantanée Download PDF

Info

Publication number
WO2022041177A1
WO2022041177A1 PCT/CN2020/112407 CN2020112407W WO2022041177A1 WO 2022041177 A1 WO2022041177 A1 WO 2022041177A1 CN 2020112407 W CN2020112407 W CN 2020112407W WO 2022041177 A1 WO2022041177 A1 WO 2022041177A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
keyword
voice message
voice
message
Prior art date
Application number
PCT/CN2020/112407
Other languages
English (en)
Chinese (zh)
Inventor
马宇尘
Original Assignee
深圳市永兴元科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市永兴元科技股份有限公司 filed Critical 深圳市永兴元科技股份有限公司
Publication of WO2022041177A1 publication Critical patent/WO2022041177A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information

Definitions

  • the present invention relates to the technical field of communication interaction.
  • IM Instant Messaging
  • Various instant messaging software not only supports the instant transmission of text messages, but also enables the transmission of voice messages and video messages between users.
  • the user When interacting with voice messages through the IM tool, the user can activate the terminal's microphone and other voice collection settings to record the voice message, and then transmit the voice message to the target receiving end user through the Internet. After the receiving end user enters the play instruction, he can play the Voice message, the recipient user can also reply to the message by voice.
  • the text conversion function of the voice message is also added, and the converted text content and the recorded audio file can be sent to the receiving end user as an instant communication message.
  • Some communication tools also have a speech synthesis function that converts text into speech—Text To Speech (TTS for short).
  • TTS Text To Speech
  • the former uses a large number of recorded voice fragments, combined with the text analysis results, and splices the recorded fragments to obtain synthetic voice; while the latter uses the results of text analysis to generate voice parameters through the model, such as basic frequency, etc., and then convert it into a waveform.
  • the existing voice message function only combines the features of text conversion, and does not consider further information such as expressions, emotional states, tone of voice, etc. when the user's voice is recorded, which is difficult to meet the needs of users, especially for users who like to use the dynamic image function for For young people in Doutu, voice messages lack fun.
  • the purpose of the present invention is to overcome the deficiencies of the prior art and provide a communication message processing method, device and instant messaging client.
  • relevant image data can be loaded intelligently in the process of user voice interaction, the convenience, intelligence and interest of message interaction can be improved, and user experience can be improved.
  • a communication message processing method comprising the steps of: acquiring a voice message collected by an audio collection device; extracting a keyword feature in the voice message; determining image data matching the keyword, and sending it together with the aforementioned voice message, or It is sent after replacing the keywords in the voice message with image data.
  • volume information of the voice message is acquired, and the size when outputting the matching image data is adjusted according to the volume.
  • semantic analysis is performed on the voice message, and when the semantic content obtained by the analysis includes two or more matching image data, a plurality of matched image data is obtained to produce a dynamic image output, or a plurality of images are formed into a composite image output.
  • the extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
  • a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  • image data is pictures, videos, animations and/or other multimedia information.
  • the text content of the voice message is acquired, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
  • the text content is displayed in a message box of the multimedia message
  • an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the method of extracting the keyword features in the voice message is:
  • audio analysis is performed on the voice message to obtain the user's emotional state feature, and the emotional state feature is used as a keyword feature of the voice message.
  • the method of determining the image data matching the keyword is,
  • the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
  • the communication message is an instant communication message.
  • the present invention also provides a communication message processing device, including the following structure:
  • an audio acquisition module for acquiring the voice message input by the user
  • an information extraction module for extracting the keyword features in the voice message
  • An information processing module configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  • the present invention also provides an instant messaging client for performing instant messaging interaction, including the following structure:
  • the voice message trigger module is used to collect the user's voice trigger operation
  • an information extraction module for extracting keyword features in the voice according to the voice input by the user
  • An information processing module configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
  • the present invention has the following advantages and positive effects as an example due to the adoption of the above technical solutions: by using the present invention, relevant image data can be loaded intelligently in the process of user voice interaction, and the efficiency of message interaction can be improved. Convenience, intelligence and fun, especially suitable for users who like to interact with bucket diagrams, improving the user experience.
  • FIG. 1 is a flowchart of a communication message processing method provided by an embodiment of the present invention.
  • FIG. 2 is a module structure diagram of an instant messaging client provided by an embodiment of the present invention.
  • FIG. 3 to FIG. 7 are diagrams illustrating operation examples of instant messaging interaction provided by an embodiment of the present invention.
  • FIG. 8 to FIG. 10 are exemplary diagrams when a voice message including image data is received according to an embodiment of the present invention.
  • Instant messaging client 100 voice message triggering module 110 , information extraction module 120 , information processing module 130 ; user terminal 200 , desktop 210 , instant messaging tool icon 211 , contact 220 , microphone 230 ;
  • a communication message processing method including the following steps:
  • S100 Acquire a voice message collected by an audio collection device.
  • the message at this time is an instant messaging message.
  • IM tool instant messaging tool
  • WeChat the message at this time is an instant messaging message.
  • the voice recording button After the user enters WeChat, he can trigger the voice recording button to start the audio collection device of the terminal where he is located. After the pickup is activated, the user's voice information can be collected.
  • the terminal may be various commonly used mobile terminals such as mobile phones, palmtop computers, and tablet computers, and various smart wearable electronic devices, such as smart glasses, smart watches, and the like.
  • a mobile phone is used as the mobile terminal, and the mobile phone has an audio collection structure, an image collection structure and a display structure.
  • the aforementioned voice message is recognized based on the voice recognition technology, and the keyword features in the voice message are advanced.
  • Speech recognition technology is mainly based on the analysis of three basic properties of speech: physical properties, physiological properties and social properties.
  • the physical properties of speech mainly include four elements: pitch, length, intensity and timbre.
  • Pitch refers to the height of the sound, which is mainly determined by the speed of the vibration of the sounding body;
  • the length of the sound refers to the length of the sound, which is mainly determined by the duration of the vibration of the sounding body;
  • the intensity of the sound refers to the strength of the sound, which is mainly determined by the pronunciation.
  • the timbre refers to the characteristics of the sound, which is mainly determined by the different tortuous forms of the sound wave ripples formed by the vibration of the sounding object.
  • the physiological properties of speech mainly refer to the influence of vocal organs on speech, including the lungs and trachea, head and vocal cords, as well as the vocal organs such as the oral cavity, nasal cavity and pharynx.
  • the social attributes of phonetics are mainly reflected in three aspects. First, there is no necessary connection between phonetics and meaning, and their corresponding relationship is established by social members; second, various languages or dialects have their own phonetic systems; third, Voice has the function of distinguishing meaning.
  • the basic process of speech recognition may include three steps: preprocessing of speech signals, feature extraction, and pattern matching.
  • Preprocessing usually includes speech signal sampling, anti-aliasing bandpass filtering, removal of individual pronunciation differences and noise effects caused by equipment and environment, etc., and involves the selection of speech recognition primitives and endpoint detection.
  • Feature extraction is used to extract acoustic parameters that reflect essential features in speech, such as average energy, average zero-crossing rate, formants, etc.
  • the extracted feature parameters must meet the following requirements: the extracted feature parameters can effectively represent the speech features and have good discrimination; the parameters of each order have good independence; the feature parameters should be easy to calculate, preferably with high efficiency. Algorithms to ensure real-time implementation of speech recognition.
  • a model is established for each entry and saved as a template library.
  • the speech signal passes through the same channel to obtain speech feature parameters, generates a test template, matches with the reference template, and takes the reference template with the highest matching score as the recognition result.
  • the accuracy of recognition can be improved.
  • Pattern matching is the core of the entire speech recognition system. It calculates the similarity between input features and inventory patterns according to certain rules (such as a certain distance measure) and expert knowledge (such as word formation rules, grammar rules, semantic rules, etc.). degree (such as matching distance, likelihood probability) to determine the semantic information of the input speech.
  • rules such as a certain distance measure
  • expert knowledge such as word formation rules, grammar rules, semantic rules, etc.
  • degree such as matching distance, likelihood probability
  • the keyword feature in the advance voice message refers to obtaining the key content based on the content of the voice recognition.
  • the keyword features may be words expressing emotions, words expressing emotions, words expressing preferences, words expressing intentions, or words expressing plans, and the like.
  • the method for extracting the keyword features in the voice message may be as follows:
  • the first method is to perform semantic analysis on the voice message, and obtain keyword features based on the semantic analysis.
  • Method 2 Perform audio analysis on the voice message to obtain intonation features, speed features and/or volume features, and obtain keyword features in the voice message based on the intonation features, speed features and/or volume features.
  • Manner 3 Perform audio analysis on the voice message to obtain the user's emotional state feature, and use the emotional state feature as a keyword feature of the voice message.
  • Voices can reflect people's emotions to a certain extent. For example, generally speaking, irritable and loud speech often means that the speaker is more angry, while cheerful and soft speech often means that the speaker is more happy. Accordingly, the important content that the user needs to express can be obtained by analyzing the emotional information in the user's voice information.
  • the way of identifying the emotional information in the voice information is one or more of the following ways:
  • the first way is to analyze the user's volume change in the voice information, and analyze the emotional state feature according to the volume change.
  • the second method is to analyze the pitch change in the speech information, and analyze the emotional state feature according to the pitch change.
  • the third method is to analyze the speech rate information in the speech information, and analyze the emotional state characteristics according to the speech information.
  • the fourth method is to analyze the rhythm changes in the speech information, and analyze the emotional state characteristics according to the rhythm changes.
  • the user's voice message collected is "This product is much cheaper than the one I bought before, I'm really happy.”
  • the obtained keyword feature can be "Too much happy”.
  • the user does not express emotions explicitly, but the voice messages contain emotional tendencies, the implied emotions may be used as keyword features based on situational analysis.
  • the user's voice message collected is: "This bun is much smaller than before”, and the emotional tendency contained in the above text message is "dissatisfied and unhappy". Therefore, based on the emotional tendency, "dissatisfied and unhappy" is used as a keyword feature.
  • S300 Determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  • the manner of determining the image data matching the keyword may be as follows:
  • the historical image data sent and received by the user is searched, and image data matching the keyword is acquired.
  • the user's own image data when recording voice or image data on a preset associated path can be collected, and the collected image data can be identified and used as matching image data.
  • collect the user's own image data when recording voice or collect image data on a preset associated path identify the collected image data, and then add or subtract elements from the collected image data to generate a composite image as matching image data.
  • a composite image including real elements and virtual elements can be formed, which improves the interest.
  • the user's own image data when recording voice or image data on a preset associated path is collected, and a virtual image is mapped as matching image data based on the aforementioned collected image data.
  • a virtual image containing the user's own emotions or expressions is generated on the basis of protecting the user's privacy, such as a cartoon shape, which improves the fun.
  • the volume information of the voice message may also be acquired, and the size of the matching image data when output is adjusted according to the volume.
  • the correspondence between the volume and the image size can be established in advance.
  • the sound is divided into 5 levels based on the volume, from low to high: bass, mid-bass, mid-range, mid-high and treble.
  • the image sizes corresponding to bass, mid-bass, mid-tone, mid-high and high-pitched sounds increase in sequence.
  • the image size corresponding to the volume level can be obtained based on the correspondence between the volume level and the image size.
  • semantic analysis may also be performed on the voice message, and when the semantic content obtained by the analysis includes more than two matching image data, a plurality of matching image data are obtained to produce a dynamic image for output, Or combine multiple images into composite image output.
  • both "Yangcheng Lake” and “hairy crab” in the semantic content have matching images, then multiple matching images can be made into a dynamic image “hairy crabs crawling on Yangcheng Lake", or a composite image "Multiple hairy crabs are located in Yangcheng Lake”.
  • the extracted sound clips are played corresponding to the image data, or the aforementioned sound clips are played after a triggering operation of the image data by the user is collected.
  • sound information is set on the output image data, and the sound information can be automatically played when the receiving end user receives the information, or, when the receiving end user triggers the image data—for example, the user clicks on the area where the image data is located—played.
  • the manner of sending together with the aforementioned voice message may be as follows:
  • the voice message and the image data are sent together as two separate messages.
  • the image data is inserted into the keyword position or adjacent positions and then sent together.
  • a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  • the image data may be pictures, videos, animations and/or other multimedia information.
  • the text content of the voice message may be obtained, and the text content and the audio file of the voice message may be integrated into a multimedia message for output display.
  • the text content is displayed in a message box of the multimedia message
  • an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the instant messaging client 100 includes the following structure:
  • the voice message triggering module 110 is used for collecting user's voice triggering operation.
  • the information extraction module 120 is used for extracting the keyword features in the speech according to the speech input by the user.
  • the information processing module 130 is configured to determine the image data matching the keyword, and send it together with the aforementioned voice, or replace the keyword in the voice with image data and send it as an instant communication message.
  • the audio collection device When the user enters the instant communication tool and needs to send a voice message, the audio collection device is activated to record the voice. Specifically, the voice recording button can be triggered to activate the audio collection device of the terminal where it is located, and the user's voice information can be collected after the audio pickup is activated.
  • the terminal may be various commonly used mobile terminals such as mobile phones, palmtop computers, and tablet computers, and various smart wearable electronic devices, such as smart glasses, smart watches, and the like.
  • a mobile phone is used as the mobile terminal, and the mobile phone has an audio collection structure, an image collection structure and a display structure.
  • the aforementioned voice message is recognized based on the voice recognition technology, and the keyword features in the voice message are advanced.
  • the keyword feature in the advance voice message refers to obtaining the key content based on the content of the voice recognition.
  • the keyword features may be words expressing emotions, words expressing emotions, words expressing preferences, words expressing intentions, or words expressing plans, and the like.
  • the method of extracting the keyword features in the voice message may be as follows:
  • the first method is to perform semantic analysis on the voice message, and obtain keyword features based on the semantic analysis.
  • Method 2 Perform audio analysis on the voice message to obtain intonation features, speed features and/or volume features, and obtain keyword features in the voice message based on the intonation features, speed features and/or volume features.
  • Manner 3 Perform audio analysis on the voice message to obtain the user's emotional state feature, and use the emotional state feature as a keyword feature of the voice message.
  • Voices can reflect people's emotions to a certain extent. For example, generally speaking, irritable and loud speech often means that the speaker is more angry, while cheerful and soft speech often means that the speaker is more happy. Accordingly, the important content that the user needs to express can be obtained by analyzing the emotional information in the user's voice information.
  • the information processing module 130 may include a message synthesis unit, which is used for recognizing the text content of the voice, and integrating the text content and the audio file of the voice into a multimedia message.
  • a message synthesis unit which is used for recognizing the text content of the voice, and integrating the text content and the audio file of the voice into a multimedia message.
  • the text content is displayed in a message box of the multimedia message
  • an audio file play button is set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the information extraction module 120 may include an emotion recognition unit.
  • the emotion recognition unit is used for recognizing emotion information in the voice message.
  • the emotion recognition unit includes a voice volume analysis sub-circuit, a voice pitch analysis sub-circuit, a voice speech rate analysis sub-circuit and/or a voice rhythm analysis sub-circuit.
  • the user enters the instant messaging tool “Quick Message” through the user terminal 200 carried by the user.
  • the user terminal 200 is preferably a mobile phone in this embodiment.
  • the desktop 210 of the user terminal 200 outputs a user interface to the user, on which all communication messages are displayed, and the communication messages display the contacts 220, the latest interactive messages, and a virtual microphone 230 (voice trigger control).
  • the virtual microphone 230 corresponding to leo can be triggered, and then the voice message collection function can be directly started.
  • a voice message input box is displayed in the user interface, and the input box displays the user's voice being entered, the text content corresponding to the voice, and related operation keys.
  • the voice message input box can be displayed directly on the current user interface, or can be displayed after generating a separate voice message interface for the contact leo, as shown in FIG. 6 , the voice message interface displays contact information, voice message input box, as well as virtual microphone and current recording quality information.
  • a user when a user records a voice, he or she can perform sending and pausing operations by operating the virtual microphone 230 .
  • sending and pausing operations by operating the virtual microphone 230 .
  • pressing and sliding the microphone up is a send operation
  • pressing and sliding the microphone to the right is a pause operation.
  • the manner in which the image data is sent together with the aforementioned voice message may be as follows:
  • the voice message and the image data are sent together as two separate messages.
  • the image data is inserted into the keyword position or adjacent positions and then sent together.
  • the inserted image data can be played directly or played after the user triggers the keyword position.
  • a floating window is set corresponding to the voice message, and the image data is displayed through the floating window.
  • the keywords in the voice are replaced with image data and then sent as an instant communication message.
  • the message sent to the receiving end includes text content, audio files and image data.
  • the image data may be pictures, videos, animations and/or other multimedia information.
  • the text content of the voice message is also obtained, and the text content and the audio file of the voice message are integrated into a multimedia message for output display.
  • the text content is displayed in the message box of the multimedia message, and an audio file play button may also be set corresponding to the message box, and triggering the play button can trigger the audio file to play.
  • the instant messaging client may also be set with other functional modules as required, and the specific functions can be found in the previous embodiments, which will not be repeated here.
  • Another embodiment of the present invention also provides a communication message processing device.
  • the message processing settings include the following structure:
  • an audio acquisition module for acquiring the voice message input by the user
  • an information extraction module for extracting the keyword features in the voice message
  • An information processing module configured to determine the image data matching the keyword, and send it together with the aforementioned voice message, or replace the keyword in the voice message with image data and send it.
  • the message processing device may also be provided with other functional modules as required. For details, refer to the foregoing embodiments, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention concerne un procédé de traitement de message de communication, un dispositif et un client de messagerie instantanée, qui se rapportent au domaine technique de l'interaction de communication. Ce procédé de traitement de message de communication comprend les étapes consistant à : acquérir un message vocal acquis par un dispositif d'acquisition audio; extraire une caractéristique de mot-clé dans le message vocal; et déterminer les données d'image correspondant au mot-clé, puis les envoyer conjointement avec le message vocal, ou remplacer le mot-clé dans le message vocal par les données d'image, puis les envoyer. Au moyen de la présente invention, des données d'image pertinentes peuvent être chargées de manière intelligente pendant le processus d'interaction vocale des utilisateurs, ce qui permet d'améliorer la commodité, l'intelligence et l'intérêt de l'échange de messages, et améliore aussi l'expérience d'utilisateur.
PCT/CN2020/112407 2020-08-29 2020-08-31 Procédé de traitement de message de communication, dispositif et client de messagerie instantanée WO2022041177A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010891954.1 2020-08-29
CN202010891954.1A CN112235183B (zh) 2020-08-29 2020-08-29 通信消息处理方法、设备及即时通信客户端

Publications (1)

Publication Number Publication Date
WO2022041177A1 true WO2022041177A1 (fr) 2022-03-03

Family

ID=74116406

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112407 WO2022041177A1 (fr) 2020-08-29 2020-08-31 Procédé de traitement de message de communication, dispositif et client de messagerie instantanée

Country Status (2)

Country Link
CN (1) CN112235183B (fr)
WO (1) WO2022041177A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407265B (zh) * 2021-05-07 2023-04-07 上海纽盾科技股份有限公司 等保测评中基于ar的数据采集方法、装置及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120185240A1 (en) * 2011-01-17 2012-07-19 Goller Michael D System and method for generating and sending a simplified message using speech recognition
US20130210419A1 (en) * 2012-02-10 2013-08-15 Private Group Networks, Inc. System and Method for Associating Media Files with Messages
CN106531149A (zh) * 2016-12-07 2017-03-22 腾讯科技(深圳)有限公司 信息处理方法及装置
CN106570106A (zh) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 一种输入过程中将语音信息转化为表情的方法和装置
CN106888158A (zh) * 2017-02-28 2017-06-23 努比亚技术有限公司 一种即时通信方法和装置
CN107767038A (zh) * 2017-10-01 2018-03-06 上海量科电子科技有限公司 基于语音的支付评价方法、客户端及系统
CN111368609A (zh) * 2018-12-26 2020-07-03 深圳Tcl新技术有限公司 基于情绪引擎技术的语音交互方法、智能终端及存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693535B2 (en) * 2006-12-22 2010-04-06 Sony Ericsson Mobile Communications Ab Communication systems and methods for providing a group play list for multimedia content records
US20110276327A1 (en) * 2010-05-06 2011-11-10 Sony Ericsson Mobile Communications Ab Voice-to-expressive text
KR101226560B1 (ko) * 2011-03-29 2013-01-25 (주)티아이스퀘어 커뮤니케이션 서비스 수행 도중 멀티미디어 콘텐츠 공유 서비스 제공 방법 및 시스템
CN102780651A (zh) * 2012-07-21 2012-11-14 上海量明科技发展有限公司 在即时通信消息中加注情绪数据的方法、客户端及系统
CN102780649A (zh) * 2012-07-21 2012-11-14 上海量明科技发展有限公司 在即时通信消息中加注即时图像的方法、客户端及系统
CN102981712B (zh) * 2012-11-25 2016-07-13 上海量明科技发展有限公司 即时通信交互界面中交互框的调节方法及客户端
CN103001858B (zh) * 2012-12-14 2015-09-09 上海量明科技发展有限公司 即时通信中回复消息的方法、客户端及系统
CN105824799B (zh) * 2016-03-14 2019-05-17 厦门黑镜科技有限公司 一种信息处理方法、设备和终端设备
CN106161215A (zh) * 2016-08-31 2016-11-23 维沃移动通信有限公司 一种信息发送方法及移动终端
CN110085220A (zh) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 智能交互装置
CN109697290B (zh) * 2018-12-29 2023-07-25 咪咕数字传媒有限公司 一种信息处理方法、设备及计算机存储介质
US10628133B1 (en) * 2019-05-09 2020-04-21 Rulai, Inc. Console and method for developing a virtual agent
CN110311858B (zh) * 2019-07-23 2022-06-07 上海盛付通电子支付服务有限公司 一种发送会话消息的方法与设备
CN110417641B (zh) * 2019-07-23 2022-05-17 上海盛付通电子支付服务有限公司 一种发送会话消息的方法与设备
CN110781329A (zh) * 2019-10-25 2020-02-11 深圳追一科技有限公司 图像搜索方法、装置、终端设备及存储介质
CN111106995B (zh) * 2019-12-26 2022-06-24 腾讯科技(深圳)有限公司 一种消息显示方法、装置、终端及计算机可读存储介质
CN111145777A (zh) * 2019-12-31 2020-05-12 苏州思必驰信息科技有限公司 一种虚拟形象展示方法、装置、电子设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120185240A1 (en) * 2011-01-17 2012-07-19 Goller Michael D System and method for generating and sending a simplified message using speech recognition
US20130210419A1 (en) * 2012-02-10 2013-08-15 Private Group Networks, Inc. System and Method for Associating Media Files with Messages
CN106570106A (zh) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 一种输入过程中将语音信息转化为表情的方法和装置
CN106531149A (zh) * 2016-12-07 2017-03-22 腾讯科技(深圳)有限公司 信息处理方法及装置
CN106888158A (zh) * 2017-02-28 2017-06-23 努比亚技术有限公司 一种即时通信方法和装置
CN107767038A (zh) * 2017-10-01 2018-03-06 上海量科电子科技有限公司 基于语音的支付评价方法、客户端及系统
CN111368609A (zh) * 2018-12-26 2020-07-03 深圳Tcl新技术有限公司 基于情绪引擎技术的语音交互方法、智能终端及存储介质

Also Published As

Publication number Publication date
CN112235183B (zh) 2021-11-12
CN112235183A (zh) 2021-01-15

Similar Documents

Publication Publication Date Title
CN110148427B (zh) 音频处理方法、装置、系统、存储介质、终端及服务器
US10977299B2 (en) Systems and methods for consolidating recorded content
US11475897B2 (en) Method and apparatus for response using voice matching user category
CN111489424A (zh) 虚拟角色表情生成方法、控制方法、装置和终端设备
CN110517689A (zh) 一种语音数据处理方法、装置及存储介质
CN108242238B (zh) 一种音频文件生成方法及装置、终端设备
WO2005069171A1 (fr) Dispositif et procede de mise en correlation de documents
CN112102850B (zh) 情绪识别的处理方法、装置、介质及电子设备
CN111145777A (zh) 一种虚拟形象展示方法、装置、电子设备及存储介质
TW201214413A (en) Modification of speech quality in conversations over voice channels
CN110097890A (zh) 一种语音处理方法、装置和用于语音处理的装置
WO2022242706A1 (fr) Production de réponse réactive à base multimodale
CN109102800A (zh) 一种确定歌词显示数据的方法和装置
WO2022041192A1 (fr) Procédé et dispositif de traitement de message vocal, et client de messagerie instantanée
CN114125506B (zh) 语音审核方法及装置
CN110910898B (zh) 一种语音信息处理的方法和装置
WO2022041177A1 (fr) Procédé de traitement de message de communication, dispositif et client de messagerie instantanée
CN110781329A (zh) 图像搜索方法、装置、终端设备及存储介质
Gao Audio deepfake detection based on differences in human and machine generated speech
Khota et al. Modelling Emotional Valence and Arousal of Non-Linguistic Utterances for Sound Design Support
Waghmare et al. A Comparative Study of the Various Emotional Speech Databases
WO2023236054A1 (fr) Procédé et appareil de génération audio, et support de stockage
CN113066513B (zh) 语音数据处理方法、装置、电子设备及存储介质
WO2023051155A1 (fr) Procédés de traitement et d'entraînement de voix et dispositif électronique
CN110795581B (zh) 图像搜索方法、装置、终端设备及存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20950846

Country of ref document: EP

Kind code of ref document: A1