WO2021115351A1 - 一种表情制作方法和装置 - Google Patents

一种表情制作方法和装置 Download PDF

Info

Publication number
WO2021115351A1
WO2021115351A1 PCT/CN2020/135041 CN2020135041W WO2021115351A1 WO 2021115351 A1 WO2021115351 A1 WO 2021115351A1 CN 2020135041 W CN2020135041 W CN 2020135041W WO 2021115351 A1 WO2021115351 A1 WO 2021115351A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
user
voice
image
target
Prior art date
Application number
PCT/CN2020/135041
Other languages
English (en)
French (fr)
Inventor
王萌
王卓
范泛
王乐临
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021115351A1 publication Critical patent/WO2021115351A1/zh
Priority to US17/836,212 priority Critical patent/US11941323B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • This application relates to the field of terminals, and in particular to a method and device for making expressions.
  • the embodiments of the present application provide a method and device for making expressions, which can enrich the form and content of expressions, thereby improving user experience.
  • an embodiment of the present application provides an expression making method applied to an electronic device, including: displaying a first interface, the first interface including a voice input button; in response to the user triggering the operation of the voice input button, receiving the voice input by the user ; Recognize the voice in a preset manner, and the preset manner recognition includes at least content recognition. If the voice includes target keywords, the first image expression set is recommended to the user, and the first expression tag of each image expression in the first image expression set It has a matching relationship with the target keyword; in response to the user's operation of selecting an image expression from the first image expression set, the target expression is obtained according to the corresponding semantics of the voice or voice and the image expression selected by the user.
  • the mobile phone after the mobile phone receives the voice input by the user, it can perform content recognition on the voice. If the voice includes target keywords, the mobile phone recommends the first image expression set to the user, and each of the first image expression set The first expression tag of the image expression has a matching relationship with the target keyword.
  • This method of automatically recommending image expressions based on the voice input by the user can simplify user operations without requiring the user to choose from a large number of image expressions, making the user's operation more convenient. Then, in response to the user's operation of selecting an image expression from the first image expression set, the target expression can be obtained according to the voice and the image expression selected by the user.
  • the target expression contains both voice information and image information, which makes the transmission of information more natural and the expression of emotions more real.
  • the mobile phone can embed the semantics corresponding to the voice into the image expression selected by the user to obtain the target expression.
  • the target expression includes both the image information and the corresponding semantics (text information) of the voice, which can more accurately convey and express the user's intention, and can improve the user experience.
  • the preset method recognition also includes emotion recognition; if the voice belongs to the target emotional color, the second expression label of each image expression in the first image expression set has a matching relationship with the target emotional color. In this way, emoticons can be recommended for users more accurately, and user experience can be improved.
  • the method before performing content recognition on the voice, the method further includes: displaying first prompt information, the first prompt information being used to prompt the user whether to recommend image expressions based on voice; receiving the user-triggered recommendation based on voice Image expression operation.
  • the method before the target expression is obtained according to the corresponding semantics of the voice or the voice and the image expression selected by the user, the method further includes: in response to the operation of the user selecting an image expression from the first image expression set, displaying The second prompt information, the second prompt information is used to prompt the user whether to make a voice expression or a text expression. If the user chooses to make a voice expression, the target expression can be obtained according to the voice and the image expression selected by the user; if the user chooses to make a text expression, the target expression can be obtained according to the semantics of the voice and the image expression selected by the user.
  • displaying the first interface includes: displaying a dialogue interface with the target contact; or, displaying a content sharing interface; or, displaying a comment interface.
  • the method further includes: sending the target emoticon to the electronic device corresponding to the target contact; or uploading the target emoticon to the server corresponding to the application program that provides the content sharing interface or the comment interface.
  • obtaining the target expression according to the voice and the image expression selected by the user includes: encoding and compressing the voice, and adding a preset identifier to the preset position of the image expression selected by the user, and the preset identifier is used to indicate the target
  • the emoticon is a voice emoticon; the encoded voice and the image emoticon after adding a preset logo are loaded into a video format to obtain the target emoticon.
  • obtaining the target expression according to the semantics corresponding to the voice and the image expression selected by the user includes: converting all text or target keywords corresponding to the voice into pixel information; loading the pixel information into the image selected by the user The preset area or blank area of the emoticon.
  • the method further includes: displaying a preview interface, the preview interface includes the target expression, the preset position of the target expression includes a preset identifier or the semantics corresponding to the voice, and the preset identifier is used to indicate that the target expression is a voice expression.
  • the user can preview the target emoticon package on the preview interface, and can make further settings for the target emoticon package.
  • the method further includes: receiving an operation used by the user to trigger the target expression, and in response to the user's operation for triggering the target expression, playing the target expression Carrying voice.
  • the method further includes: performing preset sound effect processing on the voice, the preset sound effect processing including male voice processing, female voice processing, cartoon processing, dialect processing, funny processing, or starization processing At least one of them. In this way, the individual needs of users can be met and user experience can be improved.
  • performing the preset sound effect processing on the voice includes: performing the preset sound effect processing on the voice according to the third expression tag of the image expression selected by the user, and the third expression tag is used to indicate the value of the image expression selected by the user.
  • Type if the third expression label of the image expression selected by the user is the preset character type, the voice is processed according to the sound characteristics of the preset character type; if the third expression label of the image expression selected by the user is the preset animal type, The speech is funny or cartoonized. In this way, the type of image expression and the sound effect of the voice can be better matched, and the user experience can be improved.
  • the method further includes: receiving a user's operation of selecting a picture, and in response to the user's operation of selecting a picture, loading the target area in the picture to the image expression selected by the user or a preset position of the target expression.
  • the method further includes: in response to the user's operation for triggering the custom expression mode, displaying the drawing board interface; receiving the user's drawing board The graffiti operation input on the interface; the stick figure is generated according to the movement trajectory of the graffiti operation; the image expression that is similar to the outline of the stick figure is more than a preset threshold is recommended to the user. In this way, it can better adapt to user needs and improve user experience.
  • the method further includes: receiving an operation of the user selecting an image expression from the locally stored image expressions.
  • an embodiment of the present application provides an expression making method applied to an electronic device, including: displaying a second interface, the second interface includes an image expression selection button; in response to a user's operation for triggering the image expression selection button, displaying At least one image expression; receiving the user's operation of selecting an image expression from at least one image expression; displaying prompt information, which is used to prompt the user whether to make a voice expression; in response to the user's determination of the operation of making a voice expression, according to the user's selection
  • the text on the image expression or the text input by the user generates a voice, and the voice expression is obtained according to the voice and the image expression selected by the user.
  • the mobile phone after the mobile phone receives the operation of the image expression selected by the user, it can generate voice based on the text on the image expression or the text input by the user, and obtain the voice expression based on the voice and image expression, without the user inputting voice , Simplifies the user's operation steps, can generate voice expressions conveniently and intelligently, enriches the form and content of expressions, and can improve user experience.
  • an embodiment of the present application provides an electronic device, including: a display unit for displaying a first interface, the first interface including a voice input button; a receiving unit, for receiving the voice input button in response to the user triggering the operation of the voice input button
  • the voice input by the user the recognition unit is used to recognize the voice in a preset manner, the preset mode recognition includes at least content recognition, and the recommendation unit is used to recommend the first image emoticon set to the user if the voice includes the target keyword, the first The first expression tag of each image expression in the image expression set has a matching relationship with the target keyword;
  • the processing unit is used to respond to the user's operation of selecting an image expression from the first image expression set, according to the voice or voice corresponding The semantics and the image expression selected by the user obtain the target expression.
  • the preset method recognition further includes emotion recognition; if the voice belongs to the target emotional color, the second expression label of each image expression in the first image expression set has a matching relationship with the target emotional color.
  • the display unit is also used to display first prompt information, the first prompt information is used to prompt the user whether it is necessary to recommend image expressions based on voice; and to receive an operation triggered by the user to recommend image expressions based on voice.
  • the display unit is further configured to, in response to the user's operation of selecting an image expression from the first image expression set, display second prompt information, which is used to prompt the user whether to make a voice expression Or text emoticons.
  • the display unit is used to display a dialogue interface with the target contact; or, display a content sharing interface; or, display a comment interface.
  • it further includes a sending unit for sending the target emoticon to the electronic device corresponding to the target contact; or uploading the target emoticon to the server corresponding to the application that provides the content sharing interface or the comment interface.
  • the processing unit is used to encode and compress the speech, and add a preset identifier to the preset position of the image expression selected by the user, the preset identifier is used to indicate that the target expression is a voice expression; to compress the encoding The subsequent voice and the image expression after adding the preset logo are loaded into a video format to obtain the target expression.
  • the processing unit is configured to convert all text or target keywords corresponding to the voice into pixel information; and load the pixel information into the preset area or blank area of the image expression selected by the user.
  • the display unit is also used to display a preview interface
  • the preview interface includes a target expression
  • the preset position of the target expression includes a preset identifier or the semantics corresponding to the voice
  • the preset identifier is used to indicate that the target expression is a voice expression.
  • the preset position of the target emoticon includes a preset identifier
  • it further includes a playing unit for receiving the user's operation for triggering the target emoticon through the receiving unit, and responding to the user's triggering of the target emoticon. Operate, play the voice carried by the target expression.
  • the processing unit is also used to perform preset sound effect processing on the speech.
  • the preset sound effect processing includes male voice processing, female voice processing, cartoonization processing, dialectization processing, weirdization processing, or starization processing. At least one of them.
  • the processing unit is configured to perform preset sound effect processing on the voice according to the third expression tag of the image expression selected by the user, and the third expression tag is used to indicate the type of image expression selected by the user;
  • the third expression label of the selected image expression is the preset character type, and the voice is processed according to the sound characteristics of the preset character type; if the third expression label of the image expression selected by the user is the preset animal type, the voice is processed funny Or cartoonized processing.
  • the processing unit is further configured to receive the user's operation of selecting a picture through the receiving unit, and in response to the user's operation of selecting the picture, load the target area in the picture to the image expression or target expression selected by the user. Preset location.
  • the display unit is further used to display the drawing board interface in response to the user's operation for triggering the custom expression mode; the receiving unit also It is used to receive the graffiti operation input by the user on the drawing board interface; the processing unit is also used to generate stick figures according to the movement trajectory of the graffiti operation; the recommendation unit is also used to recommend to the user images whose outline similarity to the stick figure is greater than a preset threshold expression.
  • the receiving unit is further configured to receive an operation of the user selecting an image expression from the locally stored image expressions.
  • an embodiment of the present application provides an electronic device, including: a display unit for displaying a second interface, the second interface includes an image expression selection button; the display unit is also used for triggering an image expression selection by a user The operation of the button displays at least one image expression; the receiving unit is used to receive the user's operation of selecting an image expression from at least one image expression; the display unit is also used to display prompt information, which is used to prompt the user whether to make a voice Expression; processing unit, in response to the user's decision to make a voice expression operation, generate voice according to the text on the image expression selected by the user or the text input by the user, and obtain the voice expression according to the voice and the image expression selected by the user.
  • an embodiment of the present application provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute any of the methods provided in any of the foregoing aspects.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute any one of the methods provided in any of the foregoing aspects.
  • an embodiment of the present application provides a chip system.
  • the chip system includes a processor and may also include a memory, configured to implement any one of the methods provided in any of the foregoing aspects.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • an embodiment of the present application also provides an expression making device, which may be a processing device, an electronic device, or a chip.
  • the device includes a processor, which is used to implement any one of the methods provided in any one of the foregoing aspects.
  • the device may also include a memory for storing program instructions and data.
  • the memory may be a memory integrated in the device or an off-chip memory provided outside the device.
  • the memory is coupled with the processor, and the processor can call and execute the program instructions stored in the memory to implement any one of the methods provided in any one of the foregoing aspects.
  • the expression making device may also include a communication interface, which is used for communicating between the expression making device and other devices.
  • FIG. 1 is a schematic diagram of a system architecture suitable for an expression making method provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the software structure of an electronic device provided by an embodiment of the application.
  • FIG. 4 is a schematic flowchart of a method suitable for expression making according to an embodiment of the application.
  • FIG. 5 is a schematic diagram of a display of a mobile phone provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a display of another mobile phone provided by an embodiment of the application.
  • FIG. 7a is a schematic diagram of a display of still another mobile phone provided by an embodiment of this application.
  • FIG. 7b is a schematic diagram of a display of still another mobile phone provided by an embodiment of this application.
  • FIG. 8a is a schematic diagram of a display of still another mobile phone provided by an embodiment of this application.
  • FIG. 8b is a schematic diagram of a process for generating a target expression provided by an embodiment of this application.
  • FIG. 9a is a schematic diagram of a display of still another mobile phone provided by an embodiment of this application.
  • FIG. 9b is a schematic diagram of another process for generating a target expression provided by an embodiment of this application.
  • FIG. 9c is a schematic diagram of yet another process for generating a target expression provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of a display of still another mobile phone according to an embodiment of the application.
  • FIG. 11a is a schematic diagram of a display of still another mobile phone according to an embodiment of this application.
  • FIG. 11b is a schematic diagram of yet another process of generating a target expression provided by an embodiment of this application.
  • FIG. 12 is a schematic flow diagram of another method suitable for expression making according to an embodiment of the application.
  • FIG. 13 is a schematic diagram of a display of still another mobile phone provided by an embodiment of this application.
  • FIG. 14 is a schematic diagram of a display of still another mobile phone provided by an embodiment of this application.
  • FIG. 15 is a schematic diagram of yet another process of generating voice expressions according to an embodiment of this application.
  • FIG. 16 is a schematic structural diagram of another electronic device provided by an embodiment of this application.
  • the expressions are mainly static or dynamic pictures, the form is relatively simple, and the content is relatively monotonous.
  • the prior art proposes an instant messaging method, device, and system.
  • the first client receives a voice expression transmission instruction triggered by a user; according to the voice expression transmission instruction, the voice expression is obtained, Voice expressions include voice information and expression image information; the voice expressions are transmitted to the second client through the server, so that the second client displays the expression image corresponding to the expression image information and plays the voice corresponding to the voice information in the instant messaging dialog box.
  • voice expressions include voice information and expression image information
  • the voice expressions are transmitted to the second client through the server, so that the second client displays the expression image corresponding to the expression image information and plays the voice corresponding to the voice information in the instant messaging dialog box.
  • the prior art emphasizes how voice information and expression image information are transmitted in an instant messaging system, and when a user triggers a voice expression transmission instruction, the voice information and expression image need to be obtained and sent separately on the sending side. There are many manual interventions in the process of generating voice expressions. And operation, not smart and convenient enough.
  • An embodiment of the present application provides a method for sending an expression, including: an electronic device displays a first interface, the first interface includes a voice input button; in response to the user triggering the operation of the voice input button, receiving the voice input by the user; and performing content recognition on the voice If the voice includes the target keyword, the first image expression set is recommended to the user, and the first expression label of each image expression in the first image expression set has a matching relationship with the target keyword; that is, the recommended expression is close to the voice content .
  • the target expression is obtained according to the voice and the image expression selected by the user; or the target expression is obtained according to the semantics of the voice and the image expression selected by the user.
  • the target expression may include The image information and voice information, or the target expression may include image information and text information corresponding to the voice, thereby enriching the form and content of the expression. And the target expression can be close to the voice content and meet the user's needs, and the operation is simple, which can improve the user experience.
  • the expression making method provided in the embodiments of the present application can be applied to various scenes that can send expressions.
  • the scene of sending messages in instant messaging software such as short messages and chat applications, or posting various opinions, sharing content, comments, and publishing talks (mood) or articles (for example, blogs) in social software such as blog applications and community applications ) And other scenes.
  • a schematic diagram of an architecture suitable for an expression making method provided by an embodiment of this application includes a first electronic device 110, a server 120, and a second electronic device 130.
  • the first electronic device 110, the server 120, and the second electronic device 130 can form a voice, text, and image information interaction system.
  • the server may be a server corresponding to an instant messaging application or a social application.
  • the first electronic device or the second electronic device can be a mobile phone, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, and a personal computer.
  • Electronic devices such as digital assistant (personal digital assistant, PDA), augmented reality (AR) ⁇ virtual reality (VR) devices, etc.
  • PDA personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • the embodiments of the present application do not impose special restrictions on the specific form of the electronic device.
  • user A may open the chat application on the first electronic device, input voice and select an image expression to obtain the target expression, and the first electronic device may send the target expression to the server of the chat application.
  • the server may send the target expression to the second electronic device, and the second electronic device displays the target expression, and may output the voice carried by the target expression (after user B performs a corresponding operation, for example, after clicking the target expression).
  • user A can open the blog application on the first electronic device.
  • the user can not only input text, but also input voice and select an image expression to get the target expression.
  • the first electronic device can include The text and the blog content of the target emoticon are sent to the server of the blog application.
  • the server of the blog application receives the request message for the above-mentioned blog content sent by the second electronic device, the server may send the blog content including the target emoticon to the second electronic device, and the second electronic device displays the blog including the target emoticon Content, and can output the voice carried by the target expression (after user B performs a corresponding operation, for example, after clicking the target expression).
  • FIG. 2 is a schematic structural diagram of an electronic device 100 provided by an embodiment of this application.
  • the electronic device 100 may be a first electronic device or a second electronic device.
  • the electronic device 100 may include a processor 410, an external memory interface 420, an internal memory 421, a universal serial bus (USB) interface 430, a charging management module 440, a power management module 441, and a battery 442 , Antenna 1, antenna 2, mobile communication module 450, wireless communication module 460, audio module 470, speaker 470A, receiver 470B, microphone 470C, earphone interface 470D, sensor module 480, buttons 490, motor 491, indicator 492, camera 493 , The display screen 494, and the subscriber identification module (SIM) card interface 495, etc.
  • SIM subscriber identification module
  • the sensor module 480 may include pressure sensor 480A, gyroscope sensor 480B, air pressure sensor 480C, magnetic sensor 480D, acceleration sensor 480E, distance sensor 480F, proximity light sensor 480G, fingerprint sensor 480H, temperature sensor 480J, touch sensor 480K, environment Light sensor 480L, bone conduction sensor 480M, etc.
  • the processor 410 may include one or more processing units.
  • the processor 410 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
  • a memory may also be provided in the processor 410 for storing instructions and data.
  • the memory in the processor 410 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 410. If the processor 410 needs to use the instruction or data again, it can be directly called from the memory. Repeated access is avoided, the waiting time of the processor 410 is reduced, and the efficiency of the system is improved.
  • the processor 410 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the interface connection relationship between the modules illustrated in this embodiment is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 440 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 440 may receive the charging input of the wired charger through the USB interface 430.
  • the charging management module 440 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 440 charges the battery 442, it can also supply power to the electronic device through the power management module 441.
  • the power management module 441 is used to connect the battery 442, the charging management module 440 and the processor 410.
  • the power management module 441 receives input from the battery 442 and/or the charging management module 440, and supplies power to the processor 410, the internal memory 421, the external memory, the display screen 494, the camera 493, and the wireless communication module 460.
  • the power management module 441 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 441 may also be provided in the processor 410.
  • the power management module 441 and the charging management module 440 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 450, the wireless communication module 460, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 450 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the mobile communication module 450 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 450 may receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 450 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 450 may be provided in the processor 410.
  • at least part of the functional modules of the mobile communication module 450 and at least part of the modules of the processor 410 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 470A, the receiver 470B, etc.), or displays an image or video through the display screen 494.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 410 and be provided in the same device as the mobile communication module 450 or other functional modules.
  • the wireless communication module 460 can provide applications on the electronic device 100, including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 460 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 460 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 410.
  • the wireless communication module 460 may also receive a signal to be sent from the processor 410, perform frequency modulation, amplify, and convert it into electromagnetic waves through the antenna 2 and radiate it out.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 450, and the antenna 2 is coupled with the wireless communication module 460, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 494, an application processor, and the like.
  • the GPU is an image processing microprocessor, which connects the display screen 494 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 410 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 494 is used to display images, videos, and the like.
  • the display screen 494 includes a display panel.
  • the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active matrix organic light-emitting diode active-matrix organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode.
  • emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 493, a video codec, a GPU, a display screen 494, and an application processor.
  • the ISP is used to process the data fed back from the camera 493. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 493.
  • the camera 493 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 493, and N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects the frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 420 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 410 through the external memory interface 420 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • an external memory card for example, a Micro SD card
  • the Micro SD card is usually open to users, and users can freely delete and access pictures in the system album.
  • the internal memory 421 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 410 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 421. For example, in the embodiment of the present application, the processor 410 may display corresponding display content on the display screen 494 in response to the second operation or the first operation of the user on the display screen 494 by executing instructions stored in the internal memory 421.
  • the internal memory 421 may include a program storage area and a data storage area. Among them, the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 421 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and a read-only memory (read-only memory). memory, ROM) etc.
  • the electronic device 100 can implement audio functions through an audio module 470, a speaker 470A, a receiver 470B, a microphone 470C, a headphone interface 470D, and an application processor. For example, music playback, recording, etc.
  • the audio module 470 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 470 can also be used to encode and decode audio signals.
  • the audio module 470 may be disposed in the processor 410, or part of the functional modules of the audio module 470 may be disposed in the processor 410.
  • the speaker 470A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 470A, or listen to a hands-free call.
  • the receiver 470B also called “earpiece”, is used to convert audio electrical signals into sound signals.
  • the electronic device 100 When the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 470B close to the human ear.
  • the microphone 470C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the electronic device 100 may be provided with at least one microphone 470C. In some embodiments, the electronic device 100 may be provided with two microphones 470C, which can implement noise reduction functions in addition to collecting sound signals. In some embodiments, the electronic device 100 may also be provided with three, four or more microphones 470C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the audio module 470 can be used to convert the analog audio input obtained by the microphone 470C into a digital audio signal, and to encode and decode the audio signal.
  • the earphone interface 470D is used to connect wired earphones.
  • the earphone interface 470D may be a USB interface 430, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the pressure sensor 480A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 480A may be provided on the display screen 494.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 480A.
  • touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 480B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 480B can be used for image stabilization.
  • the gyroscope sensor 480B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 480B can also be used for navigation and somatosensory game scenes.
  • the display screen 494 of the electronic device 100 can be folded to form multiple screens.
  • Each screen may include a gyro sensor 480B for measuring the orientation of the corresponding screen (ie, the direction vector of the orientation).
  • the electronic device 100 can determine the angle between adjacent screens according to the measured angle change of the orientation of each screen.
  • the air pressure sensor 480C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 480C to assist positioning and navigation.
  • the magnetic sensor 480D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 480D to detect the opening and closing of the flip holster.
  • the electronic device 100 when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 480D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 480E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, etc. It should be noted that in this embodiment of the present application, the display screen 494 of the electronic device 100 can be folded to form multiple screens. Each screen may include an acceleration sensor 480E for measuring the orientation of the corresponding screen (that is, the direction vector of the orientation).
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 480F to measure the distance to achieve fast focusing.
  • the proximity light sensor 480G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 480G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 480G can also be used in leather case mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 480L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 494 according to the perceived brightness of the ambient light.
  • the ambient light sensor 480L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 480L can also cooperate with the proximity light sensor 480G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 480H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 480J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 480J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 480J exceeds a threshold value, the electronic device 100 executes to reduce the performance of the processor located near the temperature sensor 480J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 442 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
  • the electronic device 100 boosts the output voltage of the battery 442 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 480K also called “touch panel”.
  • the touch sensor 480K may be arranged on the display screen 494, and the touch screen is composed of the touch sensor 480K and the display screen 494, which is also called a “touch screen”.
  • the touch sensor 480K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 494.
  • the touch sensor 480K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 494.
  • the process of making, transmitting, receiving, and customizing the target expression can be completed by performing corresponding operations on the touch screen.
  • the bone conduction sensor 480M can acquire vibration signals.
  • the bone conduction sensor 480M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 480M can also contact the human pulse and receive blood pressure beating signals.
  • the bone conduction sensor 480M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 470 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 480M, and realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 480M, and realize the heart rate detection function.
  • the button 490 includes a power button, a volume button, and so on.
  • the button 490 may be a mechanical button. It can also be a touch button.
  • the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 491 can generate vibration prompts.
  • the motor 491 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 494, the motor 491 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 492 can be an indicator light, which can be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 495 is used to connect to the SIM card.
  • the SIM card can be connected to and separated from the electronic device 100 by inserting into the SIM card interface 495 or pulling out from the SIM card interface 495.
  • the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 495 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 495 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 495 can also be compatible with different types of SIM cards.
  • the SIM card interface 495 may also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the electronic device 100 may also include auxiliary devices such as a mouse, a keyboard, and a drawing board, which are used for the process of making, transmitting, receiving, and customizing target expressions.
  • FIG. 3 is a software structure block diagram of an electronic device 100 according to an embodiment of the present invention.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100.
  • the layered architecture can divide the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into three layers, from top to bottom as the application layer (referred to as the application layer), the application framework layer (referred to as the framework layer), and the kernel layer (also referred to as the driver layer).
  • the application layer can include a series of application packages. As shown in Figure 3, the application layer may include multiple application packages such as chat applications and social applications. The application layer may also include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, and launcher (not shown in Figure 3).
  • applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, and launcher (not shown in Figure 3).
  • the framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the framework layer may include an image processing module, a voice processing module, an auxiliary processing module, and an expression database module.
  • the framework layer may also include a content provider, a view system, a phone manager, a resource manager, a notification manager, etc. (not shown in FIG. 3).
  • the voice processing module is used to process the user's voice information.
  • the voice processing module may include a recording module, a playback module, a voice codec module, a voice enhancement module, a voice recognition module, an emotion recognition module, a sound effect processing module, and a text-to-speech module.
  • the recording module is used to record the voice input by the user;
  • the playback module is used to play the voice;
  • the voice codec module is used to encode or decode the voice input by the user;
  • the voice enhancement module is used to remove the noisy voice input by the user.
  • the speech recognition module can convert the user's input speech into text information through the automatic speech recognition (ASR) algorithm; the emotion recognition module can use the emotion recognition (speech emotion recognition, SER) The algorithm extracts the emotional color of the user’s speech from the user’s input; the sound effect processing module adds dialect, emotional, cartoonish, and star-like sound effects to the user’s input; the text-to-speech module can use text-to-speech (text-to-speech).
  • the to speech (TTS) algorithm converts the text information in the emoticon into audio information.
  • the image processing module is used to process the image information input by the user.
  • the image processing module may include an image editing module, an image encoding and decoding module, an image rendering module, an optical character recognition (OCR) module, an image generation module, and the like.
  • the image editing module can provide the user with manual drawing function; the image encoding and decoding module is used to encode or decode the image drawn by the user; the image rendering module is used to render the image drawn by the user; the optical character recognition module can The text of the image is extracted from the image; the image generation module can use the deep learning method to enrich the image drawn by the user and generate the corresponding expression picture.
  • the expression database module is used to store image expression data sets.
  • the auxiliary processing module includes an expression recommendation module, a text embedding and editing module, a voice image packaging module, and so on.
  • the expression recommendation module is used to obtain corresponding expressions from the expression database module in the form of keyword matching according to the voice text information and voice emotion information input by the user, so as to recommend to the user;
  • the text embedding and editing module can realize the text ( The text obtained by speech recognition) is embedded in the emoticon image, and can provide users with the function of editing text format.
  • the voice and image packaging module is used to package voice and image into a complete voice expression file, which can be stored in the format of a video file.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the following uses a scenario of sending voice expressions in a chat application to illustrate the software workflow of the electronic device 100 by way of example.
  • the kernel layer can generate a corresponding input event (such as a voice input event) according to the user's input operation, and report the event to the application framework layer.
  • the application framework layer can perform corresponding processing on the voice input by the user through the voice processing module.
  • the auxiliary processing module in the application framework layer can determine one or more image expressions matching the voice input by the user through the expression recommendation module, and display them on the display interface through the display driver.
  • the kernel layer receives the user's selection of an expression, it can report the emotion selection event to the application framework layer.
  • the auxiliary processing module in the application framework layer can package the voice and image into a complete voice expression file through the voice image packaging module , And then can be sent to the opposite end through the communication module of the electronic device (for example, the routing module).
  • an embodiment of the present application provides a method for making an expression, taking the electronic device as a mobile phone and an application scenario as a chat scenario as an example for description, including:
  • 401 Display a first interface, where the first interface includes a voice input button, and in response to the user triggering an operation of the voice input button, the voice input by the user is received.
  • the first interface may be a dialogue interface with the target contact.
  • the first interface may be a content sharing interface, such as a mood publishing interface, a talk publishing interface, or the first interface may be a comment interface, such as a blog comment interface, a forum comment interface, or a circle of friends comment interface.
  • a mobile phone can display a dialogue interface between a user (for example, Tom) and a contact Alice.
  • the user when the user wants to send a voice emoticon, as shown in (a) in FIG. 5, the user can trigger the emoticon button 301, in response to the user triggering an operation of the emoticon button 301, as shown in (b) in FIG. 5
  • the mobile phone displays an emoticon menu 302, and the emoticon menu 302 may include a voice expression making button 303.
  • the user can trigger the voice expression creation button 303.
  • the mobile phone can display a voice expression creation window 304.
  • the voice expression creation window 304 includes a voice input button (recording button) 305.
  • the prompt message "press and hold to speak” may be displayed.
  • the voice expression making window 304 may also include prompt information 306, which is used to prompt the user how to make a voice expression.
  • the user may press and hold the voice input button 305 to speak (input voice), for example, the voice input by the user may be "discussion".
  • the mobile phone can call the microphone to pick up the corresponding audio signal.
  • the mobile phone detects that the user has finished speaking, for example, after the mobile phone detects that the user has released the voice input button 305, or detects that the user has not spoken within a preset time interval, the mobile phone considers that the user has finished speaking.
  • the mobile phone After the mobile phone detects the user's operation of pressing the voice input button 305, the mobile phone can use the automatic voice recognition algorithm to convert the user's input voice into corresponding text and translate into corresponding text, and display the text 307 corresponding to the voice on the display interface . As shown in (c) in FIG. 5, the text 307 corresponding to the voice input by the user may be "beg to fight".
  • the mobile phone can also display the frequency 308 of the voice input by the user on the display interface.
  • the mobile phone can also perform voice enhancement processing on the noisy speech input by the user, such as denoising, de-reverberation, and de-echo processing to obtain clean speech.
  • voice enhancement processing on the noisy speech input by the user, such as denoising, de-reverberation, and de-echo processing to obtain clean speech. The specific process can refer to the prior art, which will not be repeated in this application.
  • the user when the user wants to send a voice expression, as shown in Figure 6(a), the user can trigger the voice button 309.
  • the mobile phone detects that the user triggers the operation of the voice button 309, as shown in Figure 6
  • the mobile phone can display a voice input window 310.
  • the voice input window 310 may include a voice input button (recording button) 305.
  • a prompt message "Press and Talk" may be displayed.
  • the user can press and hold the voice input button 305 to speak (input voice).
  • the mobile phone After the mobile phone detects the user's operation of pressing the voice input button 305, it can call the microphone to pick up the corresponding audio signal.
  • the mobile phone detects that the user has finished speaking, for example, after the mobile phone detects that the user has released the voice input button 305, or detects that the user has not spoken within a preset time interval, the mobile phone considers that the user has finished speaking.
  • the mobile phone After the mobile phone detects that the user has finished speaking, it can display the first prompt message.
  • the first prompt message is used to prompt the user whether it is necessary to recommend image expressions based on voice. If the mobile phone receives an operation triggered by the user to recommend image expressions based on voice, the mobile phone can perform steps 402.
  • the mobile phone may pop up a pop-up box 311, which includes options such as sending voice, sending text, and recommending emoticons based on voice, in response to the user selecting the option to recommend emoticons based on voice
  • the mobile phone can perform step 402.
  • the preset mode recognition at least includes content recognition, and if the voice includes a target keyword, a first image expression set is recommended to the user.
  • the first expression tag of each image expression in the first image expression set has a matching relationship with the target keyword.
  • the mobile phone can pre-store an image expression data set in the local image expression database.
  • the image expression data set includes multiple image expressions and expression tags corresponding to each image expression.
  • Each image expression may include multiple expression tags.
  • the first expression tag is used to identify key feature information of the image expression, and the key feature information is used to describe the subject or subject content of the image expression.
  • the first emoticon tag is the emoticon tag with the highest priority among the multiple emoticon tags.
  • the first emoticon tag may be, for example, "hit", “haha”, “mourning” and so on.
  • the first expression tags are the same.
  • image expressions with the same first expression label may form an expression set, and the first expression set may be expression set 1 or expression set 2, and so on.
  • the mobile phone can perform content recognition on the voice input by the user through the voice recognition module to determine whether the voice contains target keywords.
  • the target keywords can be, for example, “hit”, “haha”, “mourning” and so on. For example, if the voice input by the user is "You kid is really begging for hitting", then the mobile phone can recognize the keyword “beating”. If the voice input by the user is "Catch the car at the last minute, haha", then the mobile phone can recognize the keyword "haha”. If it is recognized that the voice includes the target keyword, the mobile phone recommends the first image expression set to the user.
  • the first expression tag of each image expression in the first image expression set has a matching relationship with the target keyword. For example, if the target keyword is: "hit”, then the first emoticon set is an emoticon set whose first emoticon tag is "hit". For example, the first emoticon set may be an emoticon set 1 in Table 1.
  • the mobile phone may also send the voice input by the user to the server, and the image expression data set described above is pre-stored on the server.
  • the server can perform content recognition on the voice input by the user. If the server determines that the voice includes the target keyword, it can send the first image expression set to the mobile phone, and the mobile phone receives the first image expression set sent by the server and recommends the first image expression set to the user , That is, the first image expression set is displayed on the display interface.
  • the mobile phone may label the first expression as the first expression of "beat".
  • the image expressions in the collection are displayed on the display interface for the user to choose.
  • the mobile phone displays at least one recommended image emoticon, it can also display the frequency and text 402 corresponding to the voice input by the user on the display interface to prevent the user from forgetting the previously entered voice content when selecting the image emoticon, so as to prompt the user to choose
  • the image expressions that are closer to the text of the voice make the voice expressions created by the user more vivid and vivid.
  • the image expression may not be recommended, and the user may be prompted to input the voice again.
  • the preset method recognition also includes emotion recognition, that is, emotion recognition can be performed on the voice input by the user. If the voice belongs to the target emotional color (emotional tone/emotional direction), each of the first image expression set The second expression label of the image expression has a matching relationship with the target emotional color.
  • the second expression tag is used to identify the emotional color of the image expression. It can be considered that the second emoticon tag is an emoticon tag with the second priority among the plurality of emoticon tags.
  • the second emoticon tag may be, for example, "happy", “angry”, “sad” and so on.
  • each expression set (a set with the same first expression label) may include one or more expression subsets, and the second expression label of each image expression in the expression subset is the same.
  • the first expression set may be expression subset 1 or expression subset 2 of expression set 1, and so on.
  • the mobile phone determines that the voice input by the user has a high pitch and a fast frequency, it can be considered that the user’s emotional color is: "angry”, and the mobile phone can determine the emotional color of the second expression label as "angry" from the first set of expressions A subset of emoticons and displayed on the display interface for users to choose. In this way, emoticons can be recommended for users more accurately, and user experience can be improved.
  • the emoticon set can be directly recommended to the user based on the target keyword.
  • the image expressions recommended by the mobile phone to the user may include emoji (emoticons) and stickers (emoticons), and the stickers may include static emoticons, and may also include dynamic emoticons (such as gif animations).
  • the image expression recommended by the mobile phone to the user may also be a local or downloaded static picture or dynamic picture, which is not limited in this application.
  • the mobile phone may first perform emotion recognition on the voice input by the user, and match the recognized emotion color with the second expression tag of the image expression to determine the second image expression set, the second image
  • the second expression label of each image expression in the expression set matches the emotional color of the voice input by the user.
  • the second emoticon tag is the emoticon tag with the highest priority among the multiple emoticon tags of the image emoticon. Then, the second image expression set can be recommended to the user.
  • the voice input by the user it is possible to perform keyword recognition on the voice input by the user, and determine whether the voice includes the target keyword, so as to determine from the second image expression set a subset of image expressions matching the target keyword.
  • the first emoticon label of each image emoticon in the set matches the target keyword.
  • the first emoticon tag is the emoticon tag with the second priority among the multiple emoticon tags of image emoticons. Then, the second image expression set can be recommended to the user.
  • a second prompt message in response to the user's operation of selecting an image expression from the first image expression set, a second prompt message may be displayed, and the second prompt message is used to prompt the user whether to make a voice expression or a text expression.
  • the mobile phone may directly prompt the user whether to use the image expression to make voice expressions or text expressions.
  • the mobile phone in response to the user's operation of selecting an image expression from the first image expression set, the mobile phone may display a pop-up box 405, and the pop-up box 405 may include voice expression options and text expression options.
  • the target expression is obtained according to the voice and the image expression selected by the user; in response to the user-triggered operation of making a text expression, the target expression is obtained according to the semantics of the voice and the image expression selected by the user.
  • the process for the mobile phone to obtain the target expression according to the voice and the image expression selected by the user may specifically be: encoding and compressing the voice input by the user, and adding a preset mark to the preset position of the image expression selected by the user, and the preset mark is used to indicate the target
  • the expression is a voice expression.
  • a small speaker logo can be added to the blank of the image expression to remind the user of the existence of voice; then, the encoded voice and the image expression after adding the preset logo are loaded into a video format to get Target expression.
  • the user can choose to store or send the target emoticon. Exemplarily, as shown in FIG.
  • the target expression 403 may be displayed on the chat interface, and the target expression 403 may include a voice icon to prompt that the voice expression carries a voice.
  • the user can trigger the target expression to play the voice. For example, the user can click (click or double-click, etc.) the voice icon of the target expression to play the voice.
  • FIG. 8b it is a flow chart for generating a target expression.
  • the target expression can carry a voice input by the user. For the specific process, please refer to the relevant description above, which will not be repeated here.
  • the mobile phone obtains the target expression according to the semantics of the voice and the image expression selected by the user, specifically: converting all the text or target keywords corresponding to the voice into pixel information; loading the pixel information into the preset area of the image expression selected by the user Or blank space.
  • the preset area can be the lower, upper, left, right and other edge areas of the image expression. If you need to load text into a blank area, the mobile phone can first recognize the blank area of the image expression, and then adapt the size of the text according to the size of the blank area and embed the text in the blank area.
  • the mobile phone before the mobile phone embeds all text or target keywords corresponding to the voice into the image expression selected by the user, it can recognize whether the image expression selected by the user includes text. If the image expression selected by the user does not include text or an image selected by the user The text included in the emoticon is different from the text corresponding to the voice input by the user. The mobile phone embeds all text or target keywords corresponding to the voice into the image emoticon selected by the user to obtain the target emoticon.
  • the mobile phone can automatically embed the text corresponding to the voice input by the user in the image emoticon selected by the user, or the mobile phone can provide a button for embedding text so that the user can manually embed the text corresponding to the voice in the image emoticon selected by the user. Then, the mobile phone can display a preview interface, the preview interface includes the target expression, and the preset position of the target expression includes the text corresponding to the voice.
  • the mobile phone can display a preview interface of the target emoticon, and the mobile phone can also preview The interface displays a button 501 for embedding text.
  • the mobile phone can embed the text corresponding to the voice in the preset area of the target expression, for example, under the target expression.
  • the mobile phone can automatically set the color of the text according to the background color of the image expression selected by the user, so as to make the text more eye-catching or closer to the background color of the expression.
  • the user can set the font, color and other characteristics of the text embedded in the image emoticon.
  • the user can edit the text such as font, size, bold, italic, artistic word, color, underline, etc. by long pressing the embedded text, so as to better adapt to user needs and improve user experience.
  • the user can click on the text 502 on the emoticon, and as shown in Figure 9a(c), the mobile phone can pop up a bullet box 503.
  • the bullet box 503 you can include Regarding the multiple preset text formats of the text 502, the user can select a text format according to his own preferences, and the mobile phone can modify the text format embedded in the voice expression according to the text format selected by the user.
  • the user can also set various fonts, text boxes or animation effects for the text, which is not limited in this application.
  • the user can also control the position and rotation angle of the text embedded in the emoticon package picture; if the emoticon package is a gif animation, you can also edit the text animation effects (animation effects) And processing to match the user’s behavior habits and preferences.
  • the mobile phone can also embed the voice and the text corresponding to the voice into the image expression selected by the user at the same time to obtain the target expression.
  • the target expression includes the voice input by the user and the text information (all text or keywords) corresponding to the voice input by the user.
  • the specific process please refer to the relevant description above, which will not be repeated here.
  • the mobile phone can play the voice carried by the target expression after receiving an operation used by the user to trigger the target expression. For example, the user can click (click or double-click, etc.) the preset identifier of the target expression to play the voice.
  • a preset identifier for example, a small speaker identifier
  • the voice of the target expression can be further processed with preset sound effects.
  • the preset sound effect processing may include male voice processing (for example, uncle sound effect), female voice processing (for example, loli sound effect, goddess sound effect) , At least one of cartoonization processing, dialectization processing, funny processing (for example, Wang Xingren sound effect, cat sound effect), starization processing or emotional processing.
  • FIG. 9c it is a flow chart of generating a target expression.
  • the target expression may carry a voice input by the user, and the voice may have a dialectized sound effect (for example, Henan dialect sound effect).
  • the user may click the voice icon on the voice expression 403, and in response to the user's operation of clicking the voice icon, the mobile phone may display a pop-up frame 504.
  • the bullet frame 504 can include a variety of preset sound effects. The user can choose a sound effect according to his own preferences. The mobile phone can process the voice of the voice expression 403 based on the sound effect selected by the user to better adapt to the user’s Demand, improve user experience.
  • the mobile phone may perform preset sound effect processing on the voice according to the third expression tag of the image expression selected by the user, and the third expression tag is used to indicate the type of the image expression selected by the user.
  • the voice input by the user may be processed according to the sound characteristics of the preset character type.
  • the third emoticon tag is a crosstalk actor XX
  • the mobile phone can modify the voice input by the user according to the voice characteristics of the crosstalk actor XX, such as timbre, pitch, speed of sound, volume, intonation, or voiceprint.
  • the timbre of the voice is funny or cartoonized.
  • the third emoticon tag of the image emoticon selected by the user is a cat
  • a cat sound effect may be added to the voice input by the user. In this way, by performing personalized sound effect processing on the language input by the user, the expression of the target expression can be richer and more interesting.
  • the mobile phone can pre-store the sound effects corresponding to the third emoticon tag of each image expression.
  • the mobile phone can store the voice color, pitch, sound speed, and volume of the crosstalk actor XX. , Intonation, or voiceprint.
  • the third emoticon tag is an animal such as a cat or a dog, the mobile phone can store the corresponding cat or dog sound effect.
  • the mobile phone after the mobile phone receives the user's operation of selecting an image expression, it can also receive the user's operation of selecting a picture, and can load the target area in the picture to the preset position of the image expression selected by the user.
  • an embedded picture button (not shown in Figure 9a (a)) can be set next to the embedded text button 501.
  • the mobile phone The system photo album can be called for the user to select a picture.
  • the mobile phone can cover the target area in the picture selected by the user on the image expression selected by the user to obtain a custom expression.
  • the target area may be an area containing a human face.
  • the mobile phone can cover the face area in the picture selected by the user on the preset position of the image expression selected by the user to obtain customization expression.
  • the mobile phone can automatically recognize the face area of the picture selected by the user, or the user can manually select the face area, which is not limited in this application.
  • the user can select an emoticon for personalized design based on the emoticon recommended by the mobile phone, and generate a target emoticon (voice emoticon or text emoticon) based on the customized emoticon obtained by the personalized design, which is more entertaining and can improve user experience.
  • the mobile phone after the mobile phone receives the user's operation to select an image expression, it can also receive the user's graffiti or "stickers" on the image expression (for example, adding stickers of love, stars, balloons, etc. to the original image expression) ) And other operations to generate target expressions based on the image expressions after graffiti or "stickers".
  • the user can select an emoticon for personalized design based on the emoticon recommended by the mobile phone, and generate a target emoticon (voice emoticon or text emoticon) based on the customized emoticon obtained by the personalized design, which is more entertaining and can improve user experience.
  • the mobile phone can save the generated target expression and generate a voice expression sending record.
  • the mobile phone can display the voice expression sent by the user before, so that the user can directly select a voice expression Sending is more convenient and faster.
  • the user can trigger a custom expression mode in response to the user's use of
  • the drawing board interface is displayed, the graffiti operation input by the user on the drawing board interface is received, and the stick figure is generated according to the movement trajectory of the graffiti operation.
  • the mobile phone may display a custom control 404 when displaying at least one recommended image expression. If the user is not satisfied with the image expression recommended by the mobile phone, the user can click the control 404 to trigger the custom expression mode.
  • the mobile phone may display a drawing board interface 1101. The user can trigger the pen control 1102 to do graffiti on the drawing board interface, and the mobile phone receives the graffiti operation input by the user on the drawing board interface, and generates a stick figure according to the movement trajectory of the graffiti operation. Then, the mobile phone can recommend to the user an image expression whose similarity with the outline of the stick figure is greater than a preset threshold.
  • the user's input voice can be combined with the user's selection.
  • the mobile phone can recommend an image expression 1103 similar to a stick figure to the user. If the user selects the image expression 1103, the mobile phone can generate a target expression according to the image expression 1103.
  • Fig. 11b it is a flow chart for generating a target expression.
  • the target expression carries the voice input by the user, and the image information of the target expression may be recommended based on the stick figure drawn by the user.
  • the image information of the target expression may be recommended based on the stick figure drawn by the user.
  • the mobile phone can also render the stick figure based on generative adversarial networks (GAN) to obtain a personalized expression, and generate a target expression based on the personalized expression.
  • GAN generative adversarial networks
  • the mobile phone can directly generate voice expressions based on the stick figure, which is not limited in this application.
  • the user can also select one from the locally stored image expressions.
  • Image emoticons if the user believes that the image expression recommended by the mobile phone does not meet psychological expectations or does not fit the current scene, and does not select an image expression from at least one image expression, the user can also select one from the locally stored image expressions.
  • Image emoticons if the user believes that the image expression recommended by the mobile phone does not meet psychological expectations or does not fit the current scene, and does not select an image expression from at least one image expression, the user can also select one from the locally stored image expressions.
  • the mobile phone can send the target expression to the electronic device corresponding to the target contact; or, it can upload the target expression to the server corresponding to the application that provides the content sharing interface or the comment interface.
  • the mobile phone after the mobile phone detects that the user triggers the operation of sending the target emoticon, for example, after the mobile phone detects the user's swipe or long-press operation on the voice emoticon, it can send the target emoticon.
  • the target emoticon can be forwarded to the corresponding contact ( For example, Alice).
  • the electronic device used by Alice can receive the target emoticon and display the target emoticon on the chat interface.
  • the second electronic device receives the operation of Alice to trigger the voice identification of the target expression
  • the second electronic device can call the speaker to input the voice carried by the expression, so that the user can hear the voice in the voice expression.
  • the operation of triggering the voice identification of the emoticon may include a click operation, a sliding operation, a long-press operation, and the like.
  • the click operation may include a single click operation, a double click operation, etc., which is not limited in this application.
  • chat scenes are used as examples to illustrate the specific method of sending emoticons. It is understandable that the above method can also be applied in social scenes such as blog posting, mood posting, reply or comment.
  • social scenes such as blog posting, mood posting, reply or comment.
  • a user publishes a blog
  • he can insert voice expressions when publishing text, that is, the blog content includes text and voice expressions.
  • the blog content may also include pictures, videos, etc., which are not limited in the embodiment of the present application.
  • the mobile phone after the mobile phone receives the voice input by the user, it can perform content recognition on the voice. If the voice includes target keywords, the mobile phone recommends the first image expression set to the user, and each of the first image expression set The first expression tag of the image expression has a matching relationship with the target keyword.
  • This method of automatically recommending image expressions based on the voice input by the user can simplify user operations without requiring the user to choose from a large number of image expressions, making the user's operation more convenient. Then, in response to the user's operation of selecting an image expression from the first image expression set, the target expression can be obtained according to the voice and the image expression selected by the user.
  • Voice as another carrier of information, can deliver rich content and bring greater entertainment, thus enriching the form and content of expressions.
  • the target expression contains both voice information and image information, which makes the transmission of information more natural and the expression of emotions more real.
  • the mobile phone can embed all the text or target keywords corresponding to the voice into the image expression selected by the user to obtain the target expression.
  • the target expression contains both image information and text information corresponding to the voice, which can more accurately convey and express the user's intention, and can improve the user experience.
  • voice control does not require gesture operations, which is very suitable for scenes such as users using electronic devices while driving.
  • the electronic device is a mobile phone as an example for description, including:
  • a second interface is displayed.
  • the second interface includes an image expression selection button, and at least one image expression is displayed in response to an operation used by the user to trigger the image expression selection button.
  • the second interface may be a dialogue interface with the target contact; or, the second interface may be a content sharing interface; or, the second interface may be a comment interface.
  • the user can select an image expression from a sticker expression provided by the mobile phone or a locally stored image expression.
  • Display prompt information which is used to prompt the user whether a voice expression needs to be made.
  • the mobile phone may display prompt information.
  • the prompt information can be a prompt box 702, which can include prompt text: "Do you need to make a voice expression", and the prompt box can also include "Yes "" and "No” buttons. If the user clicks the "Yes” button, the mobile phone determines that the user needs to make a voice expression, and the mobile phone executes step 1204. If the user clicks the "No” button, the mobile phone can directly send the emoji selected by the user to the opposite contact.
  • the mobile phone can recognize and extract the text on the image expression through the optical character recognition technology, generate the voice according to the recognized text, and then obtain the voice expression according to the voice and the image expression selected by the user.
  • the text extracted by the mobile phone from the image expression may be "discuss", and the corresponding voice can be generated based on the text.
  • the mobile phone can receive text input by the user, for example, can receive text input by the user through a soft keyboard or copied and pasted text, generate a voice based on the text, and then obtain a voice expression based on the voice and the image expression selected by the user.
  • the mobile phone can also perform preset sound effect processing or personalized processing on the generated voice.
  • preset sound effect processing or personalized processing please refer to step 403, which will not be repeated here.
  • the mobile phone can generate voice expressions based on the voice and image expressions after preset sound effect processing or personalized processing.
  • FIG. 15 it is a flow chart of generating a voice expression based on an image expression carrying text selected by the user.
  • the voice expression carries a corresponding voice generated from the text.
  • the voice may have a dialect sound effect (for example, Henan dialect sound effect). ).
  • a dialect sound effect for example, Henan dialect sound effect.
  • step 404 For specific description, refer to step 404, which is not repeated here.
  • the mobile phone After the mobile phone receives the user's operation of selecting the image expression, it can generate voice based on the text on the image expression or the text input by the user, and obtain the voice expression based on the voice and image expression, without the user inputting voice, which is simplified
  • the user's operation steps can be conveniently and intelligently generated voice expressions, the form and content of expressions can be enriched, and the user experience can be improved.
  • an electronic device includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the electronic device into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 16 shows a schematic diagram of a possible composition of the electronic device 16 involved in the foregoing embodiment.
  • the electronic device 16 may include: a display unit 1601, a receiving unit 1602, The recognition unit 1603, the recommendation unit 1604, and the processing unit 1605.
  • the display unit 1601 is configured to display a first interface, and the first interface includes a voice input button;
  • the receiving unit 1602 is configured to receive the voice input by the user in response to the user triggering the operation of the voice input button;
  • the preset manner recognition includes at least content recognition.
  • the recommendation unit 1604 is configured to recommend the first image emoticon set to the user if the voice includes target keywords, and the first image emoticon set in the first image emoticon set The first expression tag of each image expression has a matching relationship with the target keyword; the processing unit 1605 is configured to respond to the user's operation of selecting an image expression from the first image expression set, according to the voice or the corresponding semantics of the voice and the user selection The image expression of the target expression is obtained.
  • the electronic device may include a processing module, a storage module, and a communication module.
  • the processing module can be used to control and manage the actions of the electronic device. For example, it can be used to support the electronic device to execute the steps performed by the display unit 1601, the receiving unit 1602, the identification unit 1603, the recommendation unit 1604, and the processing unit 1605.
  • the storage module can be used to support the storage of program codes and data in the electronic device.
  • the communication module can be used to support the communication between electronic devices and other devices.
  • the processing module may be a processor or a controller. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of this application.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and so on.
  • the storage module may be a memory.
  • the communication module may specifically be a radio frequency circuit, a Bluetooth chip, a Wi-Fi chip, and other devices that interact with other electronic devices.
  • This embodiment also provides a computer storage medium in which computer instructions are stored.
  • the computer instructions run on an electronic device, the electronic device executes the above-mentioned related method steps to implement the expression creation method in the above-mentioned embodiment.
  • This embodiment also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to realize the expression making method in the above-mentioned embodiment.
  • the embodiments of the present application also provide a device.
  • the device may specifically be a chip, component or module.
  • the device may include a processor and a memory connected to each other.
  • the memory is used to store computer execution instructions.
  • the processor can execute the computer-executable instructions stored in the memory, so that the chip executes the expression making method in the foregoing method embodiments.
  • the electronic device, computer storage medium, computer program product, or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the corresponding method provided above. The beneficial effects of the method will not be repeated here.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of a software product, and the software product is stored in a storage medium. It includes several instructions to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods of the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Psychiatry (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种表情制作方法和装置,涉及终端领域,能够丰富表情的形式和内容,从而提高用户体验。其方法为:显示第一界面,第一界面包含语音输入按钮;响应于用户触发语音输入按钮的操作,接收用户输入的语音(401);对语音进行预设方式识别,预设方式识别至少包括内容识别,若语音包括目标关键词,向用户推荐第一图像表情集合(402);响应于用户从第一图像表情集合中选择一个图像表情的操作,根据语音或语音对应的语义以及用户选择的图像表情得到目标表情(403);发送目标表情(404)。

Description

一种表情制作方法和装置
本申请要求于2019年12月10日提交国家知识产权局、申请号为201911261292.3、申请名称为“一种表情制作方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端领域,尤其涉及一种表情制作方法和装置。
背景技术
随着手机、平板、个人计算机(personal computer,PC)等终端设备的普及和社交软件的发展,表情(图像表情/表情包)的应用越来越广泛。用户可以通过手机等终端设备在聊天软件等社交软件中发送表情,表情以时下流行的明星、语录、动漫、影视截图为素材,配上一系列相匹配的文字,可以表达特定的情感,具有高识别度,高感情化和自传播(见到-喜欢-收藏-转发)的特性,可以消除文字产生的误解。
但是,目前表情以静态或者动态图片为主,形式和内容比较单调。
发明内容
本申请实施例提供一种表情制作方法和装置,能够丰富表情的形式和内容,从而提高用户体验。
第一方面,本申请实施例提供一种表情制作方法,应用于电子设备,包括:显示第一界面,第一界面包含语音输入按钮;响应于用户触发语音输入按钮的操作,接收用户输入的语音;对语音进行预设方式识别,预设方式识别至少包括内容识别,若语音包括目标关键词,向用户推荐第一图像表情集合,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系;响应于用户从第一图像表情集合中选择一个图像表情的操作,根据语音或语音对应的语义以及用户选择的图像表情得到目标表情。
基于本申请实施例提供的方法,手机接收用户输入的语音后,可以对语音进行内容识别,若语音包括目标关键词,手机向用户推荐第一图像表情集合,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系。这种根据用户输入的语音,自动推荐图像表情的方式可以简化用户操作,无需用户从海量图像表情中选择,使用户的操作更加便捷。而后,响应于用户从第一图像表情集合中选择一个图像表情的操作,可以根据语音与用户选择的图像表情得到目标表情。这样,目标表情中同时包含语音信息和图像信息,使信息的传递更加自然,情感的流露更加真实。或者,手机可以将语音对应的语义嵌入用户选择的图像表情以得到目标表情。这样,目标表情中同时包含图像信息和语音对应的语义(文字信息),可以更加准确地传递和表达用户的意图,能够提高用户体验。
在一种可能的实现方式中,预设方式识别还包括情感识别;若语音属于目标情感色彩,第一图像表情集合中的每个图像表情的第二表情标签与目标情感色彩具有匹配 关系。这样,可以更准确地为用户推荐表情,提高用户体验。
在一种可能的实现方式中,对语音进行内容识别之前,该方法还包括:显示第一提示信息,第一提示信息用于提示用户是否需要根据语音推荐图像表情;接收用户触发的根据语音推荐图像表情的操作。
在一种可能的实现方式中,根据语音或语音对应的语义以及用户选择的图像表情得到目标表情之前,该方法还包括:响应于用户从第一图像表情集合中选择一个图像表情的操作,显示第二提示信息,第二提示信息用于提示用户是否制作语音表情或文字表情。若用户选择制作语音表情,可以根据语音以及用户选择的图像表情得到目标表情;若用户选择制作文字表情,可以根据语音对应的语义以及用户选择的图像表情得到目标表情。
在一种可能的实现方式中,显示第一界面包括:显示与目标联系人的对话界面;或者,显示内容分享界面;或者,显示评论界面。
在一种可能的实现方式中,该方法还包括:向目标联系人对应的电子设备发送目标表情;或者,向提供内容分享界面或评论界面的应用程序对应的服务器上传目标表情。
在一种可能的实现方式中,根据语音与用户选择的图像表情得到目标表情包括:对语音编码压缩,并在用户选择的图像表情的预设位置添加预设标识,预设标识用于指示目标表情为语音表情;将编码压缩后的语音和添加预设标识后的图像表情加载为视频格式以得到目标表情。
在一种可能的实现方式中,根据语音对应的语义以及用户选择的图像表情以得到目标表情包括:将语音对应的全部文字或目标关键词转换为像素信息;将像素信息载入用户选择的图像表情的预设区域或空白区域。
在一种可能的实现方式中,该方法还包括:显示预览界面,预览界面包括目标表情,目标表情的预设位置包括预设标识或语音对应的语义,预设标识用于指示目标表情为语音表情。这样,用户可以在预览界面预览目标表情包,并可以对目标表情包做进一步的设置。
在一种可能的实现方式中,若目标表情的预设位置包括预设标识,该方法还包括:接收用户用于触发目标表情的操作,响应于用户用于触发目标表情的操作,播放目标表情携带的语音。
在一种可能的实现方式中,该方法还包括:对语音进行预设音效处理,预设音效处理包括男声化处理、女声化处理、卡通化处理、方言化处理、搞怪化处理或明星化处理中的至少一种。这样,可以满足用户的个性化需求,提高用户体验。
在一种可能的实现方式中,对语音进行预设音效处理包括:根据用户选择的图像表情的第三表情标签对语音进行预设音效处理,第三表情标签用于指示用户选择的图像表情的类型;若用户选择的图像表情的第三表情标签为预设人物类型,根据预设人物类型的声音特征对语音进行处理;若用户选择的图像表情的第三表情标签为预设动物类型,对语音进行搞怪处理或卡通化处理。这样,使图像表情的类型与语音的音效能够更好的匹配,可以提高用户体验。
在一种可能的实现方式中,该方法还包括:接收用户选择图片的操作,响应于用 户选择图片的操作,将图片中的目标区域加载到用户选择的图像表情或目标表情的预设位置。
在一种可能的实现方式中,若用户未从第一图像表情集合中选择一个图像表情,该方法还包括:响应于用户用于触发自定义表情模式的操作,显示画板界面;接收用户在画板界面上输入的涂鸦操作;根据涂鸦操作的运动轨迹生成简笔画;向用户推荐与简笔画的轮廓的相似度大于预设阈值的图像表情。这样,可以更好地适配用户的需求,提高用户体验。
在一种可能的实现方式中,若用户未从至少一个图像表情中选择一个图像表情,该方法还包括:接收用户从本地存储的图像表情中选择一个图像表情的操作。
第二方面,本申请实施例提供一种表情制作方法,应用于电子设备,包括:显示第二界面,第二界面包含图像表情选择按钮;响应于用户用于触发图像表情选择按钮的操作,显示至少一个图像表情;接收用户从至少一个图像表情中选中一个图像表情的操作;显示提示信息,提示信息用于提示用户是否需要制作语音表情;响应于用户确定制作语音表情的操作,根据用户选中的图像表情上的文字或用户输入的文本生成语音,根据语音与用户选中的图像表情得到语音表情。
基于本申请实施例提供的方法,手机接收用户选中的图像表情的操作后,可以根据该图像表情上的文字或用户输入的文本生成语音,并根据语音和图像表情得到语音表情,无需用户输入语音,简化了用户的操作步骤,能够便捷、智能地生成语音表情,丰富了表情的形式和内容,能够提高用户体验。
第三方面,本申请实施例提供一种电子设备,包括:显示单元,用于显示第一界面,第一界面包含语音输入按钮;接收单元,用于响应于用户触发语音输入按钮的操作,接收用户输入的语音;识别单元,用于对语音进行预设方式识别,预设方式识别至少包括内容识别,推荐单元,用于若语音包括目标关键词,向用户推荐第一图像表情集合,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系;处理单元,用于响应于用户从第一图像表情集合中选择一个图像表情的操作,根据语音或语音对应的语义以及用户选择的图像表情得到目标表情。
在一种可能的实现方式中,预设方式识别还包括情感识别;若语音属于目标情感色彩,第一图像表情集合中的每个图像表情的第二表情标签与目标情感色彩具有匹配关系。
在一种可能的实现方式中,显示单元还用于显示第一提示信息,第一提示信息用于提示用户是否需要根据语音推荐图像表情;接收用户触发的根据语音推荐图像表情的操作。
在一种可能的实现方式中,显示单元还用于,响应于用户从第一图像表情集合中选择一个图像表情的操作,显示第二提示信息,第二提示信息用于提示用户是否制作语音表情或文字表情。
在一种可能的实现方式中,显示单元用于,显示与目标联系人的对话界面;或者,显示内容分享界面;或者,显示评论界面。
在一种可能的实现方式中,还包括发送单元,用于向目标联系人对应的电子设备发送目标表情;或者,向提供内容分享界面或评论界面的应用程序对应的服务器上传 目标表情。
在一种可能的实现方式中,处理单元用于,对语音编码压缩,并在用户选择的图像表情的预设位置添加预设标识,预设标识用于指示目标表情为语音表情;将编码压缩后的语音和添加预设标识后的图像表情加载为视频格式以得到目标表情。
在一种可能的实现方式中,处理单元用于,将语音对应的全部文字或目标关键词转换为像素信息;将像素信息载入用户选择的图像表情的预设区域或空白区域。
在一种可能的实现方式中,显示单元还用于显示预览界面,预览界面包括目标表情,目标表情的预设位置包括预设标识或语音对应的语义,预设标识用于指示目标表情为语音表情。
在一种可能的实现方式中,若目标表情的预设位置包括预设标识,还包括播放单元,用于通过接收单元接收用户用于触发目标表情的操作,响应于用户用于触发目标表情的操作,播放目标表情携带的语音。
在一种可能的实现方式中,处理单元还用于对语音进行预设音效处理,预设音效处理包括男声化处理、女声化处理、卡通化处理、方言化处理、搞怪化处理或明星化处理中的至少一种。
在一种可能的实现方式中,处理单元用于,根据用户选择的图像表情的第三表情标签对语音进行预设音效处理,第三表情标签用于指示用户选择的图像表情的类型;若用户选择的图像表情的第三表情标签为预设人物类型,根据预设人物类型的声音特征对语音进行处理;若用户选择的图像表情的第三表情标签为预设动物类型,对语音进行搞怪处理或卡通化处理。
在一种可能的实现方式中,处理单元还用于,通过接收单元接收用户选择图片的操作,响应于用户选择图片的操作,将图片中的目标区域加载到用户选择的图像表情或目标表情的预设位置。
在一种可能的实现方式中,若用户未从第一图像表情集合中选择一个图像表情,显示单元还用于,响应于用户用于触发自定义表情模式的操作,显示画板界面;接收单元还用于接收用户在画板界面上输入的涂鸦操作;处理单元还用于根据涂鸦操作的运动轨迹生成简笔画;推荐单元还用于向用户推荐与简笔画的轮廓的相似度大于预设阈值的图像表情。
在一种可能的实现方式中,若用户未从至少一个图像表情中选择一个图像表情,接收单元还用于接收用户从本地存储的图像表情中选择一个图像表情的操作。
第四方面,本申请实施例提供一种电子设备,包括:显示单元,用于显示第二界面,第二界面包含图像表情选择按钮;显示单元还用于,响应于用户用于触发图像表情选择按钮的操作,显示至少一个图像表情;接收单元,用于接收用户从至少一个图像表情中选中一个图像表情的操作;显示单元还用于,显示提示信息,提示信息用于提示用户是否需要制作语音表情;处理单元,用于响应于用户确定制作语音表情的操作,根据用户选中的图像表情上的文字或用户输入的文本生成语音,根据语音与用户选中的图像表情得到语音表情。
第五方面,本申请实施例提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述任一方面提供的任意一种方法。
第六方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一方面提供的任意一种方法。
第七方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现上述任一方面提供的任意一种方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
第八方面,本申请实施例还提供了一种表情制作装置,该表情制作装置可以是处理设备、电子设备或芯片。该装置包括处理器,用于实现上述任一方面提供的任意一种方法。该装置还可以包括存储器,用于存储程序指令和数据,存储器可以是集成在该装置内的存储器,或设置在该装置外的片外存储器。该存储器与该处理器耦合,该处理器可以调用并执行该存储器中存储的程序指令,用于实现上述任一方面提供的任意一种方法。该表情制作装置还可以包括通信接口,该通信接口用于该表情制作装置与其它设备进行通信。
附图说明
图1为本申请实施例提供的一种适用于表情制作方法的系统架构示意图;
图2为本申请实施例提供的一种电子设备的结构示意图;
图3为本申请实施例提供的一种电子设备的软件结构示意图;
图4为本申请实施例提供的一种适用于表情制作方法的流程示意图;
图5为本申请实施例提供的一种手机的显示示意图;
图6为本申请实施例提供的又一种手机的显示示意图;
图7a为本申请实施例提供的再一种手机的显示示意图;
图7b为本申请实施例提供的再一种手机的显示示意图;
图8a为本申请实施例提供的再一种手机的显示示意图;
图8b为本申请实施例提供的一种生成目标表情的流程示意图;
图9a为本申请实施例提供的再一种手机的显示示意图;
图9b为本申请实施例提供的又一种生成目标表情的流程示意图;
图9c为本申请实施例提供的再一种生成目标表情的流程示意图;
图10为本申请实施例提供的再一种手机的显示示意图;
图11a为本申请实施例提供的再一种手机的显示示意图;
图11b为本申请实施例提供的再一种生成目标表情的流程示意图;
图12为本申请实施例提供的又一种适用于表情制作方法的流程示意图;
图13为本申请实施例提供的再一种手机的显示示意图;
图14为本申请实施例提供的再一种手机的显示示意图;
图15为本申请实施例提供的再一种生成语音表情的流程示意图;
图16为本申请实施例提供的又一种电子设备的结构示意图。
具体实施方式
目前,表情都是以静态或者动态图片为主,形式比较单一,内容比较单调。为了解决上述问题,现有技术中提出一种即时通讯方法、装置及系统,在即时通讯会话过程中,第一客户端接收用户触发的语音表情传输指令;根据语音表情传输指令,获取语音表情,语音表情括语音信息及表情图像信息;通过服务器将语音表情传输至第二客户端,以使得第 二客户端在即时通讯会话框显示表情图像信息对应的表情图像和播放语音信息对应的语音,以呈现语音表情。
现有技术强调语音信息和表情图像信息如何在即时通讯系统中传输,且用户触发语音表情传输指令时,需要在发送侧分别获取语音信息和表情图像并发送,生成语音表情的过程有很多人工介入和操作,不够智能和方便。
本申请实施例提供一种发送表情的方法,包括:电子设备显示第一界面,第一界面包含语音输入按钮;响应于用户触发语音输入按钮的操作,接收用户输入的语音;对语音进行内容识别,若语音包括目标关键词,向用户推荐第一图像表情集合,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系;即推荐的表情是贴近语音内容的。响应于用户从第一图像表情集合中选择一个图像表情的操作,根据语音与用户选择的图像表情得到目标表情;或者,根据语音对应的语义和用户选择的图像表情得到目标表情,目标表情可以包括图像信息和语音信息,或者目标表情可以包括图像信息以及语音对应的文字信息,从而丰富了表情的形式和内容。并且目标表情既能够贴近语音内容又能够符合用户需求,而且操作简便,可以提高用户体验。
本申请实施例提供的表情制作方法可以应用于各种能够发送表情的场景。比如在短信息、聊天应用等即时通信软件中发送消息的场景,或者在博客应用、社区应用等社交软件中发表各种观点、分享内容、评论、发表说说(心情)或文章(例如,博客)等场景。
如图1所示,为本申请实施例提供的一种适用于表情制作方法的架构示意图,包括第一电子设备110、服务器120和第二电子设备130。第一电子设备110、服务器120和第二电子设备130可以组成一个语音文字图像信息交互系统。
其中,服务器可以是即时通讯应用或社交应用对应的服务器。第一电子设备或第二电子设备可以是手机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备等电子设备,本申请实施例对电子设备的具体形态不作特殊限制。
示例性的,在聊天场景中,用户A可以在第一电子设备打开聊天应用,输入语音并选择一个图像表情以得到目标表情,第一电子设备可以将该目标表情括发送至聊天应用的服务器,服务器可以将该目标表情发送至第二电子设备,第二电子设备显示该目标表情,并可以(在用户B进行相应操作后,例如点击目标表情后)输出该目标表情携带的语音。
在其他场景,例如,发博客的场景,用户A可以在第一电子设备打开博客应用,用户不仅可以输入文字,还可以输入语音并选择一个图像表情以得到目标表情,第一电子设备可以将包括文字和目标表情的博客内容发送至博客应用的服务器。当博客应用的服务器接收到第二电子设备发送的对上述博客内容的请求消息后,服务器可以将包括该目标表情的博客内容发送至第二电子设备,第二电子设备显示包括该目标表情的博客内容,并可以(在用户B进行相应操作后,例如点击目标表情后)输出该目标表情携带的语音。
下面将结合附图对本申请实施例的实施方式进行详细描述。
请参考图2,为本申请实施例提供的一种电子设备100的结构示意图,该电子设备100可以是第一电子设备或第二电子设备。如图2所示,电子设备100可以包括处理器410,外部存储器接口420,内部存储器421,通用串行总线(universal serial bus,USB)接口430, 充电管理模块440,电源管理模块441,电池442,天线1,天线2,移动通信模块450,无线通信模块460,音频模块470,扬声器470A,受话器470B,麦克风470C,耳机接口470D,传感器模块480,按键490,马达491,指示器492,摄像头493,显示屏494,以及用户标识模块(subscriber identification module,SIM)卡接口495等。其中,传感器模块480可以包括压力传感器480A,陀螺仪传感器480B,气压传感器480C,磁传感器480D,加速度传感器480E,距离传感器480F,接近光传感器480G,指纹传感器480H,温度传感器480J,触摸传感器480K,环境光传感器480L,骨传导传感器480M等。
处理器410可以包括一个或多个处理单元,例如:处理器410可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器410中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器410中的存储器为高速缓冲存储器。该存储器可以保存处理器410刚用过或循环使用的指令或数据。如果处理器410需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减小了处理器410的等待时间,因而提高了系统的效率。
在一些实施例中,处理器410可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
可以理解的是,本实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块440用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块440可以通过USB接口430接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块440可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块440为电池442充电的同时,还可以通过电源管理模块441为电子设备供电。
电源管理模块441用于连接电池442,充电管理模块440与处理器410。电源管理模块441接收电池442和/或充电管理模块440的输入,为处理器410,内部存储器421,外部存储器,显示屏494,摄像头493,和无线通信模块460等供电。电源管理模块441还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块441也可以设置于处理器410中。在另一些实施例中,电源管 理模块441和充电管理模块440也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块450,无线通信模块460,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块450可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块450可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块450可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块450还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块450的至少部分功能模块可以被设置于处理器410中。在一些实施例中,移动通信模块450的至少部分功能模块可以与处理器410的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器470A,受话器470B等)输出声音信号,或通过显示屏494显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器410,与移动通信模块450或其他功能模块设置在同一个器件中。
无线通信模块460可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块460可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块460经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器410。无线通信模块460还可以从处理器410接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块450耦合,天线2和无线通信模块460耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based  augmentation systems,SBAS)。
电子设备100通过GPU,显示屏494,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏494和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器410可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏494用于显示图像,视频等。
显示屏494包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。
电子设备100可以通过ISP,摄像头493,视频编解码器,GPU,显示屏494以及应用处理器等实现拍摄功能。
ISP用于处理摄像头493反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头493中。
摄像头493用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头493,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口420可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口420与处理器410通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。本申请实施例中,外部存储卡(例如,Micro SD卡)可以用于存储系统相册中的全部图片,Micro SD卡通常是对用户开放的,用户可以自由删除和存取系统相册中的图片。
内部存储器421可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。 处理器410通过运行存储在内部存储器421的指令,从而执行电子设备100的各种功能应用以及数据处理。例如,在本申请实施例中,处理器410可以通过执行存储在内部存储器421中的指令,响应于用户在显示屏494的第二操作或第一操作,在显示屏494显示对应的显示内容。内部存储器421可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器421可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS),只读存储器(read-only memory,ROM)等。
电子设备100可以通过音频模块470,扬声器470A,受话器470B,麦克风470C,耳机接口470D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块470用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块470还可以用于对音频信号编码和解码。在一些实施例中,音频模块470可以设置于处理器410中,或将音频模块470的部分功能模块设置于处理器410中。扬声器470A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器470A收听音乐,或收听免提通话。受话器470B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器470B靠近人耳接听语音。麦克风470C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。电子设备100可以设置至少一个麦克风470C。在一些实施例中,电子设备100可以设置两个麦克风470C,除了采集声音信号,还可以实现降噪功能。在一些实施例中,电子设备100还可以设置三个,四个或更多麦克风470C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
在本申请实施例中,用户需要发送语音时,可以通过人嘴靠近麦克风470C发声,将声音信号输入到麦克风470C。而后,音频模块470可以用于将麦克风470C得到的模拟音频输入转换为数字音频信号,并对音频信号编码和解码。
耳机接口470D用于连接有线耳机。耳机接口470D可以是USB接口430,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器480A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器480A可以设置于显示屏494。压力传感器480A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器480A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏494,电子设备100根据压力传感器480A检测所述触摸操作强度。电子设备100也可以根据压力传感器480A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器480B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器480B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器480B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器480B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器480B还可以用于导航,体感游戏场景。本申请实施例中,电子设备100的显示屏494可折叠形成多个屏。每个屏中可以包括陀螺仪传感器480B,用于测量对应屏的朝向(即朝向的方向向量)。电子设备100根据测量得到的每个屏的朝向的角度变化,可以确定出相邻屏的夹角。
气压传感器480C用于测量气压。在一些实施例中,电子设备100通过气压传感器480C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器480D包括霍尔传感器。电子设备100可以利用磁传感器480D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器480D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器480E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。需要注意的是,在本申请实施例中,电子设备100的显示屏494可折叠形成多个屏。每个屏中可以包括加速度传感器480E,用于测量对应屏的朝向(即朝向的方向向量)。
距离传感器480F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器480F测距以实现快速对焦。
接近光传感器480G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器480G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器480G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器480L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏494亮度。环境光传感器480L也可用于拍照时自动调节白平衡。环境光传感器480L还可以与接近光传感器480G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器480H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器480J用于检测温度。在一些实施例中,电子设备100利用温度传感器480J检测的温度,执行温度处理策略。例如,当温度传感器480J上报的温度超过阈值,电子设备100执行降低位于温度传感器480J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池442加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100 对电池442的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器480K,也称“触控面板”。触摸传感器480K可以设置于显示屏494,由触摸传感器480K与显示屏494组成触摸屏,也称“触控屏”。触摸传感器480K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏494提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器480K也可以设置于电子设备100的表面,与显示屏494所处的位置不同。
本申请实施例中,目标表情的制作、传递、接收以及自定义的过程可以是通过触控屏上进行相应的操作完成的。
骨传导传感器480M可以获取振动信号。在一些实施例中,骨传导传感器480M可以获取人体声部振动骨块的振动信号。骨传导传感器480M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器480M也可以设置于耳机中,结合成骨传导耳机。音频模块470可以基于所述骨传导传感器480M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器480M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键490包括开机键,音量键等。按键490可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达491可以产生振动提示。马达491可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏494不同区域的触摸操作,马达491也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器492可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口495用于连接SIM卡。SIM卡可以通过插入SIM卡接口495,或从SIM卡接口495拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口495可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口495可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口495也可以兼容不同类型的SIM卡。SIM卡接口495也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
以下实施例中的方法均可以在具有上述硬件结构的电子设备100中实现。
可以理解的是,本实施例示意的结构并不构成对电子设备100的具体限定。在另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。例如,电子设备100还可以包括鼠标,键盘、画板等辅助设备,用于进行目标表情的制作、传递、接收以及自定义的过程。
图3是本发明实施例的电子设备100的软件结构框图。电子设备100的软件系统可以 采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
分层架构可将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为三层,从上至下分别为应用程序层(简称应用层),应用程序框架层(简称框架层),以及内核层(也称为驱动层)。
其中,应用层可以包括一系列应用程序包。如图3所示,应用层可以包括聊天应用和社交应用等多个应用程序包。应用层还可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息以及桌面启动(Launcher)等应用程序(图3中未示出)。
框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图3所示,框架层可以包括图像处理模块、语音处理模块、辅助处理模块和表情数据库模块。可选的,框架层还可以包括内容提供器,视图系统,电话管理器,资源管理器,通知管理器等(图3中未示出)。
其中,语音处理模块用于处理用户的语音信息。语音处理模块可以包括录音模块、放音模块、语音编解码模块、语音增强模块、语音识别模块、情感识别模块、音效处理模块和文本转语音模块等。其中,录音模块用于录制用户输入的语音;放音模块用于播放语音;语音编解码模块用于对用户输入的语音进行编码或解码;语音增强模块用来对用户输入的带噪语音进行去噪、去混响和去回声处理等;语音识别模块可以通过语音识别(automatic speech recognition,ASR)算法将用户输入的语音转化为文本信息;情感识别模块可以通过情感识别(speech emotion recognition,SER)算法从用户输入的语音中提取用户说话时的情感色彩;音效处理模块对用户输入的语音增加方言化,情绪化,卡通化,明星化等音效特性;文本转语音模块可以通过文本转语音(text to speech,TTS)算法将表情中的文字信息转化为音频信息。
图像处理模块用于处理用户输入的图像信息。图像处理模块可以包括图像编辑模块、图像编解码模块、图像渲染模块、光学字符识别(optical character recognition,OCR)模块和图像生成模块等。其中,图像编辑模块可为用户提供手动绘图功能;图像编解码模块用于对用户绘图的图像进行编码或解码;图像渲染模块用于对用户绘图的图像进行渲染;光学字符识别模块可以将图像中的文字从图像提取出来;图像生成模块可利用深度学习方式对用户绘图的图像进行丰富,生成相应的表情图片。
表情数据库模块用于存储图像表情数据集。
辅助处理模块包含表情推荐模块、文字嵌入和编辑模块,语音图像打包模块等。其中,表情推荐模块用于根据用户输入的语音文本信息和语音情感信息,以关键词匹配的方式从表情数据库模块获取相应的表情,以便推荐给用户使用;文字嵌入和编辑模块可实现将文字(语音识别得到的文字)嵌入到表情图像中,并可以为用户提供编辑文字格式的功能。语音图像打包模块,用来将语音和图像打包成一个完整的语音表情文件,可以以视频文件的格式进行存储。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面以在聊天应用中发送语音表情的场景,示例性说明电子设备100的软件工作流程。
用户打开聊天应用后,可以输入语音,内核层可以根据用户的输入操作产生相应的输入事件(如语音输入事件),并向应用程序框架层上报该事件。应用程序框架层可以通过语音处理模块对用户输入的语音进行相应的处理。而后,应用程序框架层中的辅助处理模块可以通过表情推荐模块确定一个或多个与用户输入的语音相匹配的图像表情,并通过显示驱动显示在显示界面。内核层接收到用户选择一个表情的操作后,可以向应用程序框架层上报表情选择事件,应用程序框架层中的辅助处理模块可以通过语音图像打包模块将语音和图像打包成一个完整的语音表情文件,而后可以通过电子设备的通信模块(例如,路由模块)发送给对端。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请的描述中,除非另有说明,“至少一个”是指一个或多个,“多个”是指两个或多于两个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
为了便于理解,以下结合附图对本申请实施例提供的表情制作方法进行具体介绍。
如图4所示,本申请实施例提供一种表情制作方法,以电子设备为手机,应用场景为聊天场景为例进行说明,包括:
401、显示第一界面,第一界面包含语音输入按钮,响应于用户触发语音输入按钮的操作,接收用户输入的语音。
其中,第一界面可以是与目标联系人的对话界面。或者,第一界面可以是内容分享界面,例如,心情发表界面、说说发表界面,或者第一界面可以是评论界面,例如博客评论界面,论坛评论界面或者朋友圈评论界面。
以聊天应用为例,例如手机可以显示用户(例如,Tom)与联系人Alice的对话界面。在一些实施例中,当用户希望发送语音表情时,如图5中的(a)所示,用户可以触发表情按钮301,响应于用户触发表情按钮301的操作,如图5中的(b)所示,手机显示表情菜单302,表情菜单302中可以包括语音表情制作按钮303。用户可以触发语音表情制作按钮303,响应于用户用于触发语音表情制作按钮303的操作,手机可以显示语音表情制作窗口304,该语音表情制作窗口304中包括语音输入按钮(录音按钮)305,在语音输入按钮305上,可以显示提示信息“按住说话”。语音表情制作窗口304中还可以包括提示信息306,该提示信息306用于提示用户如何制作语音表情。用户可以按住语音输入按钮305讲话(输入语音),例如,用户输入的语音可以为“讨打啊”。手机检测到用户按住语音输入按钮305的操作后,手机可以调用麦克风拾取相应的音频信号。当手机检测到用户讲话完毕后,例如,手机检测到用户放开语音输入按钮305后,或者检测到用户在预设时间间隔内未说话,手机认为用户讲话完毕。
手机检测到用户按住语音输入按钮305的操作后,手机可以通过自动语音识别算法,将用户输入的语音转变为相应的文本翻译成相应的文字,并将语音对应的文字307显示在显示界面上。如图5中的(c)所示,用户输入的语音对应的文字307可以为“讨打啊”。可选的,手机还可以将用户输入的语音的频率308显示在显示界面上。当然,手机还可以对 用户输入的带噪语音进行语音增强处理,例如进行去噪、去混响和去回声处理等以得到干净语音,具体过程可以参考现有技术,本申请不做赘述。
在一些实施例中,当用户希望发送语音表情时,如图6中的(a)所示,用户可以触发语音按钮309,当手机检测到用户触发语音按钮309的操作后,如图6中的(b)所示,手机可以显示语音输入窗口310。语音输入窗口310中可以包括语音输入按钮(录音按钮)305,在语音输入按钮305上,可以显示提示信息“按住说话”。用户可以按住语音输入按钮305讲话(输入语音)。手机检测到用户按住语音输入按钮305的操作后,可以调用麦克风拾取相应的音频信号。当手机检测到用户讲话完毕后,例如,手机检测到用户放开语音输入按钮305后,或者检测到用户在预设时间间隔内未说话,手机认为用户讲话完毕。
手机检测到用户讲话完毕后,可以显示第一提示信息,第一提示信息用于提示用户是否需要根据语音推荐图像表情,若手机接收到用户触发的根据语音推荐图像表情的操作,手机可以执行步骤402。
示例性的,如图6中的(c)所示,手机可以弹出弹框311,该弹框311中包括发送语音、发送文字和根据语音推荐表情等选项,响应于用户选中根据语音推荐表情的选项,手机可以执行步骤402。
402、对用户输入的语音进行预设方式识别,预设方式识别至少包括内容识别,若语音包括目标关键词,向用户推荐第一图像表情集合。
其中,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系。
需要说明的是,手机可以在本地的图像表情数据库预先存储图像表情数据集,该图像表情数据集中包括多个图像表情和每个图像表情对应的表情标签,每个图像表情可以包括多个表情标签,例如第一表情标签、第二表情标签、第三表情标签等等。其中,第一表情标签用于标识图像表情的关键特征信息,该关键特征信息用于描述图像表情的主旨或主题内容。可以认为第一表情标签是多个表情标签中优先级最高的表情标签。第一表情标签例如可以是“打”、“哈哈”、“丧”等等。
对于关键特征信息相同或相近的图像表情,其第一表情标签相同。示例性的,如表1所示,第一表情标签相同的图像表情可以组成一个表情集合,第一表情集合可以是表情集合1或表情集合2等。
表1
Figure PCTCN2020135041-appb-000001
手机可以通过语音识别模块对用户输入的语音进行内容识别,确定该语音中是否包含目标关键词。目标关键词例如可以是“打”、“哈哈”、“丧”等等。例如,若用户输入的语 音为“你小子真是讨打啊”,那么手机可以识别到关键词“打”。若用户输入的语音为“最后一分钟赶上车,哈哈”,那么手机可以识别到关键词“哈哈”。若识别到语音包括目标关键词,手机向用户推荐第一图像表情集合。第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系。例如,若目标关键词为:“打”,那么第一表情集合为第一表情标签为“打”的表情集合,例如第一表情集合可以为表1中的表情集合1。
或者,手机也可以将用户输入的语音发送至服务器,服务器上预先存储有上述图像表情数据集。服务器可以对用户输入的语音进行内容识别,若服务器确定语音包括目标关键词,可以向手机发送第一图像表情集合,手机接收服务器发送的第一图像表情集合并向用户推荐该第一图像表情集合,即将第一图像表情集合显示在显示界面。
示例性的,如图7a所示,若用户输入的语音为“讨打啊”,可以确定该语音中包括目标关键词“打”,手机可以将第一表情标签为“打”的第一表情集合中的图像表情显示在显示界面供用户进行选择。可选的,手机在显示推荐的至少一个图像表情时,还可以在显示界面显示用户输入的语音对应的频率和文字402,防止用户在选择图像表情时忘记之前输入的语音内容,以提示用户选择与语音的文字更加贴近的图像表情,使用户制作的语音表情更加形象生动。
另外,若确定用户输入的语音中不包括与表情数据集中任何一个图像表情的第一表情标签匹配的目标关键词,那么可以不推荐图像表情,并可以提示用户重新输入语音。
在一些实施例中,预设方式识别还包括情感识别,即还可以对用户输入的语音进行情感识别,若语音属于目标情感色彩(情感基调/情感指向),第一图像表情集合中的每个图像表情的第二表情标签与目标情感色彩具有匹配关系。第二表情标签用于标识图像表情的情感色彩。可以认为第二表情标签是多个表情标签中优先级为第二的表情标签。第二表情标签例如可以是“开心”、“愤怒”、“难过”等等。示例性的,如表2所示,每个表情集合(第一表情标签相同的集合)中可以包括一个或多个表情子集,表情子集中的每个图像表情的第二表情标签相同。第一表情集合可以是表情集合1的表情子集1或表情子集2等。
表2
Figure PCTCN2020135041-appb-000002
例如,若手机确定用户输入的语音的音调高,频率快,可以认为用户的情感色彩为:“生气”,手机可以从第一表情集合中确定出第二表情标签为“生气”这一情感色彩的表情子集,并显示在显示界面以供用户选择。这样,可以更准确地为用户推荐表情,提高用户体验。
另外,如果用户输入的语音没有明显的情感色彩,例如,语气很平缓,没有明显起伏, 那么可以直接仅根据目标关键词向用户推荐表情集合。
其中,手机向用户推荐的图像表情可以包括emoji(表情符号)和sticker(表情贴图),sticker可以包括静态的表情图片,还可以包括动态的表情图片(例如gif动图)。当然,手机向用户推荐的图像表情也可以是本地的或者下载的静态图片或动态图片,本申请不做限定。
在另一些实施例中,手机也可以对用户输入的语音先进行情感识别,并将识别出的情感色彩与图像表情的第二表情标签进行匹配,以确定出第二图像表情集合,第二图像表情集合中的每个图像表情的第二表情标签与用户输入的语音的情感色彩相匹配。此时,可以认为第二表情标签是图像表情的多个表情标签中优先级最高的表情标签。而后,可以向用户推荐第二图像表情集合。
进一步的,可以再对用户输入的语音进行关键词识别,并确定该语音中是否包括目标关键词,以便从第二图像表情集合中确定出与目标关键词匹配的图像表情子集,该表情子集中的每个图像表情的第一表情标签与目标关键词相匹配。此时,可以认为第一表情标签是图像表情的多个表情标签中优先级第二的表情标签。而后,可以向用户推荐第二图像表情集合。
403、响应于用户从第一图像表情集合中选择一个图像表情的操作,根据语音或语音对应的语义以及用户选择的图像表情得到目标表情。
在一些实施例中,响应于用户从第一图像表情集合中选择一个图像表情的操作,可以显示第二提示信息,第二提示信息用于提示用户是否制作语音表情或文字表情。在一些实施例中,若第一图像表情集合中仅包括一个图像表情,即手机仅向用户推荐了一个图像表情,手机可以直接提示用户是否采用该图像表情制作语音表情或文字表情。
示例性的,如图7b所示,响应于用户从第一图像表情集合中选择一个图像表情的操作,手机可以显示弹框405,弹框405中可以包括语音表情选项和文字表情选项。响应于用户触发的制作语音表情的操作,根据语音与用户选择的图像表情得到目标表情;响应于用户触发的制作文字表情的操作,根据语音对应的语义以及用户选择的图像表情以得到目标表情。
手机根据语音与用户选择的图像表情得到目标表情的过程具体可以为:对用户输入的语音进行编码压缩,并在用户选择的图像表情的预设位置添加预设标识,预设标识用于指示目标表情为语音表情,例如可以在图像表情的空白处添加一个小喇叭标志,用于提示用户语音的存在;而后,将编码压缩后的语音和添加预设标识后的图像表情加载为视频格式以得到目标表情。用户可以选择存储或发送该目标表情。示例性的,如图8a所示,手机发送目标表情后,可以将目标表情403显示在聊天界面,该目标表情403上可以包括一个语音图标,以提示该语音表情中携带语音。用户可以触发该目标表情播放语音。例如,用户可以点击(单击或双击等)该目标表情的语音图标以播放语音。如图8b所示,为一种生成目标表情的流程图,该目标表情可以携带有用户输入的语音,具体过程可以参考上文相关描述,在此不做赘述。
手机根据语音对应的语义以及用户选择的图像表情得到目标表情的过程具体可以为:将语音对应的全部文字或目标关键词转换为像素信息;将像素信息载入用户选择的图像表情的预设区域或空白区域。其中,预设区域可以是图像表情的下方、上方、左方、右方等 边缘区域。若需要将文字载入空白区域,手机可以先识别图像表情的空白区域,再根据空白区域的大小对文字的尺寸进行适配并将文字嵌入到空白区域中。
可选的,手机将语音对应的全部文字或目标关键词嵌入用户选择的图像表情之前,可以识别用户选择的图像表情中是否包括文字,若用户选择的图像表情中不包括文字或者用户选择的图像表情中包括的文字与用户输入的语音所对应的文字不同,手机将语音对应的全部文字或目标关键词嵌入用户选择的图像表情以得到目标表情。
手机可以自动将用户输入的语音对应的文字嵌入用户选择的图像表情中,或者手机可以提供一个用于嵌入文字的按钮,以便用户手动将语音对应的文字嵌入用户选择的图像表情中。而后,手机可以显示预览界面,预览界面包括目标表情,目标表情的预设位置包括语音对应的文字。
示例性的,如图7a所示,用户选中表情401后,响应于用户选中表情401的操作,如图9a中的(a)所示,手机可以显示目标表情的预览界面,手机还可以在预览界面显示一个用于嵌入文字的按钮501,响应于用户触发按钮501的操作,手机可以将语音对应的文字嵌入目标表情的预设区域,例如,嵌入目标表情的下方。
可选的,手机可以根据用户选择的图像表情的背景颜色自动设置文字的颜色,以使文字更加醒目或更加贴近表情的背景颜色。或者,用户可以设置嵌入图像表情中的文字的字体、颜色等特征。例如,用户通过长按嵌入文字可以对文字进行字体、大小、粗体、斜体、艺术字、颜色、下划线等编辑处理,以更好地适配用户的需求,提高用户体验。
示例性的,如图9a中的(b)所示,用户可以点击表情上的文字502,如图9a中的(c)所示,手机可以弹出弹框503,在弹框503中,可以包括针对文字502的多种预设的文字格式,用户可以根据自己的喜好选择一种文字格式,手机可以根据用户选择的文字格式修改嵌入语音表情上的文字格式。可选的,用户还可以为文字设置各种字体、文字框或动画效果,本申请不做限定。
可选的,在文字嵌入过程中,用户同时还可以控制文字嵌入到表情包图片中的位置和旋转角度;如果表情包是gif动图,还可以对文字的动图特效(动画效果)进行编辑和处理,以匹配用户的行为习惯和喜好等。
在一些实施例中,手机还可以将语音和语音对应的文字同时嵌入用户选择的图像表情以得到目标表情,具体过程参考上文的相关描述,在此不做赘述。如图9b所示,为一种生成目标表情的流程图,该目标表情包括用户输入的语音以及用户输入的语音对应的文字信息(全部文字或关键词)。具体过程可以参考上文相关描述,在此不做赘述。
若目标表情为语音表情,即其预设位置包括预设标识(例如,小喇叭标识),手机接收用户用于触发目标表情的操作后,可以播放目标表情携带的语音。例如,用户可以点击(单击或双击等)该目标表情的预设标识以播放语音。
在一种可能的设计中,可以进一步对目标表情的语音进行预设音效处理,预设音效处理可以包括男声化处理(例如,大叔音效)、女声化处理(例如,萝莉音效、女神音效)、卡通化处理、方言化处理、搞怪化处理(例如,汪星人音效、喵星人音效)、明星化处理或情绪化处理中的至少一种。如图9c所示,为一种生成目标表情的流程图,该目标表情可以携带有用户输入的语音,该语音可以具有方言化音效(例如,河南话音效)。
示例性的,如图10所示,用户可以点击语音表情403上的语音图标,响应于用户点 击语音图标的操作,手机可以显示弹框504。在弹框504中,可以包括多种预设音效,用户可以根据自己的喜好选择一种音效,手机可以基于用户选择的音效对语音表情403的语音进行相应处理,以更好地适配用户的需求,提高用户体验。
在一些实施例中,手机可以根据用户选择的图像表情的第三表情标签对语音进行预设音效处理,第三表情标签用于指示用户选择的图像表情的类型。
若用户选择的图像表情的第三表情标签为预设人物类型,可以根据预设人物类型的声音特征对用户输入的语音进行处理。例如,若第三表情标签为相声演员XX,手机可以根据该相声演员XX的音色、音高、音速、音量、语调或声纹等声音特征修改用户输入的语音。若用户选择的图像表情的第三表情标签为预设动物类型,对语音的音色进行搞怪处理或卡通化处理。例如,若用户选择的图像表情的第三表情标签为猫,可以对用户输入的语音添加喵星人音效。这样一来,通过对用户输入的语言进行个性化音效处理,可以使得目标表情的表达更丰富,更有趣味性。
可以理解的是,手机可以预先存储每个图像表情的第三表情标签对应的音效,例如,若第三表情标签为相声演员XX,手机可以存储该相声演员XX的音色、音高、音速、音量、语调或声纹等特征。若第三表情标签为猫或狗等动物,手机可以存储相应的喵星人音效或汪星人音效。
在一种可能的设计中,手机接收到用户选择一个图像表情的操作后,还可以接收用户选择图片的操作,并可以将图片中的目标区域加载到用户选择的图像表情的预设位置。例如,如图9a中的(a)所示,可以在嵌入文字按钮501旁边,设置一个嵌入图片按钮(图9a中的(a)未示出),响应于用户选中嵌入图片按钮的操作,手机可以调用系统相册供用户选择图片,用户选中图片后,手机可以将用户选中的图片中的目标区域覆盖在用户选择的图像表情上得到自定义表情。其中,目标区域可以是包含人脸的区域,用户选中一张包含人脸的图片后,手机可以将用户选中的图片中的人脸区域覆盖在用户选择的图像表情的预设位置以得到自定义表情。手机可以自动识别用户选中的图片的人脸区域,或者,用户可以手动选择人脸区域,本申请不做限定。这样,用户可以在手机推荐的表情的基础上,选择一个表情进行个性化设计,并根据该个性化设计得到的自定义表情生成目标表情(语音表情或文字表情),更具有娱乐性,可以提高用户体验。
在一种可能的设计中,手机接收到用户选择一个图像表情的操作后,还可以接收用户对该图像表情的涂鸦或贴“贴纸”(例如在原图像表情上增加爱心、星星、气球等贴纸图案)等操作,根据涂鸦后或贴“贴纸”后的图像表情生成目标表情。这样,用户可以在手机推荐的表情的基础上,选择一个表情进行个性化设计,并根据该个性化设计得到的自定义表情生成目标表情(语音表情或文字表情),更具有娱乐性,可以提高用户体验。
进一步的,手机可以将生成的目标表情保存起来,生成语音表情发送记录,当用户下一次需要发送语音表情时,手机可以将用户之前发送过的语音表情显示出来,以便用户可以直接选择一个语音表情进行发送,更加方便快捷。
在一些实施例中,若用户认为手机推荐的图像表情不符合心理预期或不贴合当前场景,而未从至少一个图像表情中选择一个图像表情,用户可以触发自定义表情模式,响应于用户用于触发自定义表情模式的第三操作,显示画板界面,接收用户在画板界面上输入的涂鸦操作,根据涂鸦操作的运动轨迹生成简笔画。
示例性的,如图7a所示,手机在显示推荐的至少一个图像表情时,可以显示自定义控件404。若用户不满意手机推荐的图像表情,用户可以点击控件404以触发自定义表情模式。响应于用户点击控件404的操作(第三操作),如图11a所示,手机可以显示画板界面1101。用户可以触发画笔控件1102在画板界面进行涂鸦,手机接收用户在画板界面上输入的涂鸦操作,根据涂鸦操作的运动轨迹生成简笔画。而后,手机可以向用户推荐与简笔画的轮廓的相似度大于预设阈值的图像表情,响应于用户从选中该相似度大于预设阈值的图像表情的操作,可以将用户输入的语音与用户选择的图像表情打包以得到目标表情;或者,将语音对应的全部文字或目标关键词嵌入用户选择的图像表情以得到目标表情。示例性的,如图11a所示,手机可以向用户推荐与简笔画相似的图像表情1103,若用户选择该图像表情1103,手机可以根据该图像表情1103生成目标表情。
如图11b所示,为一种生成目标表情的流程图,该目标表情携带有用户输入的语音,并且该目标表情的图像信息可以是根据用户画的简笔画推荐得到的。具体过程可以参考上文相关描述,在此不做赘述。
另外,手机也可以基于生成对抗网络(generative adversarial networks,GAN)对该简笔画进行渲染得到一个个性化表情,根据该个性化表情生成目标表情。或者,手机可以直接根据该简笔画生成语音表情,本申请不做限定。
在一些实施例中,若用户认为手机推荐的图像表情不符合心理预期或不贴合当前场景,而未从至少一个图像表情中选择一个图像表情,用户也可以从本地存储的图像表情中选择一个图像表情。
404、发送目标表情。
手机可以向目标联系人对应的电子设备发送目标表情;或者,可以向提供内容分享界面或评论界面的应用程序对应的服务器上传目标表情。
具体的,手机检测到用户触发发送目标表情的操作后,例如,手机检测到用户对语音表情的上划或长按操作后,可以发送目标表情,例如可以将目标表情转发给相应的联系人(例如,Alice)。Alice使用的电子设备可以接收该目标表情,并将该目标表情显示在聊天界面。若第二电子设备接收到Alice触发目标表情的语音标识的操作,第二电子设备可以调用扬声器输入表情携带的语音,这样用户可以听到语音表情中的语音。其中,触发表情的语音标识的操作可以包括点击操作、滑动操作、长按操作等。其中,点击操作可以包括单击操作、连击操作等,本申请不做限定。
另外,上述实施例中均以聊天场景举例说明发送表情的具体方法,可以理解的是,上述方法也可以应用在发表博客、发表心情、回复或评论等社交场景中。例如,用户在发表博客时,可以在发表文字时,插入语音表情,即博客内容包括文字和语音表情。当然,博客内容还可以包括图片、视频等,本申请实施例对此不做任何限制。
基于本申请实施例提供的方法,手机接收用户输入的语音后,可以对语音进行内容识别,若语音包括目标关键词,手机向用户推荐第一图像表情集合,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系。这种根据用户输入的语音,自动推荐图像表情的方式可以简化用户操作,无需用户从海量图像表情中选择,使用户的操作更加便捷。而后,响应于用户从第一图像表情集合中选择一个图像表情的操作,可以根据语音与用户选择的图像表情得到目标表情。语音作为信息的另外一种载体,可以传递丰 富的内容,带来更强的娱乐性,从而丰富了表情的形式和内容。目标表情中同时包含语音信息和图像信息,使信息的传递更加自然,情感的流露更加真实。或者,手机可以将语音对应的全部文字或目标关键词嵌入用户选择的图像表情以得到目标表情。这样,目标表情中同时包含图像信息和语音对应的文字信息,可以更加准确地传递和表达用户的意图,能够提高用户体验。而且,语音控制不需要手势操作,非常适合用户在开车时使用电子设备等场景。
在另一些实施例中,如图12所示,提供一种发送表情的方法,以电子设备为手机为例进行说明,包括:
1201、显示第二界面,第二界面包含图像表情选择按钮,响应于用户用于触发图像表情选择按钮的操作,显示至少一个图像表情。
其中,第二界面可以是与目标联系人的对话界面;或者,第二界面可以是内容分享界面;或者,第二界面可以是评论界面。
1202、接收用户从至少一个图像表情中选中一个图像表情的操作。
示例性的,用户可以从手机提供的sticker表情或本地存储的图像表情中选择一个图像表情。
1203、显示提示信息,提示信息用于提示用户是否需要制作语音表情。
示例性的,如图13中的(a)所示,响应于用户选中携带文字的图像表情701的操作,手机可以显示提示信息。例如,如图13中的(b)所示,该提示信息可以是一个提示框702,该提示框702中可以包括提示文字:“是否需要制作语音表情”,该提示框中还可以包括“是”和“否”的按钮,若用户点击“是”的按钮,手机确定用户需要制作语音表情,手机执行步骤1204。若用户点击“否”的按钮,手机可以将用户选中的表情直接发送给对端联系人。
1204、响应于用户确定制作语音表情的操作,根据用户选中的图像表情上的文字或用户输入的文字生成语音,根据语音与用户选中的图像表情得到语音表情。
手机可以通过光学字符识别技术识别和提取图像表情上的文字,根据识别得的文字生成语音,再根据语音与用户选中的图像表情得到语音表情。示例性的,如图14所示,手机从图像表情中提取的文字可以是“讨打啊”,根据该文字可以生成相应的语音。
或者,手机可以接收用户输入的文本,例如可以接收用户通过软键盘输入的文字或复制粘贴的文字,根据该文字生成语音,再根据语音与用户选中的图像表情得到语音表情。
可选的,手机还可以对生成的语音进行预设音效处理或个性化处理,具体描述可以参考步骤403,在此不做赘述。而后,手机可以根据预设音效处理或个性化处理后的语音和图像表情生成语音表情。
如图15所示,为一种基于用户选择的携带文字的图像表情生成语音表情的流程图,该语音表情携带有根据文字生成的相应语音,该语音可以具有方言化音效(例如,河南话音效)。具体过程可以参考上文相关描述,在此不做赘述。
1205、发送语音表情。
具体描述参考步骤404,在此不做赘述。
基于本申请实施例提供的方法,手机接收用户选中图像表情的操作后,可以根据图像表情上的文字或用户输入的文本生成语音,并根据语音和图像表情得到语音表情,无需用户输入语音,简化了用户的操作步骤,能够便捷、智能地生成语音表情,丰富了表情的形 式和内容,能够提高用户体验。
上述主要从电子设备的角度对本申请实施例提供的方案进行了介绍。可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图16示出了上述实施例中涉及的电子设备16的一种可能的组成示意图,该电子设备16可以包括:显示单元1601、接收单元1602、识别单元1603、推荐单元1604和处理单元1605。在本申请实施例中,显示单元1601,用于显示第一界面,第一界面包含语音输入按钮;接收单元1602,用于响应于用户触发语音输入按钮的操作,接收用户输入的语音;识别单元1603,用于对语音进行预设方式识别,预设方式识别至少包括内容识别,推荐单元1604,用于若语音包括目标关键词,向用户推荐第一图像表情集合,第一图像表情集合中的每个图像表情的第一表情标签与目标关键词具有匹配关系;处理单元1605,用于响应于用户从第一图像表情集合中选择一个图像表情的操作,根据语音或语音对应的语义以及用户选择的图像表情得到目标表情。
在采用集成的单元的情况下,电子设备可以包括处理模块、存储模块和通信模块。其中,处理模块可以用于对电子设备的动作进行控制管理,例如,可以用于支持电子设备执行上述显示单元1601、接收单元1602、识别单元1603、推荐单元1604和处理单元1605执行的步骤。存储模块可以用于支持电子设备存储程序代码和数据等。通信模块,可以用于支持电子设备与其他设备的通信。
其中,处理模块可以是处理器或控制器。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理(digital signal processing,DSP)和微处理器的组合等等。存储模块可以是存储器。通信模块具体可以为射频电路、蓝牙芯片、Wi-Fi芯片等与其他电子设备交互的设备。
本实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的表情制作方法。
本实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的表情制作方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行 时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中的表情制作方法。
其中,本实施例提供的电子设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种表情制作方法,其特征在于,应用于电子设备,包括:
    显示第一界面,所述第一界面包含语音输入按钮;
    响应于用户触发所述语音输入按钮的操作,接收所述用户输入的语音;
    对所述语音进行预设方式识别,所述预设方式识别至少包括内容识别,若所述语音包括目标关键词,向所述用户推荐第一图像表情集合,所述第一图像表情集合中的每个图像表情的第一表情标签与所述目标关键词具有匹配关系;
    响应于用户从所述第一图像表情集合中选择一个图像表情的操作,根据所述语音或所述语音对应的语义以及所述用户选择的图像表情得到目标表情。
  2. 根据权利要求1所述的表情制作方法,其特征在于,所述预设方式识别还包括情感识别;
    若所述语音属于目标情感色彩,所述第一图像表情集合中的每个图像表情的第二表情标签与所述目标情感色彩具有匹配关系。
  3. 根据权利要求1或2所述的表情制作方法,其特征在于,所述对所述语音进行预设方式识别之前,所述方法还包括:
    显示第一提示信息,所述第一提示信息用于提示所述用户是否需要根据所述语音推荐图像表情;
    接收所述用户触发的根据所述语音推荐图像表情的操作。
  4. 根据权利要求1-3任一项所述的表情制作方法,其特征在于,所述根据所述语音或所述语音对应的语义以及所述用户选择的图像表情得到目标表情之前,所述方法还包括:
    响应于用户从所述第一图像表情集合中选择一个图像表情的操作,显示第二提示信息,所述第二提示信息用于提示所述用户是否制作语音表情或文字表情。
  5. 根据权利要求1-4任一项所述的表情制作方法,其特征在于,所述显示第一界面包括:
    显示与目标联系人的对话界面;或者
    显示内容分享界面;或者
    显示评论界面。
  6. 根据权利要求5所述的表情制作方法,其特征在于,所述方法还包括:
    向所述目标联系人对应的电子设备发送所述目标表情;或者
    向提供所述内容分享界面或所述评论界面的应用程序对应的服务器上传所述目标表情。
  7. 根据权利要求1-6任一项所述的表情制作方法,其特征在于,所述根据所述语音与所述用户选择的图像表情得到目标表情包括:
    对所述语音编码压缩,并在所述用户选择的图像表情的预设位置添加预设标识,所述预设标识用于指示所述目标表情为语音表情;
    将编码压缩后的所述语音和添加预设标识后的所述图像表情加载为视频格式以得到所述目标表情。
  8. 根据权利要求1-7任一项所述的表情制作方法,其特征在于,所述根据所述语音对应的语义以及所述用户选择的图像表情以得到目标表情包括:
    将所述语音对应的全部文字或所述目标关键词转换为像素信息;
    将所述像素信息载入所述用户选择的图像表情的预设区域或空白区域。
  9. 根据权利要求1-8任一项所述的表情制作方法,其特征在于,所述方法还包括:
    显示预览界面,所述预览界面包括所述目标表情,所述目标表情的预设位置包括预设标识或所述语音对应的语义,所述预设标识用于指示所述目标表情为语音表情。
  10. 根据权利要求9所述的表情制作方法,其特征在于,若所述目标表情的预设位置包括预设标识,所述方法还包括:
    接收所述用户用于触发所述目标表情的操作,播放所述目标表情携带的语音。
  11. 根据权利要求1-10任一项所述的表情制作方法,其特征在于,所述方法还包括:
    对所述语音进行预设音效处理,所述预设音效处理包括男声化处理、女声化处理、卡通化处理、方言化处理、搞怪化处理或明星化处理中的至少一种。
  12. 根据权利要求11所述的表情制作方法,其特征在于,所述对所述语音进行预设音效处理包括:
    根据所述用户选择的图像表情的第三表情标签对所述语音进行预设音效处理,所述第三表情标签用于指示所述用户选择的图像表情的类型;
    若所述用户选择的图像表情的第三表情标签为预设人物类型,根据所述预设人物类型的声音特征对所述语音进行处理;
    若所述用户选择的图像表情的第三表情标签为预设动物类型,对所述语音进行搞怪处理或卡通化处理。
  13. 根据权利要求1-12任一项所述的表情制作方法,其特征在于,所述方法还包括:
    接收所述用户选择图片的操作,将所述图片中的目标区域加载到所述用户选择的图像表情或所述目标表情的预设位置。
  14. 根据权利要求1所述的表情制作方法,其特征在于,若所述用户未从所述第一图像表情集合中选择一个图像表情,所述方法还包括:
    响应于用户用于触发自定义表情模式的操作,显示画板界面;
    接收用户在所述画板界面上输入的涂鸦操作;
    根据所述涂鸦操作的运动轨迹生成简笔画;
    向所述用户推荐与所述简笔画的轮廓的相似度大于预设阈值的图像表情。
  15. 根据权利要求1所述的表情制作方法,其特征在于,若所述用户未从所述至少一个图像表情中选择一个图像表情,所述方法还包括:
    接收所述用户从本地存储的图像表情中选择一个图像表情的操作。
  16. 一种表情制作方法,其特征在于,应用于电子设备,包括:
    显示第二界面,所述第二界面包含图像表情选择按钮;
    响应于用户用于触发所述图像表情选择按钮的操作,显示至少一个图像表情;
    接收所述用户从所述至少一个图像表情中选中一个图像表情的操作;
    显示提示信息,所述提示信息用于提示用户是否需要制作语音表情;
    响应于所述用户确定制作语音表情的操作,根据所述用户选中的图像表情上的文字或所述用户输入的文本生成语音,根据所述语音与所述用户选中的图像表情得到语音表情。
  17. 一种表情制作装置,其特征在于,应用于电子设备,所述表情制作装置用于执行 如权利要求1-16中任一项所述的表情制作方法。
  18. 一种芯片系统,其特征在于,所述芯片系统应用于电子设备;所述芯片系统包括一个或多个接口电路和一个或多个处理器;所述接口电路和所述处理器通过线路互联;所述接口电路用于从所述电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括所述存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,所述电子设备执行如权利要求1-16中任一项所述的表情制作方法。
  19. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-16中任一项所述的表情制作方法。
  20. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-16中任一项所述的表情制作方法。
PCT/CN2020/135041 2019-12-10 2020-12-09 一种表情制作方法和装置 WO2021115351A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/836,212 US11941323B2 (en) 2019-12-10 2022-06-09 Meme creation method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911261292.3 2019-12-10
CN201911261292.3A CN113051427A (zh) 2019-12-10 2019-12-10 一种表情制作方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/836,212 Continuation US11941323B2 (en) 2019-12-10 2022-06-09 Meme creation method and apparatus

Publications (1)

Publication Number Publication Date
WO2021115351A1 true WO2021115351A1 (zh) 2021-06-17

Family

ID=76328853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135041 WO2021115351A1 (zh) 2019-12-10 2020-12-09 一种表情制作方法和装置

Country Status (3)

Country Link
US (1) US11941323B2 (zh)
CN (1) CN113051427A (zh)
WO (1) WO2021115351A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3955131A4 (en) * 2020-06-28 2022-10-19 Beijing Baidu Netcom Science Technology Co., Ltd. METHOD AND DEVICE FOR CREATING MEMS PACKAGING, ELECTRONIC DEVICE AND MEDIUM
CN114495988B (zh) * 2021-08-31 2023-04-18 荣耀终端有限公司 一种输入信息的情感处理方法及电子设备
CN115033318B (zh) * 2021-11-22 2023-04-14 荣耀终端有限公司 图像的文字识别方法、电子设备及存储介质
US20230229869A1 (en) * 2022-01-20 2023-07-20 Vmware, Inc. Local input method for remote desktops
CN114553810A (zh) * 2022-02-22 2022-05-27 广州博冠信息科技有限公司 表情图片合成方法及装置、电子设备
CN115348225B (zh) * 2022-06-06 2023-11-07 钉钉(中国)信息技术有限公司 表情信息处理方法、电子设备及存储介质
CN115102917A (zh) * 2022-06-28 2022-09-23 维沃移动通信有限公司 消息发送方法、消息处理方法及装置
CN115170239A (zh) * 2022-07-14 2022-10-11 艾象科技(深圳)股份有限公司 一种商品定制服务系统及商品定制服务方法
CN115460166A (zh) * 2022-09-06 2022-12-09 网易(杭州)网络有限公司 即时语音通信方法、装置、电子设备及存储介质
CN115212561B (zh) * 2022-09-19 2022-12-09 深圳市人马互动科技有限公司 基于玩家的语音游戏数据的服务处理方法及相关产品
CN117931333A (zh) * 2022-10-26 2024-04-26 华为技术有限公司 一种表盘界面显示方法及电子设备
CN117953919A (zh) * 2022-10-31 2024-04-30 腾讯科技(深圳)有限公司 数据处理方法、装置、设备、存储介质及计算机程序产品

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101072207A (zh) * 2007-06-22 2007-11-14 腾讯科技(深圳)有限公司 即时通讯工具中的交流方法及即时通讯工具
CN102541259A (zh) * 2011-12-26 2012-07-04 鸿富锦精密工业(深圳)有限公司 电子设备及其根据脸部表情提供心情服务的方法
CN106570106A (zh) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 一种输入过程中将语音信息转化为表情的方法和装置
CN106789581A (zh) * 2016-12-23 2017-05-31 广州酷狗计算机科技有限公司 即时通讯方法、装置及系统
CN107423277A (zh) * 2016-02-16 2017-12-01 中兴通讯股份有限公司 一种表情输入方法、装置及终端
CN107450746A (zh) * 2017-08-18 2017-12-08 联想(北京)有限公司 一种表情符号的插入方法、装置和电子设备

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
US8170872B2 (en) * 2007-12-04 2012-05-01 International Business Machines Corporation Incorporating user emotion in a chat transcript
US20130332168A1 (en) * 2012-06-08 2013-12-12 Samsung Electronics Co., Ltd. Voice activated search and control for applications
US9425974B2 (en) * 2012-08-15 2016-08-23 Imvu, Inc. System and method for increasing clarity and expressiveness in network communications
GB201401046D0 (en) * 2014-01-22 2014-03-05 Iedutainments Ltd Searching and content delivery system
US10146416B2 (en) * 2014-01-29 2018-12-04 Ingenious.Ventures, LLC Systems and methods for sensory interface
KR102337072B1 (ko) * 2014-09-12 2021-12-08 삼성전자 주식회사 이모티콘을 생성하는 방법 및 이를 지원하는 전자장치
KR20160085614A (ko) * 2015-01-08 2016-07-18 엘지전자 주식회사 이동단말기 및 그 제어방법
KR20160089152A (ko) * 2015-01-19 2016-07-27 주식회사 엔씨소프트 화행 분석을 통한 스티커 추천 방법 및 시스템
KR101634086B1 (ko) * 2015-01-19 2016-07-08 주식회사 엔씨소프트 감정 분석을 통한 스티커 추천 방법 및 시스템
KR101583181B1 (ko) * 2015-01-19 2016-01-06 주식회사 엔씨소프트 응답 스티커 추천방법 및 컴퓨터 프로그램
KR102306538B1 (ko) * 2015-01-20 2021-09-29 삼성전자주식회사 콘텐트 편집 장치 및 방법
US20160247500A1 (en) * 2015-02-22 2016-08-25 Rory Ryder Content delivery system
KR101620050B1 (ko) * 2015-03-03 2016-05-12 주식회사 카카오 인스턴트 메시지 서비스를 통한 시나리오 이모티콘 표시 방법 및 이를 위한 사용자 단말
KR102427302B1 (ko) * 2015-11-10 2022-08-01 삼성전자주식회사 통신 단말에서의 음성 통화 지원방법
US20170308289A1 (en) * 2016-04-20 2017-10-26 Google Inc. Iconographic symbol search within a graphical keyboard
CN105975563B (zh) * 2016-04-29 2019-10-11 腾讯科技(深圳)有限公司 表情推荐方法及装置
KR101780809B1 (ko) * 2016-05-09 2017-09-22 네이버 주식회사 이모티콘이 함께 제공되는 번역문 제공 방법, 사용자 단말, 서버 및 컴퓨터 프로그램
CN106372059B (zh) * 2016-08-30 2018-09-11 北京百度网讯科技有限公司 信息输入方法和装置
US11321890B2 (en) 2016-11-09 2022-05-03 Microsoft Technology Licensing, Llc User interface for generating expressive content
CN106531149B (zh) * 2016-12-07 2018-02-23 腾讯科技(深圳)有限公司 信息处理方法及装置
CN107369196B (zh) * 2017-06-30 2021-08-24 Oppo广东移动通信有限公司 表情包制作方法、装置、存储介质及电子设备
US11121991B2 (en) * 2017-07-03 2021-09-14 Mycelebs Co., Ltd. User terminal and search server providing a search service using emoticons and operating method thereof
CN108320316B (zh) * 2018-02-11 2022-03-04 秦皇岛中科鸿合信息科技有限公司 个性化表情包制作系统及方法
CN109165072A (zh) * 2018-08-28 2019-01-08 珠海格力电器股份有限公司 一种表情包生成方法及装置
US11093712B2 (en) * 2018-11-21 2021-08-17 International Business Machines Corporation User interfaces for word processors
CN109524027B (zh) * 2018-12-11 2024-05-28 平安科技(深圳)有限公司 语音处理方法、装置、计算机设备及存储介质
KR102657519B1 (ko) * 2019-02-08 2024-04-15 삼성전자주식회사 음성을 기반으로 그래픽 데이터를 제공하는 전자 장치 및 그의 동작 방법
CN110297928A (zh) * 2019-07-02 2019-10-01 百度在线网络技术(北京)有限公司 表情图片的推荐方法、装置、设备和存储介质
US11722749B2 (en) * 2019-07-31 2023-08-08 Rovi Guides, Inc. Systems and methods for providing content relevant to a quotation
CN110609723B (zh) * 2019-08-21 2021-08-24 维沃移动通信有限公司 一种显示控制方法及终端设备
US11758231B2 (en) * 2019-09-19 2023-09-12 Michael J. Laverty System and method of real-time access to rules-related content in a training and support system for sports officiating within a mobile computing environment
US11133025B2 (en) * 2019-11-07 2021-09-28 Sling Media Pvt Ltd Method and system for speech emotion recognition
US20210334068A1 (en) * 2020-04-25 2021-10-28 Qualcomm Incorporated Selectable options based on audio content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101072207A (zh) * 2007-06-22 2007-11-14 腾讯科技(深圳)有限公司 即时通讯工具中的交流方法及即时通讯工具
CN102541259A (zh) * 2011-12-26 2012-07-04 鸿富锦精密工业(深圳)有限公司 电子设备及其根据脸部表情提供心情服务的方法
CN107423277A (zh) * 2016-02-16 2017-12-01 中兴通讯股份有限公司 一种表情输入方法、装置及终端
CN106570106A (zh) * 2016-11-01 2017-04-19 北京百度网讯科技有限公司 一种输入过程中将语音信息转化为表情的方法和装置
CN106789581A (zh) * 2016-12-23 2017-05-31 广州酷狗计算机科技有限公司 即时通讯方法、装置及系统
CN107450746A (zh) * 2017-08-18 2017-12-08 联想(北京)有限公司 一种表情符号的插入方法、装置和电子设备

Also Published As

Publication number Publication date
CN113051427A (zh) 2021-06-29
US11941323B2 (en) 2024-03-26
US20220300251A1 (en) 2022-09-22

Similar Documents

Publication Publication Date Title
WO2021115351A1 (zh) 一种表情制作方法和装置
CN110286976B (zh) 界面显示方法、装置、终端及存储介质
WO2020078299A1 (zh) 一种处理视频文件的方法及电子设备
CN110825469A (zh) 语音助手显示方法及装置
CN111866404B (zh) 一种视频编辑方法及电子设备
US20230089566A1 (en) Video generation method and related apparatus
WO2021104485A1 (zh) 一种拍摄方法及电子设备
WO2020207326A1 (zh) 一种对话消息的发送方法及电子设备
CN110910872A (zh) 语音交互方法及装置
WO2020029306A1 (zh) 一种图像拍摄方法及电子设备
CN114390139B (zh) 一种电子设备在来电时呈现视频的方法、电子设备和存储介质
CN111742539B (zh) 一种语音控制命令生成方法及终端
CN112214636A (zh) 音频文件的推荐方法、装置、电子设备以及可读存储介质
WO2021052139A1 (zh) 手势输入方法及电子设备
CN112566152B (zh) 一种卡顿预测的方法、数据处理的方法以及相关装置
CN114242037A (zh) 一种虚拟人物生成方法及其装置
CN112150499A (zh) 图像处理方法及相关装置
CN115543145A (zh) 一种文件夹管理方法及装置
CN113163394B (zh) 一种情景智能服务的信息共享方法及相关装置
CN114444000A (zh) 页面布局文件的生成方法、装置、电子设备以及可读存储介质
CN113742460A (zh) 生成虚拟角色的方法及装置
CN116861066A (zh) 应用推荐方法和电子设备
WO2020216144A1 (zh) 一种添加邮件联系人的方法和电子设备
CN115730091A (zh) 批注展示方法、装置、终端设备及可读存储介质
CN115734032A (zh) 视频剪辑方法、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899702

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20899702

Country of ref document: EP

Kind code of ref document: A1