WO2022135323A1 - 图像生成方法、装置和电子设备 - Google Patents

图像生成方法、装置和电子设备 Download PDF

Info

Publication number
WO2022135323A1
WO2022135323A1 PCT/CN2021/139569 CN2021139569W WO2022135323A1 WO 2022135323 A1 WO2022135323 A1 WO 2022135323A1 CN 2021139569 W CN2021139569 W CN 2021139569W WO 2022135323 A1 WO2022135323 A1 WO 2022135323A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
input
message
conversation
target image
Prior art date
Application number
PCT/CN2021/139569
Other languages
English (en)
French (fr)
Inventor
明昊
Original Assignee
维沃移动通信(杭州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信(杭州)有限公司 filed Critical 维沃移动通信(杭州)有限公司
Publication of WO2022135323A1 publication Critical patent/WO2022135323A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0483Interaction with page-structured environments, e.g. book metaphor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application belongs to the technical field of speech recognition, and in particular relates to an image generation method, apparatus and electronic device.
  • the voice chat method has the advantages of being more convenient and quicker than the text chat method, the application of the voice chat method in people's life is more and more extensive.
  • the user needs to recall the content of the voice call, or consult the other party in the above voice call again, for example: User A communicates with User B through voice chat about the meeting time, After the meeting address and other information, if user A forgets the meeting address, he needs to ask user B again.
  • the voice information in the voice chat mode has the defect that it is inconvenient to query.
  • the purpose of the embodiments of the present application is to provide an image generation method, apparatus, and electronic device, which can solve the problem of inconvenient query of voice information in the voice chat mode.
  • an embodiment of the present application provides an image generation method, the method comprising:
  • the conversation interface including a voice conversation message
  • a target image is generated, wherein the target image includes text information corresponding to the voice conversation message.
  • an image generation device including:
  • a first display module for displaying a conversation interface, the conversation interface including voice conversation messages
  • a user input module configured to receive the first input of the user
  • a response module configured to generate a target image in response to the first input, wherein the target image includes text information corresponding to the voice conversation message.
  • embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.
  • an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .
  • an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the steps of the method.
  • a computer program product is provided, the computer program product is stored in a non-volatile storage medium, the computer program product is executed by at least one processor to implement the method of the first aspect step.
  • a communication device configured to perform the steps of the method of the first aspect.
  • a conversation interface is displayed, and the conversation interface includes a voice conversation message; a first input from a user is received; in response to the first input, a target image is generated, wherein the target image includes the voice conversation Text information corresponding to the message.
  • FIG. 1 is a flowchart of an image generation method provided by an embodiment of the present application.
  • Fig. 2a is one of application scenario diagrams of an image generation method provided by an embodiment of the present application.
  • 2b is the second application scenario diagram of an image generation method provided by an embodiment of the present application.
  • FIG. 2c is the third application scenario diagram of an image generation method provided by an embodiment of the present application.
  • FIG. 2d is a fourth application scenario diagram of an image generation method provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of another image generation method provided by an embodiment of the present application.
  • Fig. 4a is one of the application scene diagrams of another image generation method provided by the embodiment of the present application.
  • FIG. 4b is the second application scenario diagram of another image generation method provided by an embodiment of the present application.
  • FIG. 5 is a structural diagram of an image generation apparatus provided by an embodiment of the present application.
  • FIG. 6 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 7 is a structural diagram of another electronic device provided by an embodiment of the present application.
  • first, second and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between “first”, “second”, etc.
  • the objects are usually of one type, and the number of objects is not limited.
  • the first object may be one or more than one.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
  • FIG. 1 is a flowchart of an image generation method provided by an embodiment of the present application. As shown in FIG. 1 , the image generation method may include the following steps:
  • Step 101 Display a conversation interface, where the conversation interface includes a voice conversation message.
  • the above-mentioned conversation interface can be a conversation interface in any application program with conversation function, and the conversation interface can display the voice conversation messages received and sent by the conversation application program, such as: a voice call message or a voice message, wherein the voice A call message, which is expressed as a message that a voice call is established between at least two session contacts, so that the at least two session contacts can conduct a voice session through the voice call.
  • the difference between the above-mentioned voice message and the voice call message is that the above-mentioned voice message is sent by one session contact to another session contact, and the recording time of the voice message is often relatively short, for example: 15 seconds or 30 seconds, etc., and the voice call message can be used for at least two sessions. people have conversations, often with unlimited duration.
  • the above-mentioned conversation interface may also include text conversation messages, image conversation messages, and the like, which are not specifically limited herein.
  • Step 102 Receive a first input from the user.
  • the above-mentioned first input may include: at least one of a touch input and a pressing operation on a hardware button, which is used to trigger the conversion of the voice conversation message in the conversation interface into a text conversation message and display it on the target In the image, the specific form of the first input is not specifically limited here.
  • step 103 may be performed in response to the screenshot operation, or, the voice session message and preset controls ( For example, under the diligent operation of the voice conversion button), the execution process of step 103 is triggered by the touch input of the preset control.
  • Step 103 In response to the first input, generate a target image, where the target image includes text information corresponding to the voice conversation message.
  • the text information corresponding to the above-mentioned voice conversation message can be understood as: performing speech recognition on the voice conversation message to recognize the conversation content of the voice conversation message, and outputting the conversation content in the form of text, so as to obtain the Text information corresponding to the voice conversation message. In this way, by viewing the target image, the conversation content of the voice conversation message can be viewed.
  • the above-mentioned target image may be a static image or a dynamic image.
  • subsequent processing such as viewing, forwarding, sharing, and recording may be performed on the target image, so as to realize more processing methods for voice conversation messages.
  • contact A and contact B communicated a certain issue through a voice call message, and after the communication, contact A wants to share the process and conclusion of the discussion on the issue in the chat group chat, so that other contacts can It can be seen that at this time, under the current social chat software, contact A can only recall the content of the voice call, re-edit it into text, and then send it to the group.
  • the voice conversation message can be converted into a target image carrying text information, so as to facilitate subsequent operations such as viewing, forwarding, sharing, and recording of the target image.
  • the generating a target image in response to the first input includes:
  • voice recognition processing is performed on the voice call message to obtain a voice recognition result, wherein the voice recognition result includes: each segment of the voice call message The text information corresponding to the voice respectively, and the call contact corresponding to each piece of voice respectively;
  • a target image is generated, wherein the target image includes the speech recognition result.
  • the first preset control is pre-displayed on the session interface, or the user's input operation is used to trigger the display of the first preset control on the session interface.
  • the displayed session interface includes When a voice call message is received, if a user's long-press input on the voice call message is received, an interface as shown in FIG. 2a is displayed, and the interface includes a conversion option 22, which is the first preset control.
  • the above-mentioned target image includes the voice recognition result and the target image includes the voice recognition result, which can be understood as displaying the converted text information in association with the corresponding call contact according to the voice order in the voice call.
  • the user can know the call content of the voice call by viewing the target image, and identify which contact sent the text corresponding to each piece of voice, thereby making the voice recognition result displayed in the target image clearer.
  • contact A and contact B communicated a certain issue through a voice call in the conversation interface 20, and after the communication, contact A wanted to share the process and conclusion of the discussion on the issue In the chat group chat, so that other contacts can see, at this time, contact A only needs to long press the voice call message 21 displayed in the conversation interface 20 to display the edit corresponding to the voice call message 21.
  • the editing box 22 includes a conversion option 221 for converting the voice call message 21 into a target picture and a conversion option 222 for converting into text, when the user clicks on the conversion option 221, a conversation interface as shown in Figure 2b is displayed , the conversation interface includes a target picture 23, and the target picture 23 displays the recognition result after the speech recognition of the voice call message 21, and the recognition result may include the corresponding correspondence between the voice messages sent by the contact A and the contact B respectively text information. In this way, the user can perform operations such as forwarding, viewing, and the like on the target picture 23 .
  • a recording operation can be performed to facilitate subsequent voice recognition processing on the content of the voice call message.
  • the above-mentioned target image may also include the information of the voice conversation message.
  • the voice call icon and the voice call duration information in the lower right corner of the target image 23, The voice call icon and the voice call duration information, so that when the user sees the voice call icon and the voice call duration information, they can know that the target picture 23 is the target image corresponding to the voice call, and the call duration of the voice call is 3 minutes 28 seconds.
  • the target image can also be associated with the voice call through the above voice call icon, so that the voice call icon can be played by touching the voice call icon.
  • the audio content of the call can be
  • the audio content of the voice call can be played for confirmation, thereby improving the image provided by the embodiments of the present application.
  • the reliability of the generation method is not clear enough, or the user has doubts about the text content displayed in the target image.
  • the above-mentioned voice conversation message may include not only a voice call message, but also a voice message.
  • the conversation can be realized through the first input.
  • the voice messages in the interface are respectively converted into corresponding text information and displayed in the target image.
  • the generating a target image in response to the first input includes:
  • the intermediate image includes the at least two voice messages, the first input includes a first sub-input and a second sub-input;
  • the at least two voice messages are respectively converted into text information, and the display area corresponding to the voice messages in the intermediate image is respectively updated to the corresponding text information. text information corresponding to the voice message to generate the target image.
  • the above-mentioned first sub-input may be a screenshot input
  • the above-mentioned intermediate image may be a screenshot of the conversation interface.
  • the above-mentioned second preset control may be the same control as the first preset control, and may be displayed based on the same manner.
  • the above-mentioned second preset control may also be a control different from the first preset control.
  • the displayed conversation interface includes a voice message
  • the interface shown in FIG. 2c is displayed, and the interface includes a voice conversion button 28.
  • the voice conversion button 28 is the second preset control.
  • the text information corresponding to the at least two voice messages can also be classified according to the above-mentioned conversation interface.
  • Figure 2c it is assumed that contact A and contact B conduct conversational communication in the conversation interface 20, and during the communication process, contact A sends out two voice messages 24, the voice messages 24 Mixed between the text message 25 sent by the contact B and the text message 25 sent by the contact A, then when the user's click operation on the voice conversion button 28 is received, the voice message 24 is respectively converted into text. information, and generate and display the target image 26 as shown in FIG. 2d .
  • the display area corresponding to the voice message 24 in the conversation interface 20 displays the converted text information 27 of the voice message 24 .
  • the target image when the target image includes the text conversation message in the conversation interface and the text information after converting the voice message, the two can be displayed differently, for example, displayed as having different text boxes or different text boxes. text color, etc.
  • the voice call message and the voice message may also be processed by voice-to-text conversion, so that in the generated target image, It may include text information after the voice call message is processed by voice-text conversion, and text information after the voice message is processed by voice-text conversion.
  • a conversation interface is displayed, and the conversation interface includes a voice conversation message; a first input from a user is received; in response to the first input, a target image is generated, wherein the target image includes the voice conversation Text information corresponding to the message.
  • FIG. 3 is a flowchart of another image generation method provided by an embodiment of the present application.
  • the difference between the another image generation method and the image generation method shown in FIG. 1 is that the image generation method provided by the embodiment of the present application
  • Another image generation method is only applicable to: in the case that there are at least two voice conversation messages in the conversation interface, and this embodiment can also perform some voice conversation messages in the at least two voice conversation messages. Select to perform speech recognition conversion on only the selected voice conversation messages, and do not perform speech recognition conversion on unselected voice conversation messages.
  • the another image generation method may include the following steps:
  • Step 301 Display a conversation interface, where the conversation interface includes a voice conversation message.
  • This step has the same meaning as step 101 in the method embodiment shown in FIG. 1 , and the difference is only that there are at least two voice conversation messages in the conversation interface, and details are not repeated here.
  • Step 302 In the case of receiving the first sub-input, generate a preset window for frame selection of the conversation interface, and display in the preset window a message corresponding to each voice conversation message in the conversation interface. Select the control.
  • the above-mentioned first sub-input may include a screenshot operation for capturing the conversation interface.
  • prompt information may be output to prompt the user whether to choose to convert the voice conversation message in the conversation interface. It is text information. If the user selects "Yes”, a preset window for frame selection of the conversation interface will be generated, and the selection corresponding to each voice conversation message in the conversation interface will be displayed in the preset window. Control steps; if the user selects "No", a screenshot is directly generated, which is the same as the screenshot in the prior art, and will not be repeated here.
  • the preset window of the frame selection session interface may be a newly generated window, including the session content in the target area of the session interface.
  • the position and size of the window can be adjusted to adjust the session content in the window.
  • the above-mentioned preset window can be a frame selection area, for example, the frame selection area 41 in the interface as shown in FIG. 4a, so that the session message displayed in the session interface can be framed through the frame selection area 41, and the session message includes There are at least two voice messages 42, and may also include other text messages 43 or even picture messages, and each voice message 42 is correspondingly displayed with a selection control 44.
  • Step 303 In the case of receiving the second sub-input of the selection control corresponding to the target voice conversation message, convert the target voice conversation message in the preset window into text information.
  • the above-mentioned second sub-input can be a touch input such as a click on a selection control, so that the target voice conversation message corresponding to the selection control is selected, and the second sub-input can be ended in any of the following ways:
  • Manner 1 After stopping the operation on the selection control for a preset time length (for example: 2 seconds or 3 seconds, etc.), it is determined that the second sub-input ends, so as to execute the The step of converting the target voice conversation message into text information.
  • a preset time length for example: 2 seconds or 3 seconds, etc.
  • preset controls can also be displayed in the preset window, for example: the voice conversion button 45 shown in FIG. to determine the end of the second sub-input, thereby executing the step of converting the target voice conversation message in the preset window into text information, thereby converting the voice message 42 selected by the second sub-input into text information, In the interface as shown in FIG. 4 b , the voice message 42 is displayed as the corresponding text information 47 .
  • the above-mentioned process of converting the target voice conversation message into text information is the same as the process of converting the voice conversation message into text information in the method embodiment shown in FIG. 1 , the difference is that in this embodiment Only the text conversion is performed on the selected target voice conversation message, while in the method embodiment shown in FIG. 1 , the text conversion is performed on all the voice conversation messages in the conversation interface, and the text conversion process is not repeated here.
  • Step 304 Generate a target image, where the target image includes text information corresponding to the target voice conversation message, and voice messages corresponding to other voice conversation messages other than the target voice conversation message in the preset window. icon.
  • the voice icon in the target image may not have the function of voice playback, and cannot view text information, that is, when the target image is displayed, it can only be known that there is a voice message here through the voice icon, but for the voice The specific content of the message is not known.
  • This step is similar to step 103 in the method embodiment shown in FIG. 1 , the difference is that, in the target image generated in this embodiment, only the selected target voice conversation message is displayed as the corresponding text information. Other voice conversation messages are displayed as voice icons; however, in the method embodiment shown in FIG. 1 , all the voice conversation messages in the conversation interface can be displayed as corresponding text information in the target image.
  • the embodiment of the present application can also perform a selection operation on part of the voice conversation messages in the conversation interface, so that only the selected voice conversation messages are subjected to speech-to-text conversion processing, so as to The converted text information is displayed in the target image, and the voice-to-text conversion processing is not performed on the unselected voice conversation messages.
  • the voice-to-text conversion processing of private voice messages or irrelevant voice messages can be avoided, and displayed in the target image. While simplifying the voice-to-text conversion process, it can also protect the privacy of users.
  • the execution body may be an image generation apparatus, or a control module in the image generation apparatus for executing the image generation method.
  • the image generating apparatus provided by the embodiments of the present application is described by taking the image generating apparatus executing the method for generating a loaded image as an example.
  • FIG. 5 is a structural diagram of an image generation apparatus provided by an embodiment of the present application.
  • the image generation apparatus 500 may include:
  • a first display module 501 configured to display a conversation interface, where the conversation interface includes voice conversation messages;
  • a user input module 502 configured to receive a first input from a user
  • the generating module 503 is configured to generate a target image in response to the first input, wherein the target image includes text information corresponding to the voice conversation message.
  • the generating module 503 includes:
  • a voice recognition unit configured to perform voice recognition processing on the voice call message in response to a first input acting on a first preset control to obtain a voice recognition result, wherein the voice recognition result includes: the voice The text information corresponding to each piece of voice in the call message, and the call contact corresponding to each piece of voice respectively;
  • the first generating unit is configured to generate a target image, wherein the target image includes the speech recognition result.
  • the generating module 503 includes:
  • a second generating unit configured to generate an intermediate image in response to a first sub-input, wherein the intermediate image includes the at least two voice messages, and the first input includes a first sub-input and a second sub-input;
  • the updating unit is configured to convert the at least two voice messages into text information in response to the second sub-input acting on the second preset control, and respectively convert the voice messages corresponding to the voice messages in the intermediate image.
  • the display area is updated to the text information corresponding to the voice message to generate the target image.
  • the first input includes a screenshot operation for capturing the session interface.
  • the generating module 503 includes:
  • the third generation unit is used to generate a preset window for frame selection of the conversation interface when the first sub-input is received;
  • a display unit configured to display a selection control corresponding to each voice conversation message in the conversation interface in the preset window
  • a text conversion unit configured to convert the target voice conversation message in the preset window into text information when receiving the second sub-input of the selection control corresponding to the target voice conversation message, wherein the first input includes the first sub-input and the second sub-input;
  • a fourth generating unit configured to generate a target image, where the target image includes text information respectively corresponding to the target voice conversation message, and other voice conversation messages other than the target voice conversation message in the preset window corresponding voice icons.
  • the image generating apparatus provided in the embodiment of the present application can implement each process in the method embodiment shown in FIG. 1 or FIG. 3 , and can achieve the same beneficial effect, which is not repeated here in order to avoid repetition.
  • the image generating apparatus in this embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal.
  • the apparatus may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant).
  • UMPC ultra-mobile personal computer
  • assistant, PDA personal digital assistant
  • the non-mobile electronic device may be a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., which are not specifically limited in the embodiments of the present application.
  • the image generating apparatus in this embodiment of the present application may be an apparatus having an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
  • the image generating apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiment of FIG. 1 or FIG. 3 , and to avoid repetition, details are not described here.
  • an embodiment of the present application further provides an electronic device 600, including a processor 601, a memory 602, a program or instruction stored in the memory 602 and executable on the processor 601,
  • an electronic device 600 including a processor 601, a memory 602, a program or instruction stored in the memory 602 and executable on the processor 601,
  • the program or instruction is executed by the processor 601
  • each process of the above-mentioned image generation method embodiments can be implemented, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 700 includes but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, and a processor 710, etc. part.
  • the electronic device 700 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 710 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions.
  • a power source such as a battery
  • the structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device.
  • the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
  • the display unit 706 is configured to display a conversation interface, and the conversation interface includes a voice conversation message;
  • a user input unit 707 configured to receive a first input from a user
  • the processor 710 is configured to generate a target image in response to the first input, wherein the target image includes text information corresponding to the voice conversation message.
  • the performing of the processor 710 for generating a target image in response to the first input includes:
  • voice recognition processing is performed on the voice call message to obtain a voice recognition result, wherein the voice recognition result includes: each segment of the voice call message The text information corresponding to the voice respectively, and the call contact corresponding to each piece of voice respectively;
  • a target image is generated, wherein the target image includes the speech recognition result.
  • the step performed by the processor 710 for generating a target image in response to the first input includes:
  • the intermediate image includes the at least two voice messages, the first input includes a first sub-input and a second sub-input;
  • the at least two voice messages are respectively converted into text information, and the display area corresponding to the voice messages in the intermediate image is respectively updated to the corresponding text information. text information corresponding to the voice message to generate the target image.
  • the first input includes a screenshot operation for capturing the session interface.
  • the step performed by the processor 710 for generating a target image in response to the first input includes:
  • a preset window for selecting the dialog interface is generated, and the display unit 706 is controlled to display each item in the dialog interface in the preset window.
  • the selection controls corresponding to the voice conversation messages respectively;
  • the target voice conversation message in the preset window is converted into text information, wherein the first an input comprising the first sub-input and the second sub-input;
  • a target image is generated, where the target image includes text information respectively corresponding to the target voice conversation message, and voice icons respectively corresponding to other voice conversation messages other than the target voice conversation message in the preset window.
  • the electronic device 700 provided in this embodiment of the present application can perform each process in the method embodiment shown in FIG. 1 or FIG. 3 , and can achieve the same beneficial effect, which is not repeated here in order to avoid repetition.
  • the input unit 704 may include a graphics processor (Graphics Processing Unit, GPU) and a microphone, and the graphics processor can be used to capture images captured by an image capture device (such as a camera) in a video capture mode or an image capture mode.
  • the acquired still picture or video image data is processed.
  • the display unit 706 may include a display panel, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 707 includes a touch panel and other input devices. Touch panel, also known as touch screen.
  • the touch panel may include two parts, a touch detection device and a touch controller.
  • Memory 709 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems.
  • the processor 710 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the like, and the modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 710.
  • An embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, implements the image generation method embodiment shown in FIG. 1 or FIG. 2 .
  • a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, implements the image generation method embodiment shown in FIG. 1 or FIG. 2 .
  • Each process can achieve the same technical effect. In order to avoid repetition, it will not be repeated here.
  • the processor is the processor in the electronic device described in the foregoing embodiments.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run a program or an instruction, and realizes the process shown in FIG. 1 or FIG. 2 .
  • the chip includes a processor and a communication interface
  • the communication interface is coupled to the processor
  • the processor is used to run a program or an instruction, and realizes the process shown in FIG. 1 or FIG. 2 .
  • Each process of the embodiment of the image generation method is shown, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请公开了一种图像生成方法、装置和电子设备,属于语音识别技术领域。其中,图像生成方法包括:显示会话界面,所述会话界面包括语音会话消息;接收第一输入;响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。

Description

图像生成方法、装置和电子设备
相关申请的交叉引用
本申请主张在2020年12月23日在中国提交的中国专利申请No.202011537104.8的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于语音识别技术领域,具体涉及一种图像生成方法、装置和电子设备。
背景技术
鉴于语音聊天方式相较于文字聊天方式具有更加方便、快捷的优势,使得语音聊天方式在人们生活中的应用越来越广泛。
但是,在某些需要使用语音通话内容的应用场景下,用户需要通过回想语音通话的内容,或者再次咨询上述语音通话中的另一方,例如:用户甲通过语音聊天与用户乙沟通了会议时间、会议地址等信息之后,若用户甲忘记了会议地址,则需要再次询问用户乙。
由此可见,语音聊天方式中的语音信息存在不便于查询的缺陷。
发明内容
本申请实施例的目的是提供一种图像生成方法、装置和电子设备,能够解决语音聊天方式中的语音信息存在的不便于查询的问题。
为了解决上述技术问题,本申请是这样实现的:
第一方面,本申请实施例提供了一种图像生成方法,该方法包括:
显示会话界面,所述会话界面包括语音会话消息;
接收用户的第一输入;
响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
第二方面,本申请实施例提供了一种图像生成装置,包括:
第一显示模块,用于显示会话界面,所述会话界面包括语音会话消息;
用户输入模块,用于接收用户的第一输入;
响应模块,用于响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法的步骤。
第六方面,提供了一种计算机程序产品,所述计算机程序产品被存储在非易失的存储介质中,所述计算机程序产品被至少一个处理器执行以实现如第一方面所述的方法的步骤。
第七方面,提供了一种通信设备,被配置为执行如第一方面所述的方法的步骤。
在本申请实施例中,显示会话界面,所述会话界面包括语音会话消息;接收用户的第一输入;响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。通过在目标图像中展示语音会话消息对应的文字信息,这样,便于用户对该目标图像进行显示、转发等后续处理,从而能够实现对语音会话消息的消息内容进行便捷的查看和分享等操作。
附图说明
图1是本申请实施例提供的一种图像生成方法的流程图;
图2a是本申请实施例提供的一种图像生成方法的应用场景图之一;
图2b是本申请实施例提供的一种图像生成方法的应用场景图之二;
图2c是本申请实施例提供的一种图像生成方法的应用场景图之三;
图2d是本申请实施例提供的一种图像生成方法的应用场景图之四;
图3是本申请实施例提供的另一种图像生成方法的流程图;
图4a是本申请实施例提供的另一种图像生成方法的应用场景图之一;
图4b是本申请实施例提供的另一种图像生成方法的应用场景图之二;
图5是本申请实施例提供的一种图像生成装置的结构图;
图6是本申请实施例提供的一种电子设备的结构图;
图7是本申请实施例提供的另一种电子设备的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的图像生成方法、图像生成装置和电子设备进行详细地说明。
请参阅图1,是本申请实施例提供的一种图像生成方法的流程图,如图1 所示,该图像生成方法可以包括以下步骤:
步骤101、显示会话界面,所述会话界面包括语音会话消息。
在具体实施中,上述会话界面可以是任意具有会话功能的应用程序中的会话界面,该会话界面可以显示会话应用程序所接收和发送语音会话消息,例如:语音通话消息或语音消息,其中,语音通话消息,表示为:在至少两个会话联系人之间建立语音通话,以使该至少两个会话联系人通过该语音通话进行语音会话的消息,另外,上述语音消息与语音通话消息的不同之处在于,上述语音消息由一个会话联系人发送至另一会话联系人,且该语音消息的录音时长往往比较短,例如:15秒或者30秒等,而语音通话消息可以供至少两个会话联系人进行对话,且其通话时长往往没有限制。
需要说明的是,上述会话界面中除了语音会话消息之外,还可以包括文字会话消息、图像会话消息等,在此不作具体限定。
步骤102、接收用户的第一输入。
在具体实施中,上述第一输入可以包括:触控输入和对硬件按钮的按压操作中的至少一种,其用于触发将会话界面中的语音会话消息转化为文字会话消息,并显示于目标图像内,在此对第一输入的具体形式不作具体限定。
例如:通过在包括语音会话消息的会话界面上,执行用于截取所述会话界面的截图操作,则可以响应于该截图操作执行步骤103,或者,还可以在显示语音会话消息和预设控件(例如:语音转换按钮)的勤快下,通过对该预设控件的触控输入,以触发对步骤103的执行过程。
步骤103、响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
在实施中,上述语音会话消息对应的文字信息,可以理解为:对语音会话消息进行语音识别,以识别语音会话消息的会话内容,并将该会话内容以文字形式进行输出,从而得到对所述语音会话消息对应的文字信息。这样,通过查看该目标图像,便可以查看到语音会话消息的会话内容。
另外,上述目标图像可以是静态图像或者动态图像,在实施中,可以对 该目标图像进行查看、转发、分享、记录等后续处理,以实现对语音会话消息的更多处理方式。
例如:假设联系人A与联系人B通过语音通话消息,沟通了某一问题,且在沟通后联系人A想要将对该问题讨论的过程和结论分享到聊天群聊里,以让其他联系人看到,此时,在当前的社交聊天软件下,联系人A只能回忆语音通话的内容,并重新编辑成文字后,再发送到群里。
而通过本申请实施例提供的图像生成方法,能够将语音会话消息转化为携带有文字信息的目标图像,以便于对该目标图像进行查看、转发、分享、记录等后续操作。
作为一种可选的实施方式,在所述语音会话消息为语音通话消息的情况下,所述响应于所述第一输入,生成目标图像,包括:
响应于作用在第一预设控件上的第一输入,对所述语音通话消息进行语音识别处理,以得到语音识别结果,其中,所述语音识别结果包括:所述语音通话消息中的每一段语音分别对应的文字信息,以及所述每一段语音分别对应的通话联系人;
生成目标图像,其中,所述目标图像包括所述语音识别结果。
其中,在会话界面上预先显示有所述第一预设控件,或者通过用户的输入操作,以触发在所述会话界面上显示所述第一预设控件,例如:在显示的会话界面中包括语音通话消息时,若接收到用户对语音通话消息的长按输入,则显示如图2a所示界面,该界面包括转换选项22,该转换选项22即为第一预设控件。
另外,上述目标图像包括所述语音识别结果目标图像包括所述语音识别结果,可以理解为:按照所述语音通话中的语音顺序,将转化后的文字信息与对应的通话联系人进行关联显示。这样,用户通过查看该目标图像便可以获知该语音通话中的通话内容,且明确每一段语音对应的文字是哪一个联系人发出的,从而使得所述目标图像中显示的语音识别结果更加清晰。
例如:如图2a所示,假设联系人A与联系人B在会话界面20内通过语 音通话,沟通了某一问题,且在沟通后联系人A想要将对该问题讨论的过程和结论分享到聊天群聊里,以让其他联系人看到,此时,联系人A仅需对会话界面20内显示的语音通话消息21进行长按操作,便可以显示与该语音通话消息21对应的编辑框22,该编辑框22内包括用于将语音通话消息21转化为目标图片的转换选项221和转换为文字的转换选项222,当用户点击该转换选项221时,显示如图2b所示会话界面,该会话界面内包括目标图片23,该目标图片23内显示有对所述语音通话消息21进行语音识别后的识别结果,该识别结果可以包括联系人A和联系人B分别发出的语音信息对应的文字信息。这样,用户可以对该目标图片23进行转发、查看等操作。
可选地,在进行语音通话时,可以进行录音操作,便于后续对语音通话消息内容进行语音识别处理。
在实施中,上述目标图像中除了包括所述语音会话消息对应的文字信息之外,还可以包括所述语音会话消息的信息,例如:如图2b所示,在目标图片23的右下角显示有语音通话图标和语音通话时长信息,这样,用户看到该语音通话图标和语音通话时长信息时,便能够获知该目标图片23为语音通话对应的目标图像,且该语音通话的通话时长为3分28秒。
进一步的,在如图2b所示实施例中显示的语音通话图标的基础上,还可以通过上述语音通话图标将目标图像与语音通话相关联,以通过触控该语音通话图标,便可以播放语音通话的音频内容。
这样,在对语音识别不够清晰,或者用户对目标图像中显示的文字内容有疑问等应用场景下,可以通过播放语音通话的音频内容,以进行确认,从而提升本申请实施例提供的所述图像生成方法的可靠性。
需要说明的是,在实施中,上述语音会话消息除了可以包括语音通话消息之外,其还可以包括语音消息,通过本申请实施例提供的图像生成方法,能够通过第一输入实现将所述会话界面中的语音消息分别转化为对应的文字信息,并在目标图像中显示。
作为一种可选的实施方式,在所述语音会话消息包括至少两条语音消息 的情况下,所述响应于所述第一输入,生成目标图像,包括:
响应于第一子输入,生成中间图像,其中,所述中间图像包括所述至少两条语音消息,所述第一输入包括第一子输入和第二子输入;
响应于作用在第二预设控件上的所述第二子输入,分别将所述至少两条语音消息转化为文字信息,并分别将所述中间图像中与语音消息对应的显示区域更新为所述语音消息对应的文字信息,以生成所述目标图像。
在具体实施中,上述第一子输入可以是截图输入,上述中间图像可以是对会话界面的截图。
另外,上述第二预设控件可以是与第一预设控件相同的控件,且可以基于相同的方式进行显示。
当然,上述第二预设控件也可以是与第一预设控件不同的控件。例如:在显示的会话界面中包括语音消息时,若接收到用户的截图操作,则显示如图2c所示界面,该界面内包括语音转换按钮28,通过触控该语音转换按钮28,便可以触发将该截图中的语音消息转化为文字信息,此时,该语音转换按钮28即为第二预设控件。
在实施中,在上述会话界面还包括除了上述至少两条语音消息之外的文字信息或者图片信息的情况下,还可以将与所述至少两条语音消息分别对应的文字信息按照所述会话界面的会话顺序显示,例如:如图2c所示,假设联系人A与联系人B在会话界面20内进行会话沟通,且沟通过程中,联系人A发出了两条语音消息24,该语音消息24夹杂在联系人B发出的文字消息25和联系人A自己发出的文字消息25之间,则在接收到用户对语音转换按钮28的点击操作的情况下,将所述语音消息24分别转化为文字信息,并生成并显示如图2d所示的目标图像26,该目标图像26中,与会话界面20内的语音消息24对应的显示区域显示有该条语音消息24转化后的文字信息27。
在实施中,在目标图像中包括会话界面内的文字会话消息和对语音消息进行转化后的文字信息的情况下,可以将两者进行区别显示,例如:显示为具有不同的文字框或者具有不同的文字颜色等。
需要说明的是,在实施中,若会话界面内同时包括语音通话消息和语音消息的情况下,也可以对该语音通话消息和语音消息分别进行语音文字转化处理,以使生成的目标图像中,可以包括对所述语音通话消息进行语音文字转化处理后的文字信息,和对该语音消息进行语音文字转化处理后的文字信息。
在本申请实施例中,显示会话界面,所述会话界面包括语音会话消息;接收用户的第一输入;响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。通过在目标图像中展示语音会话消息对应的文字信息,这样,便于用户对该目标图像进行显示、转发等后续处理,从而能够实现对语音会话消息的消息内容进行便捷的查看和分享等操作。
请参阅图3,是本申请实施例提供的另一种图像生成方法的流程图,该另一种图像生成方法与如图1所示图像生成方法的不同之处在于,本申请实施例提供的另一种图像生成方法仅应用于:在所述会话界面中的语音会话消息为至少两个的情况下,且本实施方式还能够对所述至少两个语音会话消息中的部分语音会话消息进行选取,以仅将选中的语音会话消息进行语音识别转换,对于未被选中的语音会话消息则不进行语音识别转换。
如图3所示,该另一种图像生成方法,可以包括以下步骤:
步骤301、显示会话界面,所述会话界面包括语音会话消息。
本步骤与如图1所示方法实施例中的步骤101具有相同含义,不同之处仅在于所述会话界面的语音会话消息为至少两个,在此不再赘述。
步骤302、在接收到第一子输入的情况下,生成框选所述会话界面的预设窗口,并在所述预设窗口内显示与所述会话界面中的每一条语音会话消息分别对应的选择控件。
在实施中,上述第一子输入可以包括用于截取所述会话界面的截图操作,另外,在执行截图操作之后,可以输出提示信息,以提示用户是否选择将该会话界面中的语音会话消息转化为文字信息,若用户选择“是”,则执行生成 框选所述会话界面的预设窗口,并在所述预设窗口内显示与所述会话界面中的每一条语音会话消息分别对应的选择控件的步骤;用户选择“否”,则直接生成截图,该截图与现有技术中的截图相同,在此不再赘述。
可选地,框选会话界面的预设窗口可以是一个新生成的一个窗口,包括会话界面的目标区域中的会话内容。当然,可以调节该窗口的位置和大小来调节该窗口中的会话内容。
另外,上述预设窗口可以是一个框选区域,例如:如图4a所示界面中的框选区域41,以通过该框选区域41框选中会话界面中显示的会话消息,该会话消息中包括至少两条语音消息42,且还可以包括其他文字消息43甚至图片消息等,且每一条语音消息42分别对应显示有选择控件44。
步骤303、在接收到对目标语音会话消息对应的选择控件的第二子输入的情况下,将所述预设窗口内的所述目标语音会话消息转化为文字信息。
在实施中,上述第二子输入可以是对选择控件的点击等触控输入,从而选中该选择控件对应的目标语音会话消息,该第二子输入可以通过下一任一种方式结束:
方式一,在停止对选择控件进行操作的预设时间长度(例如:2秒或3秒等)之后,确定所述第二子输入结束,从而执行所述将所述预设窗口内的所述目标语音会话消息转化为文字信息的步骤。
方式二,在所述预设窗口内还可以显示预设控件,例如:如图4a所示的语音转换按钮45,在对选择控件44进行点击之后,再对该语音转换按钮45进行触控操作,以确定第二子输入结束,从而执行所述将所述预设窗口内的所述目标语音会话消息转化为文字信息的步骤,从而将第二子输入选中的语音消息42转化为文字信息,以在如图4b所示界面中,将该语音消息42显示为对应的文字信息47。
本实施例中,上述将目标语音会话消息转化为文字信息的过程,与如图1所示方法实施例中,将语音会话消息转化为文字信息的过程相同,不同之处在于,本实施例中仅对选中的目标语音会话消息进行文字转化,而如图1 所示方法实施例中,对会话界面内的语音会话消息都进行文字转化,在此对文字转化的过程不再赘述。
步骤304、生成目标图像,所述目标图像包括与所述目标语音会话消息分别对应的文字信息,和所述预设窗口中除了所述目标语音会话消息之外的其他语音会话消息分别对应的语音图标。
其中,目标图像中的语音图标可以不具备语音播放功能,也不能够查看到文字信息,即在显示该目标图像时,仅能够通过该语音图标得知此处有一条语音消息,但是对于该语音消息的具体内容却不能得知。
例如:在如图4a中,有未被选中的语音消息46,则如图4b所示,进行语音文字转化处理后得到的会话文件中,该未被选中的语音消息46被显示为语音图标48。
本步骤与如图1所示方法实施例中的步骤103相似,不同之处在于,本实施方式中生成的目标图像中,仅选中的目标语音会话消息显示为对应的文字信息,对于未选中的其他语音会话消息则显示为语音图标;而如图1所示方法实施例中,能够在目标图像中,将会话界面内的全部语音会话消息均显示为对应的文字信息。
本申请实施例与,在如图1所示方法实施例的基础上,还能够对会话界面内的部分语音会话消息进行选取操作,以仅对选中的语音会话消息进行语音文字转化处理,以在目标图像中展示转化后的文字信息,而不对未选中的语音会话消息进行语音文字转化处理,这样,可以避免对隐私语音消息或者无关的语音消息进行语音文字转化处理,并展示在目标图像中,在简化语音文字转化处理过程的同时,还能够保护用户的隐私。
需要说明的是,本申请实施例提供的图像生成方法,执行主体可以为图像生成装置,或者该图像生成装置中的用于执行图像生成方法的控制模块。本申请实施例中以图像生成装置执行加载图像生成方法为例,说明本申请实施例提供的图像生成装置。
请参阅图5,是本申请实施例提供的一种图像生成装置的结构图,如图5 所示,该图像生成装置500可以包括:
第一显示模块501,用于显示会话界面,所述会话界面包括语音会话消息;
用户输入模块502,用于接收用户的第一输入;
生成模块503,用于响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
可选的,在所述语音会话消息为语音通话消息的情况下,生成模块503,包括:
语音识别单元,用于响应于作用在第一预设控件上的第一输入,对所述语音通话消息进行语音识别处理,以得到语音识别结果,其中,所述语音识别结果包括:所述语音通话消息中的每一段语音分别对应的文字信息,以及所述每一段语音分别对应的通话联系人;
第一生成单元,用于生成目标图像,其中,所述目标图像包括所述语音识别结果。
可选的,在所述语音会话消息包括至少两条语音消息的情况下,生成模块503,包括:
第二生成单元,用于响应于第一子输入,生成中间图像,其中,所述中间图像包括所述至少两条语音消息,所述第一输入包括第一子输入和第二子输入;
更新单元,用于响应于作用在第二预设控件上的所述第二子输入,分别将所述至少两条语音消息转化为文字信息,并分别将所述中间图像中与语音消息对应的显示区域更新为所述语音消息对应的文字信息,以生成所述目标图像。
可选的,所述第一输入包括用于截取所述会话界面的截图操作。
可选的,在所述会话界面中的语音会话消息为至少两个的情况下,生成模块503,包括:
第三生成单元,用于在接收到第一子输入的情况下,生成框选所述会话 界面的预设窗口;
显示单元,用于在所述预设窗口内显示与所述会话界面中的每一条语音会话消息分别对应的选择控件;
文字转化单元,用于在接收到对目标语音会话消息对应的选择控件的第二子输入的情况下,将所述预设窗口内的所述目标语音会话消息转化为文字信息,其中,所述第一输入包括所述第一子输入和所述第二子输入;
第四生成单元,用于生成目标图像,所述目标图像包括与所述目标语音会话消息分别对应的文字信息,和所述预设窗口中除了所述目标语音会话消息之外的其他语音会话消息分别对应的语音图标。
本申请实施例提供的图像生成装置,能够实现如图1或图3所示方法实施例中的各个过程,且能够取得相同的有益效果,为避免重复,在此不再赘述。
本申请实施例中的图像生成装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的图像生成装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的图像生成装置能够实现图1或图3的方法实施例实现的各个过程,为避免重复,这里不再赘述。
可选的,如图6所示,本申请实施例还提供一种电子设备600,包括处理器601,存储器602,存储在存储器602上并可在所述处理器601上运行的程序或指令,该程序或指令被处理器601执行时实现上述图像生成方法实施 例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要注意的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。
图7为实现本申请实施例的一种电子设备的硬件结构示意图。
该电子设备700包括但不限于:射频单元701、网络模块702、音频输出单元703、输入单元704、传感器705、显示单元706、用户输入单元707、接口单元708、存储器709、以及处理器710等部件。
本领域技术人员可以理解,电子设备700还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器710逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图7中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,显示单元706,用于显示会话界面,所述会话界面包括语音会话消息;
用户输入单元707,用于接收用户的第一输入;
处理器710,用于响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
可选地,在所述语音会话消息为语音通话消息的情况下,处理器710执行的所述用于响应于所述第一输入,生成目标图像,包括:
响应于作用在第一预设控件上的第一输入,对所述语音通话消息进行语音识别处理,以得到语音识别结果,其中,所述语音识别结果包括:所述语音通话消息中的每一段语音分别对应的文字信息,以及所述每一段语音分别对应的通话联系人;
生成目标图像,其中,所述目标图像包括所述语音识别结果。
可选地,在所述语音会话消息包括至少两条语音消息的情况下,处理器710执行的所述用于响应于所述第一输入,生成目标图像,包括:
响应于第一子输入,生成中间图像,其中,所述中间图像包括所述至少 两条语音消息,所述第一输入包括第一子输入和第二子输入;
响应于作用在第二预设控件上的所述第二子输入,分别将所述至少两条语音消息转化为文字信息,并分别将所述中间图像中与语音消息对应的显示区域更新为所述语音消息对应的文字信息,以生成所述目标图像。
可选的,所述第一输入包括用于截取所述会话界面的截图操作。
可选地,在所述会话界面中的语音会话消息为至少两个的情况下,处理器710执行的所述用于响应于所述第一输入,生成目标图像,包括:
在通过用户输入单元707接收到第一子输入的情况下,生成框选所述会话界面的预设窗口,并控制显示单元706在所述预设窗口内显示与所述会话界面中的每一条语音会话消息分别对应的选择控件;
在通过用户输入单元707接收到对目标语音会话消息对应的选择控件的第二子输入的情况下,将所述预设窗口内的所述目标语音会话消息转化为文字信息,其中,所述第一输入包括所述第一子输入和所述第二子输入;
生成目标图像,所述目标图像包括与所述目标语音会话消息分别对应的文字信息,和所述预设窗口中除了所述目标语音会话消息之外的其他语音会话消息分别对应的语音图标。
本申请实施例提供的电子设备700能够执行如图1或图3所示方法实施例中的各个过程,且能够取得相同的有益效果,为避免重复,在此不再赘述。
应理解的是,本申请实施例中,输入单元704可以包括图形处理器(Graphics Processing Unit,GPU)和麦克风,图形处理器对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元706可包括显示面板,可以采用液晶显示器、有机发光二极管等形式来配置显示面板。用户输入单元707包括触控面板以及其他输入设备。触控面板,也称为触摸屏。触控面板可包括触摸检测装置和触摸控制器两个部分。其他输入设备可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器709可用于存储软件程序以及各种数据,包括但不限于应用程序和操作 系统。处理器710可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器710中。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现如图1或图2所示图像生成方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如图1或图2所示图像生成方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被 组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (15)

  1. 一种图像生成方法,包括:
    显示会话界面,所述会话界面包括语音会话消息;
    接收用户的第一输入;
    响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
  2. 根据权利要求1所述的方法,其中,在所述语音会话消息为语音通话消息的情况下,所述响应于所述第一输入,生成目标图像,包括:
    响应于作用在第一预设控件上的第一输入,对所述语音通话消息进行语音识别处理,以得到语音识别结果,其中,所述语音识别结果包括:所述语音通话消息中的每一段语音分别对应的文字信息,以及所述每一段语音分别对应的通话联系人;
    生成目标图像,其中,所述目标图像包括所述语音识别结果。
  3. 根据权利要求1所述的方法,其中,在所述语音会话消息包括至少两条语音消息的情况下,所述响应于所述第一输入,生成目标图像,包括:
    响应于第一子输入,生成中间图像,其中,所述中间图像包括所述至少两条语音消息,所述第一输入包括第一子输入和第二子输入;
    响应于作用在第二预设控件上的所述第二子输入,分别将所述至少两条语音消息转化为文字信息,并分别将所述中间图像中与语音消息对应的显示区域更新为所述语音消息对应的文字信息,以生成所述目标图像。
  4. 根据权利要求1所述的方法,其中,所述第一输入包括用于截取所述会话界面的截图操作。
  5. 根据权利要求1所述的方法,其中,在所述会话界面中的语音会话消息为至少两个的情况下,所述响应于所述第一输入,生成目标图像,包括:
    在接收到第一子输入的情况下,生成框选所述会话界面的预设窗口,并显示与所述会话界面中的每一条语音会话消息分别对应的选择控件;
    在接收到对目标语音会话消息对应的选择控件的第二子输入的情况下,将所述预设窗口内的所述目标语音会话消息转化为文字信息,其中,所述第一输入包括所述第一子输入和所述第二子输入;
    生成目标图像,所述目标图像包括与所述目标语音会话消息对应的文字信息,和所述预设窗口中除了所述目标语音会话消息之外的其他语音会话消息分别对应的语音图标。
  6. 一种图像生成装置,包括:
    第一显示模块,用于显示会话界面,所述会话界面包括语音会话消息;
    用户输入模块,用于接收用户的第一输入;
    生成模块,用于响应于所述第一输入,生成目标图像,其中,所述目标图像包括所述语音会话消息对应的文字信息。
  7. 根据权利要求6所述的装置,其中,在所述语音会话消息为语音通话消息的情况下,所述生成模块,包括:
    语音识别单元,用于响应于作用在第一预设控件上的第一输入,对所述语音通话消息进行语音识别处理,以得到语音识别结果,其中,所述语音识别结果包括:所述语音通话消息中的每一段语音分别对应的文字信息,以及所述每一段语音分别对应的通话联系人;
    第一生成单元,用于生成目标图像,其中,所述目标图像包括所述语音识别结果。
  8. 根据权利要求6所述的装置,其中,在所述语音会话消息包括至少两条语音消息的情况下,所述生成模块,包括:
    第二生成单元,用于响应于第一子输入,生成中间图像,其中,所述中间图像包括所述至少两条语音消息,所述第一输入包括第一子输入和第二子输入;
    更新单元,用于响应于作用在第二预设控件上的所述第二子输入,分别将所述至少两条语音消息转化为文字信息,并分别将所述中间图像中与语音消息对应的显示区域更新为所述语音消息对应的文字信息,以生成所述目标 图像。
  9. 根据权利要求6所述的装置,其中,所述第一输入包括用于截取所述会话界面的截图操作。
  10. 根据权利要求6所述的装置,其中,在所述会话界面中的语音会话消息为至少两个的情况下,所述生成模块,包括:
    第三生成单元,用于在接收到第一子输入的情况下,生成框选所述会话界面的预设窗口;
    显示单元,用于在所述预设窗口内显示与所述会话界面中的每一条语音会话消息分别对应的选择控件;
    文字转化单元,用于在接收到对目标语音会话消息对应的选择控件的第二子输入的情况下,将所述预设窗口内的所述目标语音会话消息转化为文字信息,其中,所述第一输入包括所述第一子输入和所述第二子输入;
    第四生成单元,用于生成目标图像,所述目标图像包括与所述目标语音会话消息分别对应的文字信息,和所述预设窗口中除了所述目标语音会话消息之外的其他语音会话消息分别对应的语音图标。
  11. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,其中,所述程序或指令被所述处理器执行时实现如权利要求1-5中任一项所述的图像生成方法的步骤。
  12. 一种可读存储介质,所述可读存储介质上存储程序或指令,其中,所述程序或指令被处理器执行时实现如权利要求1-5中任一项所述的图像生成方法的步骤。
  13. 一种芯片,包括处理器和通信接口,其中,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-5中任一项所述的图像生成方法的步骤。
  14. 一种计算机程序产品,其中,所述程序产品被存储在非易失的存储介质中,所述程序产品被至少一个处理器执行以实现如权利要求1-5中任一项所述的图像生成方法的步骤。
  15. 一种通信设备,其中,被配置为执行如权利要求1-5中任一项所述的 图像生成方法的步骤。
PCT/CN2021/139569 2020-12-23 2021-12-20 图像生成方法、装置和电子设备 WO2022135323A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011537104.8 2020-12-23
CN202011537104.8A CN112711366A (zh) 2020-12-23 2020-12-23 图像生成方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2022135323A1 true WO2022135323A1 (zh) 2022-06-30

Family

ID=75543726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139569 WO2022135323A1 (zh) 2020-12-23 2021-12-20 图像生成方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN112711366A (zh)
WO (1) WO2022135323A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112711366A (zh) * 2020-12-23 2021-04-27 维沃移动通信(杭州)有限公司 图像生成方法、装置和电子设备
CN113965640B (zh) * 2021-10-11 2023-09-01 维沃移动通信有限公司 消息处理方法及装置
CN114979054B (zh) * 2022-05-13 2024-06-18 维沃移动通信有限公司 视频生成方法、装置、电子设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763508A (zh) * 2008-12-24 2010-06-30 新奥特硅谷视频技术有限责任公司 一种语音信息的获取、转换和标识的方法和装置
US20100238323A1 (en) * 2009-03-23 2010-09-23 Sony Ericsson Mobile Communications Ab Voice-controlled image editing
CN103812999A (zh) * 2012-11-12 2014-05-21 展讯通信(天津)有限公司 移动终端及其通话记录处理方法和装置
CN106357932A (zh) * 2016-11-22 2017-01-25 奇酷互联网络科技(深圳)有限公司 一种通话信息记录方法和移动终端
US10074381B1 (en) * 2017-02-20 2018-09-11 Snap Inc. Augmented reality speech balloon system
CN109412932A (zh) * 2018-09-28 2019-03-01 维沃移动通信有限公司 一种截屏方法和终端
CN112711366A (zh) * 2020-12-23 2021-04-27 维沃移动通信(杭州)有限公司 图像生成方法、装置和电子设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8917847B2 (en) * 2012-06-12 2014-12-23 Cisco Technology, Inc. Monitoring and notification mechanism for participants in a breakout session in an online meeting
KR20140114238A (ko) * 2013-03-18 2014-09-26 삼성전자주식회사 오디오와 결합된 이미지 표시 방법
CN106033418B (zh) * 2015-03-10 2020-01-31 阿里巴巴集团控股有限公司 语音添加、播放方法及装置、图片分类、检索方法及装置
CN105100360B (zh) * 2015-08-26 2019-05-03 百度在线网络技术(北京)有限公司 用于语音通话的通话辅助方法和装置
CN105117021A (zh) * 2015-09-24 2015-12-02 深圳东方酷音信息技术有限公司 一种虚拟现实内容的生成方法和播放装置
CN105677707A (zh) * 2015-12-28 2016-06-15 努比亚技术有限公司 一种实现图片处理的方法及终端
CN106791206A (zh) * 2017-03-01 2017-05-31 北京小米移动软件有限公司 信息记录方法及装置
CN109300177B (zh) * 2017-07-24 2024-01-23 中兴通讯股份有限公司 一种图片处理方法和装置
CN109874038B (zh) * 2019-03-26 2022-07-15 维沃移动通信有限公司 一种终端的显示方法及终端
CN110062107A (zh) * 2019-03-29 2019-07-26 东莞市步步高通信软件有限公司 一种内容显示方法及终端设备
CN110995921A (zh) * 2019-11-19 2020-04-10 维沃移动通信有限公司 通话处理方法、电子设备及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763508A (zh) * 2008-12-24 2010-06-30 新奥特硅谷视频技术有限责任公司 一种语音信息的获取、转换和标识的方法和装置
US20100238323A1 (en) * 2009-03-23 2010-09-23 Sony Ericsson Mobile Communications Ab Voice-controlled image editing
CN103812999A (zh) * 2012-11-12 2014-05-21 展讯通信(天津)有限公司 移动终端及其通话记录处理方法和装置
CN106357932A (zh) * 2016-11-22 2017-01-25 奇酷互联网络科技(深圳)有限公司 一种通话信息记录方法和移动终端
US10074381B1 (en) * 2017-02-20 2018-09-11 Snap Inc. Augmented reality speech balloon system
CN109412932A (zh) * 2018-09-28 2019-03-01 维沃移动通信有限公司 一种截屏方法和终端
CN112711366A (zh) * 2020-12-23 2021-04-27 维沃移动通信(杭州)有限公司 图像生成方法、装置和电子设备

Also Published As

Publication number Publication date
CN112711366A (zh) 2021-04-27

Similar Documents

Publication Publication Date Title
US10728196B2 (en) Method and storage medium for voice communication
WO2022135323A1 (zh) 图像生成方法、装置和电子设备
WO2022017107A1 (zh) 信息处理方法、装置、计算机设备及存储介质
WO2023160668A1 (zh) 会话消息收发方法及装置、电子设备、可读存储介质
EP3410676B1 (en) Communication terminal, communication system, display control method, and program
CN112540821A (zh) 信息发送方法和电子设备
CN108495168B (zh) 弹幕信息的显示方法及装置
WO2019242274A1 (zh) 一种内容处理方法及装置
CN107888965A (zh) 图像礼物展示方法及装置、终端、系统、存储介质
WO2022156668A1 (zh) 信息处理方法和电子设备
CN103973542B (zh) 一种语音信息处理方法及装置
CN112416200A (zh) 显示方法、装置、电子设备和可读存储介质
WO2020042815A1 (zh) 文本编辑方法及移动终端
WO2022242745A1 (zh) 显示方法、显示装置、相关设备及可读存储介质
WO2023061343A1 (zh) 会话创建方法、装置和电子设备
US11956531B2 (en) Video sharing method and apparatus, electronic device, and storage medium
WO2023131290A1 (zh) 信息交互方法、装置、电子设备及介质
WO2023011483A1 (zh) 消息发送方法、装置及电子设备
CN109302341A (zh) 即时通信方法、装置、电子设备及存储介质
WO2022089481A1 (zh) 信息处理方法、装置及电子设备
WO2020038171A1 (zh) 撤回图片文件的方法及其控制方法、装置及移动终端
WO2024051522A1 (zh) 消息发送方法、装置、电子设备及存储介质
WO2023131134A1 (zh) 会话方法及其装置
WO2023071932A1 (zh) 消息发送方法和电子设备
WO2023046105A1 (zh) 消息发送方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21909319

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21909319

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.12.2023)