CN117834576A

CN117834576A - Expression interaction method, device, equipment and storage medium

Info

Publication number: CN117834576A
Application number: CN202410023197.4A
Authority: CN
Inventors: 李曦宇; 舒斯起
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2024-01-05
Filing date: 2024-01-05
Publication date: 2024-04-05

Abstract

The embodiment of the disclosure relates to a method, a device, equipment and a storage medium for expression interaction. The method proposed herein comprises: determining a target media resource for generating a speech expression; creating a speech expression based on the received editing operation, the speech expression including a visual portion and a speech portion, the editing operation including at least one of: a first operation for modifying first visual content and/or first voice content included in the target media resource, a second operation for inputting second visual content and/or second voice content; and publishing the speech expression such that the speech expression can be sent into a conversation associated with the target user based on the selection of the target user. In this way, the embodiment of the disclosure can support the editing of the media resource by the user to create the voice expression, thereby improving the production efficiency of the voice expression.

Description

Expression interaction method, device, equipment and storage medium

Technical Field

Example embodiments of the present disclosure relate generally to the field of computers, and in particular, relate to methods, apparatuses, devices, and computer-readable storage media for expression interaction.

Background

With the development of computer technology, the internet has become an important platform for people to interact information. In the process of information interaction of people through the internet, various types of emoticons have become important media for people to socially express and exchange information.

Conventional emoticons typically communicate information through visual content, but such a way is relatively single and it is difficult to fully express the information that is desired to be communicated.

Disclosure of Invention

In a first aspect of the present disclosure, a method of expression interaction is provided. The method comprises the following steps: determining a target media resource for generating a speech expression; creating a speech expression based on the received editing operation, the speech expression including a visual portion and a speech portion, the editing operation including at least one of: a first operation for modifying first visual content and/or first voice content included in the target media resource, a second operation for inputting second visual content and/or second voice content; and publishing the speech expression such that the speech expression can be sent into a conversation associated with the target user based on the selection of the target user.

In a second aspect of the present disclosure, a method of expression interaction is provided. The method comprises the following steps: receiving a selection of a speech expression; and sending a speech expression into the target session, the speech expression including a visual portion and a speech portion, wherein the speech expression is created based on an editing operation associated with the target media asset, the editing operation including at least one of: a first operation for modifying the first visual content and/or the first voice content included in the target media asset, a second operation for adding the second visual content and/or the second voice content.

In a third aspect of the present disclosure, an expression making method is provided. The method comprises the following steps: based on the target voice content, presenting a first set of candidate visual content determined based on the target voice content; and creating a speech expression based on the selection of the first visual content from the first set of candidate visual content, the speech expression including a visual portion and a speech portion, wherein the visual portion is determined based on the first visual content and the speech portion is determined based on the target speech content.

In a fourth aspect of the present disclosure, an expression making method is provided. The method comprises the following steps: based on the target text content, presenting a second set of candidate visual content determined based on the target text content; and creating a speech expression based on the selection of the second visual content from the second set of candidate visual content, the speech expression including a visual portion and a speech portion, wherein the visual portion is determined based on the target visual content and the speech portion is generated based on the target text content.

In a fifth aspect of the present disclosure, an apparatus for expression interaction is provided. The device comprises: a determining module configured to determine a target media resource for generating a speech expression; a creation module configured to create a speech expression based on the received editing operation, the speech expression including a visual portion and a speech portion, the editing operation including at least one of: a first operation for modifying first visual content and/or first voice content included in the target media resource, a second operation for inputting second visual content and/or second voice content; and a publication module configured to publish the speech expression such that the speech expression can be sent into a conversation associated with the target user based on the selection of the target user.

In a sixth aspect of the present disclosure, an electronic device is provided. The apparatus comprises at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. The instructions, when executed by at least one processing unit, cause the apparatus to perform the method of any one of the first to fourth aspects.

In a seventh aspect of the present disclosure, a computer-readable storage medium is provided. The computer readable storage medium has stored thereon a computer program executable by a processor to implement the method of any one of the first to fourth aspects.

It should be understood that what is described in this section of the disclosure is not intended to limit key features or essential features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:

FIG. 1 illustrates a schematic diagram of an example environment in which embodiments in accordance with the present disclosure may be implemented;

FIG. 2 illustrates a flowchart of an example process of expression interaction, according to some embodiments of the present disclosure;

3A-3E illustrate example interfaces according to some embodiments of the present disclosure;

FIG. 4 illustrates a flowchart of an example process of expression interaction, according to some embodiments of the present disclosure;

5A-5B illustrate example interfaces according to some embodiments of the present disclosure;

FIG. 6 illustrates a flowchart of an example process of expression production, according to some embodiments of the present disclosure;

FIG. 7 illustrates a flowchart of an example process of expression production, according to some embodiments of the present disclosure;

fig. 8 illustrates a schematic block diagram of an example apparatus for expression interaction, according to some embodiments of the present disclosure; and

fig. 9 illustrates a block diagram of an electronic device capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been illustrated in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather, these embodiments are provided so that this disclosure will be more thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that any section/subsection headings provided herein are not limiting. Various embodiments are described throughout this document, and any type of embodiment may be included under any section/subsection. Furthermore, the embodiments described in any section/subsection may be combined in any manner with any other embodiment described in the same section/subsection and/or in a different section/subsection.

In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions are also possible below. The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

Embodiments of the present disclosure may relate to user data, the acquisition and/or use of data, and the like. These aspects all follow corresponding legal and related regulations. In embodiments of the present disclosure, all data collection, acquisition, processing, forwarding, use, etc. is performed with knowledge and confirmation by the user. Accordingly, in implementing the embodiments of the present disclosure, the user should be informed of the type of data or information, the range of use, the use scenario, etc. that may be involved and obtain the authorization of the user in an appropriate manner according to the relevant laws and regulations. The particular manner of notification and/or authorization may vary depending on the actual situation and application scenario, and the scope of the present disclosure is not limited in this respect.

In the present description and embodiments, if the personal information processing is concerned, the processing is performed on the premise of having a validity base (for example, obtaining agreement of the personal information body, or being necessary for executing a contract, etc.), and the processing is performed only within a prescribed or contracted range. The user refuses to process the personal information except the necessary information of the basic function, and the basic function is not influenced by the user.

In the process of information interaction through the internet, people desire to use high-quality expressions to conveniently express desired information. Traditional expression resources are difficult to comprehensively express information which people want to communicate.

The embodiment of the disclosure provides an expression interaction scheme. According to this scheme, a target media resource for generating a speech expression can be determined. Further, a speech expression may be created based on the received editing operation, wherein the speech expression includes a visual portion and a speech portion.

Such editing operations may include at least one of: a first operation for modifying the first visual content and/or the first voice content included in the target media asset, a second operation for inputting the second visual content and/or the second voice content.

Further, a speech expression may be published such that the speech expression can be sent into a conversation associated with the target user based on the selection of the target user.

In this way, the embodiment of the disclosure can support the editing of the media resource by the user to create the voice expression, thereby improving the production efficiency of the voice expression.

Various example implementations of the scheme are described in further detail below in conjunction with the accompanying drawings.

Example Environment

FIG. 1 illustrates a schematic diagram of an example environment 100 in which embodiments of the present disclosure may be implemented. As shown in fig. 1, an example environment 100 may include an electronic device 110.

In this example environment 100, an electronic device 110 may be running an application 120 that supports interface interactions. The application 120 may be any suitable type of application for interface interaction, examples of which may include, but are not limited to: video applications, social applications, or other suitable applications. The user 140 may interact with the application 120 via the electronic device 110 and/or its attached device.

In the environment 100 of fig. 1, if the application 120 is in an active state, the electronic device 110 may present an interface 150 for supporting interface interactions through the application 120.

In some embodiments, the electronic device 110 communicates with the server 130 to enable provisioning of services for the application 120. The electronic device 110 may be any type of mobile terminal, fixed terminal, or portable terminal, including a mobile handset, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, media computer, multimedia tablet, palmtop computer, portable gaming terminal, VR/AR device, personal communication system (Personal Communication System, PCS) device, personal navigation device, personal digital assistant (Personal Digital Assistant, PDA), audio/video player, digital camera/video camera, positioning device, television receiver, radio broadcast receiver, electronic book device, gaming device, or any combination of the preceding, including accessories and peripherals for these devices, or any combination thereof. In some embodiments, electronic device 110 is also capable of supporting any type of interface to the user (such as "wearable" circuitry, etc.).

The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content distribution network, basic cloud computing services such as big data and an artificial intelligence platform. Server 130 may include, for example, a computing system/server, such as a mainframe, edge computing node, computing device in a cloud environment, and so on. The server 130 may provide background services for applications 120 in the electronic device 110 that support virtual scenes.

A communication connection may be established between server 130 and electronic device 110. The communication connection may be established by wired means or wireless means. The communication connection may include, but is not limited to, a bluetooth connection, a mobile network connection, a universal serial bus (Universal Serial Bus, USB) connection, a wireless fidelity (Wireless Fidelity, wiFi) connection, etc., as embodiments of the disclosure are not limited in this respect. In embodiments of the present disclosure, the server 130 and the electronic device 110 may implement signaling interactions through a communication connection therebetween.

It should be understood that the structure and function of the various elements in environment 100 are described for illustrative purposes only and are not meant to suggest any limitation as to the scope of the disclosure.

Some example embodiments of the present disclosure will be described below with continued reference to the accompanying drawings.

First example procedure

FIG. 2 illustrates a flow chart of an example interaction process 200 according to some embodiments of the present disclosure. Process 200 may be implemented at electronic device 110. Process 200 is described below with reference to fig. 1.

As shown in fig. 2, at block 210, the electronic device 110 determines a target media asset for generating a speech expression.

In some embodiments, such target media assets may include any suitable type of assets including visual content and/or audio content, such as pictures, video, audio, expressions, and the like.

In some embodiments, the target media asset may comprise a user-entered media asset. For example, a user may take a video, a picture, or record a voice as a target media asset for the generation of a voice expression.

In some embodiments, the electronic device 110 may also determine a target media asset for generating the speech expression based on a user selection of the target media asset from the candidate media assets.

The process 200 will be described below with reference to fig. 3A-3E. Fig. 3A-3E illustrate example interfaces 300A-300E according to some embodiments of the present disclosure. The interfaces 300A-300E may be provided by the electronic device 110 shown in fig. 1, for example.

As shown in fig. 3A, interface 300A may be, for example, a session interface for a session. The electronic device 110 may display the expression panel 305 in the interface 300A. As an example, the expression panel 305 may display an added speech expression, such as speech expression 315.

Additionally, the expression panel 305 may also include, for example, creating a portal 310. Upon receiving a selection to create portal 310, electronic device 110 can present interface 300B as shown in fig. 3B.

In interface 300B, electronic device 110 may display a set of candidate images, such as candidate image 320-1, candidate image 320-2, candidate image 320-3, and candidate image 320-4 (individually or collectively candidate image 320). In some embodiments, such candidate images may include still images and/or still images.

In some embodiments, such a candidate image 320 may be associated with a current user, for example. For example, candidate images 320 may include images that are currently collected by the user. As another example, candidate image 320 may also include an image associated with a work published by the current user, such as, for example, a video frame of a work of the image published by the current user or a work of the video published by the current user.

In some embodiments, candidate image 320 may also include images local to electronic device 110, if known and authorized by the user.

Further, the electronic device 110 may, for example, receive a user selection of the candidate image 320-3 and may accordingly determine the candidate image 320-3 as a target media asset for generating a speech expression.

Electronic device 110 may also determine the target media asset for generating the speech expression based on other suitable means. In some embodiments, the electronic device 110 may also support user selection of available expressions as target media assets, for example.

In some examples, the user-available expressions may include image expressions, voice expressions, etc. that the user has added; or, the image expressions, the voice expressions, etc. shared by other users.

In some embodiments, the electronic device 110 may present a set of candidate image expressions in the expression panel 305. As an example, such a candidate image expression may for example have only a visual part, not include a speech part.

Further, the electronic device 110 may receive a user selection of a target image expression from the set of candidate image expressions and determine the target image expression as a target media asset.

In still other embodiments, the electronic device 110 may also receive, for example, a preset operation associated with the first work and may accordingly determine the first work as the target media asset.

For example, the electronic device 110 may provide a speech expression creation portal, for example, in association with a viewing interface of a first work, and may determine the first work as a target media asset based on a user selection of the creation portal.

With continued reference to fig. 2, at block 220, the electronic device 110 creates a speech expression based on the received editing operation, the speech expression including a visual portion and a speech portion, the editing operation including at least one of: a first operation for modifying the first visual content and/or the first voice content included in the target media asset, a second operation for inputting the second visual content and/or the second voice content.

In some embodiments, the electronic device 110 may provide an editing interface for the user to create a speech expression.

In some embodiments, the electronic device 110 may modify the first visual content and/or the first voice content included with the target media asset accordingly based on the first operation of the user in the editing interface.

Taking fig. 3B as an example, in the case where the target media asset is the candidate image 320-3, the electronic device 110 may, for example, provide an image editing interface for editing the candidate image 320-3. It should be appreciated that such an image editing interface may support any suitable type of image editing operation, such as, for example, cropping images, adding text, adding stickers, applying filters, and so forth.

Illustratively, as shown in FIG. 3C, the electronic device 110 may display the edited image 325, for example, in the interface 300C. It should be appreciated that such an image editing process is not necessary. Image 325 may also be, for example, the original candidate image 320-3.

With continued reference to fig. 3C, the electronic device 110 can provide a control 330 for adding voice content. For example, in the event that the user triggers control 330, electronic device 110 can obtain voice content entered by the user.

It should be appreciated that such voice content may include any suitable audio content, regardless of whether such audio content represents corresponding textual information. For example, the voice content may include a recorded section of a user's speech, a particular sound in the environment, and so forth. As another example, the voice content may also include music content with or without lyrics, for example.

With continued reference to fig. 3D, after the voice content input is completed, the electronic device 110 may, for example, present an interface 300D. In interface 300D, electronic device 110 may provide preview portal 335 for playing the entered voice content.

Additionally, the electronic device 110 may also apply a particular speech style to the input speech content, for example. As shown in fig. 3D, the electronic device 110 may, for example, provide a set of candidate speech styles 340 and may apply a specified target speech style to the input speech content based on the user's selection. Such speech styles may correspond to different timbres, intonation, etc., for example.

Taking fig. 3D as an example, in the event that the user selects a target speech style (e.g., "style 1"), the speech content triggered by preview portal 335 may, for example, match the target speech style.

In still other embodiments, the electronic device 110 may also support, for example, a user selecting a target speech style to apply before entering speech content. Accordingly, after the voice content input is completed, the voice content triggered to be played by the preview portal 335 may be, for example, voice content to which the target voice style has been applied.

Further, the electronic device 110 can create a corresponding speech expression based on the user's selection of the completion control 345. The speech expression may include a visual portion and a speech portion.

Accordingly, the visual portion may correspond to the image 325 shown in fig. 3C, for example. The voice portion may correspond to voice content triggered to play by preview portal 335.

In some embodiments, where image 325 is a dynamic image, the speech content of the speech expression may be aligned in time with the dynamic image such that the speech content may be played, for example, from a starting frame of the dynamic image.

In some embodiments, the electronic device 110 may also add voice content in other suitable ways. As an example, the electronic device 110 may provide a set of candidate speech content, for example, and may add the candidate speech content to the speech expression based on a user selection of the candidate speech content.

By way of example, the electronic device 110 may obtain a set of voice material, such as may be created and shared by other users, or may include voice material provided by a platform, for example. Accordingly, the user may, for example, select a particular speech material and combine with an existing image to generate a corresponding speech expression.

In some embodiments, the voice content may also be added accordingly, for example, based on text content entered by the user. For example, the electronic device 110 may receive text content entered by a user and may accordingly generate speech content matching the text content to add to the speech expression. Alternatively, the electronic device 110 may also search for speech content matching the text content from existing speech material to add to the speech expression.

The creation of a speech expression was described above taking the example of a target media asset including an image. As mentioned above, the target media asset may also include, for example, an image expression selected by the user. It should be appreciated that editing of the visual portion of the image expression and addition of audio content to create a speech expression may be supported based on similar processes described above.

In some embodiments, the target media asset may also include, for example, a voice expression. In this case, the electronic device 110 may support, for example, user editing or replacing of visual portions of the speech expression and/or editing or replacing of speech portions.

For example, the user may choose to reserve a visual portion of an existing speech expression and may enter a new speech portion to generate the speech expression. Alternatively, the user may choose to reserve a visual portion of an existing speech expression and may apply a particular speech style to the existing speech portion to generate the speech expression.

In some embodiments, the target media asset may also include audio content, such as a segment of speech, for example. By way of example, the electronic device 110 may receive a user selection of a segment of speech in a conversation, for example, to determine the speech as a target media asset. Alternatively, the electronic device 110 may also receive a piece of speech recorded by the user as the target media asset.

Accordingly, the electronic device 110 may, for example, enable a user to add corresponding visual content. Similar to that discussed above with respect to fig. 3B, electronic device 110 may, for example, provide a set of candidate images for selection by a user. In some examples, such candidate images may include, for example, images associated with a user. Alternatively, such candidate images may also include images generated or searched based on the segment of speech.

Based on the above-discussed process, the embodiment of the present disclosure can support editing of media resources by a user to create a speech expression, thereby improving the production efficiency of the speech expression.

With continued reference to fig. 2, at block 230, the electronic device 110 publishes a speech expression so that the speech expression can be sent into a conversation associated with the target user based on the target user's selection.

In some embodiments, the electronic device 110 may publish the created speech expression accordingly based on a publication operation by the user.

In some examples, such publishing operations may include adding a voice expression to the expression panel. As shown in fig. 3E, the electronic device 110 may display the published speech expression 350 in the expression panel 300E accordingly, such that the speech expression 350 may be sent into the associated conversation.

In some examples, such publishing operations may include sharing operations for speech expressions. For example, the electronic device 110 may share the speech expression to other users or groups based on the user's sharing request. As an example, a user may share a speech expression to other users or groups, for example, by way of private letter or comment, or the like.

Alternatively, the electronic device 110 may publish the speech expression as a resource available to other users based on the user's sharing request. Other users may, for example, further add the speech expression to support using such speech expression in a conversation.

In yet another example, such a publishing operation may include publishing the speech expression 350 as a work (also referred to as a second work) such that a user can add the speech expression 350 via the work. For example, an add entry may be provided at the work's viewing interface for adding expressions through which the current user or others may add the speech expression.

The use process regarding the speech expression will be described in detail below with reference to fig. 4.

Second example procedure

Fig. 4 illustrates a flowchart of an example process 400 of expression interaction, according to some embodiments of the present disclosure. The process 400 may be implemented at the electronic device 110. Process 400 is described below with reference to fig. 1.

At block 410, the electronic device 110 receives a selection of a speech expression.

The process 400 will be described below with reference to fig. 5A and 5B. Fig. 5A and 5B illustrate example interfaces 500A and 500B according to some embodiments of the present disclosure.

As shown in fig. 5A, interface 500A may be, for example, a session interface for a session (e.g., a session with "user B"). As shown, the electronic device 110 may display an expression panel 505 in the interface 500A, for example. The expression panel 505 may, for example, select a set of voice expressions, such as voice expression 510-1 and voice expression 510-2 (individually or collectively referred to as voice expression 510). Such a speech expression 510 may be created, for example, based on the process described above with reference to fig. 2.

As shown in FIG. 5A, the electronic device 110 may provide corresponding preview portal 515-1 and preview portal 515-2 in association with the speech expression 510-1 and the speech expression 510-2.

In the event that a selection of preview portal 515-1 is received, for example, electronic device 110 may play the voice portion of the voice expression 510-1.

With continued reference to fig. 4, at block 420, the electronic device 110 may send a speech expression into the target session, the speech expression including a visual portion and a speech portion.

Taking fig. 5B as an example, upon receiving a selection of the speech expression 515-2 shown in fig. 5A, the electronic device 110 may send the speech expression 515-2 into the conversation.

In particular, the electronic device 110 may display the visual portion 520 of the speech expression 510-2 in a message window of the conversation. In the case where the visual portion 520 includes a dynamic image, the visual portion 520 may be automatically played one or more times, for example.

In some embodiments, the electronic device 110 may also display the play entry 525 of the speech expression 510-2 in a message window. Further, upon receiving the selection of the play entry 525, the electronic device 110 may play the voice portion of the voice expression 510-2.

In some embodiments, where the visual portion 520 includes a dynamic image, the dynamic image may be replayed, for example, if the play entry 525 is triggered, such that the dynamic image may be synchronized with the play of the voice portion, for example.

It should be appreciated that for the recipient of the speech expression, it may similarly view the visual portion 520 of the speech expression and may play the speech portion of the speech expression based on the selection of the play entry 525.

Third example procedure

Fig. 6 illustrates a flowchart of an example process 600 of expression production according to some embodiments of the present disclosure. The process 600 may be implemented at the electronic device 110. Process 600 is described below with reference to fig. 1.

As shown, at block 610, the electronic device 110 presents a first set of candidate visual content determined based on the target voice content.

In some embodiments, the target voice content may include voice content entered by a user. For example, the electronic device 110 may obtain voice content input by a user through a voice capture device.

In some embodiments, the target voice content may also include voice content that the user otherwise selects. For example, the electronic device 110 may receive a selection of a segment of speech in a conversation to treat the segment of speech as target speech content. Alternatively, the electronic device 110 may receive a selection of voice material provided by another user or platform to target the voice material as the target voice content.

In some embodiments, the first set of candidate visual content provided by the electronic device 110 is generated based on text content corresponding to the target speech content. For example, electronic device 110 may utilize any suitable machine learning model to generate a set of candidate visual content, such as still images, moving images, or video, corresponding to text content.

At block 620, the electronic device 110 creates a speech expression based on the selection of the first visual content from the first set of candidate visual content, the speech expression including a visual portion and a speech portion, wherein the visual portion is determined based on the first visual content and the speech portion is determined based on the target speech content.

In some embodiments, the electronic device 110 may determine the speech portion of the speech expression based on the target speech content. For example, the target speech content itself may be used as the speech portion of the speech expression. Alternatively, the electronic device 110 may also apply a specific speech style to the target speech content to generate a speech portion of the speech expression.

In some embodiments, the electronic device 110 may determine a visual portion of the speech expression based on the selected first visual content. For example, the first visual content itself may be used as the visual portion of the speech expression. Alternatively, the electronic device 110 may also receive an editing operation of the user on the first visual content and generate a visual portion of the speech expression accordingly.

Based on such a manner, embodiments of the present disclosure can support a user to efficiently create matching speech expressions by way of entering speech.

Fourth example procedure

Fig. 7 illustrates a flowchart of an example process 700 of expression production, according to some embodiments of the present disclosure. Process 700 may be implemented at electronic device 110. Process 700 is described below with reference to fig. 1.

As shown, at block 710, the electronic device 110 presents a second set of candidate visual content determined based on the target text content.

In some embodiments, the electronic device 110 may receive, for example, target text content entered or selected by a user and may accordingly provide a second set of candidate visual content determined based on the target text content.

In some embodiments, the second set of candidate visual content may be, for example, searched from a visual content library based on the target text content. Alternatively or additionally, the second set of candidate visual content may be generated based on the target text content, for example. For example, electronic device 110 may utilize any suitable machine learning model to generate a set of candidate visual content, such as still images, moving images, or video, corresponding to text content.

At block 720, the electronic device 110 creates a speech expression based on the selection of the second visual content of the second set of candidate visual content, the speech expression including a visual portion and a speech portion, wherein the visual portion is determined based on the target visual content and the speech portion is generated based on the target text content.

In some embodiments, the electronic device 110 may generate a speech portion of the speech expression based on the target text content. For example, electronic device 110 may provide a set of candidate speech styles. Further, the electronic device 110 may receive a selection of a target speech style from a set of candidate speech styles such that a speech portion of the speech expression is generated based on the target speech style.

Based on this approach, embodiments of the present disclosure can support a user to efficiently create matching speech expressions by way of entering text.

Example apparatus and apparatus

Embodiments of the present disclosure also provide corresponding apparatus for implementing the above-described methods or processes. Fig. 8 illustrates a schematic block diagram of an example apparatus 800 for expression interaction, according to some embodiments of the present disclosure. The apparatus 800 may be implemented as or included in the electronic device 110. The various modules/components in apparatus 800 may be implemented in hardware, software, firmware, or any combination thereof.

As shown in fig. 8, the apparatus 800 includes a determination module 810 configured to determine a target media resource for generating a speech expression; a creation module 820 configured to create a speech expression based on the received editing operations, the speech expression including a visual portion speech portion, the editing operations including at least one of: a first operation for modifying first visual content and/or first voice content included in the target media resource, a second operation for inputting second visual content and/or second voice content; and a publication module 830 configured to publish the speech expression such that the speech expression can be sent into a conversation associated with the target user based on the selection of the target user.

In some embodiments, the determination module 810 is further configured to: presenting a set of candidate images, the set of candidate images including still images and/or moving images; and determining the target image as a target media asset based on the selection of the target image from the set of candidate images.

In some embodiments, the set of candidate images includes at least one of: images collected by the current user; an image associated with the work being released by the current user.

In some embodiments, the visual portion of the speech expression is determined based on the target image.

In some embodiments, the apparatus 800 further comprises a first receiving module configured to: a second operation is received for inputting second speech content, wherein a speech portion of the speech expression is determined based on the second speech content.

In some embodiments, the apparatus 800 further comprises a second receiving module configured to: a first operation is received for editing a target image, wherein a visual portion of a speech expression is determined based on the edited target image.

In some embodiments, the apparatus 800 further comprises a style determination module configured to: providing a set of candidate speech styles; and receiving a selection of a target speech style from the set of candidate speech styles such that the speech portion of the speech expression matches the target speech style.

In some embodiments, the target image is a target dynamic image, and the speech portion is configured to be played from a start frame of the target dynamic image.

In some embodiments, the determination module 810 is further configured to: based on the selection of the creation portal in the expression panel, a set of candidate images is presented.

In some embodiments, the determination module 810 is further configured to: presenting a set of candidate image expressions; and determining the target image expression as a target media asset based on the selection of the target image expression from the set of candidate image expressions.

In some embodiments, the apparatus 800 further comprises a second receiving module configured to: a second operation is received for inputting second speech content via the editing interface, wherein a visual portion of the speech expression is determined based on the target image expression and a speech portion is determined based on the second speech content.

In some embodiments, the determination module 810 is further configured to: determining the first work as a target media asset based on a preset operation associated with the first work; or acquiring pictures or videos shot by the user as target media content.

In some embodiments, the apparatus 800 further comprises a transmitting module configured to: displaying the published voice expression in an expression panel; and transmitting the speech expression into the conversation based on the selection for the speech expression.

In some embodiments, the apparatus 800 further comprises a play module configured to: displaying the visual part of the transmitted voice expression and a play entry of the voice expression in a message window of the session; and playing the speech portion of the speech expression based on the selection of the play entry.

In some embodiments, the apparatus 800 further comprises a preview module configured to: providing a preview entry in the expression panel in association with the speech expression; and playing the speech portion of the speech expression based on the selection of the preview portal.

In some embodiments, the publication module 830 is further configured to perform at least one of: publishing the speech expression as a second work, wherein the at least one user is enabled to add the speech expression via the second work; adding a voice expression so that the voice expression can be used for the current user; the speech expression is shared with at least one user.

Fig. 9 illustrates a block diagram of an electronic device 900 in which one or more embodiments of the disclosure may be implemented. It should be understood that the electronic device 900 illustrated in fig. 9 is merely exemplary and should not be construed as limiting the functionality and scope of the embodiments described herein. The electronic device 900 illustrated in fig. 9 may be used to implement the electronic device 110 of fig. 1.

As shown in fig. 9, the electronic device 900 is in the form of a general-purpose electronic device. Components of electronic device 900 may include, but are not limited to, one or more processors or processing units 910, memory 920, storage 930, one or more communication units 940, one or more input devices 950, and one or more output devices 960. The processing unit 910 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 920. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capabilities of electronic device 900.

Electronic device 900 typically includes multiple computer storage media. Such a medium may be any available media that is accessible by electronic device 900, including, but not limited to, volatile and non-volatile media, removable and non-removable media. The memory 920 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 930 may be a removable or non-removable media and may include machine-readable media such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data and that may be accessed within electronic device 900.

The electronic device 900 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in fig. 9, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. Memory 920 may include a computer program product 925 having one or more program modules configured to perform the various methods or acts of the various embodiments of the disclosure.

The communication unit 940 enables communication with other electronic devices via a communication medium. Additionally, the functionality of the components of the electronic device 900 may be implemented in a single computing cluster or in multiple computing machines capable of communicating over a communications connection. Thus, the electronic device 900 may operate in a networked environment using logical connections to one or more other servers, a network Personal Computer (PC), or another network node.

The input device 950 may be one or more input devices such as a mouse, keyboard, trackball, etc. The output device 960 may be one or more output devices such as a display, speakers, printer, etc. The electronic device 900 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., with one or more devices that enable a user to interact with the electronic device 900, or with any device (e.g., network card, modem, etc.) that enables the electronic device 900 to communicate with one or more other electronic devices, as desired, via the communication unit 940. Such communication may be performed via an input/output (I/O) interface (not shown).

According to an exemplary implementation of the present disclosure, a computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method described above is provided. According to an exemplary implementation of the present disclosure, there is also provided a computer program product tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions that are executed by a processor to implement the method described above.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices, and computer program products implemented according to the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of implementations of the present disclosure has been provided for illustrative purposes, is not exhaustive, and is not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various implementations described. The terminology used herein was chosen in order to best explain the principles of each implementation, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand each implementation disclosed herein.

Claims

1. A method of expression interaction, comprising:

determining a target media resource for generating a speech expression;

creating a speech expression based on the received editing operation, the speech expression including a visual portion and a speech portion, the editing operation including at least one of: a first operation for modifying first visual content and/or first voice content included in the target media resource, a second operation for inputting second visual content and/or second voice content; and

the speech expression is published such that the speech expression can be sent into a conversation associated with a target user based on the selection of the target user.

2. The method of claim 1, wherein determining a target media resource for generating a speech expression comprises:

presenting a set of candidate images, the set of candidate images comprising still images and/or moving images; and

a target image is determined as the target media asset based on a selection of the target image from the set of candidate images.

3. The method of claim 2, wherein the set of candidate images comprises at least one of:

images collected by the current user;

an image associated with the work being released by the current user.

4. The method of claim 2, wherein the visual portion of the speech expression is determined based on the target image.

5. The method of claim 2, further comprising:

the second operation for inputting the second speech content is received, wherein the speech portion of the speech expression is determined based on the second speech content.

6. The method of claim 5, further comprising:

the method further includes receiving the first operation for editing the target image, wherein the visual portion of the speech expression is determined based on the edited target image.

7. The method of claim 5, further comprising:

providing a set of candidate speech styles; and

a selection is received for a target speech style of the set of candidate speech styles such that the speech portion of the speech expression matches the target speech style.

8. The method of claim 5, wherein the target image is a target moving image and the speech portion is configured to be played from a start frame of the target moving image.

9. The method of claim 2, wherein presenting a set of candidate images comprises:

the set of candidate images is presented based on a selection of a creation portal in the expression panel.

10. The method of claim 1, wherein determining a target media resource for generating a speech expression comprises:

presenting a set of candidate image expressions; and

determining a target image expression from the set of candidate image expressions as the target media asset based on a selection of the target image expression.

11. The method of claim 10, further comprising:

the second operation for inputting the second speech content is received via the editing interface, wherein the visual portion of the speech expression is determined based on the target image expression, the speech portion is determined based on the second speech content.

12. The method of claim 1, wherein determining a target media resource for generating a speech expression comprises:

determining a first work as the target media asset based on a preset operation associated with the first work; or (b)

And acquiring a picture or video shot by the user as the target media content.

13. The method of claim 1, further comprising:

displaying the published speech expression in an expression panel; and

based on the selection of the speech expression, the speech expression is sent into the conversation.

14. The method of claim 13, further comprising:

displaying the visual part of the transmitted voice expression and a play entry of the voice expression in a message window of the conversation; and

and playing the voice part of the voice expression based on the selection of the play entry.

15. The method of claim 13, further comprising:

providing a preview portal in the expression panel in association with the speech expression; and

the speech portion of the speech expression is played based on the selection of the preview portal.

16. The method of claim 1, wherein publishing the target expression comprises at least one of:

Publishing the speech expression as a second work, wherein at least one user is enabled to add the speech expression via the second work;

adding the speech expression so that the speech expression can be used for a current user;

and sharing the voice expression to at least one user.

17. A method of expression interaction, comprising:

receiving a selection of a speech expression; and

transmitting the speech expression into a target session, the speech expression including a visual portion and a speech portion,

wherein the speech expression is created based on editing operations associated with the target media asset, the editing operations including at least one of: a first operation for modifying the first visual content and/or the first voice content included in the target media asset, a second operation for adding the second visual content and/or the second voice content.

18. The method of claim 17, wherein receiving a selection of a speech expression comprises:

presenting an expression panel in a session interface of the target session; and

a selection is received for the speech expression in the expression panel.

19. The method of claim 18, further comprising:

20. The method of claim 17, further comprising:

displaying the visual part of the transmitted voice expression and a play entry of the voice expression in a message window of the target session; and

21. An expression making method comprises the following steps:

based on target speech content, presenting a first set of candidate visual content determined based on the target speech content; and

based on a selection of a first visual content of the first set of candidate visual content, a speech expression is created, the speech expression comprising a visual portion and a speech portion, wherein the visual portion is determined based on the first visual content and the speech portion is determined based on the target speech content.

22. The method of claim 21, wherein the first set of candidate visual content is generated based on text content corresponding to the target speech content.

23. An expression making method comprises the following steps:

based on the target text content, presenting a second set of candidate visual content determined based on the target text content; and

based on the selection of the second visual content of the second set of candidate visual content, a speech expression is created, the speech expression comprising a visual portion and a speech portion, wherein the visual portion is determined based on the target visual content and the speech portion is generated based on the target text content.

24. The method of claim 23, further comprising:

providing a set of candidate speech styles; and

a selection of a target speech style of the set of candidate speech styles is received such that the speech portion of the speech expression is generated based on the target speech style.

25. An apparatus for expression interaction, comprising:

a determining module configured to determine a target media resource for generating a speech expression;

a creation module configured to create a speech expression based on the received editing operation, the speech expression including a visual portion and a speech portion, the editing operation including at least one of: a first operation for modifying first visual content and/or first voice content included in the target media resource, a second operation for inputting second visual content and/or second voice content; and

A publication module configured to publish the speech expression such that the speech expression can be sent into a conversation associated with a target user based on the selection of the target user.

26. An electronic device, comprising:

at least one processing unit; and

at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, which when executed by the at least one processing unit, cause the electronic device to perform the method of any one of claims 1 to 16, 17 to 20, 21 to 22, or 23 to 24.

27. A computer readable storage medium having stored thereon a computer program executable by a processor to implement the method of any one of claims 1 to 16, 17 to 20, 21 to 22 or 23 to 24.