CN117556066A

CN117556066A - Multimedia content generation method and electronic equipment

Info

Publication number: CN117556066A
Application number: CN202311403165.9A
Authority: CN
Inventors: 田敬; 高原
Original assignee: Beijing Zhilohuo Technology Co ltd
Current assignee: Beijing Zhilohuo Technology Co ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-02-13

Abstract

The application discloses a multimedia content generation method and electronic equipment. The multimedia content generation method comprises the following steps: receiving an instruction for associating the audio with the target image, which is input by a user, and associating the audio with the target image; forming a plurality of target images associated with the target audio content into multimedia content; the multimedia content is opened, a plurality of target images are displayed according to a preset display sequence, and associated target audio content is played when the target images are displayed.

Description

Multimedia content generation method and electronic equipment

Technical Field

The application belongs to the technical field of computers, and particularly relates to a multimedia content generation method and electronic equipment.

Background

With the development of computer technology, many applications for content editing and generation by users have appeared. At present, the application software is generally complex and inconvenient to use, and the user is not easy to get up and has poor experience.

Disclosure of Invention

The embodiment of the application aims to provide a multimedia content generation method and electronic equipment, which can enable a user to generate multimedia content in a more convenient mode.

In a first aspect, an embodiment of the present application provides a method for generating multimedia content, including: receiving an instruction for associating the audio with the target image, which is input by a user, and associating the audio with the target image; forming a plurality of target images associated with the target audio content into multimedia content; the multimedia content is opened, a plurality of target images are displayed according to a preset display sequence, and associated target audio content is played when the target images are displayed.

Optionally, the forming the plurality of target images associated with the target audio content into the multimedia content includes: storing the plurality of target images, the plurality of target audio contents and the first play control parameter into multimedia contents; the first playing control parameters comprise an association relation between the target images and the target audio content and a display sequence of the plurality of target images.

Optionally, the receiving the instruction for associating the audio with the target image input by the user associates the target audio content with the target image, including: receiving an instruction input by a user for associating audio with a target image; responding to the instruction, and displaying a recording control corresponding to the target image; receiving input of a user for the recording control; responding to the input of a user to the recording control, and starting an audio recording function of the terminal equipment; and taking the recorded audio content as target audio content, and establishing an association relationship between the target image and the target audio content.

Optionally, the method further comprises: setting a cover of the multimedia content based on the first cover image; the first cover image is an image selected from a plurality of target images by a user, or the first cover image is a target image with the longest target audio content.

Optionally, the method further comprises: generating a name of the multimedia content based on the characteristics of the multimedia content; wherein the characteristics of the multimedia content include any one or any combination of the following: the method comprises the steps of a user name, a forming time of the multimedia content, a shooting time of a target image in the multimedia content, a shooting place of the target image in the multimedia content, a key picture element extracted from the target image in the multimedia content and a key word extracted from target audio content associated with the target image in the multimedia content.

Optionally, the method comprises: obtaining background sound set by a user, and adding target background sound and second play control parameters into the multimedia content, wherein the second play control parameters comprise first volume of target audio content and second volume of background sound; the multimedia content is opened, then a plurality of target images are displayed according to a preset display sequence, background sound is played at a second volume, and associated target audio content is played at a first volume when a picture of the target images is displayed.

Optionally, the target image is an audio video; the method further comprises adding a third play control parameter to the multimedia content, wherein the third play control parameter comprises a first volume of the target audio content and a third volume of the target image; and displaying a plurality of target images according to a preset display sequence after the multimedia content is opened, and playing the original sound of the target images at a third volume and the associated target audio content at a first volume when the pictures of the target images are displayed.

Optionally, the target image is a video; in the event that the duration of the target imagery is less than the duration of its associated target audio content, the multimedia content is configured to play the target imagery in a recurring manner until the end of its associated target audio content play; and/or, in case the duration of the target video is greater than the duration of its associated target audio content, the multimedia content is configured to end the presentation of its associated target video in case the playing of the target audio content ends.

Optionally, the target image is any one of a picture, an audio video and a silent video; the target image is an image shot by the user through the terminal equipment, and/or the target image is an image selected by the user from an image library of the terminal equipment; the target audio content is audio content recorded by a user through the terminal equipment, and/or the target audio content is audio content selected by the user from an audio content library of the terminal equipment, and/or the target audio content is audio content synthesized through a target text.

Optionally, the preset display sequence is any one of the following: the user selects the sequence of a plurality of target images from the image library; after a user selects a plurality of target images from the image library, the sequence of the plurality of target images is set; the user associates a sequence of target audio content for a plurality of target images; using the sequence of shooting time of a plurality of target images; an order determined based on semantic logic of the target audio content associated with the plurality of target imagery.

In a second aspect, an embodiment of the present application provides a method for generating multimedia content, including: receiving an instruction for associating the target audio content with the image, wherein the instruction is input by a user and is used for associating the target audio content with the target image; forming a plurality of target audio contents associated with the target image into multimedia contents; the multimedia content plays a plurality of target audio contents according to a preset playing sequence after being opened, and the associated target images of the target audio contents are displayed when the target audio contents are played.

Optionally, the forming the plurality of target audio contents associated with the target image into the multimedia contents includes: storing a plurality of target audio contents, a plurality of target images and a first play control parameter into multimedia contents; the first playing control parameters comprise the association relation between the target audio content and the target image and the playing sequence of the plurality of target audio contents.

In a third aspect, an embodiment of the present application provides a method for generating multimedia content, where the method includes: receiving an instruction for associating images with target audio content, wherein the instruction is input by a user and associates the target images with at least one target moment of the target audio content; forming a multimedia content from the target audio content associated with the target image; the multimedia content plays the target audio content after being opened, and displays the associated target image when the multimedia content is played to the target moment of the target audio content.

Optionally, the forming the target audio content associated with the target image into the multimedia content includes: storing the target audio content, at least one target image and the first play control parameter as multimedia content; the first playing control parameter includes a target time associated with a target image.

Optionally, the associating the target image for at least one target time of the target audio content includes: acquiring a plurality of target images; and determining the target moment associated with each target image in the target audio content based on the recording time of the target audio content and the shooting time of the target image.

Optionally, the associating the target image for at least one target time of the target audio content includes: acquiring a plurality of target images; acquiring a first target moment appointed by a user in target audio content; the target time associated with each target image in the target audio content is determined based on the first target time of the target audio content and the times of capture of the plurality of target images.

In a fourth aspect, embodiments of the present application provide an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a seventh aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

According to the method for generating the multimedia content, the target audio content can be associated with the target image according to the instruction of the target image associated audio input by the user, then the plurality of target images associated with the target audio content are formed into the multimedia content, and the multimedia content is configured to display the plurality of target images according to the preset display sequence after being opened, and play the associated target audio content when the target images are displayed. By the method, the user can generate the multimedia content very simply and conveniently, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. Other relevant drawings may be made by those of ordinary skill in the art without undue burden from these drawings.

FIG. 1 is a schematic diagram of an electronic device that may be used to implement an embodiment of the present disclosure, provided by one embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of generating multimedia content in accordance with an embodiment of the present disclosure;

FIG. 3 (a) is an interface diagram of a target image associated audio according to an embodiment of the disclosure;

FIG. 3 (b) is an interface diagram of a target image associated audio according to an embodiment of the disclosure;

FIG. 4 is a flow chart of a method of generating multimedia content in accordance with an embodiment of the present disclosure;

FIG. 5 is a flow chart of a method of multimedia content generation of an embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device of an embodiment of the present disclosure.

Detailed Description

Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type and do not limit the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

It should be noted that, all actions for acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

< hardware configuration >

Fig. 1 is a schematic structural diagram of an electronic device that may be used to implement embodiments of the present disclosure. The electronic device may be used to implement the multimedia content generation method of the embodiments of the present disclosure.

The electronic device 1000 may be a smart phone, a portable computer, a desktop computer, a tablet computer, a server, etc., and is not limited herein.

The electronic device 1000 may include, but is not limited to, a processor 1100, a memory 1200, a communication device 1300, an image capturing device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a graphics processor GPU, a microprocessor MCU, etc. for executing computer programs/instructions, which may be written in an instruction set of an architecture such as x86, arm, RISC, MIPS, SSE, etc. The memory 1200 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The communication device 1300 can perform wired communication using an optical fiber or a cable, or perform wireless communication, for example, and specifically can include WiFi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The image pickup device 1400 is capable of image pickup, and for example, photographing and image pickup can be performed. The display device 1500 is, for example, a liquid crystal display, a touch display, or the like. The input device 1600 may include, for example, a touch screen, keyboard, somatosensory input, and the like. The speaker 1700 is for outputting audio signals. Microphone 1800 is used to collect audio signals. The electronic device 1000 may also include interface means (not shown in fig. 1) including, for example, a USB interface, a serial interface, a parallel interface, etc.

The memory 1200 of the electronic device 1000 is used for storing a computer program/instructions for controlling the processor 1100 to operate to implement the multimedia content generating method according to the embodiments of the present disclosure. The skilled person can design the computer program/instructions according to the disclosed aspects of the present disclosure. How the computer program/instructions control the processor to operate is well known in the art and will not be described in detail here. The electronic device 1000 may be installed with an intelligent operating system (e.g., windows, linux, android, IOS, etc. systems) and application software.

It will be appreciated by those skilled in the art that although a plurality of devices of the electronic device 1000 are shown in fig. 1, the electronic device 1000 of the embodiments of the present disclosure may relate to only some of the devices therein, for example, to the processor 1100 and the memory 1200, etc. For example, the processor 1100, the memory 1200, the imaging device 1400, the display device 1500, the speaker 1700, the microphone 1800, and the like are involved.

< method example one >

Referring to fig. 2, an embodiment of the present disclosure provides a multimedia content generating method, which is applicable to an electronic device, and includes steps S102 to S104.

Step S102, receiving an instruction for associating the audio with the target image, which is input by the user, and associating the audio with the target image.

Step S104, forming a plurality of target images associated with the target audio content into the multimedia content. The multimedia content is opened, a plurality of target images are displayed according to a preset display sequence, and associated target audio content is played when the target images are displayed.

Compared with the situation that the traditional video editing software is complex to operate and the user needs to learn to use, the multimedia content generating method can form the multimedia content only by associating the target image with the target audio content, is convenient for the user to operate by hands, and greatly shortens the time required for manufacturing the multimedia content. Many scenes in daily life can be clear only by matching audio and images, and complex multimedia display effects are not needed.

In some embodiments, the target image may be a picture. The picture may be a still picture or a moving picture. In some embodiments, the target image may be a video. The video may be an audio video or a non-audio video. For audio video, the audio that the video script carries is referred to as "sound". In some embodiments, the plurality of target images in the multimedia content may be all pictures, all videos, or both pictures and videos.

In some embodiments, the target image is an image taken by the user through the terminal device. In some embodiments, the target image is an image selected by the user from an image library of the terminal device.

In some embodiments, the targeted audio content is audio content recorded by the user via the terminal device. In some embodiments, the target audio content is audio content selected by the user from an audio content library of the terminal device. In some embodiments, the target audio content is audio content synthesized by the target text, such as audio content generated by a speech synthesis technique. In some embodiments, the target audio content is audio content synthesized from target content (e.g., pictures, video, etc.). In some embodiments, the target audio content is, for example, audio content generated by AI technology.

In some embodiments, the method includes steps S106-S108 prior to receiving a user entered instruction for target video associated audio.

Step S106, receiving the selection input of the user for the images in the image library of the terminal equipment.

Step S108, responding to the selection input of the user for the images in the image library of the terminal equipment, and selecting a target image.

For example, a user opens an album of the electronic device, selects one or more pictures in the album, and the picture selected by the user is the target image.

In some embodiments, associating the target audio content for the target image may be establishing an association between the target image and the target audio content. For example, the identification of the target image and the identification of the target audio are recorded in an associated manner. For example, the identification of the target video, the storage address of the target video, the identification of the target audio content, and the storage address of the target audio content are recorded in an associated manner.

In some embodiments, the receiving the instruction for the target image associated audio input by the user is to associate the target audio content with the target image, including steps S1021-S1025.

In step S1021, an instruction for associating audio with the target image is received.

Step S1022, in response to the instruction of the user to the target image associated audio, displaying the recording control corresponding to the target image.

Step S1023, receiving user input for a recording control.

In step S1024, the audio recording function of the terminal device is turned on in response to the user input of the recording control.

Step S1025, taking the recorded audio content as target audio content, and establishing an association relationship between the target image and the target audio content.

Referring to fig. 3 (a), after a user inputs an instruction for the target image associated audio to the electronic device, the electronic device displays a functional interface for the user to use as the target image associated audio. The functional interface is displayed with a target image and a recording control P1 marked with a microphone pattern, a user can open a microphone of the electronic device to record by clicking the recording control P1, the user can stop recording by clicking the recording control P1 again, and the electronic device can then take recorded audio content as target audio content to establish an association relationship between the target image and the target audio content.

In some embodiments, the recording duration is displayed simultaneously in the process of recording the audio content, so that the user can know the progress of recording the audio content. In some embodiments, the real-time volume of the recorded audio content is displayed simultaneously in the process of recording the audio content, so that a user can conveniently perceive the volume of the recorded audio content, and the volume is adjusted immediately when needed.

In some embodiments, the method comprises: displaying a play control and a confirmation control; playing the audio content recorded at this time under the condition of receiving the input of a user for the playing control; and under the condition that the input of the user for the confirmation control is received, establishing an association relationship between the currently displayed target image and the audio content recorded at the time.

Referring to fig. 3 (a), the functional interface also displays a play control P2 indicating a horn pattern and a confirm control P3 indicating an "OK" word. After the user finishes one-time recording by using the recording control P1, the user can click the playing control P2 to listen. If satisfied after the listening test, the confirmation control P3 can be clicked to confirm. After the user confirms, the electronic equipment establishes an association relationship between the currently displayed target image and the audio content recorded at the time.

In some embodiments, the method comprises: displaying a switching control; receiving input of a user for the switching control; and responding to the input of the user for the switching control, and switching the target image currently displayed on the audio configuration interface.

Referring to fig. 3 (b), two switching controls P4 are also displayed in the functional interface, one of which is a forward switching control and the other of which is a backward switching control. The user clicks the switch control P4 to switch and display the previous target image or the next target image, so as to associate the target audio content for the previous target image or the next target image.

In some embodiments, the switch control is also a confirmation control, and after the user switches to display another target image, the associated target audio content is completed for the target image displayed before the switch.

In some embodiments, forming the plurality of target imagery associated with the target audio content into the multimedia content may be generating video from the target imagery and the target audio content. For example, the target imagery and target audio content are encoded into video files using audio-video encoding techniques and specifications. For example, the target audio content is written in an audio track of the target video.

In some embodiments, forming the plurality of target imagery associated with the target audio content into multimedia content may include: the plurality of target images, the plurality of target audio contents, and the first play control parameter are stored as multimedia contents. The first playing control parameters comprise an association relation between the target images and the target audio content and a display sequence of the plurality of target images. In this way, the processing of the target video and target audio content itself, i.e., whether the original target video and original target audio content is in the formed multimedia content, may not be required. When the multimedia content is played through the player, the player acquires the first playing control parameters, sequentially displays a plurality of target images according to the display sequence in the first playing control parameters, and plays the target audio content associated with the currently displayed target image according to the association relation in the first playing control parameters when displaying one of the target images.

That is, the method for generating multimedia content according to the embodiments of the present disclosure may form multimedia content in a non-traditional video format, the forming process does not occupy too much computing power resources of the electronic device, and the formed multimedia content is much smaller in data size than the traditional video, so that the method is convenient for storage and daily sharing, and is suitable for instant messaging scenarios.

In some embodiments, after the user selects the plurality of target images from the image library, the order of the plurality of target images set by the user is used as the display order.

In some embodiments, the order in which the user selects the plurality of target images from the image library may be used as the display order.

In some embodiments, the order in which the user associates the target audio content for the plurality of target images may be referred to as the presentation order. This approach helps to set the display order of the plurality of target images according to the explanation thought of the user.

In some embodiments, the sequence of the shooting times of the plurality of target images may be used as the display sequence. This approach helps to set the presentation order of the plurality of target images in the order of the actual occurrence of the event.

In some embodiments, the order may be determined based on semantic logic of the target audio content associated with the plurality of target imagery. For example, the target audio content associated with the target image a states the registration process, the target audio content associated with the target image B states the doctor-seeing process, and the target audio content associated with the target image C states the home-returning process, and the display order may be determined to be the target image a, the target image B, and the target image C in order based on the semantic logic of the target audio content.

In some embodiments, the user may preset a default manner of the display order, for example, default to the order in which the user associates the target audio content for the plurality of target images or to the order in which the plurality of target images are captured by the user as the display order of the plurality of target images.

In some embodiments, the method further comprises step S106: a cover of the multimedia content is set based on the first cover image.

For example, the first cover image is an image selected by a user from a plurality of target images.

For example, the target image with the front display order is taken as the first cover image.

For example, the target image with the longest associated target audio content is taken as the first cover image. The longer the target audio content is, the longer the associated target image needs to be displayed, and the longest target image of the associated target audio content is taken as the first cover image, so that a viewer can know the central thought of the multimedia content from the cover.

For example, the head portrait of the user is taken as the first cover image.

In the embodiment of the disclosure, the cover may be a still picture, a moving picture, or a video.

When the first cover image is a picture, the first cover image can be directly set as a cover of the multimedia content, and the cover in the multimedia can be generated based on the first cover image.

When the first cover image is a video, the video can be directly set as a cover of the multimedia content, one frame of image frame can be selected from the video according to a preset algorithm to serve as the cover of the multimedia content, and the cover in the multimedia can be generated based on the selected image frame.

The method for generating the cover in the multimedia based on the picture or the video frame can be used for partially cutting the picture or the video frame, performing a certain style processing, or generating the cover in the multimedia by a preset image generation algorithm and the like.

In some embodiments, the user may enter the name of the multimedia content by himself.

In some embodiments, the method further comprises step S108: the name of the multimedia content is generated based on the characteristics of the multimedia content. The characteristics of the multimedia content include any one or a combination of the following: the method comprises the steps of a user name, a multimedia content forming time, a shooting time of a target image in the multimedia content, a shooting location of the target image in the multimedia content, a key picture element extracted from the target image in the multimedia content, and a key word extracted from target audio content associated with the target image in the multimedia content.

For example, the names of the multimedia contents may be generated by a text generation algorithm based on one or more items of the multimedia contents under a certain word count limit. For example, the name of "three-trip record of XX" is generated based on the name of the user, the shooting time of the target image, and the shooting place of the target image.

In some embodiments, the method includes step S110: and obtaining background sound set by the user, and adding a target background sound and a second play control parameter into the multimedia content, wherein the second play control parameter comprises a first volume of the target audio content and a second volume of the background sound. The multimedia content is opened, a plurality of target images are displayed according to a preset display sequence, background sound is played at a second volume, and associated target audio content is played at a first volume when a picture of the target images is displayed.

In this way, it is possible to add background sounds to multimedia contents and support a user to set the volume of the background sounds and the volume of target audio contents according to his/her needs. For example, the second volume of the background sound is set to be 30 units, the first volume of the target audio content is set to be 100 units, when the multimedia content is played by the player, the player displays a plurality of target images according to a preset display sequence and continuously plays the background sound at 30 units, and when displaying one target image, the target audio content associated with the currently displayed target image is played at 100 units, so that a viewer can hear the target audio content with larger volume and the background sound with smaller volume.

In some embodiments, the target image is an audio video. The method further comprises step S112: and adding a third play control parameter in the multimedia content, wherein the third play control parameter comprises the first volume of the target audio content and the third volume of the target image. The multimedia content is opened to display a plurality of target images according to a preset display sequence, and when the images of the target images are displayed, the original sound of the target images is played at a third volume, and the associated target audio content is played at a first volume.

In this way, in the case where the target image is an audio-visual, the user is supported to set the volume of the target image and the volume of the target audio content according to his/her needs. For example, the volume of the target image is set to 20 units, the first volume of the target audio content is set to 100 units, when the multimedia content is played by the player, the player displays a plurality of target images according to a preset display sequence, when the target image with sound video is displayed, the original sound of the target image is played according to the volume of 20 units, and the target audio content associated with the target image is played according to the volume of 100 units, so that a viewer hears the target audio content with larger volume and the background sound with smaller volume.

In one example, if the viewer does not wish to hear the original sound of the target image, the third volume of the target image may be set to zero.

In some embodiments, the target image is video. The method further comprises step S114: in the event that the duration of the target imagery is less than the duration of its associated target audio content, the multimedia content is configured to play the target imagery in a recurring manner until the end of its associated target audio content play.

For example, a fourth playing parameter is added to the multimedia content, where the fourth playing parameter is used to indicate that, when the target image is displayed, if the duration of the target image is smaller than the duration of the associated target audio content, the target image is played in a cyclic manner, and after the playing of the target audio content associated with the target image is finished, the next target image is displayed again.

In some embodiments, the target image is video. The method further comprises step S116: and under the condition that the duration of the target image is smaller than that of the associated target audio content, continuously maintaining the display of the last frame after the display of the last frame of the target image until the playing of the associated target audio content is finished.

For example, a fifth playing parameter is added to the multimedia content, where the fifth playing parameter is used to indicate that when the target image is displayed, if the duration of the target image is smaller than the duration of the associated target audio content, the display of the last frame is continued after the display of the last frame of the target image, and the display of the next target image is started after the display of the target audio content associated with the target image is completed.

In some embodiments, the target image is video. The method further comprises step S118: in the event that the duration of the target video is greater than the duration of its associated target audio content, the multimedia content is configured to end the presentation of its associated target video in the event that the playback of the target audio content ends.

For example, a sixth playing parameter is added to the multimedia content, where the sixth playing parameter is used to indicate that when the target image is displayed, if the duration of the target image is longer than the duration of the associated target audio content, the display of the target image is terminated when the playing of the target audio content is finished, and the display of the next target image is started.

In some embodiments, after the multimedia content is formed, an address link of the multimedia content may be further created, so that the user may share and forward the multimedia content, and further improve the user experience.

The multimedia content method disclosed by the embodiment of the invention can be suitable for various scenes, and can enable a user to form multimedia content in a simple and quick way so as to meet the demands of the user.

For example, when a user wants to quickly combine a plurality of images to convey own ideas to others, according to the multimedia content generation method of the embodiment of the present disclosure, the user can simply and quickly form multimedia content meeting the conveying requirements of the user only by associating each image with a target audio one by one, and the user is not required to manually edit the plurality of image contents into a complete image, manually remove the original sound therein, manually drag a progress bar to record a new audio track for the complete image, and other complex operations. That is, the user can form multimedia content for communication to others in an extremely short time. For viewers, the multimedia content can be automatically and continuously played as long as the viewers click to open, and the viewers are not required to do redundant operations, so that the multimedia content can not occupy much space in the device.

For example, applying the multimedia content generating method of the embodiment of the present disclosure to children may include the following steps:

And step P01, the child starts the multimedia content generation function on the electronic equipment, and the electronic equipment automatically opens the album.

And step P02, the child selects N pictures in the album as target images, and after the child finishes selecting, the electronic equipment displays a functional interface for the user to associate audio for the target images.

And step P03, displaying the 1 st target image set by the child above the functional interface, displaying the recording control below, and displaying the switching control for switching the next target image on the lower right side.

And step P04, the child can open the microphone of the electronic equipment to record the audio of the 1 st target image by clicking the recording control, and can end the audio recording of the 1 st target image by clicking the recording control again.

And step P05, after finishing the audio recording of the 1 st target image, the electronic equipment takes the audio content just recorded as the target audio content of the 1 st target image, and establishes the association relationship between the target image of the 1 st target image and the target audio content of the 1 st target image.

And step P06, the child clicks a switching control on the lower right side, the 2 nd target image set by the child is displayed on the upper side of the functional interface, the recording control is displayed on the lower side of the functional interface, and the switching control for switching the next target image is displayed on the lower right side of the functional interface.

And P07, the child can open the microphone of the electronic equipment to record the audio of the 2 nd target image by clicking the recording control, and can end the audio recording of the 2 nd target image by clicking the recording control again.

And step P08, after finishing the audio recording of the 2 nd target image, the electronic equipment takes the audio content just recorded as the target audio content of the 2 nd target image, and establishes the association relationship between the target image of the 2 nd target image and the target audio content of the 2 nd target image.

And step P09, the child clicks the switching control on the lower right side, the 3 rd target image set by the child is displayed on the upper side of the functional interface, the recording control is displayed on the lower side of the functional interface, and the switching control for switching the next target image is displayed on the lower right side of the functional interface.

In the above manner, the target audio content can be continuously associated for the 3 rd target image and the subsequent target image.

After the child associates the target audio content with the nth target image, the electronic equipment automatically forms the multimedia content and displays the multimedia content, and the 1 st target image set by the child is used as a cover of the multimedia content during display.

After the child associates the target audio content with the nth target image, the electronic device prompts the child to input the name of the multimedia content and displays the recording control again, and the child can submit the name of the multimedia content by using the recording control.

In this way, children can use the multimedia content generation method to publish story works/life records and the like and share the story works/life records to their own partners, so that more content to be expressed can be added and conveyed compared with simple picture sharing. It can be seen that the multimedia content generating method according to the embodiment of the present disclosure is very friendly for children who are illiterate or who cannot type, and children can easily make their own multimedia content.

< method example two >

Based on the technical concept of the first method embodiment, the present application provides a multimedia content generating method capable of associating a target image with a target audio content and forming a multimedia content, see a second method embodiment. The steps applicable to the second method embodiment in the first method embodiment can be cited into the second method embodiment, and the technical effects thereof can also be referred to as the technical effects of the first method embodiment.

Referring to fig. 4, an embodiment of the present disclosure provides a multimedia content generating method, which is applicable to an electronic device, and includes steps S202 to S204.

Step S202, receiving an instruction for associating the target audio content with the target image, wherein the instruction is input by a user and is used for associating the target audio content with the target image.

Step S204, forming the plurality of target audio contents associated with the target image into multimedia contents. The multimedia content plays a plurality of target audio contents according to a preset playing sequence after being opened, and the associated target images of the target audio contents are displayed when the target audio contents are played.

According to the method for generating the multimedia content, the target audio content is associated with the target image according to the instruction for associating the target audio content with the image, which is input by the user, and then the plurality of target audio contents associated with the target image are formed into the multimedia content. By the method, the user can generate the multimedia content very simply and conveniently, and the user experience is improved.

Compared with the situation that the traditional video editing software is complex to operate and the user needs to learn to use, the multimedia content generating method can form the multimedia content only by associating the target audio content with the target image, is convenient for the user to operate by hands, and greatly shortens the time required for manufacturing the multimedia content. Many scenes in daily life can be clear only by matching audio and images, and complex multimedia display effects are not needed.

In some embodiments, the method includes steps S206-S208 prior to receiving a user entered instruction for a target audio content-associated presentation.

Step S206, receiving the selection input of the user for the audio in the audio library of the terminal equipment.

Step S208, responding to the selection input of the user for the audio in the audio library of the terminal equipment, and selecting the target audio content.

In some embodiments, associating the target video for the target audio content may be establishing an association between the target audio content and the target video. For example, the identification of the target image and the identification of the target audio are recorded in an associated manner. For example, the identification of the target video, the storage address of the target video, the identification of the target audio content, and the storage address of the target audio content are recorded in an associated manner.

In some embodiments, the forming the plurality of target audio content associated with the target video into the multimedia content may be generating the video from the target video and the target audio content. For example, the target imagery and target audio content are encoded into video files using audio-video encoding techniques and specifications. For example, the target audio content is written in an audio track of the target video.

In some embodiments, the forming the plurality of target audio content associated with the target imagery into multimedia content includes: the plurality of target audio contents, the plurality of target images and the first play control parameter are stored as multimedia contents. The first playing control parameters comprise the association relation between the target audio content and the target image and the playing sequence of the plurality of target audio contents. In this way, the processing of the target video and target audio content itself, i.e., whether the original target video and original target audio content is in the formed multimedia content, may not be required. When the multimedia content is played through the player, the player acquires a first playing control parameter, sequentially plays a plurality of target audio contents according to the playing sequence in the first playing control parameter, and displays a target image associated with the currently played target audio content according to the association relation in the first playing control parameter when playing one of the target audio contents.

In some embodiments, the preset playing sequence is any one of the following:

(1) The user selects an order of a plurality of target audio contents from the audio library.

(2) After the user selects a plurality of target audio contents from the audio library, the order of the plurality of target audio contents is set.

(3) The user associates a sequence of target imagery for a plurality of target audio content.

(4) The recording time sequence of the plurality of target audio contents.

(5) Sequencing shooting time of a plurality of target images;

(6) An order determined based on semantic logic of the plurality of target audio content.

Step S206, receiving a dividing operation of a user on the first audio content.

In step S208, the first audio content is divided into a plurality of target audio contents in response to a division operation of the first audio content by the user.

In this way, the first audio content can be divided into a plurality of target audio contents, and then the target images are associated one by one for the respective target audio contents.

In the case that the target image is an audio video, the playing control parameter related to the volume may be added to the multimedia content to meet the user requirement, and in particular, the first embodiment of the method may be referred to, which is not described herein. In the case where the duration of the target audio content and its associated target video are not identical, reference may be made to the related content of the first method embodiment.

For example, when a user wants to quickly combine multiple images to convey own ideas to others, according to the multimedia content generation method of the embodiment of the disclosure, the user can simply and quickly form the multimedia content meeting the conveying requirements by associating each piece of target audio content with each piece of target image, and the user can form the multimedia content to the others in a very short time. For viewers, the multimedia content can be automatically and continuously played as long as the viewers click to open, and the viewers are not required to do redundant operations, so that the multimedia content can not occupy much space in the device.

< method example three >

Based on the technical concept of the first method embodiment, the present application provides a multimedia content generating method capable of associating a target image with a target audio content and forming a multimedia content, see a third method embodiment. The steps applicable to the third method embodiment in the first and second method embodiments can be cited into the third method embodiment, and the technical effects thereof can also be referred to as the technical effects of the first and second method embodiments.

Referring to fig. 5, an embodiment of the present disclosure provides a multimedia content generating method, which is applicable to an electronic device, and includes steps S302 to S304.

In step S302, an instruction for associating images with target audio content, which is input by a user, is received, and the target images are associated for at least one target time of the target audio content.

Step S304, forming the target audio content associated with the target image into multimedia content. The multimedia content plays the target audio content after being opened, and displays the associated target image when the multimedia content is played to the target moment of the target audio content.

According to the method for generating the multimedia content, the target image can be associated with at least one target moment of the target audio content according to the instruction for the target audio content to associate the image, which is input by a user, then the target audio content associated with the target image is formed into the multimedia content, and the multimedia content is configured to play the target audio content after being opened and display the associated target image when the target moment of the target audio content is played. By the method, the user can generate the multimedia content very simply and conveniently, and the user experience is improved.

In some embodiments, associating the target imagery for at least one target time of the target audio content may be establishing an association between respective target times of the target audio content and the target imagery. For example, the target time of the target audio content and the identification of the corresponding target image thereof are recorded in a correlated manner. For example, the identification of the target audio content and the storage address of the target audio content are recorded, and the target time of the target audio content, the identification of the corresponding target image and the storage address association of the corresponding target image are recorded.

In some embodiments, the forming the target audio content associated with the target imagery into the multimedia may be generating video from the target imagery and the target audio content. For example, the target imagery and target audio content are encoded into video files using audio-video encoding techniques and specifications. For example, the target audio content is written in an audio track of the target video.

In some embodiments, the forming the target audio content associated with the target imagery into multimedia content includes: storing the target audio content, the at least one target image, and the first play control parameter as multimedia content. The first playing control parameter includes a target time associated with a target image. In this way, the processing of the target video and target audio content itself, i.e., whether the original target video and original target audio content is in the formed multimedia content, may not be required. When the multimedia content is played through the player, the player acquires a first playing control parameter, plays the target audio content, and displays the target image associated with the current target time according to the association relation in the first playing control parameter when the target audio content is played to a certain target time.

In some embodiments, the associating the target image for the at least one target time of the target audio content includes steps S3021 to S3022.

In step S3021, a plurality of target images are acquired.

In step S3022, a target time associated with each target image in the target audio content is determined based on the recording time of the target audio content and the capturing time of the target image.

For example, in a large-scale live conference, a user plays a speaking courseware made by a speaker on a conference screen, the user records the live speech of the speaker in the whole process (i.e. obtains target audio content), and shoots an image displayed on the conference screen when encountering a place of interest (i.e. obtains a target image), so that the target time corresponding to each target image in the target audio content can be determined based on matching between the recording time of the target audio content and the shooting time of the target image.

By the method, the target images can be automatically associated with at least one target moment of the target audio content, so that user operation is saved, and user experience is improved.

In some embodiments, the associating the target image for the at least one target time of the target audio content includes steps S3023 to S3025.

In step S3023, a plurality of target images are acquired.

In step S3024, a first target time designated by the user in the target audio content is acquired.

In step S3025, a target time associated with each target image in the target audio content is determined based on the first target time of the target audio content and the capturing times of the plurality of target images.

For example, according to the sequence of the shooting times of the plurality of target images, the sequence of the target moments which need to be corresponding is determined. And associating the designated first target moment with the first shot target image. Then, based on the time interval between the shooting times of the remaining target images and the first shot target image, the time interval is moved backward from the first target time as the target time corresponding to the remaining target image. For example, a total of 3 target images are sequentially a target image a, a target image B, and a target image C in the order of photographing time, the photographing interval of the target image a and the target image B is 5 minutes, and the photographing interval of the target image a and the target image C is 15 minutes. The first target time of the target audio content designated by the user is 15 minutes after the start, the target time associated with the target image a is 15 minutes of the target audio content, the target time associated with the target image B is 20 minutes of the target audio content, and the target time associated with the target image C is 30 minutes of the target audio content.

In the case that the target image is an audio video, the playing control parameter related to the volume may be added to the multimedia content to meet the user requirement, and in particular, the first embodiment of the method may be referred to, which is not described herein. In the case where the time period to which the target time of the target audio content belongs (i.e., the time period between the target time and the next target time) and the duration of the associated target image thereof are inconsistent, a similar processing manner to that of the related content of the method embodiment one may be adopted.

< device example >

Referring to fig. 6, an embodiment of the present application provides an electronic device, which includes a processor M01 and a memory M02, where the memory M02 stores a program or an instruction executable on the processor M01, and the program or the instruction implements the multimedia content generating method as disclosed in any one of the foregoing embodiments when executed by the processor M01.

The present embodiments provide a readable storage medium having stored thereon a program or instructions that when executed by a processor implement a multimedia content generation method as disclosed in any of the foregoing embodiments.

The embodiment of the application provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the method for generating the multimedia content as disclosed in any one of the previous embodiments.

Embodiments of the present application provide a computer program product stored in a storage medium for execution by at least one processor to implement a multimedia content generation method as disclosed in any of the previous embodiments.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method of generating multimedia content, comprising:

receiving an instruction for associating the audio with the target image, which is input by a user, and associating the audio with the target image;

forming a plurality of target images associated with the target audio content into multimedia content;

the multimedia content is opened, a plurality of target images are displayed according to a preset display sequence, and associated target audio content is played when the target images are displayed.

2. The method of claim 1, wherein forming the plurality of target imagery associated with the target audio content into multimedia content comprises:

Storing the plurality of target images, the plurality of target audio contents and the first play control parameter into multimedia contents;

the first playing control parameters comprise an association relation between the target images and the target audio content and a display sequence of the plurality of target images.

3. The method of claim 1, wherein the user-entered instruction for target video-associated audio is a target video-associated target audio content, comprising:

receiving an instruction input by a user for associating audio with a target image;

responding to the instruction, and displaying a recording control corresponding to the target image;

receiving input of a user for the recording control;

responding to the input of a user to the recording control, and starting an audio recording function of the terminal equipment;

and taking the recorded audio content as target audio content, and establishing an association relationship between the target image and the target audio content.

4. The method according to claim 1, wherein the method further comprises:

setting a cover of the multimedia content based on the first cover image;

the first cover image is an image selected from a plurality of target images by a user, or the first cover image is a target image with the longest target audio content.

5. The method according to claim 1, wherein the method further comprises:

generating a name of the multimedia content based on the characteristics of the multimedia content;

wherein the characteristics of the multimedia content include any one or any combination of the following:

the method comprises the steps of a user name, a forming time of the multimedia content, a shooting time of a target image in the multimedia content, a shooting place of the target image in the multimedia content, a key picture element extracted from the target image in the multimedia content and a key word extracted from target audio content associated with the target image in the multimedia content.

6. The method according to claim 1, characterized in that the method comprises:

obtaining background sound set by a user, and adding target background sound and second play control parameters into the multimedia content, wherein the second play control parameters comprise first volume of target audio content and second volume of background sound;

the multimedia content is opened, then a plurality of target images are displayed according to a preset display sequence, background sound is played at a second volume, and associated target audio content is played at a first volume when a picture of the target images is displayed.

7. The method of claim 1, wherein the target image is an audio video;

the method further comprises adding a third play control parameter to the multimedia content, wherein the third play control parameter comprises a first volume of the target audio content and a third volume of the target image;

and displaying a plurality of target images according to a preset display sequence after the multimedia content is opened, and playing the original sound of the target images at a third volume and the associated target audio content at a first volume when the pictures of the target images are displayed.

8. The method of claim 1, wherein the target image is a video;

in the event that the duration of the target imagery is less than the duration of its associated target audio content, the multimedia content is configured to play the target imagery in a recurring manner until the end of its associated target audio content play; and/or the number of the groups of groups,

in the case that the duration of the target video is longer than the duration of the associated target audio content, the multimedia content is configured to end the presentation of the associated target video if the playing of the target audio content ends.

9. The method of any one of claims 1-8, wherein the target image is any one of a picture, an audio video, a silent video;

the target image is an image shot by the user through the terminal equipment, and/or the target image is an image selected by the user from an image library of the terminal equipment;

the target audio content is audio content recorded by a user through the terminal equipment, and/or the target audio content is audio content selected by the user from an audio content library of the terminal equipment, and/or the target audio content is audio content synthesized through a target text.

10. The method according to any one of claims 1-8, wherein the predetermined display order is any one of:

the user selects the sequence of a plurality of target images from the image library;

after a user selects a plurality of target images from the image library, the sequence of the plurality of target images is set;

the user associates a sequence of target audio content for a plurality of target images;

sequencing shooting time of a plurality of target images;

an order determined based on semantic logic of the target audio content associated with the plurality of target imagery.

11. A method of generating multimedia content, comprising:

receiving an instruction for associating the target audio content with the image, wherein the instruction is input by a user and is used for associating the target audio content with the target image;

forming a plurality of target audio contents associated with the target image into multimedia contents;

the multimedia content plays a plurality of target audio contents according to a preset playing sequence after being opened, and the associated target images of the target audio contents are displayed when the target audio contents are played.

12. The method of claim 11, wherein forming the plurality of target audio content associated with the target imagery into multimedia content comprises:

storing a plurality of target audio contents, a plurality of target images and a first play control parameter into multimedia contents;

the first playing control parameters comprise the association relation between the target audio content and the target image and the playing sequence of the plurality of target audio contents.

13. A method of generating multimedia content, comprising:

receiving an instruction for associating images with target audio content, wherein the instruction is input by a user and associates the target images with at least one target moment of the target audio content;

Forming a multimedia content from the target audio content associated with the target image;

the multimedia content plays the target audio content after being opened, and displays the associated target image when the multimedia content is played to the target moment of the target audio content.

14. The method of claim 13, wherein forming the target audio content associated with the target imagery into multimedia content comprises:

storing the target audio content, at least one target image and the first play control parameter as multimedia content;

the first playing control parameter includes a target time associated with a target image.

15. The method of claim 13, wherein associating the target imagery for the at least one target time of the target audio content comprises:

acquiring a plurality of target images;

and determining the target moment associated with each target image in the target audio content based on the recording time of the target audio content and the shooting time of the target image.

16. The method of claim 13, wherein associating the target imagery for the at least one target time of the target audio content comprises:

acquiring a plurality of target images;

Acquiring a first target moment appointed by a user in target audio content;

the target time associated with each target image in the target audio content is determined based on the first target time of the target audio content and the times of capture of the plurality of target images.

17. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implements the multimedia content generation method of any of claims 1-16.