Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that although the terms first, second, third, etc. may be used to describe … … in embodiments of the present invention, these … … should not be limited to these terms. These terms are used only to distinguish … …. For example, the first … … can also be referred to as the second … … and similarly the second … … can also be referred to as the first … … without departing from the scope of embodiments of the present invention.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in the article or device in which the element is included.
Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Example 1
Fig. 1 is a flowchart of an implementation of a video file synthesis method according to an embodiment of the present invention, which is applied to a client. The video file synthesis method comprises the following steps:
s100, obtaining all current voice comments of the commented picture, wherein the commented picture comprises one or more pictures.
In the step, the voice comment is recorded through a voice comment component of the client, wherein when the stay time of the browsing page of the client reaches a preset threshold value, the voice comment component is displayed around a published content area in the browsing page. In the embodiment, in the process that the user browses the published content at the client, when the dwell time of the page browsed by the user reaches a preset threshold, the voice comment component is displayed to the user, and the voice comment component is displayed below the published content area, so that a user interface is concise and clear. The user records through the displayed voice comment component, generates the voice comment when the user is loose or the maximum recording duration of the voice comment component is reached, and stores the commented picture and the voice comment to a server or a cloud.
In this embodiment, please refer to fig. 2, the obtaining of all current voice comments of a commented picture includes:
and S101, providing a comment synthesis control. And the position of the comment synthesis control is not limited. Preferably, the comment synthesis control is disposed around the voice comment component.
And S102, responding to the operation of the comment synthesis control, and the client acquires all current voice comments of the commented picture from the server and displays the current voice comments through a user selection interface. The commented picture can be a picture published by the user, or a set of a plurality of pictures published by the user, such as a six-grid picture and a nine-grid picture.
S110, selecting a plurality of target voice comments from all the current voice comments;
in this step, the selecting a plurality of target voice comments from all the current voice comments includes: selecting a comment object from all the current voice comments as a plurality of target voice comments of the same picture in the commented picture; or the like, or, alternatively,
and selecting a comment object from all the current voice comments as a plurality of target voice comments of different pictures in the commented picture. That is, the plurality of target voice comments may be comments made on the same photo or comments made on different photos.
Specifically, the method for selecting a plurality of target voice comments from all the current voice comments is not limited. In this embodiment, the automatically identifying and selecting the multiple target voice comments through the client, specifically, referring to fig. 3, the selecting multiple target voice comments from all the current voice comments includes:
s111, calling the timestamps of all the current voice comments;
s112, selecting the voice comments with the same comment time according to the time stamps;
s113, a plurality of target voice comments of which the contents have relevance are identified.
Preferably, the plurality of target voice comments are directly selectable by a user. Specifically, please refer to fig. 4, the selecting a plurality of target voice comments from all the current voice comments includes:
s114, providing a user interface for selecting the voice comments, wherein the user interface comprises a plurality of selection controls. Specifically, a selection control is set for each voice comment.
S115, responding to the operation of the selection control in the user interface, and acquiring a plurality of target voice comments. Specifically, the user can play each voice comment, and manually select a plurality of voice comments, the comment contents of which have relevance, by touching the selection control, so that a plurality of target voice comments are obtained.
S120, sending the target voice comments to a server, so that the server can identify the pictures corresponding to the target voice comments and synthesize the pictures into a video file;
after step S120 is executed, after a plurality of target voice comments are acquired, the plurality of target voice comments are sent to the server for video synthesis. Specifically, referring to fig. 5, the step of identifying the picture corresponding to the target voice comment by the server and synthesizing a plurality of pictures into a video file includes:
s121, receiving the target voice comments;
s122, identifying a picture for the target voice comment content. Specifically, before the step of identifying the picture corresponding to the target voice comment by the server, the server cuts the published picture according to a preset rule to generate a plurality of sub-pictures, marks each sub-picture, and stores the sub-pictures in a database. After the server receives the target voice comments, the server matches and obtains the sub-pictures corresponding to the target voice comments in the database.
And S123, combining the pictures into a video file. Specifically, the server side can synthesize a plurality of pictures into a continuous animation picture through an algorithm, and the target voice comment is synthesized into the animation picture at the same time in the synthesizing process and corresponds to the picture. The playing form of the video file comprises two types:
firstly, when the target voice comments correspond to the same picture, the image display content can be unchanged in the playing process of the video file, and the voice comments are continuously played.
Secondly, when the target voice comments correspond to different pictures, the different pictures are continuously played as a plurality of video frames in the playing process of the video file, and the voice comments are simultaneously played as background sound effects corresponding to the video frames.
In another embodiment, the server may convert the target voice comments into text forms before synthesizing the video file. The plurality of target voice comments in the synthesized video file are played in the form of subtitles without audio.
S130, receiving the video file sent by the server.
Specifically, the client receives the video file synthesized by the server and stores the video file locally.
Further, the video file synthesis method comprises the following steps:
and S140, issuing the video file. In particular, the video file may be published manually or automatically. Referring to fig. 6, the manually publishing the video file includes:
s141, providing a video publishing control. The video publishing control is arranged on the user selection interface.
And S142, responding to the operation of the video publishing control, and publishing the video file. And the user publishes the video file by touching the video publishing control, and the video file can be displayed in a voice comment area.
According to the video file synthesis method provided by the embodiment of the invention, the plurality of voice comments are selected, and the pictures targeted by the plurality of voice comments are synthesized into a new continuous animation picture, so that the interaction richness and the interaction colorfulness can be increased; further increasing the user viscosity.
Example 2
Referring to fig. 7, an embodiment of the invention provides a video file composition system 700, where the system 700 includes: an obtaining module 710, a selecting module 720, a sending module 730 and a receiving module 740.
The obtaining module 710 is configured to obtain all current voice comments of a commented picture, where the commented picture includes one or more pictures.
Specifically, the voice comment is recorded through a voice comment component of the client, wherein when the stay time of the browsing page of the client reaches a preset threshold, the voice comment component is displayed around a published content area in the browsing page. In the embodiment, in the process that the user browses the published content at the client, when the dwell time of the page browsed by the user reaches a preset threshold, the voice comment component is displayed to the user, and the voice comment component is displayed below the published content area, so that a user interface is concise and clear. And the user records through the displayed voice comment component, and generates the voice comment when the user releases his hand or the maximum recording duration of the voice comment component is reached.
In this embodiment, the obtaining module 710 may provide a comment composition control. The position of the comment composition control is not limited. Preferably, the comment synthesis control is disposed around the voice comment component. The obtaining module 710 may obtain all current voice comments of the commented picture from the server in response to the operation of the comment synthesizing control, and display the current voice comments through a user selection interface. The commented picture can be a picture published by the user, or a set of a plurality of pictures published by the user, such as a six-grid picture and a nine-grid picture.
The selecting module 720 is configured to select a plurality of target voice comments from all the current voice comments. Specifically, the selecting module 720 may select a comment object from all the current voice comments as a plurality of target voice comments of the same picture in the commented picture; and selecting a comment object from all the current voice comments as a plurality of target voice comments of different pictures in the commented picture. That is, the plurality of target voice comments may be comments made on the same photo or comments made on different photos.
Specifically, the method for selecting the multiple target voice comments from all the current voice comments by the selection module 720 is not limited. In this embodiment, the selecting module 720 may automatically identify and select the target voice comments. Further, the selecting module 720 includes:
the calling submodule 721 is used for calling the timestamps of all the current voice comments;
the selecting submodule 722 is used for selecting the voice comments with the same comment time according to the time stamps;
the identifying sub-module 723 is configured to identify a plurality of target voice comments, of which contents have relevance, in the voice comments.
In another embodiment, the selection module 720 may provide a user interface for selecting voice comments, the user interface including a plurality of selection controls. Specifically, a selection control is set for each voice comment. The selection module 720 may retrieve a plurality of target voice comments in response to operating the selection control in the user interface. Specifically, the user can play each voice comment, and manually select a plurality of voice comments, the comment contents of which have relevance, by touching the selection control, so that a plurality of target voice comments are obtained.
The sending module 730 is configured to send the target voice comments to the server, so that the server identifies the picture corresponding to the target voice comment, and synthesizes the plurality of pictures into a video file.
After the selecting module 720 obtains a plurality of target voice comments, the sending module 730 sends the plurality of target voice comments to a server for video synthesis. Specifically, referring to fig. 8, the server includes:
a receiving module 800, configured to receive the plurality of target voice comments;
an identifying module 810, configured to identify a picture to which the target voice comment content is directed. Specifically, before the recognition module 810 recognizes the picture corresponding to the target voice comment, the server cuts the published picture according to a preset rule to generate a plurality of sub-pictures, and tags each sub-picture and stores the sub-picture in a database. After the receiving module 800 receives the target voice comments, the identifying module 810 matches and obtains sub-pictures corresponding to the target voice comments in the database.
A composition module 820, configured to compose a plurality of the pictures into a video file. Specifically, the synthesis module 820 may synthesize a plurality of pictures into a continuous animation picture through an algorithm, and the target voice comment is synthesized into the animation picture at the same time in the synthesis process, corresponding to the picture. The playing form of the video file comprises two types:
firstly, when the target voice comments correspond to the same picture, the image display content can be unchanged in the playing process of the video file, and the voice comments are continuously played.
Secondly, when the target voice comments correspond to different pictures, the different pictures are continuously played as a plurality of video frames in the playing process of the video file, and the voice comments are simultaneously played as background sound effects corresponding to the video frames.
In another embodiment, the synthesizing module 820 can convert the target voice comments into text form by a converting module 830 before synthesizing the video file. The plurality of target voice comments in the synthesized video file are played in the form of subtitles without audio.
The receiving module 740 is configured to receive the video file sent by the server. Specifically, the receiving module 740 receives the video file sent by the server and stores the video file locally.
Further, the video file composition system includes a publishing module 750 configured to publish the video file. Specifically, the publishing module 750 may publish the video file manually or automatically. The publishing module 750 may provide a video publishing control. The video publishing control is arranged on the user selection interface. The publishing module 750 may publish the video file in response to operation of the video publishing control. And the user publishes the video file by touching the video publishing control, and the video file can be displayed in a voice comment area.
The video file synthesis system 700 provided by the embodiment of the invention can increase the interaction rich and colorful by selecting a plurality of voice comments and synthesizing a new continuous animation picture by the pictures aimed at by the voice comments; further increasing the user viscosity.
Example 3
The disclosed embodiments provide a non-volatile computer storage medium having stored thereon computer-executable instructions that can perform the video file composition method of any of the above method embodiments.
Example 4
This embodiment provides an electronic device, and this device is used for video file composition, and this electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the one processor to cause the at least one processor to:
obtaining all current voice comments of a commented picture, wherein the commented picture comprises one or more pictures;
selecting a plurality of target voice comments from all the current voice comments;
sending the target voice comments to a server so that the server can identify the picture corresponding to the target voice comment and synthesize the target voice comment into a video file;
and receiving the video file sent by the server.
Example 5
Referring now to FIG. 9, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 9, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing apparatus 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While fig. 9 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing apparatus 901.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".