CN116527994A - Video generation method and device and electronic equipment - Google Patents

Video generation method and device and electronic equipment Download PDF

Info

Publication number
CN116527994A
CN116527994A CN202310423244.XA CN202310423244A CN116527994A CN 116527994 A CN116527994 A CN 116527994A CN 202310423244 A CN202310423244 A CN 202310423244A CN 116527994 A CN116527994 A CN 116527994A
Authority
CN
China
Prior art keywords
target
text
video
picture
picture data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310423244.XA
Other languages
Chinese (zh)
Inventor
王柏淋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
58 Chang Life Beijing Information Technology Co ltd
Original Assignee
58 Chang Life Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 58 Chang Life Beijing Information Technology Co ltd filed Critical 58 Chang Life Beijing Information Technology Co ltd
Priority to CN202310423244.XA priority Critical patent/CN116527994A/en
Publication of CN116527994A publication Critical patent/CN116527994A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/523Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing with call distribution or queueing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8126Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
    • H04N21/8133Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts specifically related to the content, e.g. biography of the actors in a movie, detailed information about an article seen in a video program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
    • H04N21/8153Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics comprising still images, e.g. texture, background image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Graphics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the invention provides a video generation method and device and electronic equipment. The method comprises the following steps: obtaining a preset text template and target text data, wherein the target text data comprises: text data associated with the target service; generating a target document based on the target text data and the preset document template; screening and obtaining target picture data from picture data associated with the target service, wherein the target picture data comprises picture data matched with the target document; and generating a target video based on the target file and the target picture data. The invention not only realizes the automatic synthesis of the video, reduces the requirement on personnel, and improves the efficiency of video synthesis; meanwhile, the video synthesis mode is guided by the text, so that the condition that video contents are disordered can be avoided.

Description

Video generation method and device and electronic equipment
Technical Field
The present invention relates to the field of video synthesis technologies, and in particular, to a method and an apparatus for generating a video, and an electronic device.
Background
The video editing is to use software to carry out nonlinear editing on a video source, remix added materials such as pictures, background music, special effects, scenes and the like with the video, cut and combine the video source, and generate new videos with different expressive force through secondary coding. Video editing is a common technical means for synthesizing video, and is widely applied to various scenes.
In some scenes, because there is no video source, it is necessary to directly generate video using materials such as pictures, background music, special effects, and the like. This is a common means of generating video from non-video formatted material that better meets the user's needs in these scenarios.
However, the above-mentioned methods for synthesizing video all need to be manually participated, and have high requirements for operators, and the efficiency of synthesizing video is low.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention are provided to provide a method and apparatus for generating a video, and an electronic device, which overcome or at least partially solve the foregoing problems.
In a first aspect, an embodiment of the present invention provides a method for generating a video, where the method includes:
obtaining a preset text template and target text data, wherein the target text data comprises: text data associated with the target service;
generating a target document based on the target text data and the preset document template;
screening and obtaining target picture data from picture data associated with the target service, wherein the target picture data comprises picture data matched with the target document;
And generating a target video based on the target file and the target picture data.
Optionally, before the screening of the target picture data from the picture data associated with the target service, the method further includes:
acquiring a picture set associated with the target service;
and performing de-duplication processing on the pictures in the picture set to obtain picture data associated with the target service.
Optionally, the generating the target video based on the target document and the target picture data includes:
performing definition processing on the target picture data to obtain processed target picture data;
and generating a target video based on the target file and the processed target picture data.
Optionally, the preset document template includes: the method comprises the steps that a file segment to be filled in each video frame of a target video template; each to-be-filled text segment has a text content attribute;
the generating a target document based on the target text data and the preset document template includes:
screening out filling text conforming to text content attributes of the text segments to be filled from the target text data aiming at each text segment to be filled;
Filling the filling text into the corresponding text segments to be filled respectively to generate the target text; wherein each text segment to be filled with the filled text is a text segment of the target text;
the generating a target video based on the target document and the target picture data includes:
and filling the target picture data into the target video template according to the video frame to which the text fragment matched with the target picture data belongs, and generating the target video.
Optionally, the screening the target picture data from the picture data associated with the target service includes:
for each text segment, respectively matching each picture in the picture data with the text segment, and determining a picture successfully matched with the text segment;
and screening the picture successfully matched with the text fragment from the picture data associated with the target service to obtain the target picture data.
Optionally, each video frame to which the text segment belongs has a size attribute;
the step of respectively matching each picture in the picture data with the text segment to determine the picture successfully matched with the text segment comprises the following steps:
Respectively matching the text content in each picture of the picture data with the text content of the text fragment, and determining an intermediate picture with successfully matched text content;
respectively matching the size of each intermediate picture with the size attribute of the video frame to which the text fragment belongs, and determining a target picture with successfully matched size;
and determining the target picture as a picture successfully matched with the text fragment.
Optionally, before the screening of the target picture data from the picture data associated with the target service, the method further includes:
in the event that it is determined that the amount of data of the picture data associated with the target service is less than the amount of data threshold, a call is initiated to a target user associated with the target service based on the call service.
In a second aspect, an embodiment of the present invention further provides a device for generating a video, where the device includes:
the device comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a preset text template and target text data, and the target text data comprises: text data associated with the target service;
the document module is used for generating a target document based on the target text data and the preset document template;
The screening module is used for screening and obtaining target picture data from picture data associated with the target service, wherein the target picture data comprises picture data matched with the target document;
and the video synthesis module is used for generating a target video based on the target file and the target picture data.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps in the video generating method as described above when executing the computer program.
In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the video generation method as described above.
In the embodiment of the invention, a target document associated with a target service is generated by utilizing a preset document template and text data associated with the target service. And further screening out target picture data matched with the target document from the picture data associated with the target service. Finally, the target video associated with the target service can be generated by utilizing the target picture data and the target file, so that the automatic synthesis of the video is realized, the requirement on personnel is reduced, and the efficiency of video synthesis is improved; meanwhile, the video synthesis mode is guided by the text, so that the condition that video contents are disordered can be avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of steps of a method for generating video according to an embodiment of the present invention;
fig. 2 is a flow chart of an actual application of outbound service provided in an embodiment of the present invention;
FIG. 3 is one of partial flowcharts of a video generation method according to an embodiment of the present invention;
FIG. 4 is a second partial flowchart of a video generation method according to an embodiment of the present invention;
FIG. 5 is a third partial flowchart of a method for generating video according to an embodiment of the present invention;
fig. 6 is a block diagram of a video generating apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a method for generating a video. The method may include:
step 101: and acquiring a preset text template and target text data.
In this step, the preset document template is a pre-configured document template. Where a document template may be understood as a piece of text that has multiple slots. A section of text with coherent semantics is formed by filling other text data into the gaps. For example, a document template may be "we are XX, and the business of camping includes: XX; our advantage is XX. "wherein XX represents a vacancy. Filling the gaps with 'Internet company', 'software development, network marketing', 'can finish tasks in half of the time of the company in the same industry', and then forming a section of text with coherent semantics: we are internet companies, and the camping services include: software development and network marketing; our advantage is that tasks can be completed half the time of the same industry company.
Preferably, the number of the document templates is multiple, and when the preset document templates are acquired, one document template can be automatically selected from multiple document template targets. Of course, the corresponding document template may also be selected based on user input.
The target text data includes: text data associated with the target service. The target traffic here may be any number of traffic. Preferably, the target service includes all services related to a target user, which may be any user under the platform. For example, the target business includes all the businesses provided by merchant A in a certain e-commerce platform. The text data associated with the target business can comprise posts, articles, topics and the like which are published by the merchant A in the e-commerce platform, and can also comprise comments which are published by customers after shopping at the merchant A, wherein the specific content of the text data is not limited, and only the text data is ensured to be associated with the target business in any dimension.
Step 102: and generating a target text based on the target text data and a preset text template.
In the step, corresponding data is selected from the target text data, and then the selected data is filled into a text template to generate the target text. Here, the data selected from the target text data or the rule of selecting the data is related to the document template. For each slot in the document template, corresponding data may be selected from the target text data. For example, if a certain empty space of the document template should be filled with company names, the company names in the data may be screened when selecting the data from the target text data.
It will be appreciated that since the target text data is text data associated with the target service. Thus, a target document generated based on the target text data is also associated with the target business. The target document is a document having consistent semantics and associated with a target business. A paragraph is here not limited to one paragraph in an article but can be understood as a plurality of paragraphs.
Step 103: and screening the image data associated with the target service to obtain target image data.
Step 104: and generating a target video based on the target text and the target picture data.
It should be noted that the target picture data includes picture data that matches the target document. It will be appreciated that the picture data associated with the target service includes a large number of pictures, and that although the pictures are all associated with the target service, the picture content is unorganized. If the pictures are directly used to generate video, the video content is also cluttered. Here, the video composition is guided by the target document, so that the situation that the video content is disordered can be avoided. Therefore, when the data of the generated video is screened, the target picture data matched with the target document is screened, and then the target video is generated by utilizing the target picture data.
It will be appreciated that, to enrich the video content, the target document is added to the target video, so that the content of the target picture data and the content of the target document can be simultaneously displayed to the user in the process of playing the video. Preferably, a video template may be utilized to generate the target video.
In the embodiment of the invention, the target document associated with the target service is generated by utilizing the preset document template and the text data associated with the target service. And then the target picture data matched with the target document is screened out from the picture data associated with the target service. Finally, the target video associated with the target service can be generated by utilizing the target picture data and the target file, so that the automatic synthesis of the video is realized, the requirement on personnel is reduced, and the efficiency of video synthesis is improved; meanwhile, the video synthesis mode is guided by the text, so that the condition that video contents are disordered can be avoided.
Optionally, before screening the target picture data from the picture data associated with the target service, the method further includes:
a set of pictures associated with a target service is obtained.
And performing de-duplication processing on the pictures in the picture set to obtain picture data associated with the target service.
It should be noted that the picture set is composed of a large number of pictures associated with the target service. Here, a large number of pictures associated with the target service may be collected in advance, thereby generating a picture set. When collecting pictures, the pictures may be collected from different data sources through different channels. This results in the possibility that the same picture may be present in the picture set. For the same multiple pictures, only one picture is needed to be used in the generation of the video. Therefore, it is necessary to de-duplicate the pictures in the picture set. When the pictures in the picture set are de-duplicated, the hash value of the pictures can be used for de-duplication, and the similarity between the pictures can be used for de-duplication, which is not limited herein.
In the embodiment of the invention, the pictures in the picture set are de-duplicated, so that the condition that the same pictures exist in the target picture data can be avoided, and the display effect of the target video is further influenced.
Optionally, generating the target video based on the target document and the target picture data includes:
performing definition processing on the target picture data to obtain processed target picture data;
and generating a target video based on the target file and the processed target picture data.
It should be noted that the sharpness of each picture in the target picture data may be different, and the sharpness of each picture may be poor. Both of these conditions can undoubtedly affect the presentation of the target video. Therefore, before the target video is generated, the target picture data can be subjected to definition processing, and the definition of each picture in the target picture data can be adjusted to be a target definition value so as to improve the definition of the picture. And then generating a target video by using the picture with the improved definition. Of course, the sharpness of each picture in the unified target picture data may be also used. And then generates a target video using pictures having the same definition.
In the embodiment of the invention, the pictures of the synthesized target video are subjected to definition processing, so that the problem that the definition of each video frame in the target video is poor or the definition difference is large can be avoided, and the display effect of the target video is further influenced.
Optionally, the preset document template includes: the method comprises the steps that a file segment to be filled in each video frame of a target video template; each text segment to be filled has a text content attribute;
generating a target document based on the target text data and a preset document template, comprising:
screening and obtaining filling text which accords with text content attributes of the text segments to be filled from target text data aiming at each text segment to be filled;
filling the filling text into corresponding text segments to be filled respectively to generate a target text; wherein each text segment to be filled with filled text is a text segment of the target text;
generating a target video based on the target document and the target picture data, comprising:
and filling the target picture data into a target video template according to the video frame to which the text fragment matched with the target picture data belongs, and generating a target video.
It should be noted that the target video template is any video template in the video template library, which may be a video template actively selected by the user, or may be a video template automatically selected based on the usage record of the user, which is not limited herein. The method comprises the steps that a document segment to be filled is arranged in each video frame of a target video template, and a complete document can be generated by filling the document segment to be filled. Here, the document segments to be filled in each video frame are regarded as a preset document template. That is, the preset document template is an integral part of the target video template. Thus, acquiring the preset document template can also be regarded as acquiring the target video template.
The text content attributes may include: company name, service area, service name, service feature, fixed document, etc., and the user can set the document fragment to be filled and the text content attribute when making the target video template. For example, a merchant providing a home service on an e-commerce platform may need to generate a promotional video using the methods provided by the present application. Merchants need promotional videos to advertise their company names, service areas, and service names. In this case, a fixed document "we are __________ companies, focusing on providing ________" services for __________ area businesses, companies and individuals "can be entered when creating the target video template. The text content attribute is then entered for the first position in the fixed document to be filled (the document fragment to be filled): the company name, input text content attribute for the second location to be filled in (to be filled in document fragment) in the fixed document: the service area inputs text content attributes for a position to be filled (to-be-filled text segment) at a third position in the fixed text: service name. Finally, a target video template is generated based on the fixed text and text content attributes entered by the staff. Based on the target video template, propaganda videos meeting the needs of merchants can be generated.
Since different to-be-filled text segments are provided on each video frame, and each to-be-filled text segment filled with filled text is a text segment of the target text. Thus, after filling the data content in each video frame of the target video template, a complete document, i.e., the target document, will be generated on the target video template. And finally, filling the part matched with the text segment in the target picture data into the video frame to which the text segment belongs for each text segment of the target text, and generating a target video.
In the embodiment of the invention, the preset document template is a component part of the target video template, so that the video file can be generated only by using the video template, and the generation process of the video is simplified.
Optionally, filtering the target picture data from the picture data associated with the target service includes:
for each text segment, respectively matching each picture in the picture data with the text segment, and determining a picture successfully matched with the text segment;
and screening the picture successfully matched with the text fragment from the picture data associated with the target service to obtain target picture data.
It should be noted that the target document is text data, and a text segment may be understood as a segment divided by the target document. For example, each sentence of the target document may be divided into one text segment, but is not limited thereto. In the process of matching the picture with the text segment, if the picture contains the text identical to the text segment, the picture is considered to be successfully matched with the text segment, otherwise, the picture is considered to be failed to be matched with the text segment. For example: the text segment a includes the word "curtain cleaning". If the pattern of 'curtain cleaning' exists in the picture B, the text fragment A is successfully matched with the picture B.
It can be understood that the successfully matched pictures of each text segment can be obtained through screening, and then the successfully matched pictures of each text segment are taken as target picture data. Here, the correspondence between the text fragment and the picture whose matching is successful may be recorded at the same time. Therefore, under the condition that a certain text segment is determined, the picture corresponding to the text segment can be determined through the corresponding relation. Preferably, after each text segment is determined to match successfully, only a fixed number of pictures may be retained, and the excess may be deleted. For example, the fixed number is N, if the number of pictures successfully matched by a certain text segment is M greater than N, only N pictures are reserved. Where N is a preconfigured value, for example, N may be 1, but is not limited thereto.
In the embodiment of the invention, the target document is split into text fragments, and the text fragments are utilized to screen the target picture data successfully matched, so that the target picture data guided according to the text content of the target document can be obtained.
Optionally, each video frame to which the text segment belongs has a size attribute;
respectively matching each picture in the picture data with the text segment to determine the picture successfully matched with the text segment, wherein the method comprises the following steps:
Respectively matching the text content in each picture of the picture data with the text content of the text fragment, and determining an intermediate picture with successfully matched text content;
respectively matching the size of each intermediate picture with the size attribute of the video frame to which the text fragment belongs, and determining a target picture with successfully matched size;
and determining the target picture as a picture successfully matched with the text fragment.
It should be noted that the number of successfully matched pictures per text segment is typically multiple, i.e. each text content may be matched to a plurality of intermediate pictures. However, the plurality of intermediate pictures may be pictures collected from different data sources based on a plurality of different channels. This results in that the size of the intermediate pictures may not be the same. Under the condition that a part of the video is selected to generate the target video, the size attribute of the video frame to which the text segment belongs can be preconfigured based on the design requirement of the target video, and then the picture consistent with the size attribute of the video frame to which the text segment belongs is selected. Design requirements for target video include, for example: the size requirement of the picture in the video frame to which the text segment a belongs is a first size. At this time, the size attribute of the video frame to which the text segment a belongs may be configured as the first size. After the intermediate picture successfully matched with the text segment A is determined, if the intermediate picture of the first size, the intermediate picture of the second size and the intermediate picture of the third size exist. And selecting the intermediate picture with the first size as a picture which is successfully matched with the text fragment A finally. It is noted that the size attribute may be any attribute related to the size, for example, the size attribute may be resolution, aspect ratio, etc. of the picture, but is not limited thereto.
In the embodiment of the invention, the size attribute is configured for the video frame to which the text fragment belongs, so that the size of each picture in the target video can be freely controlled, and further more requirements of users can be met.
Optionally, before screening the target picture data from the picture data associated with the target service, the method further includes:
in the event that it is determined that the data amount of the picture data associated with the target service is less than the data amount threshold, a call is initiated to a target user associated with the target service based on the call service.
It should be noted that the target picture data used to generate the target video is derived from picture data associated with the target service. Thus, the picture data associated with the target service directly determines the video quality of the target video. It is generally necessary that the data amount of the picture data associated with the target service is sufficiently large. Here, a data amount threshold is set, and if the data amount of the picture data associated with the target service is greater than or equal to the data amount threshold, it may be considered that the data amount of the picture data associated with the target service is sufficiently large so as not to reduce the video quality of the target video. If the data amount of the picture data associated with the target service is smaller than the data amount threshold, it may be considered that the data amount of the picture data associated with the target service is not large enough, which may reduce the video quality of the target video.
And notifying the target user to supplement the picture data associated with the target service in the form of automatic call so as to ensure the quality of the target video. The call services herein include a voice call service and a short message call service in a mobile communication service, a voice call service and a short message call service in a network platform, and the like. Of course, other communication means may be used to notify the target user, for example, by mail. In the case where the target business includes all the businesses provided by merchant a in a certain e-commerce platform, the target user is merchant a.
Fig. 2 is a flowchart of an actual application of outbound service provided in an embodiment of the present invention, where when a platform makes a target video for a certain merchant, if insufficient material occurs, the video is made and the outbound retry is completed, a corresponding task is added to a Redis queue. Thus, each task requiring the use of outbound services is recorded in the Redis queue. The insufficient material is the condition that the data volume of the picture data associated with the target service is smaller than a data volume threshold. The outbound service in the present embodiment is not limited to the case where the material is insufficient. The actual application flow of the outbound service comprises the following steps:
Step 201: the outbound task is performed every target duration, where the target duration may be a predetermined short duration, for example, but not limited to, 20 ms.
Step 202: each time an outbound task is executed, 20 tasks are fetched from the Redis queue. The embodiment of the invention takes 20 tasks as an example, but is not limited to 20 tasks.
Step 203: outbound calls are respectively carried out for the 20 tasks which are taken out. Here, the outbound service teg may be invoked for outbound, but is not limited thereto.
Step 204: and judging whether the outbound call is successful, if so, ending, and if not, executing step 205.
Step 205: and (3) retrying the outbound, namely adding a corresponding outbound task in the Redis queue, and waiting for the next outbound.
In the embodiment of the invention, the user can be automatically informed to supplement under the condition of insufficient video materials through the outbound service, so that the problem of low video quality caused by less video materials is avoided.
It can be appreciated that, since each text segment in the target document corresponds to a respective target picture data, the target video is generated. And filling the text segment with the corresponding relation and the target picture data into the same position of the video template by utilizing the corresponding relation between the text segment and the target picture data, so that the target picture data corresponding to the text segment can be displayed when a certain text segment is played. Preferably, during or after the generation of the target video, data such as subtitles, audio and the like may be further added to the target video. The data format of the target video may also be converted to mp4 (MPEG-4) format.
As shown in fig. 3 to 5, a flowchart of an actual application of the video generating method provided by the present invention is shown. The embodiment of the invention is illustrated by taking an example that an Internet platform provides video composition service for merchants. Wherein, fig. 3 shows a process of generating a document, fig. 4 shows a process of performing image-text matching using the generated document, and fig. 5 shows a process of generating a video based on the image-text matching result.
Specifically, the process of generating the document includes:
step 301: and initiating the generation of the text.
Step 302: the company name of the merchant is obtained.
Step 303: high quality posts related to the merchant are retrieved.
Step 304: and acquiring characteristic information of the service provided by the merchant.
Step 305: and obtaining a document template.
Step 306: and intelligently selecting a document template.
Step 307: and filling information.
Step 308: generating a document.
It should be noted that a plurality of different document templates may be preconfigured for selection. When only the document template can be selected, the document template with highest use frequency or use frequency by the merchant can be preferentially selected based on the use record of the merchant. If a document template is not used, one can be selected randomly. It should be noted that, when configuring the document template, each attribute of the document template, namely, company name attribute, service area attribute, service name attribute, service feature attribute, fixed document, etc. will be configured. Operators can configure abundant templates in the template background according to different primary categories and secondary categories, so that the alternative document templates are richer, and the generated documents are richer and smoother. In the process of configuring the document template, welcome words can be added in the document template and the number of words can be limited within a certain range.
Preferably, the above-mentioned flow can be implemented based on intelligent semantic analysis technology. When selecting a document template based on the technique, deduplication may be performed first. Specifically, based on the intelligent semantic analysis technology, a section of content of the introduction company is automatically formed according to each attribute in the selected document template. And the main business information of the merchant is captured from the post information of the merchant, so that the document content is more relevant to the actual situation of the merchant. And may go through spam policies when attributes configured in the document template cannot be extracted from the merchant posts.
The process of performing image-text matching by using the generated text comprises the following steps:
step 401: and initiating graph-text matching.
Step 402: and acquiring a picture set. The set of pictures contains a large number of pictures associated with the merchant.
Step 403: the picture is de-duplicated. And de-duplicating the pictures in the picture set by using the hash value or the picture link.
Step 404: and obtaining a picture with reasonable size based on the video template. Here, different video templates are set, each having different requirements for the picture. For example, for the first video template, a one-to-one correspondence of the text segments (text segments after text segmentation) to the pictures may be implemented based on a head-map matching policy. Aiming at the second video template, the large word newspaper picture can be deleted first, and then a picture with a reasonable size is selected, so that the many-to-one relation between the document fragment and the picture is realized.
Step 405: and carrying out definition processing on the picture with reasonable size.
Step 406: and matching the pictures subjected to the definition processing based on the text.
It should be noted that a head-up of a merchant's store service album may be obtained, if the merchant has a specially uploaded head-up, the merchant is preferentially selected to upload. In the image-text matching process, the head map is preferentially matched with the name document of the merchant company. Selecting corresponding pictures according to the attributes in the text by utilizing an algorithm: for example, if there is a merchant's camping information (e.g., curtain cleaning, carpet cleaning, property cleaning) as described in the text, the algorithm will match the relevant picture. If a sentence in the document contains a company name, then a picture containing the company name is analyzed as a picture successfully matched with the sentence in the document. And if the sentence in the document contains the service feature, analyzing the picture containing the feature as a picture successfully matched with the sentence in the document.
A process for generating video based on a graph-text matching result, comprising:
step 501: video composition is initiated.
Step 502: video material is placed in a video composition queue.
Step 503: video template type selection.
Step 504: the video is composed using a service provided by a third party.
Step 505: subtitles and audio are added to the synthesized video.
Step 506: video format conversion.
It should be noted that different types of video templates may be made in advance for selection. Here, a video composition service provided by a third party is utilized to compose a video. The same applies to the uploading of the composite video to the external network for access by the user using multimedia, where access links to the composite video will be provided.
It can be appreciated that the embodiment of the invention further comprises an artificial intelligence outbound service flow. Specifically, merchants with insufficient materials can be queried at nine-point timing in the morning, merchant information with insufficient materials is placed into an outbound queue to wait for outbound, and merchant information is automatically placed into the outbound queue to wait for outbound after merchant video production is completed. The timing task polls the outbound queue to remind the merchant to supplement the material. Here, the voice playing can be performed in advance when the voice is manually dubbed for calling out, and information is collected according to key feedback of the user. The timing task polls the business working time of the video production completion and calls the business to confirm the video production.
In the embodiment of the invention, the artificial intelligent outbound function is used, so that the communication cost of customer service can be reduced, the workload of the customer service is reduced, the merchant can confirm video service more quickly, and the video delivery period is shortened. Meanwhile, the most tedious and time-wasting part is given to a machine for automatic execution, and the requirements are met without perception through an algorithm model.
Having described the video generation method provided by the embodiment of the present invention, the video generation device provided by the embodiment of the present invention will be described with reference to the accompanying drawings.
As shown in fig. 6, an embodiment of the present invention provides a video generating apparatus, including:
the obtaining module 61 is configured to obtain a preset document template and target text data, where the target text data includes: text data associated with the target service;
a document module 62 for generating a target document based on the target text data and a preset document template;
a screening module 63, configured to screen and obtain target picture data from picture data associated with a target service, where the target picture data includes picture data matched with a target document;
the video synthesis module 64 is configured to generate a target video based on the target document and the target picture data.
Optionally, the apparatus further comprises:
the first picture data module is used for acquiring a picture set associated with the target service;
and the second picture data module is used for carrying out de-duplication processing on the pictures in the picture set to obtain picture data associated with the target service.
Optionally, the video composition module 64 includes:
The definition unit is used for performing definition processing on the target picture data to obtain processed target picture data;
and the video synthesis unit is used for generating a target video based on the target file and the processed target picture data.
Optionally, the preset document template includes: the method comprises the steps that a file segment to be filled in each video frame of a target video template; each text segment to be filled has a text content attribute;
a document module 62 comprising:
the first document unit is used for screening and obtaining filling text which accords with the text content attribute of the document fragment to be filled from the target text data aiming at each document fragment to be filled;
the second document unit is used for respectively filling the filling text into the corresponding document segments to be filled to generate a target document; wherein each text segment to be filled with filled text is a text segment of the target text;
the video synthesis module 64 is specifically configured to fill the target picture data into a target video template according to a video frame to which the text segment matched with the target picture data belongs, so as to generate a target video.
Optionally, the screening module 63 includes:
the matching unit is used for respectively matching each picture in the picture data with the text fragment aiming at each text fragment and determining a picture successfully matched with the text fragment;
And the screening unit is used for screening the picture successfully matched with the text fragment from the picture data associated with the target service to obtain target picture data.
Optionally, each video frame to which the text segment belongs has a size attribute;
the matching unit is specifically used for:
respectively matching the text content in each picture of the picture data with the text content of the text fragment, and determining an intermediate picture with successfully matched text content;
respectively matching the size of each intermediate picture with the size attribute of the video frame to which the text fragment belongs, and determining a target picture with successfully matched size;
and determining the target picture as a picture successfully matched with the text fragment.
Optionally, the apparatus further comprises:
and the outbound module is used for initiating a call to a target user associated with the target service based on the call service under the condition that the data volume of the picture data associated with the target service is determined to be smaller than a data volume threshold value.
In the embodiment of the invention, the target document associated with the target service is generated by utilizing the preset document template and the text data associated with the target service. And then the target picture data matched with the target document is screened out from the picture data associated with the target service. Finally, the target video associated with the target service can be generated by utilizing the target picture data and the target file, so that the automatic synthesis of the video is realized, the requirement on personnel is reduced, and the efficiency of video synthesis is improved; meanwhile, the video synthesis mode is guided by the text, so that the condition that video contents are disordered can be avoided.
The video generating device provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to 5, and achieve the same technical effects, so that repetition is avoided, and no further description is given here.
On the other hand, the embodiment of the invention also provides an electronic device, which comprises a memory, a processor, a bus and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the video generation method when executing the program.
For example, fig. 7 shows a schematic physical structure of an electronic device.
As shown in fig. 7, the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method:
obtaining a preset text template and target text data, wherein the target text data comprises: text data associated with the target service;
generating a target text based on the target text data and a preset text template;
Screening and obtaining target picture data from picture data associated with a target service, wherein the target picture data comprises picture data matched with a target document;
and generating a target video based on the target text and the target picture data.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In still another aspect, an embodiment of the present invention further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the method for generating video provided in the foregoing embodiments, for example, including:
obtaining a preset text template and target text data, wherein the target text data comprises: text data associated with the target service;
generating a target text based on the target text data and a preset text template;
screening and obtaining target picture data from picture data associated with a target service, wherein the target picture data comprises picture data matched with a target document;
and generating a target video based on the target text and the target picture data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of generating video, the method comprising:
obtaining a preset text template and target text data, wherein the target text data comprises: text data associated with the target service;
generating a target document based on the target text data and the preset document template;
screening and obtaining target picture data from picture data associated with the target service, wherein the target picture data comprises picture data matched with the target document;
and generating a target video based on the target file and the target picture data.
2. The method of claim 1, wherein prior to said screening of the target picture data from the picture data associated with the target service, the method further comprises:
acquiring a picture set associated with the target service;
and performing de-duplication processing on the pictures in the picture set to obtain picture data associated with the target service.
3. The method of claim 1, wherein the generating a target video based on the target document and the target picture data comprises:
Performing definition processing on the target picture data to obtain processed target picture data;
and generating a target video based on the target file and the processed target picture data.
4. The method of claim 1, wherein the pre-set document template comprises: the method comprises the steps that a file segment to be filled in each video frame of a target video template; each to-be-filled text segment has a text content attribute;
the generating a target document based on the target text data and the preset document template includes:
screening out filling text conforming to text content attributes of the text segments to be filled from the target text data aiming at each text segment to be filled;
filling the filling text into the corresponding text segments to be filled respectively to generate the target text; wherein each text segment to be filled with the filled text is a text segment of the target text;
the generating a target video based on the target document and the target picture data includes:
and filling the target picture data into the target video template according to the video frame to which the text fragment matched with the target picture data belongs, and generating the target video.
5. The method of claim 4, wherein the filtering the target picture data from the picture data associated with the target service comprises:
for each text segment, respectively matching each picture in the picture data with the text segment, and determining a picture successfully matched with the text segment;
and screening the picture successfully matched with the text fragment from the picture data associated with the target service to obtain the target picture data.
6. The method of claim 5, wherein each video frame to which the text segment belongs has a size attribute;
the step of respectively matching each picture in the picture data with the text segment to determine the picture successfully matched with the text segment comprises the following steps:
respectively matching the text content in each picture of the picture data with the text content of the text fragment, and determining an intermediate picture with successfully matched text content;
respectively matching the size of each intermediate picture with the size attribute of the video frame to which the text fragment belongs, and determining a target picture with successfully matched size;
And determining the target picture as a picture successfully matched with the text fragment.
7. The method of claim 1, wherein prior to said screening of the target picture data from the picture data associated with the target service, the method further comprises:
in the event that it is determined that the amount of data of the picture data associated with the target service is less than the amount of data threshold, a call is initiated to a target user associated with the target service based on the call service.
8. A video generation apparatus, the apparatus comprising:
the device comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a preset text template and target text data, and the target text data comprises: text data associated with the target service;
the document module is used for generating a target document based on the target text data and the preset document template;
the screening module is used for screening and obtaining target picture data from picture data associated with the target service, wherein the target picture data comprises picture data matched with the target document;
and the video synthesis module is used for generating a target video based on the target file and the target picture data.
9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method of generating a video according to any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the video generation method according to any one of claims 1 to 7.
CN202310423244.XA 2023-04-19 2023-04-19 Video generation method and device and electronic equipment Pending CN116527994A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310423244.XA CN116527994A (en) 2023-04-19 2023-04-19 Video generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310423244.XA CN116527994A (en) 2023-04-19 2023-04-19 Video generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116527994A true CN116527994A (en) 2023-08-01

Family

ID=87398663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310423244.XA Pending CN116527994A (en) 2023-04-19 2023-04-19 Video generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116527994A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117082293A (en) * 2023-10-16 2023-11-17 成都华栖云科技有限公司 Automatic video generation method and device based on text creative

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117082293A (en) * 2023-10-16 2023-11-17 成都华栖云科技有限公司 Automatic video generation method and device based on text creative
CN117082293B (en) * 2023-10-16 2023-12-19 成都华栖云科技有限公司 Automatic video generation method and device based on text creative

Similar Documents

Publication Publication Date Title
CN104244024B (en) Video cover generation method and device and terminal
US9020824B1 (en) Using natural language processing to generate dynamic content
CN106096064A (en) For the method and apparatus generating the page
CN109040779B (en) Caption content generation method, device, computer equipment and storage medium
US10268690B2 (en) Identifying correlated content associated with an individual
US10223357B2 (en) Video data filtering
CN112004137A (en) Intelligent video creation method and device
CN116527994A (en) Video generation method and device and electronic equipment
CN112330532A (en) Image analysis processing method and equipment
JP6730757B2 (en) Server and program, video distribution system
JP2019220098A (en) Moving image editing server and program
CN113905254B (en) Video synthesis method, device, system and readable storage medium
EP3944242A1 (en) A system and method to customizing video
CN111918146B (en) Video synthesis method and system
US11645803B2 (en) Animation effect reproduction
CN104991950A (en) Picture generating method, display method and corresponding devices
CN110544281B (en) Picture batch compression method, medium, mobile terminal and device
JP6812586B1 (en) Video editing equipment, video editing methods, and programs
US11227094B2 (en) System, method, recording medium for dynamically changing search result delivery format
CN112218146A (en) Video content distribution method and device, server and medium
CN115328363B (en) File processing method, electronic equipment and related products
CN113992866B (en) Video production method and device
US11164595B2 (en) Displayed analytics for multiparty communications
KR101138738B1 (en) Method of producing personalized video
CN113343146A (en) H5-based material editing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination