CN112632326A

CN112632326A - Video production method and device based on video script semantic recognition

Info

Publication number: CN112632326A
Application number: CN202011543648.5A
Authority: CN
Inventors: 王鹤; 林洪祥; 何俊华; 王雅萍
Original assignee: Beijing Fengping Technology Co ltd
Current assignee: Beijing Fengping Technology Co ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-04-09
Anticipated expiration: 2040-12-24
Also published as: CN112632326B

Abstract

The method comprises the steps of obtaining a video production script given by a user, wherein the video production script carries a plurality of video production sentences, each video production sentence comprises a material type identification label and a material attribute label, and the material attribute label comprises a material description and/or a material display form; and determining a material library for selecting the material according to the material type identification tag, determining the material and/or determining the operation mode of the material in the material library according to the material attribute tag, and generating a video from the material according to the sequence of video production statements in the video production script. By the method or the device, massive video materials can be intelligently analyzed, and short videos can be automatically generated by combining semantic understanding video scripts.

Description

Video production method and device based on video script semantic recognition

Technical Field

The application belongs to the technical field of video processing, and particularly relates to a video production method and device based on video script semantic recognition.

Background

The traditional manual short video production mode mainly comprises the following steps: 1. content planning, compiling a video script; 2. shooting materials or editing the existing contents into available materials; 3. the material is edited into a complete video using a video editing tool in accordance with the script. Each step of the process requires a relatively large amount of human resources and time

An automatic production method of a video template is provided, i.e. a group of fixed video materials is edited in advance aiming at a certain specific scene, and then designated text content input by a user is replaced by subtitles, character display content and voice. The video produced in this way is basically fixed in form, and the video picture content is relatively fixed except for the text content part. It is equivalent to the video script being in a fixed mode and the material being selectable only to a relatively small extent. Although the method solves part of labor and time cost, the application scene has limitation, but the method cannot be applied to some customized video scripts to solve the problem of labor and time consumption. For example, the template related to the traffic accident can only be limited to the production of the video related to the traffic accident, if the video of other contents is produced, one way is that the system developer adds the related template, thus increasing the work of the developer, or the video editor replaces the materials in the template manually, thus increasing the work load of the video editor.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present application provides a video production method and apparatus based on video script semantic recognition.

In a first aspect of the present application, a video production method based on video script semantic recognition includes: the method comprises the steps of obtaining a video production script given by a user, wherein the video production script carries a plurality of video making sentences, each video making sentence comprises a material type identification label and a material attribute label, and the material attribute label comprises a material description and/or a material display form; and determining a material library for selecting the materials according to the material type identification tag, determining the materials and/or determining the display forms of the materials in the material library according to the material attribute tag, and intelligently integrating and modifying the materials according to the video AI production script to generate a video.

Preferably, the determining the material according to the material attribute tag includes: analyzing a material selection rule corresponding to the material attribute label; and searching and extracting the materials in the corresponding material library according to the material selection rule, wherein the material library is constructed in a mode of dividing and marking the materials according to the division logic corresponding to the material selection rule.

Preferably, the constructing the material library includes: segmenting a video containing materials; acquiring video description of each divided material segment by a material processor; and storing the material segments subjected to the video description into different material libraries according to a preset dividing logic, wherein the preset dividing logic comprises but is not limited to the duration of the material segments, the resolution of the material segments or the video description of the material segments.

Preferably, storing the material segments subjected to the video description into the material library includes: calculating a perceptual hash value of the material segment to be put in storage; determining whether material segments with similar perception hash values exist in a material library consistent with the video description of the material segments to be warehoused, if the material segments with similar perception hash values exist and the md5 values of the material segments to be warehoused and the material segments with the similar perception hash values are different, storing the material segments to be warehoused into the material library corresponding to the video description, otherwise, submitting the material segments to be warehoused to manual processing, and manually determining whether the material segments to be warehoused are stored into the material library corresponding to the video description; the similar perceptual hash values mean that the Hamming distance between the two perceptual hash values is smaller than a preset value.

Preferably, determining the operation mode of the material according to the material attribute tag includes: and checking whether the material attribute tag is consistent with a prefabricated material processing command or not, and if so, processing the material according to the prefabricated material processing command.

In a second aspect of the present application, a video production apparatus based on video script semantic recognition includes: the video production script acquisition module is used for acquiring a video production script given by a user, wherein the video production script carries a plurality of video production sentences, each video production sentence comprises a material type identification tag and a material attribute tag, and the material attribute tag comprises a material description and/or a material display form; and the video production module is used for determining a material library for selecting the materials according to the material type identification tag, determining the materials and/or determining the display form of the materials in the material library according to the material attribute tag, and generating videos from the materials according to the sequence of video production sentences in the video production script.

Preferably, the video production module includes: the selection rule analysis unit is used for analyzing the material selection rule corresponding to the material attribute label; and the material retrieval unit is used for retrieving and extracting the materials in the corresponding material library according to the material selection rule, wherein the material library is constructed in a mode of dividing and marking the materials according to the division logic corresponding to the material selection rule.

Preferably, the construction of the material library on which the material retrieval unit depends includes: the material segment dividing module is used for carrying out segment division on the video containing the material; the video description acquisition module is used for acquiring the video description of the material processor on each divided material segment; and the warehousing module is used for storing the material segments subjected to the video description into different material libraries according to preset division logic, wherein the preset division logic comprises but is not limited to the duration of the material segments, the resolution of the material segments or the video description of the material segments.

Preferably, the warehousing module includes: the hash value calculating unit is used for calculating the perceptual hash value of the material segment to be warehoused; the warehousing judgment unit is used for determining whether the material segment with the similar perception hash value exists in a material library consistent with the video description of the material segment to be warehoused, if the material segment with the similar perception hash value exists and the md5 value of the material segment to be warehoused is different from that of the material segment with the similar perception hash value, the material segment to be warehoused is stored in the material library corresponding to the video description, otherwise, the material segment to be warehoused is submitted to manual processing, and whether the material segment to be warehoused is stored in the material library corresponding to the video description is determined manually; the similar perceptual hash values mean that the Hamming distance between the two perceptual hash values is smaller than a preset value.

Preferably, the video production module further comprises: and the command identification unit is used for verifying whether the material attribute tag is consistent with a prefabricated material processing command or not, and if so, processing the material according to the prefabricated material processing command.

In a third aspect of the present application, a computer system includes a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the video production method based on semantic recognition of video scripts.

In a fourth aspect of the present application, a readable storage medium stores a computer program, and the computer program is used for implementing the video production method based on video script semantic recognition when being executed by a processor.

By the method or the device, massive video materials can be intelligently analyzed, and short videos can be automatically generated by combining semantic understanding video scripts.

Drawings

FIG. 1 is a flowchart of an embodiment of a video production method based on video script semantic recognition according to the present application.

Fig. 2 is a flow chart of the construction of the material library according to the embodiment shown in fig. 1 of the present application.

Fig. 3 is a flow chart of the construction of the audio material library according to the embodiment of fig. 1 of the present application.

Fig. 4 is a schematic structural diagram of a computer device suitable for implementing the terminal or the server according to the embodiment of the present application.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are some, but not all embodiments of the present application. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application, and should not be construed as limiting the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application. Embodiments of the present application will be described in detail below with reference to the drawings.

According to a first aspect of the present application, there is provided a video production method based on video script semantic recognition, including:

step S100, a video production script given by a user is obtained, wherein the video production script carries a plurality of video production sentences, each video production sentence comprises a material type identification label and a material attribute label, and the material attribute label comprises a material description and/or a material display form.

And S200, determining a material library for selecting the materials according to the material type identification tag, determining the materials and/or determining the display form of the materials in the material library according to the material attribute tag, and intelligently integrating and modifying the related materials according to the video production script to generate a video. In this embodiment, the intelligent integration mainly refers to stacking, planning the display time of each material, and the switching effect between materials, according to the materials produced by each video production statement of the production script, for example, fusing the video material and the audio material according to the display form thereof.

Fig. 1 shows a specific case flow of video production by the video production method based on semantic recognition of video scripts according to the present application. Referring to fig. 1, the video script obtained in step S100 of the present application is as follows:

beginning: [ video resolution ]

Scene: scene description [ Special Effect 1] [ Special Effect 2]

Music: music description

Scene: scene description 2

And (3) performing side whitening: [ Caption | Screen ] [ mute ]

Scene xxxx

And (3) performing side whitening: xxxxx

And (4) ending: .

As described above, in step S100, the material type identification tag is substantially ": "front part, material property label is substantially": the "later part," where the material type identification tag is used to determine the material library for material selection, for example, "scene" indicates selection in a video clip, "music" indicates selection in a music material library, the material attribute tag contains the rule for material selection on the one hand, and contains the rule for material operation on the other hand, for example, "scene," indicates that the subsequent retrieval module will perform material retrieval according to the description, and the square brackets are added with words such as [ command ] as video processing commands, "|" is multiple command separators, and the commands are processed in some special ways on the basis of the video materials and are pre-prepared in the script parser. The above-described script structure is described in detail below.

1. Start/end identification line, this line is identified with "start/end: "as line head, put at the head and the end of the script, respectively, the line head can be added with commands describing the video, such as: [ resolution ].

2. The music description line, which is written in "music: "as a line header, to select music from the library of musical material that matches the description.

3. A scene description line, which is expressed as "scene: and as a line head, a program can automatically take out materials conforming to the description from a video material library according to the scene description to construct a video, and a prefabricated command for video processing can be added under the scene description line, such as [ fuzzy video ] [ fuzzy portrait ] [ black and white mode ] [ vertical vibration ] [ red filter ] and the like.

4. The person dialog description line, which is expressed as "person name-action: "as the head of a line, the program will take out the corresponding animation content from the custom task library according to the character and the corresponding action and add it to the scene screen, and the head of a line plus [ left | up | down | right | none ] [ enter | exit | none ] represents the character image animation to enter or exit from the specified direction of the video, or the animation image is not displayed. The caption screen mute is used to indicate whether the dialog text is to be sounded, displayed on the caption or presented on the screen with the character.

5. The dialogue description line, which starts with the "dialogue" to show the non-character dialogue content, [ caption | screen | mute ] to show whether the dialogue text is to be sounded, displayed in the caption or displayed on the screen with character.

In some optional embodiments, determining the material according to the material attribute tag comprises:

step S201, analyzing a material selection rule corresponding to the material attribute label.

And S202, retrieving and extracting the materials in the corresponding material library according to the material selection rule, wherein the material library is constructed in a mode of dividing and marking the materials according to the division logic corresponding to the material selection rule.

In some optional embodiments, determining the operation mode of the material according to the material attribute tag includes:

and S203, checking whether the material attribute tag is consistent with a prefabricated material processing command, and if so, processing the material according to the prefabricated material processing command.

The parsing logic of the script is given in steps S201 to S203, and as shown in fig. 1, the parsing logic mainly includes: and querying a corresponding video in a material library by using a BM25 relevance algorithm according to the scene description, and finding out video material insertion with corresponding duration from a query list according to the duration of the voice conversion of the voice of the character dialog added with the voice. And searching the audio materials which accord with the keyword description from the audio material library according to the music description, and circularly playing until meeting the next music switching. When character keywords are met, action keywords are recognized, corresponding images are selected from the character material library, video pictures are cut in from one side of the screen according to requirements, and characters are converted into voice according to the character voice library and are synchronously inserted into the video. The voice only, the text content only and the voice not only can be displayed, or both the voice and the content can be displayed. The text content may be displayed on subtitles or pictures.

The same script runs at once to produce a batch of video in bulk. The author can choose to select favorite videos.

The above-described script is exemplified as follows.

1, start [16:9]

2, music: leisure and quiet

3, scene: night family living room [ dim light ]

4, daughter-speaking: [ left ] [ enter ] [ screen ] dad, tomorrow weekend, where we go to play [ left ] [ exit ]

5, scene: sunrise at sea

6, father-speaking: [ Right ] [ go to ] [ Screen ] tomorrow we go to the sea and play a bar, see the sea in the morning and sunrise, then swim at the sea, and rest on the beach when they are tired

7, scene: night family living room [ dim light ]

8, daughter-cheering: [ left side ] [ screen ] is too good

And 9, ending.

The script parsing result is as follows:

1. identifying a starting identification line, and setting the video resolution as follows: 16: 9.

2. and identifying a music description line, and screening out 'leisure and quiet' music from the material description labels from a music material library according to a bm25 algorithm to be used as background music.

3. And identifying a scene description line, and adding the pause time length to the voice time length generated by the dialog text existing in the middle from the scene to the next scene to obtain the material time length. The minimum time duration is 1 second, and the materials which accord with the description of 'family and living room at night' and the resolution ratio of 16:9 are screened from the material description labels in the video material library according to the bm25 algorithm. And randomly selecting the video materials with the time length meeting the requirement.

Identify the [ dim light ] processing command, add a dim light filter to the video.

4. Recognizing character description lines, recognizing 'daughter-speaking', selecting 'daughter-speaking' animation material from a character material library, inserting the animation material into the scene video material from the left side, displaying the dialogue content on the right side of the 'daughter' in the form of dialogue bubbles, converting the dialogue characters into voice by using TTS technology, and synchronizing the dialogue characters into the video material.

5. And identifying a new scene, selecting the new scene according to the logic of the step 3, and switching the scenes.

6. Identify the "dad" image and show it according to the same logic of step 4.

7. And identifying the same scene, taking the old scene, and cutting off the video after recalculating the video duration.

8. And recognizing the image dialog of the daughter, and displaying according to the same processing mode of the step 4.

9. And finishing the script to generate a plurality of complete videos.

In some alternative embodiments, constructing the corpus comprises:

step S2021, segment-dividing the video including the material.

Step S2022, obtain the video description of the material processor on each divided material segment.

Step S2023, storing the material segments after the video description into different material libraries according to a predetermined dividing logic, wherein the predetermined dividing logic includes but is not limited to a duration of the material segments, a resolution of the material segments, or a video description of the material segments.

Fig. 2 shows a specific example of building the material library, and referring to fig. 2, this embodiment should match with the material search when building the material library, and according to the foregoing description of the material search, the building of the material library in this embodiment mainly includes the following steps.

1. Mass video raw material acquisition

The existing video resources or the mass video resources crawled from the internet by using a crawler can be used, and the related contents are put into a video raw material library. Wherein the video description: including but not limited to, descriptive tags, video titles, video profiles. In addition, the video content is warehoused in a unified video MP4 format.

2. Video clip processing

Extracting a key frame image list of a video raw material, carrying out a perceptual hashing algorithm on each frame image to generate a 64-bit hash value, wherein the Hamming distance between two hash values is smaller than a threshold value, (the threshold value can be dynamically adjusted according to a judgment effect, the initial value is 8), the two frames are considered to be similar, otherwise, the two frames are dissimilar, a segment from one key frame to one dissimilar key frame is called a material to be processed, the video segment is at most 20s, the video segment is cut off when the video segment exceeds 20s, and the next segment is executed from the cut-off. And entering a video fragment library to be processed.

The embodiment further comprises extracting audio information in the video clip, converting the sound into text content by using the existing voice recognition technology, if the audio information can be converted into the text content, further analyzing whether the tone in the audio information is changed, and if the tone is not changed, considering the voice content as a bystander content which is a description of the video content. And replacing the default description with the converted text as the video clip description.

Extracting keywords according to the video clip description, and grouping the keywords together with the video clip duration and resolution, for example: marine sunrise, 2s,4:3, marine sunrise, 3s,16:9, morning sunrise, 2s, 9: 16, a video clip may belong to several groups.

In some alternative embodiments, storing the video-described material segments into a material library includes:

calculating a perceptual hash value of the material segment to be put in storage;

determining whether material segments with similar perception hash values exist in a material library consistent with the video description of the material segments to be warehoused, if the material segments with similar perception hash values exist and the md5 values of the material segments to be warehoused and the material segments with the similar perception hash values are different, storing the material segments to be warehoused into the material library corresponding to the video description, otherwise, submitting the material segments to be warehoused to manual processing, and manually determining whether the material segments to be warehoused are stored into the material library corresponding to the video description;

the similar perceptual hash values mean that the Hamming distance between the two perceptual hash values is smaller than a preset value.

The embodiment mainly aims to perform warehousing management on videos quickly, and mainly comprises two aspects of contents.

1. Video similarity contrast

The method extracts a plurality of (minimum 4) frames of images from the video every second, and each frame of image generates a 64-bit hash value by using a perceptual hash algorithm. The hamming distance between two hash values is less than a threshold value (the threshold value can be dynamically adjusted according to the judgment effect, and the initial value is 8), and then the two pictures are considered to be relatively similar. If the hamming distance of the hash values of the two segments of video is less than the threshold value, the number of frames per second, the two segments of video in the segment of time are considered to be similar 2, and the material is confirmed and backtracked

And after the video segments generate the hash values, the hash values are firstly compared with the hash values in the blacklist library, and if the hash values are consistent, the video segments are deleted. If the video segments are not consistent, the video segments are compared with the hashes of the same label materials in the white list library, the hashes are consistent, the files md5 are different, the video segments are represented as the same as the video types in the white list library, and then the video segments enter the material library. If the materials are inconsistent, the materials are materials of other types independent of the white list library, if the number of the materials is large, the materials can be considered to be added into the white list, and the specific operation steps can be as follows: and judging whether the number of the video segments consistent with the video hash is greater than 10, if so, submitting the similar video segments to an auditor for confirmation, and if the video segments are confirmed to be consistent with the label description manually, adding the video segments into a material library, and adding the corresponding hash values into a white list library for rapid identification of other subsequent video materials with the same label. And (4) manually confirming that the wrong hash value enters a blacklist library, and directly eliminating the corresponding video section selection when the same label material is identified. In addition, manual intervention can be performed, the hash value is put into a universal black list library, and the video selection similar to the black screen can be directly excluded from all tag content groups.

The method comprises the steps of enabling materials to enter a material library, collecting metadata (material duration, video resolution, description keywords and the like), erasing subtitles and sound of the materials, and storing the erased subtitles and the sound into the material library.

Fig. 3 shows a process of constructing a background music material library, as shown in fig. 3, a crawler is used to crawl a large number of audio files and corresponding descriptions, or the audio files are imported into existing resources, and then enter an audio material library, and an open-source machine learning program spleteter is used to separate voices from music from the audio files, and if the voices and the music are separated, the music is stored as an independent file and stored as a material. If not, directly taking the file in the raw material library as a material. And (4) taking the description keywords and the duration information into an audio material library.

In an alternative embodiment, a user-defined character image material library can be constructed, in this embodiment, a designer is required to design a general short animation or static diagram of a character according with a brand image, and the general short animation or static diagram is stored in the character image material library to be distinguished by character-action dimensions. Selecting a TTS voice library conforming to the character for voice synthesis for each character.

By the method, massive video materials can be intelligently analyzed, and short videos can be automatically generated by combining semantic understanding video scripts.

The second aspect of the present application provides a video production apparatus based on video script semantic recognition corresponding to the above method, which mainly includes: the video production script acquisition module is used for acquiring a video production script given by a user, wherein the video production script carries a plurality of video production sentences, each video production sentence comprises a material type identification tag and a material attribute tag, and the material attribute tag comprises a material description and/or a material display form; and the video production module is used for determining a material library for selecting the materials according to the material type identification tag, determining the materials and/or determining the display form of the materials in the material library according to the material attribute tag, and intelligently integrating and modifying the related materials according to the video production script to generate a video.

In some optional embodiments, the video production module comprises: the selection rule analysis unit is used for analyzing the material selection rule corresponding to the material attribute label; and the material retrieval unit is used for retrieving and extracting the materials in the corresponding material library according to the material selection rule, wherein the material library is constructed in a mode of dividing and marking the materials according to the division logic corresponding to the material selection rule.

In some alternative embodiments, the construction of the material library on which the material retrieval unit depends includes: the material segment dividing module is used for carrying out segment division on the video containing the material; the video description acquisition module is used for acquiring the video description of the material processor on each divided material segment; and the warehousing module is used for storing the material segments subjected to the video description into different material libraries according to preset division logic, wherein the preset division logic comprises but is not limited to the duration of the material segments, the resolution of the material segments or the video description of the material segments.

In some optional embodiments, the warehousing module comprises: the hash value calculating unit is used for calculating the perceptual hash value of the material segment to be warehoused; the warehousing judgment unit is used for determining whether the material segment with the similar perception hash value exists in a material library consistent with the video description of the material segment to be warehoused, if the material segment with the similar perception hash value exists and the md5 value of the material segment to be warehoused is different from that of the material segment with the similar perception hash value, the material segment to be warehoused is stored in the material library corresponding to the video description, otherwise, the material segment to be warehoused is submitted to manual processing, and whether the material segment to be warehoused is stored in the material library corresponding to the video description is determined manually; the similar perceptual hash values mean that the Hamming distance between the two perceptual hash values is smaller than a preset value.

In some optional embodiments, the video production module further comprises: and the command identification unit is used for verifying whether the material attribute tag is consistent with a prefabricated material processing command or not, and if so, processing the material according to the prefabricated material processing command.

According to a third aspect of the present application, a computer system comprises a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program for implementing the video production method based on video script semantic recognition as above.

According to a fourth aspect of the present application, a readable storage medium stores a computer program, and the computer program is used for implementing the video production method based on video script semantic recognition when being executed by a processor.

Referring now to FIG. 4, shown is a schematic diagram of a computer device 800 suitable for use in implementing embodiments of the present application. The computer device shown in fig. 4 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.

As shown in fig. 4, the computer apparatus 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the apparatus 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to embodiments of the present application, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer storage media of the present application can be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units described in the embodiments of the present application may be implemented by software or hardware. The modules or units described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the module or unit itself.

The computer-readable storage medium provided by the fourth aspect of the present application may be included in the apparatus described in the above embodiment; or may be present separately and not assembled into the device. The computer readable storage medium carries one or more programs which, when executed by the apparatus, process data in the manner described above.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video production method based on video script semantic recognition is characterized by comprising the following steps:

the method comprises the steps of obtaining a video production script given by a user, wherein the video production script carries a plurality of video making sentences, each video making sentence comprises a material type identification label and a material attribute label, and the material attribute label comprises a material description and/or a material display form;

and determining a material library for selecting the materials according to the material type identification tag, determining the materials and/or determining the display forms of the materials in the material library according to the material attribute tag, and intelligently integrating and modifying the materials according to the video production script to generate a video.

2. The video production method based on video script semantic recognition as recited in claim 1, wherein determining material based on the material attribute tags comprises:

analyzing a material selection rule corresponding to the material attribute label;

and searching and extracting the materials in the corresponding material library according to the material selection rule, wherein the material library is constructed in a mode of dividing and marking the materials according to the division logic corresponding to the material selection rule.

3. The video production method based on video script semantic recognition as recited in claim 2. The method is characterized in that the step of constructing the material library comprises the following steps:

segmenting a video containing materials;

acquiring video description of each divided material segment by a material processor;

and storing the material segments subjected to the video description into different material libraries according to a preset dividing logic, wherein the preset dividing logic comprises but is not limited to the duration of the material segments, the resolution of the material segments or the video description of the material segments.

4. The video production method based on video script semantic recognition as claimed in claim 3. The method is characterized in that the step of storing the material segments subjected to video description into a material library comprises the following steps:

5. The video production method based on video script semantic recognition as recited in claim 1, wherein determining the material operation mode according to the material attribute tags comprises:

and checking whether the material attribute tag is consistent with a prefabricated material processing command or not, and if so, processing the material according to the prefabricated material processing command.

6. A video production device based on video script semantic recognition is characterized by comprising:

the video production script acquisition module is used for acquiring a video production script given by a user, wherein the video production script carries a plurality of video production sentences, each video production sentence comprises a material type identification tag and a material attribute tag, and the material attribute tag comprises a material description and/or a material display form;

and the video production module is used for determining a material library for selecting the materials according to the material type identification tag, determining the materials and/or determining the display form of the materials in the material library according to the material attribute tag, and intelligently integrating and modifying the related materials according to the video production script to generate a video.

7. The video production device based on video script semantic recognition of claim 6, wherein the video production module comprises:

the selection rule analysis unit is used for analyzing the material selection rule corresponding to the material attribute label;

and the material retrieval unit is used for retrieving and extracting the materials in the corresponding material library according to the material selection rule, wherein the material library is constructed in a mode of dividing and marking the materials according to the division logic corresponding to the material selection rule.

8. The video production device based on video script semantic recognition as recited in claim 7. The construction of the material library depended by the material retrieval unit comprises the following steps:

the material segment dividing module is used for carrying out segment division on the video containing the material;

the video description acquisition module is used for acquiring the video description of the material processor on each divided material segment;

and the warehousing module is used for storing the material segments subjected to the video description into different material libraries according to preset division logic, wherein the preset division logic comprises but is not limited to the duration of the material segments, the resolution of the material segments or the video description of the material segments.

9. The video production device based on video script semantic recognition as recited in claim 8. The storage module is characterized by comprising:

the hash value calculating unit is used for calculating the perceptual hash value of the material segment to be warehoused;

the warehousing judgment unit is used for determining whether the material segment with the similar perception hash value exists in a material library consistent with the video description of the material segment to be warehoused, if the material segment with the similar perception hash value exists and the md5 value of the material segment to be warehoused is different from that of the material segment with the similar perception hash value, the material segment to be warehoused is stored in the material library corresponding to the video description, otherwise, the material segment to be warehoused is submitted to manual processing, and whether the material segment to be warehoused is stored in the material library corresponding to the video description is determined manually;

10. The video production device based on video script semantic recognition of claim 5, wherein the video production module further comprises:

and the command identification unit is used for verifying whether the material attribute tag is consistent with a prefabricated material processing command or not, and if so, processing the material according to the prefabricated material processing command.