CN116708929A - Interactive video script generation method and device, electronic equipment and storage medium - Google Patents

Interactive video script generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116708929A
CN116708929A CN202310755562.6A CN202310755562A CN116708929A CN 116708929 A CN116708929 A CN 116708929A CN 202310755562 A CN202310755562 A CN 202310755562A CN 116708929 A CN116708929 A CN 116708929A
Authority
CN
China
Prior art keywords
image
determining
image elements
video
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310755562.6A
Other languages
Chinese (zh)
Inventor
刘晓丹
杨子斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing IQIYI Science and Technology Co Ltd
Original Assignee
Beijing IQIYI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing IQIYI Science and Technology Co Ltd filed Critical Beijing IQIYI Science and Technology Co Ltd
Priority to CN202310755562.6A priority Critical patent/CN116708929A/en
Publication of CN116708929A publication Critical patent/CN116708929A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an interactive video script generation method, an interactive video script generation device, electronic equipment and a storage medium. The method comprises the steps of obtaining a scenario structure diagram corresponding to an interactive video, wherein the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing the video clips; performing image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and second image elements for representing association relations among the plurality of first image elements; determining corresponding first element information based on each of the first image elements, and determining corresponding second element information based on each of the second image elements; and generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic. Thus, the production efficiency of the script file is improved.

Description

Interactive video script generation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for generating an interactive video script, an electronic device, and a storage medium.
Background
The interactive video is a video form capable of providing scenario interaction for users, and in the playing process of the interactive video, the users can select branching scenarios according to own preference, select different branching scenarios, influence the development of the scenarios, determine the trend of the scenarios, enable the users to be immersed in the scenarios, and promote the participation feeling of the users. In recent years, the interactive video technology is rapidly developed, and various platforms release a plurality of interactive video works. And the completion of the interactive video work is realized based on the script file.
However, the script files of the current interactive video are mostly completed by manual participation, and the efficiency is very low.
Disclosure of Invention
In order to solve the technical problems, the application provides an interactive video script generation method, an interactive video script generation device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present application provides a method for generating an interactive video script, including:
acquiring a scenario structure diagram corresponding to an interactive video, wherein the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing the video clips;
performing image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and second image elements for representing association relations among the plurality of first image elements;
Determining corresponding first element information based on each of the first image elements, and determining corresponding second element information based on each of the second image elements;
and generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic.
In one possible implementation manner, the determining, based on each of the first image elements, corresponding first element information includes:
determining a first element area corresponding to the first image element from the scenario structure diagram;
identifying a first element shape of the first image element based on the first element region, and determining a corresponding element type based on the first element shape;
when the element type is a first type, the first image element is used for representing a corresponding interaction node, and text recognition processing is carried out on a first sub-region in the first element region to obtain a node title of the corresponding interaction node;
and determining the node title as corresponding first element information.
In one possible implementation manner, the determining, based on each of the first image elements, corresponding first element information includes:
When the element type is the second type, the first image element is used for representing the corresponding video segment, text recognition processing is performed on the second subarea in the first element area to obtain a segment identifier of the corresponding video segment, and text recognition processing is performed on the third subarea in the first element area to obtain a segment title of the corresponding video segment;
and determining the fragment identification and the fragment title as corresponding first element information.
In one possible implementation manner, the determining the corresponding second element information based on each of the second image elements includes:
for each second image element, determining an associated image element corresponding to the second image element and an association relation with each associated image element in a plurality of first image elements;
and determining corresponding second element information based on a plurality of the associated image elements and the association relation with each associated image element.
In one possible implementation manner, the determining the corresponding second element information based on the plurality of associated image elements and the association relation with each associated image element includes:
Determining a target image element in a plurality of associated image elements based on the association relationship between the second image element and each of the associated image elements;
determining an element type of the target image element;
and under the condition that the element type is the first type, the target image element is used for representing the corresponding interaction node, and connection information formed by a plurality of associated image elements and the association relation of each associated image element is determined to be corresponding second element information.
In one possible embodiment, the method further comprises:
the target image element is used for representing a corresponding video segment under the condition that the element type is a second type, and a second element region corresponding to the second image element is determined from the scenario structure diagram;
performing text recognition processing on the second element region to obtain playing duration information of the corresponding video clip;
and determining the playing time length information and the connection information as corresponding second element information.
In one possible implementation manner, the generating, based on the first element information and the second element information, a script file corresponding to the interactive video includes:
Determining data format information;
and formatting the first element information and the second element information based on the data format information to obtain script files corresponding to the interactive video.
In a second aspect, an embodiment of the present application provides an interactive video script generating apparatus, including:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a scenario structure diagram corresponding to an interactive video, the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing a plurality of video clips;
the processing module is used for carrying out image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and a second image element used for representing the association relation among the plurality of first image elements;
a determining module, configured to determine corresponding first element information based on each of the first image elements, and determine corresponding second element information based on each of the second image elements;
the generation module is used for generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic.
In a possible implementation manner, the determining module is specifically configured to:
determining a first element area corresponding to the first image element from the scenario structure diagram;
identifying a first element shape of the first image element based on the first element region, and determining a corresponding element type based on the first element shape;
when the element type is a first type, the first image element is used for representing a corresponding interaction node, and text recognition processing is carried out on a first sub-region in the first element region to obtain a node title of the corresponding interaction node;
and determining the node title as corresponding first element information.
In one possible embodiment, the determining module is further configured to:
when the element type is the second type, the first image element is used for representing the corresponding video segment, text recognition processing is performed on the second subarea in the first element area to obtain a segment identifier of the corresponding video segment, and text recognition processing is performed on the third subarea in the first element area to obtain a segment title of the corresponding video segment;
And determining the fragment identification and the fragment title as corresponding first element information.
In one possible embodiment, the determining module is further configured to:
for each second image element, determining an associated image element corresponding to the second image element and an association relation with each associated image element in a plurality of first image elements;
and determining corresponding second element information based on a plurality of the associated image elements and the association relation with each associated image element.
In one possible embodiment, the determining module is further configured to:
determining a target image element in a plurality of associated image elements based on the association relationship between the second image element and each of the associated image elements;
determining an element type of the target image element;
and under the condition that the element type is the first type, the target image element is used for representing the corresponding interaction node, and connection information formed by a plurality of associated image elements and the association relation of each associated image element is determined to be corresponding second element information.
In one possible embodiment, the determining module is further configured to:
The target image element is used for representing a corresponding video segment under the condition that the element type is a second type, and a second element region corresponding to the second image element is determined from the scenario structure diagram;
performing text recognition processing on the second element region to obtain playing duration information of the corresponding video clip;
and determining the playing time length information and the connection information as corresponding second element information.
In a possible implementation manner, the generating module is specifically configured to:
determining data format information;
and formatting the first element information and the second element information based on the data format information to obtain script files corresponding to the interactive video.
In a third aspect, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of the first aspects when executing a program stored on a memory.
In a fourth aspect, a computer-readable storage medium is provided, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of the first aspects.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the interactive video script generation method of any of the above.
The embodiment of the application has the beneficial effects that:
the embodiment of the application provides an interactive video script generation method, an interactive video script generation device, electronic equipment and a storage medium. Therefore, the script file corresponding to the interactive video can be automatically generated according to the plot structure drawing by the user, manual participation is not needed, and the production efficiency of the script file is improved.
Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flowchart of an interactive video script generation method according to an embodiment of the present application;
fig. 2 is an example of a scenario structure diagram provided by an embodiment of the present application;
FIG. 3 is an example of a playback interface for playing back to an interactive node according to an interactive script according to an embodiment of the present application;
FIG. 4 is a flowchart of another method for generating an interactive video script according to an embodiment of the present application;
FIG. 5 is a flowchart of another method for generating an interactive video script according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of an interactive video script generating device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following description is made in specific embodiments with reference to the accompanying drawings, where the embodiments do not limit the embodiments of the present application.
Referring to fig. 1, a flowchart of an embodiment of a method for generating an interactive video script is provided in an embodiment of the present application. As shown in fig. 1, the process may include the steps of:
s101, acquiring a scenario structure diagram corresponding to an interactive video, wherein the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing the video clips.
The interactive video is a video which can provide scenario interaction for users, and in the process of playing the interactive video, the users can select branching scenarios according to own preference so as to enter different branching scenarios. The interactive video is composed of a plurality of video clips, wherein some video clips contain interactive nodes for interaction with users, and some video clips do not contain interactive nodes, namely common video clips.
The scenario structure diagram is used for describing playing logic for playing a plurality of video clips.
As shown in fig. 2, a schematic diagram of a plot structure diagram is shown, in which a rectangular box corresponding to "story beginning" is used for representing video segment id_1, a rectangular box corresponding to "pair" is used for representing video segment id_2, a rectangular box corresponding to "error" is used for representing video segment id_3, and an oval box corresponding to "judge pair error" is used for representing an interaction node in video segment id_1. The arrowed lines are used to represent the playing logic relationship (i.e. the development direction of the storyline) between each video clip and the interactive node, and the time above the lines represents the time of occurrence of the corresponding video clip or interactive node, such as "36 seconds" shown in fig. 2, and represents the time of occurrence of the "correct judgment" node in the video clip id_1.
S102, performing image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and second image elements used for representing association relations among the plurality of first image elements.
S103, corresponding first element information is determined based on each first image element, and corresponding second element information is determined based on each second image element.
S104, generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic.
S102 to S104 are collectively described below:
the first image element refers to an element used for representing a video segment and an interaction node in a scenario structure diagram, for example, in the scenario structure diagram shown in fig. 2, a rectangular frame is used for representing the video segment, and an oval frame is used for representing the interaction node, where the rectangular frame and the oval frame are the first image element.
And the second image element is used for representing the association relation among the plurality of first image elements, namely, the play logic relation between the video clips and the interaction nodes. For example, in the scenario structure diagram shown in fig. 2, the arrowed line under "36 seconds" is used to represent the "judging error" of the playing interaction node after the video clip id_1 is played, and the "judging error" of the interaction node is used to represent the arrowed line after the playing interaction node "judging error", and then the video clip id_2 or the video clip id_3 is played.
The first element information refers to information for describing a corresponding first image element in a scenario structure diagram, for example, in the scenario structure diagram shown in fig. 2, for a first rectangular frame on the left side, "story start" and id_1 on the top side, i.e., the first element information corresponding to the first rectangular frame, for an oval frame, "error judgment" is the first element information corresponding to the oval frame.
The second element information refers to information for describing a corresponding second image element in the scenario structure diagram, for example, in the scenario structure diagram shown in fig. 2, for a connection line between "story start" and "correct judgment", the "36 seconds" on the upper side of the connection line is the second element information corresponding to the connection line. For other links in the graph, i.e. there is no second element information.
In the application, a training set with a certain scale can be prepared in advance, wherein the training set comprises sample scenario structure diagrams and sample interaction script files corresponding to the sample scenario structure diagrams, and each sample interaction script file is a correct script corresponding to the sample scenario structure diagrams. And training an image recognition model (such as an AI (Artificial Intelligence, artificial intelligence) model) by taking the sample scenario structure diagrams as model training data and taking the sample interaction script files as labels corresponding to the sample scenario structure diagrams until the model converges, so as to obtain a trained recognition model.
The size of the training set may be determined according to the complexity of the interaction script that needs to be supported in the model application, that is, the higher the complexity of the interaction script needs to be supported, the larger the size of the training set. In the application, the complexity of the interaction script can be determined by the number of video clips and interaction nodes in the script, that is, the greater the number of video clips and interaction nodes, that is, the higher the complexity.
In order to ensure the recognition effect of the recognition model, a test set can be prepared, the training result is evaluated by using the test set, the recognition model is ensured to have the following capability, and an interaction script (taking a scenario structure diagram shown in fig. 2 as an example) meeting the requirements is output:
A. information such as all rectangular boxes and related numbers (e.g., "id_1", "id_2", "id_3"), titles (e.g., "story start", "pair", "error") and the like can be identified to generate a play section (i.e., video clip) structure in the interactive script. How many rectangular boxes generate how many play intervals.
B. All oval frames and related titles (e.g. "judging correct") can be identified, and an interactive node structure in the interactive script is generated. How many elliptical boxes generate how many interactive nodes.
C. The connection line with the playing interval as the starting point can be identified, the interaction node of the connection line end point is associated to the playing interval, and the playing interval is used as the playing interval corresponding to the interaction node. And identifying the time on the connection line as the starting time of the interaction node. And generating the information into the interaction script.
D. The connection line with the interactive node as the starting point can be identified, the playing interval which takes the same interactive node as the starting point of the connection line and takes the same interactive node as the end point of the connection line is counted, and each playing interval corresponds to one option of the interactive node. The title of the interaction node is used as the display title of the interaction, and the name of each playing interval is used as the name of an interaction option to be filled into the interaction script.
Therefore, when playing the interactive video to a certain interactive node according to the interactive script, the user can jump to the corresponding playing interval to continue playing according to the option clicked by the user. As shown in fig. 3, the video content corresponding to the current moment is displayed on the interface according to the interactive script, the related content "judging the error" is displayed on the interface under the interface, at this time, the user may select one of the two options "opposite" and "error", if the user selects "opposite", the user jumps to the corresponding playing interval of "opposite" to continue playing, and if the user selects "error", the user jumps to the corresponding playing interval of "error" to continue playing.
Based on this, in the embodiment of the present application, the scenario structure chart may be input into the recognition model, the recognition model implements the image recognition processing on the scenario structure chart, so as to obtain a plurality of first image elements, and second image elements for characterizing the association relationship between the plurality of first image elements, and determine corresponding first element information based on each first image element, and determine corresponding second element information based on each second image element, and finally, generate a script file for recording playing logic of a plurality of video clips in the interactive video based on the first element information and the second element information.
In application, the recognition model can learn the data format of the sample interaction script file based on the sample interaction script file in the training process to obtain data format information, such as JSON (JavaScript Object Notation, JS object numbered musical notation) format, XML (EXtensible Markup Language ) and the like.
Based on this, the specific implementation of generating the script file corresponding to the interactive video based on the first element information and the second element information may include: determining data format information, and formatting the first element information and the second element information based on the data format information to obtain a script file corresponding to the interactive video. Thus, a script file conforming to the data format of the sample interaction script file in the training data can be generated.
In addition, in another embodiment, all video clips corresponding to the scenario structure diagram can be obtained, and after the script file is generated, a video playing file corresponding to the interactive video is generated based on all video clips and the script file. Therefore, the user can conveniently play the video clips in the video playing file through the script file in the video playing file.
In the embodiment of the application, firstly, a scenario structure diagram corresponding to an interactive video is obtained, wherein the interactive video comprises a plurality of video clips, the scenario structure diagram is used for describing playing logic for playing a plurality of the video clips, then, image recognition processing is carried out on the scenario structure diagram to obtain a plurality of first image elements, a second image element used for representing the association relation among the plurality of first image elements is obtained, corresponding first element information is determined based on each first image element, corresponding second element information is determined based on each second image element, finally, a script file corresponding to the interactive video is generated based on the first element information and the second element information, and the script file is used for recording the playing logic. Therefore, the script file corresponding to the interactive video can be automatically generated according to the plot structure drawing by the user, manual participation is not needed, and the production efficiency of the script file is improved.
Referring to fig. 4, a flowchart of an embodiment of another method for generating an interactive video script is provided in an embodiment of the present application. The flow shown in fig. 4 describes how to determine the corresponding first element information based on each of the first image elements on the basis of the flow shown in fig. 1 described above. As shown in fig. 4, the process may include the steps of:
s401, determining a first element area corresponding to the first image element from the scenario structure diagram.
S402, performing identification processing on a first element shape of the first image element based on the first element area, and determining a corresponding element type based on the first element shape.
S403, under the condition that the element type is the first type, the first image element is used for representing the corresponding interaction node, and text recognition processing is carried out on the first sub-region in the first element region to obtain the node title of the corresponding interaction node.
And S404, determining the node title as corresponding first element information.
S401 to S404 are collectively described below:
the first element area refers to a block area containing a first image element and characters or marks around the first image element in the scenario structure chart.
The first sub-region refers to a region in the scenario structure diagram for marking a node title corresponding to the interactive node, such as an inner region of an oval frame in fig. 2.
In the embodiment of the application, the region image corresponding to the first element region can be identified to obtain the first element shape of the first image element, and the element types corresponding to the first image element are distinguished according to the first element shape. For example, the first element shape is a rectangular box, the corresponding element type is a video clip, and for another example, the first element shape is an oval box, and the corresponding element type is an interaction node.
When the element type is the first type, the first image element is considered to be used for representing the corresponding interaction node, at this time, text recognition processing is performed on the first sub-region in the first element region, so as to obtain a node title of the corresponding interaction node (for example, the node title of the interaction node in fig. 2 is "judging correct"), and the node title is determined to be the corresponding first element information.
S405, in the case that the element type is the second type, the first image element is used for representing the corresponding video segment, text recognition processing is performed on the second sub-region in the first element region to obtain a segment identifier of the corresponding video segment, and text recognition processing is performed on the third sub-region in the first element region to obtain a segment title of the corresponding video segment.
S406, determining the fragment identification and the fragment title as corresponding first element information.
S405 to S406 are collectively described below:
the second sub-region refers to a region in the scenario structure diagram for marking the segment identifier of the corresponding video segment, such as a region on the upper side of the rectangular frame in fig. 2.
The third sub-region refers to a region of the scenario structure diagram for marking a section title of a corresponding video section, as in fig. 2, a rectangular frame inner region.
In the embodiment of the application, when the element type is the second type, the first image element is considered to be used for representing the corresponding video fragment, at this time, text recognition processing is performed on the second subarea in the first element area to obtain the fragment identifier of the corresponding video fragment, text recognition processing is performed on the third subarea in the first element area to obtain the fragment title of the corresponding video fragment, and the fragment identifier and the fragment title are determined to be the corresponding first element information.
Through the flow shown in fig. 4, the element shape of the first image element can be identified based on the first element region corresponding to the first image element, and compared with the method of directly identifying based on the scenario structure diagram, the accuracy of image identification can be improved. Furthermore, the element types corresponding to the first image elements are distinguished according to the element shapes, automatic extraction of the corresponding first element information is realized according to the corresponding element types, manual participation is not needed in the whole process, and the processing efficiency is improved.
Referring to fig. 5, a flowchart of an embodiment of another method for generating an interactive video script is provided in an embodiment of the present application. The flow shown in fig. 5 describes how to determine the corresponding second element information based on each of the second image elements on the basis of the flow shown in fig. 1 described above. As shown in fig. 5, the process may include the steps of:
s501, determining an associated image element corresponding to each second image element and an association relation with each associated image element in a plurality of first image elements;
s502, corresponding second element information is determined based on a plurality of associated image elements and the association relation with each associated image element.
S501 to S502 are collectively described below:
associated image elements refer to first image elements associated with corresponding second image elements.
The association relation refers to the drawing relation of the corresponding second image element on the image drawing.
In the example shown in fig. 2, for the first arrowed line on the left side, the associated image element associated with it includes a rectangular box of "story start" and an oval box of "judge right and wrong", where the association relationship is: the rectangular box of the "story beginning" is the beginning of the link, and the oval box of the "judge right and wrong" is the end of the link.
In an embodiment, the determining the specific implementation of the corresponding second element information based on the association image elements and the association relation with each association image element may include the following steps:
a1, determining a target image element in a plurality of associated image elements based on the association relation between the second image element and each associated image element;
step A2, determining the element type of the target image element;
step A3, under the condition that the element type is the first type, the target image element is used for representing a corresponding interaction node, and connection information formed by a plurality of associated image elements and association relations with each associated image element is determined to be corresponding second element information;
step A4, under the condition that the element type is a second type, the target image element is used for representing a corresponding video fragment, and a second element area corresponding to the second image element is determined from the scenario structure diagram;
step A5, performing text recognition processing on the second element region to obtain playing duration information of the corresponding video clip;
and step A6, determining the playing time length information and the connection information as corresponding second element information.
The target image element refers to an associated image element that is a logical starting point of the second image element. In the example shown in fig. 2, for the first arrowed line on the left side, the associated image element associated with it includes a rectangular box of "story start" and an oval box of "judge right and wrong", where the association relationship is: the rectangular box of the "story beginning" is the beginning of the link, and the oval box of the "judge right and wrong" is the end of the link. Then, the rectangular box of "story beginning" is the logical starting point of the link, i.e., the target image element.
The second element region refers to a block region containing a second image element and characters or marks around the second image element in the scenario structure diagram.
In application, when the logical starting point of the second image element is a video clip, the second element area of the second image element includes information describing the playing duration of the video clip, as in the example shown in fig. 2, for the first arrowed line on the left side, "36 seconds" above it indicates the playing duration of the video clip, "story start". When the logical starting point of the second image element is the interaction node, the interaction node is ended based on the user interaction, so that the playing time information is not marked in the second element area.
Based on this, in the embodiment of the present application, when the element type is the first type, that is, when the target image element is used to represent the corresponding interaction node, connection information formed by a plurality of associated image elements and association relationships between each associated image element is directly determined as corresponding second element information; and when the element type is the second type, that is, when the target image element is used for representing the corresponding video segment, a second element region corresponding to the second image element can be determined from the scenario structure diagram, text recognition processing is performed on the second element region, playing duration information of the corresponding video segment is obtained, and the playing duration information and the connection information are determined together to be the corresponding second element information.
By the flow shown in fig. 5, the automatic determination of the corresponding second element information according to the associated image element corresponding to each second image element and the association relation with each associated image element is realized. The whole process does not need to be manually participated, and the processing efficiency is improved.
Based on the same technical concept, the embodiment of the application also provides an interactive video script generating device, as shown in fig. 6, which comprises:
The acquiring module 601 is configured to acquire a scenario structure diagram corresponding to an interactive video, where the interactive video includes a plurality of video clips, and the scenario structure diagram is used to describe playing logic for playing a plurality of the video clips;
the processing module 602 is configured to perform image recognition processing on the scenario structure chart to obtain a plurality of first image elements, and a second image element for representing an association relationship between the plurality of first image elements;
a determining module 603, configured to determine corresponding first element information based on each of the first image elements, and determine corresponding second element information based on each of the second image elements;
and a generating module 604, configured to generate a script file corresponding to the interactive video based on the first element information and the second element information, where the script file is used to record the playing logic.
In a possible implementation manner, the determining module is specifically configured to:
determining a first element area corresponding to the first image element from the scenario structure diagram;
identifying a first element shape of the first image element based on the first element region, and determining a corresponding element type based on the first element shape;
When the element type is a first type, the first image element is used for representing a corresponding interaction node, and text recognition processing is carried out on a first sub-region in the first element region to obtain a node title of the corresponding interaction node;
and determining the node title as corresponding first element information.
In one possible embodiment, the determining module is further configured to:
when the element type is the second type, the first image element is used for representing the corresponding video segment, text recognition processing is performed on the second subarea in the first element area to obtain a segment identifier of the corresponding video segment, and text recognition processing is performed on the third subarea in the first element area to obtain a segment title of the corresponding video segment;
and determining the fragment identification and the fragment title as corresponding first element information.
In one possible embodiment, the determining module is further configured to:
for each second image element, determining an associated image element corresponding to the second image element and an association relation with each associated image element in a plurality of first image elements;
And determining corresponding second element information based on a plurality of the associated image elements and the association relation with each associated image element.
In one possible embodiment, the determining module is further configured to:
determining a target image element in a plurality of associated image elements based on the association relationship between the second image element and each of the associated image elements;
determining an element type of the target image element;
and under the condition that the element type is the first type, the target image element is used for representing the corresponding interaction node, and connection information formed by a plurality of associated image elements and the association relation of each associated image element is determined to be corresponding second element information.
In one possible embodiment, the determining module is further configured to:
the target image element is used for representing a corresponding video segment under the condition that the element type is a second type, and a second element region corresponding to the second image element is determined from the scenario structure diagram;
performing text recognition processing on the second element region to obtain playing duration information of the corresponding video clip;
and determining the playing time length information and the connection information as corresponding second element information.
In a possible implementation manner, the generating module is specifically configured to:
determining data format information;
and formatting the first element information and the second element information based on the data format information to obtain script files corresponding to the interactive video.
In the embodiment of the application, firstly, a scenario structure diagram corresponding to an interactive video is obtained, wherein the interactive video comprises a plurality of video clips, the scenario structure diagram is used for describing playing logic for playing a plurality of the video clips, then, image recognition processing is carried out on the scenario structure diagram to obtain a plurality of first image elements, a second image element used for representing the association relation among the plurality of first image elements is obtained, corresponding first element information is determined based on each first image element, corresponding second element information is determined based on each second image element, finally, a script file corresponding to the interactive video is generated based on the first element information and the second element information, and the script file is used for recording the playing logic. Therefore, the script file corresponding to the interactive video can be automatically generated according to the plot structure drawing by the user, manual participation is not needed, and the production efficiency of the script file is improved.
Based on the same technical concept, the embodiment of the present application further provides an electronic device, as shown in fig. 7, including a processor 111, a communication interface 112, a memory 113 and a communication bus 114, where the processor 111, the communication interface 112, and the memory 113 perform communication with each other through the communication bus 114,
a memory 113 for storing a computer program;
the processor 111 is configured to execute a program stored in the memory 113, and implement the following steps:
acquiring a scenario structure diagram corresponding to an interactive video, wherein the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing the video clips;
performing image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and second image elements for representing association relations among the plurality of first image elements;
determining corresponding first element information based on each of the first image elements, and determining corresponding second element information based on each of the second image elements;
and generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present application, a computer readable storage medium is provided, in which a computer program is stored, the computer program implementing the steps of any of the above-mentioned interactive video script generation methods when executed by a processor.
In yet another embodiment of the present application, a computer program product containing instructions that, when run on a computer, cause the computer to perform the interactive video script generation method of any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for generating an interactive video script, the method comprising:
acquiring a scenario structure diagram corresponding to an interactive video, wherein the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing the video clips;
performing image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and second image elements for representing association relations among the plurality of first image elements;
determining corresponding first element information based on each of the first image elements, and determining corresponding second element information based on each of the second image elements;
and generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic.
2. The method of claim 1, wherein said determining corresponding first element information based on each of said first image elements comprises:
determining a first element area corresponding to the first image element from the scenario structure diagram;
identifying a first element shape of the first image element based on the first element region, and determining a corresponding element type based on the first element shape;
When the element type is a first type, the first image element is used for representing a corresponding interaction node, and text recognition processing is carried out on a first sub-region in the first element region to obtain a node title of the corresponding interaction node;
and determining the node title as corresponding first element information.
3. The method of claim 2, wherein said determining corresponding first element information based on each of said first image elements comprises:
when the element type is the second type, the first image element is used for representing the corresponding video segment, text recognition processing is performed on the second subarea in the first element area to obtain a segment identifier of the corresponding video segment, and text recognition processing is performed on the third subarea in the first element area to obtain a segment title of the corresponding video segment;
and determining the fragment identification and the fragment title as corresponding first element information.
4. The method of claim 1, wherein said determining corresponding second element information based on each of said second image elements comprises:
for each second image element, determining an associated image element corresponding to the second image element and an association relation with each associated image element in a plurality of first image elements;
And determining corresponding second element information based on a plurality of the associated image elements and the association relation with each associated image element.
5. The method of claim 4, wherein the determining the corresponding second element information based on the plurality of associated image elements and the association relationship with each of the associated image elements comprises:
determining a target image element in a plurality of associated image elements based on the association relationship between the second image element and each of the associated image elements;
determining an element type of the target image element;
and under the condition that the element type is the first type, the target image element is used for representing the corresponding interaction node, and connection information formed by a plurality of associated image elements and the association relation of each associated image element is determined to be corresponding second element information.
6. The method of claim 5, wherein the method further comprises:
the target image element is used for representing a corresponding video segment under the condition that the element type is a second type, and a second element region corresponding to the second image element is determined from the scenario structure diagram;
Performing text recognition processing on the second element region to obtain playing duration information of the corresponding video clip;
and determining the playing time length information and the connection information as corresponding second element information.
7. The method of claim 1, wherein generating the script file corresponding to the interactive video based on the first element information and the second element information comprises:
determining data format information;
and formatting the first element information and the second element information based on the data format information to obtain script files corresponding to the interactive video.
8. An interactive video script generating apparatus, the apparatus comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a scenario structure diagram corresponding to an interactive video, the interactive video comprises a plurality of video clips, and the scenario structure diagram is used for describing playing logic for playing a plurality of video clips;
the processing module is used for carrying out image recognition processing on the scenario structure diagram to obtain a plurality of first image elements and a second image element used for representing the association relation among the plurality of first image elements;
A determining module, configured to determine corresponding first element information based on each of the first image elements, and determine corresponding second element information based on each of the second image elements;
the generation module is used for generating a script file corresponding to the interactive video based on the first element information and the second element information, wherein the script file is used for recording the playing logic.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-7 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-7.
CN202310755562.6A 2023-06-25 2023-06-25 Interactive video script generation method and device, electronic equipment and storage medium Pending CN116708929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310755562.6A CN116708929A (en) 2023-06-25 2023-06-25 Interactive video script generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310755562.6A CN116708929A (en) 2023-06-25 2023-06-25 Interactive video script generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116708929A true CN116708929A (en) 2023-09-05

Family

ID=87833809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310755562.6A Pending CN116708929A (en) 2023-06-25 2023-06-25 Interactive video script generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116708929A (en)

Similar Documents

Publication Publication Date Title
CN109871326B (en) Script recording method and device
CN108810642B (en) Bullet screen display method and device and electronic equipment
CN110244941B (en) Task development method and device, electronic equipment and computer readable storage medium
CN109743589B (en) Article generation method and device
CN110297633B (en) Transcoding method, device, equipment and storage medium
CN111782184B (en) Apparatus, method, apparatus and medium for performing a customized artificial intelligence production line
CN108256870A (en) Description information and update, data processing method and device are generated based on topological structure
WO2021196674A1 (en) System code testing method and apparatus, and computer device and storage medium
CN112882933A (en) Script recording method, device, equipment and storage medium
CN112860581B (en) Execution method, device, equipment and storage medium of test case
CN112199261A (en) Application program performance analysis method and device and electronic equipment
CN110286893B (en) Service generation method, device, equipment, system and storage medium
CN116708929A (en) Interactive video script generation method and device, electronic equipment and storage medium
CN110347379B (en) Processing method, device and storage medium for combined crowdsourcing questions
CN117499287A (en) Web testing method, device, storage medium and proxy server
CN111078529B (en) Client writing module testing method and device and electronic equipment
CN112052157A (en) Test message construction method, device and system
CN117193738A (en) Application building method, device, equipment and storage medium
CN109474822B (en) Android television multi-language automatic testing method and device
CN116828255A (en) Story line description file generation method and device, electronic equipment and storage medium
CN113064590B (en) Processing method and device for interactive components in interactive video
CN113934870B (en) Training method, device and server of multimedia recommendation model
CN116708930A (en) Interactive video script generation method and device, electronic equipment and storage medium
CN110221958A (en) Application testing method, calculates equipment and computer readable storage medium at device
CN112733516B (en) Method, device, equipment and storage medium for processing quick message

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination