CN115408565A - Text processing method, video processing method, device and electronic equipment - Google Patents

Text processing method, video processing method, device and electronic equipment Download PDF

Info

Publication number
CN115408565A
CN115408565A CN202211158473.5A CN202211158473A CN115408565A CN 115408565 A CN115408565 A CN 115408565A CN 202211158473 A CN202211158473 A CN 202211158473A CN 115408565 A CN115408565 A CN 115408565A
Authority
CN
China
Prior art keywords
target
text
sub
video
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211158473.5A
Other languages
Chinese (zh)
Inventor
张骏杰
宋忠良
舒新胜
况聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202211158473.5A priority Critical patent/CN115408565A/en
Publication of CN115408565A publication Critical patent/CN115408565A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/748Hypervideo
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The embodiment of the invention provides a text processing method, a video processing method, a device and electronic equipment, and relates to the technical field of videos. The method comprises the following steps: dividing the video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed, and determining a first label of each fragment to be selected; dividing a target text into a plurality of sub-texts according to the text content of the target text for describing the video to be processed, and determining a second label of each sub-text; performing label matching on the first label of each segment to be selected and the second label of each target sub-text to obtain each combination to be selected; and for each target combination, adding a target playing link of the segment to be selected in the combination to the target subfile in the combination in the target text. Compared with the prior art, the scheme provided by the embodiment of the invention is applied to realize the positioning of the video content corresponding to the text content in the scenario introduction, thereby improving the video positioning efficiency.

Description

Text processing method, video processing method, device and electronic equipment
Technical Field
The present invention relates to the field of video technologies, and in particular, to a text processing method, a video processing device, and an electronic device.
Background
With the showing of a tv series or movie of various titles, the video client provides a function of viewing a synopsis of the drama, for example, a diversity synopsis of the tv series, in order that the user can roughly understand the content of the drama.
However, when a user is interested in video content corresponding to text content in the diversity introduction, the user needs to manually drag the progress bar corresponding to the video content corresponding to the diversity introduction to a possible playing progress to watch the interested video content, which results in low efficiency in positioning the video content.
Disclosure of Invention
Embodiments of the present invention provide a text processing method, a video processing method, an apparatus, and an electronic device, so as to position video content corresponding to text content in a scenario introduction, thereby improving video positioning efficiency. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a text processing method, including:
dividing a video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed, and determining a first label of each fragment to be selected; wherein the first tag comprises: a person tag and an event tag;
dividing the target text into a plurality of sub-texts according to the text content of the target text for describing the video to be processed, and determining a second label of each sub-text; wherein the second tag comprises: person tags and event tags;
performing label matching on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts to obtain each combination to be selected; each combination to be selected comprises at least one fragment to be selected and a target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text included in the combination to be selected;
adding a target playing link of a segment to be selected in the target combination to the target sub-text in the target combination aiming at each target combination in the combinations to be selected; the target playing link is used for indicating that the video to be processed is played from the initial playing time of the segment to be selected in the target combination in the video to be processed.
Optionally, in a specific implementation manner, the dividing the target text into a plurality of sub-texts according to text content of the target text for describing the video to be processed, and determining a second tag of each sub-text includes:
dividing a target text for describing the video to be processed based on a preset separator to obtain each sentence in the target text;
performing word segmentation processing on each sentence to obtain at least one keyword in the sentence;
and sequentially carrying out keyword category identification on each keyword according to the sequence of the position of each keyword in the target text from front to back, dividing sentences to which the first keywords and the second keywords belong into one sub-text when each first keyword related to a character and each second keyword related to an event are identified, and respectively determining the first keywords and the second keywords as character tags and event tags of the sub-texts to obtain second tags of the sub-texts.
Optionally, in a specific implementation manner, the performing label matching on the first label of each to-be-selected segment and the second label of each target sub-text in the plurality of sub-texts to obtain each to-be-selected combination includes:
aiming at each target sub-text, calculating a matching degree group of the target sub-text and each segment to be selected; wherein each matching degree group comprises: the character label and the event label of the target sub-text are respectively matched with the character label and the event label of a segment to be selected by a first matching degree and a second matching degree;
and aiming at each target sub-text, calculating the label matching degree of the target sub-text and each segment to be selected based on each matching degree group, and determining the target sub-text and the segment to be selected, of which the label matching degree with the target sub-text reaches a preset matching degree, as a combination to be selected to obtain each combination to be selected.
Optionally, in a specific implementation manner, the calculating, for each target sub-text, a matching degree group between the target sub-text and each to-be-selected segment includes:
for each target sub-text, the following steps are performed:
determining a first number of keywords included in a character tag of the target sub-text and a second number of keywords included in an event tag of the sub-text;
determining a third number of matched keywords in the character tag of the target sub-text and the character tag of the segment to be selected and a fourth number of matched keywords in the event tag of the target sub-text and the event tag of the segment to be selected for each segment to be selected;
calculating the ratio of the third quantity to a target sum value for each segment to be selected, and taking the ratio as a first matching degree of the character label of the target sub-text and the character label of the segment to be selected; calculating the ratio of the fourth quantity to the target sum value as a second matching degree of the event label of the target sub-text and the event label of the to-be-selected segment; wherein the target sum value is a sum of the first number and the second number.
Optionally, in a specific implementation manner, the adding, to each target combination in the to-be-selected combinations, a target play link regarding a to-be-selected segment in the target combination to a target sub-text in the target combination in the target text includes:
for each target combination, the following operations are performed:
if the target combination comprises a segment to be selected, adding a target playing link related to the segment to be selected in the target combination to a target sub-text in the target combination in the target text;
if the target combination comprises a plurality of fragments to be selected, selecting the target fragments from the plurality of fragments to be selected included in the target combination based on a preset selection mode, and adding appointed playing links related to the target fragments to target sub texts in the target combination in the target texts; the appointed playing link is used for indicating that the video to be processed is played from the starting playing time of the target segment in the video to be processed.
Optionally, in a specific implementation manner, the selecting, based on a preset selection manner, a target segment from a plurality of segments to be selected included in the target combination includes:
acquiring a selection instruction, and determining a segment to be selected indicated by the selection instruction as a target segment; alternatively, the first and second electrodes may be,
and determining the to-be-selected segment with the maximum label matching degree with the target sub-text included in the target combination as the target segment from the plurality of to-be-selected segments included in the target combination.
In a second aspect of the embodiments of the present invention, there is also provided a video processing method, including:
determining a target text; each target sub text in the target text is associated with a target playing link, and the target playing link associated with each target sub text is added based on the text processing method provided by the first aspect;
when the selection operation of a target playing link associated with a specific sub-text in each target sub-text is detected, jumping to a playing interface of a target video described by the target text, and starting playing the target video from a moment indicated by the target playing link associated with the specific sub-text.
In a third aspect of the present invention, there is also provided a text processing apparatus, including:
the first label determining module is used for dividing the video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed and determining a first label of each fragment to be selected; wherein the first tag comprises: person tags and event tags;
the second label determining module is used for dividing the target text into a plurality of sub texts according to the text content of the target text for describing the video to be processed and determining a second label of each sub text; wherein the second tag comprises: a person tag and an event tag;
the to-be-selected combination determining module is used for performing label matching on the first label of each to-be-selected fragment and the second label of each target sub-text in the plurality of sub-texts to obtain each to-be-selected combination; each combination to be selected comprises at least one fragment to be selected and a target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text included in the combination to be selected;
a link adding module, configured to add, to each target combination in the to-be-selected combinations, a target playing link for a to-be-selected segment in the target combination in the target text for a target sub-text in the target combination; the target playing link is used for indicating that the video to be processed is played from the initial playing time of the segment to be selected in the target combination in the video to be processed.
Optionally, in a specific implementation manner, the second tag determining module is specifically configured to:
dividing a target text for describing the video to be processed based on a preset separator to obtain each sentence in the target text;
performing word segmentation processing on each sentence to obtain at least one keyword in the sentence;
and sequentially carrying out keyword category identification on each keyword according to the sequence of the position of each keyword in the target text from front to back, dividing sentences to which the first keywords and the second keywords belong into one sub-text when each first keyword related to a character and each second keyword related to an event are identified, and respectively determining the first keywords and the second keywords as character tags and event tags of the sub-texts to obtain second tags of the sub-texts.
Optionally, in a specific implementation manner, the to-be-selected combination determining module includes:
the calculation sub-module is used for calculating a matching degree group of each target sub-text and each segment to be selected aiming at each target sub-text; wherein each matching degree group comprises: the character label and the event label of the target sub-text are respectively matched with the character label and the event label of a segment to be selected by a first matching degree and a second matching degree;
and the to-be-selected combination determining submodule is used for calculating the matching degree of the target sub-text and each to-be-selected fragment based on each matching degree group aiming at each target sub-text, and determining the target sub-text and the to-be-selected fragment of which the matching degree with the label of the target sub-text reaches the preset matching degree as the to-be-selected combination to obtain each to-be-selected combination.
Optionally, in a specific implementation manner, the calculating sub-module is specifically configured to:
for each target sub-text, the following steps are performed:
determining a first number of keywords included in a character tag of the target sub-text and a second number of keywords included in an event tag of the sub-text;
determining a third number of matched keywords in the character tag of the target sub-text and the character tag of the segment to be selected and a fourth number of matched keywords in the event tag of the target sub-text and the event tag of the segment to be selected for each segment to be selected;
calculating the ratio of the third quantity to a target sum value for each segment to be selected, and taking the ratio as a first matching degree of the character label of the target sub-text and the character label of the segment to be selected; calculating the ratio of the fourth quantity to the target sum value as a second matching degree of the event label of the target sub-text and the event label of the to-be-selected segment; wherein the target sum value is a sum of the first number and the second number.
Optionally, in a specific implementation manner, the to-be-selected combination determining module includes a selecting submodule, and the to-be-selected combination determining module is specifically configured to:
for each target combination, the following operations are performed:
if the target combination comprises a segment to be selected, triggering the link adding module;
if the target combination comprises a plurality of fragments to be selected, triggering the selection submodule;
the selection submodule is used for selecting a target segment from a plurality of segments to be selected, which are included in the target combination, based on a preset selection mode, and adding a specified playing link related to the target segment to a target sub-text in the target combination in the target text; the appointed playing link is used for indicating that the video to be processed is played from the starting playing time of the target segment in the video to be processed.
Optionally, in a specific implementation manner, the selecting submodule is specifically configured to:
acquiring a selection instruction, and determining a segment to be selected indicated by the selection instruction as a target segment; or determining the to-be-selected segment with the maximum label matching degree with the target sub-text included in the target combination as the target segment from the to-be-selected segments included in the target combination.
In a fourth aspect of the present invention, there is also provided a video processing apparatus, comprising:
the text determination module is used for determining a target text; each target sub text in the target text is associated with a target playing link, and the target playing link associated with each target sub text is added based on the text processing method provided by the first aspect;
and the video playing module is used for jumping to a playing interface of the target video described by the target text and playing the target video from the moment indicated by the target playing link associated with the specified sub-text when the selection operation of the target playing link associated with the specified sub-text in each target sub-text is detected.
In yet another aspect of the present invention, there is further provided a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the text processing methods provided in the first aspect.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the video processing method provided in the second aspect above.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the text processing methods provided in the first aspect above.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the video processing method provided in the second aspect above.
By applying the scheme provided by the embodiment of the invention, the video to be processed is divided into a plurality of segments to be selected according to the video content of the video to be processed, and the first tag comprising the character tag and the event tag of each segment to be selected is determined based on the video content. Then, according to the text content of the target text for describing the video to be processed, the target text is divided into a plurality of sub-texts, and based on the text content, a second label including a character label and an event label of each sub-text is determined. Therefore, label matching can be carried out on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts, and each combination to be selected is obtained. Each combination to be selected comprises at least one fragment to be selected and one target sub-text, and when the character label of the fragment to be selected in each combination to be selected is matched with the character label of the target sub-text, the event label of the fragment to be selected is matched with the event label of the target sub-text. Therefore, for each target combination in each to-be-selected combination, a target playing link for instructing to play the to-be-processed video from the starting playing time of the to-be-selected segment in the to-be-selected combination in the to-be-processed video may be added to the target sub-text in the target combination in the target text.
Based on this, by applying the scheme provided by the embodiment of the present invention, the text can be divided into a plurality of sub-texts according to the text content of the text for describing the video, and a play link of the video picture corresponding to the sub-text at the start play time in the complete video is added to each sub-text. Therefore, when the user is interested in the video content corresponding to the sub-text in the text, the video content corresponding to the sub-text which is interested by the user can be directly positioned in the complete video corresponding to the text content through the playing link corresponding to the sub-text, and the complete video is played from the initial playing time of the video content in the complete video, so that the video positioning efficiency is improved, and further, the video watching rate is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flowchart of a text processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an effect provided by an embodiment of the present invention;
fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
With the showing of a tv series or movie of various titles, the video client provides a function of viewing a synopsis of the drama, for example, a diversity synopsis of the tv series, in order that the user can roughly understand the content of the drama. However, when a user is interested in video content corresponding to text content in the diversity introduction, the user needs to manually drag the progress bar corresponding to the video content corresponding to the diversity introduction to a possible playing progress to view the interested video content, which results in low efficiency of positioning the video content.
In order to solve the above technical problem, an embodiment of the present invention provides a text processing method.
The method can be applied to various application scenes needing video positioning and playing, for example, when a text part which is interested by a user exists in the diversity introduction of a television series, the user can directly position the video content corresponding to the text part for playing. Moreover, the method can be applied to various electronic devices such as notebook computers, tablet computers, desktop computers and the like, and is hereinafter referred to as electronic devices for short. Based on this, the embodiment of the present invention does not limit the application scenario and the execution subject of the method.
The text processing method provided by the embodiment of the invention can comprise the following steps:
dividing a video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed, and determining a first label of each fragment to be selected; wherein the first tag comprises: a person tag and an event tag;
dividing the target text into a plurality of sub-texts according to the text content of the target text for describing the video to be processed, and determining a second label of each sub-text; wherein the second tag comprises: person tags and event tags;
performing label matching on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts to obtain each combination to be selected; each combination to be selected comprises at least one fragment to be selected and a target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text included in the combination to be selected;
for each target combination in each to-be-selected combination, adding a target playing link related to a to-be-selected segment in the target combination to a target sub-text in the target combination in the target text; the target playing link is used for indicating that the video to be processed is played from the initial playing time of the segment to be selected in the target combination in the video to be processed.
As can be seen from the above, according to the scheme provided by the embodiment of the present invention, the video to be processed is divided into a plurality of segments to be selected according to the video content of the video to be processed, and the first tag including the person tag and the event tag of each segment to be selected is determined based on the video content. Then, according to the text content of the target text for describing the video to be processed, the target text is divided into a plurality of sub-texts, and a second label including a character label and an event label of each sub-text is determined based on the text content. Therefore, label matching can be carried out on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts, and each combination to be selected is obtained. Each combination to be selected comprises at least one fragment to be selected and one target sub-text, and when the character label of the fragment to be selected in each combination to be selected is matched with the character label of the target sub-text, the event label of the fragment to be selected is matched with the event label of the target sub-text. Therefore, for each target combination in each to-be-selected combination, a target playing link for instructing to play the to-be-processed video from the starting playing time of the to-be-selected segment in the to-be-selected combination in the to-be-processed video may be added to the target sub-text in the target combination in the target text.
Based on this, by applying the scheme provided by the embodiment of the present invention, the text can be divided into a plurality of sub-texts according to the text content of the text for describing the video, and a play link of the video picture corresponding to the sub-text at the start play time in the complete video is added to each sub-text. Therefore, when the user is interested in the video content corresponding to the sub-text in the text, the video content corresponding to the sub-text which is interested by the user can be directly positioned in the complete video corresponding to the text content through the playing link corresponding to the sub-text, and the complete video is played from the initial playing time of the video content in the complete video, so that the video positioning efficiency is improved, and further, the video watching rate is improved.
The following describes a text processing method according to an embodiment of the present invention with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a text processing method according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps S101 to S104.
S101: dividing the video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed, and determining a first label of each fragment to be selected;
wherein the first tag comprises: a person tag and an event tag.
In order to facilitate the user to watch the video content corresponding to the text content of interest, first, a video to be processed, for example, a single episode video of a television show, a feature film video of a movie, or the like, may be obtained.
And, the story content presented by each video to be processed is pushed by each story line, and the most important information for each story line is character information and event information. Therefore, for the video content of one video to be processed, the character tag and the event tag in the tags corresponding to the video clips corresponding to the storylines are not available.
Therefore, after the video to be processed is obtained, the video to be processed can be divided into a plurality of segments to be selected according to the video content of the video to be processed, and therefore, for each segment to be selected, the first tag comprising the character tag and the event tag of the segment to be selected is determined based on the video content.
Further, in order to more accurately divide the to-be-selected segment, the first tag may further include a time tag, a location tag, and the like. The embodiment of the present invention is not particularly limited.
The character information of the video to be processed can be obtained through a face recognition algorithm, specifically, the face video can be performed on each video frame of the video to be processed through the face recognition algorithm, so that the character information of characters existing in each video frame of the video to be processed is obtained. Or obtained through plot analysis, or obtained through manual identification.
Optionally, the dividing manner of the segments to be selected may be:
the method comprises the steps of determining character information of characters existing in all video frames of a video to be processed based on the video to be processed, and accordingly determining a video frame where the character exists as a first type video frame based on the character information for each character existing in the video to be processed.
For example, if a person a, a person b, and a person c exist in a certain video, and the video frames including the person a include the 10 th video frame, the 15 th to 50 th video frames, and the 85 th to 100 th video frames, the video frames are regarded as the first type video frames of the person a.
Therefore, the to-be-selected segment containing the character can be determined based on the first type video frames in which the character exists and the time difference between the first type video frames.
Therefore, the video to be processed is divided based on the character information, and each segment to be selected in the video to be processed can be efficiently and accurately determined.
Wherein the time difference between the video frames of the first type can be determined by the number of video frames existing between the video frames of the first type. And when the time difference between the two first-class video frames is greater than a preset threshold value, the two first-class video frames are not in the same segment to be selected.
Optionally, in order to more accurately divide the segment to be selected, after the segment to be selected is divided for a single person, each segment to be selected may be further processed. For example, the to-be-selected segment of the to-be-processed video may be comprehensively separated based on the to-be-selected segment divided by each person, or the divided to-be-selected segment may be further adjusted in combination with a background, music, and the like corresponding to the to-be-selected segment, for example, a plurality of to-be-selected segments under the same background music may be used as the same to-be-selected segment.
Therefore, the first label of each segment to be selected can be determined according to the video content corresponding to each segment to be selected.
Optionally, for the video voice of each segment to be selected, based on a voice recognition technology, recognizing a voice text corresponding to the video segment from the video voice of the segment to be selected, performing keyword recognition on the recognized voice text to obtain each keyword in the voice text, then sequentially performing keyword category recognition on each keyword according to a sequence of positions of the keywords in the voice text from front to back, and when each keyword related to a person and each keyword related to an event are recognized, determining the first keyword and the second keyword as a person tag and an event tag of the segment to be selected respectively to obtain the first tag of the segment to be selected.
For example, keyword recognition is performed on a voice text, and when a keyword p1 is recognized, the keyword is a first keyword related to a person; when the keyword p2 is identified, the keyword is a first keyword related to a person; when keyword p3 is identified, this keyword is the second keyword for the event. Therefore, sentences 1 and 2 to which the keywords p1, p2 and p3 belong may be divided into the sub-text 1, and the keywords p1, p2 and p3 may be determined as the first tags of the sub-text 1.
Continuing to identify, and when a keyword p4 is identified, the keyword is a first keyword related to the person; when keyword p5 is identified, the keyword is a second keyword related to the event. Therefore, it is possible to divide the sentence 3 to which the keyword p4 and the keyword p5 belong into the sub-text 2, and determine the keyword p4 and the keyword p5 as the first tag of the sub-text 1.
Optionally, for the line caption of each segment to be selected, based on a word processing technology, a caption text corresponding to the video segment is identified from the line caption of the segment to be selected, keyword identification is performed on the identified caption text to obtain each keyword in the caption text, then, keyword category identification is sequentially performed on each keyword according to a sequence from front to back of the position of each keyword in the caption text, and when a first keyword related to a person and a second keyword related to an event are identified, the first keyword and the second keyword are respectively determined as a person tag and an event tag of the segment to be selected, so as to obtain a first tag of the segment to be selected.
S102: dividing a target text into a plurality of sub-texts according to text content of the target text for describing a video to be processed, and determining a second label of each sub-text;
wherein the second tag comprises: a person tag and an event tag.
After a plurality of to-be-selected segments of the to-be-processed video and the first label of each to-be-selected segment are determined, a target text for describing the to-be-processed video can be obtained.
Optionally, the target text may be generated by the electronic device, so that the process of generating the target text by the electronic device is a process of acquiring the target text by the electronic device.
Optionally, the target text may be written by a user, and the user may input the written target text to the electronic device, so that the electronic device may obtain the target text input by the user.
Furthermore, after the target text is obtained, the electronic device may further determine each sub-text included in the target text.
In addition, the story content described by each target text is pushed by individual story lines, and the most important information for each story line is character information and event information. Therefore, for the text content of one target text, the character tag and the event tag in the tags corresponding to the sub-texts corresponding to the respective story lines are not available.
Therefore, after the target text is obtained, the target text can be divided into a plurality of sub-texts according to the text content of the target text. In this way, a second label of each sub-text, including a person label and an event label, may be determined for the text content of the sub-text.
Further, in order to more accurately divide the sub-text, the second label may further include a time label, a location label, and the like. The embodiment of the present invention is not particularly limited.
For example, the text content of the sub-text a is: "after the banquet is finished, the chess and phoenix know that the micro-van meets each other", then the second label of this subfile a includes: after the banquet is finished: a time tag; "play peace", "phoenix know little": a character tag; "meet mutually": an event tag; "carriage house": a place tag.
S103: performing label matching on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts to obtain each combination to be selected;
each combination to be selected comprises at least one fragment to be selected and one target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text included in the combination to be selected.
Because the target text comprises a plurality of sub-texts, however, the audience may not be interested in the text contents of all the sub-texts, after the plurality of sub-texts are obtained, in order to meet the requirements of the audience, the plurality of sub-texts can be screened to determine each target sub-text, so that the workload of the user for performing text processing on the target text is reduced, and further, the text processing efficiency is improved.
The target sub-texts may be all sub-texts in the target text, or may be partial sub-texts in the target text. The embodiment of the present invention is not particularly limited.
Therefore, after the first label of each to-be-selected fragment and the second label of each sub-text are determined, label matching can be performed on each first label and the second label of each target sub-text in the plurality of sub-texts, and each to-be-selected combination is obtained.
Because a plurality of video clips reflecting the same or similar scenes may appear in each to-be-selected clip, that is, one target sub-text may correspond to a plurality of to-be-selected clips reflecting the text content of the sub-text, each to-be-selected combination obtained includes at least one to-be-selected clip and one target sub-text.
The character tag and the event tag of the segment to be selected included in each combination to be selected are respectively matched with the character tag and the event tag of the target sub-text included in the combination to be selected.
For clarity, the manner in which the electronic device designates the step S103 is described in the following by way of example, and details are not described here, but the matching method described here is not limited to the matching method described below.
S104: adding target playing links of the fragments to be selected in the target combination to the target sub-texts in the target combination aiming at each target combination in each combination to be selected;
the target playing link is used for indicating that the video to be processed is played from the starting playing time of the segment to be selected in the target combination in the video to be processed.
After each to-be-selected combination is obtained based on tag matching, the to-be-selected segment in each to-be-selected combination can be used as the to-be-selected segment corresponding to the target sub-text in the to-be-selected combination.
In order to improve the accuracy and efficiency of video positioning, each target combination can be determined in each determined combination to be selected according to the requirement of the audience.
Optionally, if text contents that are not interested by the audience still exist in the text contents of the target sub-texts in each determined combination to be selected, after each combination to be selected is obtained, in order to meet the requirement of the audience, each combination to be selected may be screened, and each target combination is determined, so that the workload of text processing on the target text by the user is reduced, and further, the accuracy and efficiency of video positioning are improved.
Optionally, if the duration of the video of the to-be-selected segment in the to-be-selected combination is too short, the viewing experience of the audience is affected, and therefore, in order to improve the viewing experience of the audience, the preset duration may be preset. Therefore, if the video time length of the segment to be selected is not less than the preset time length, the combination to be selected to which the segment to be selected belongs can be determined as the target combination, so that the combination to be selected is screened again, the workload of text processing on the target text by a user is further reduced, and the accuracy and the efficiency of video positioning are further improved.
For example, the preset time period is 3 seconds. The to-be-selected combination Q comprises a to-be-selected segment A and a target sub-text Q, wherein the video time length of the to-be-selected segment A is 5 seconds, the segments are restricted to 3 and less than 5, and the video time length representing the to-be-selected segment A is smaller than the preset time length, so that the to-be-selected combination Q can be determined as a target combination.
In this way, for each target combination in the respective combinations to be selected, the electronic device may add, to the target sub-text in the target combination, a target play link, which is used for instructing to start playing the video to be processed from the start play time of the segment to be selected in the target combination in the video to be processed, for the segment to be selected in the target combination in the target text.
The target playing link may be presented in the form of a preset graphic icon, or in the form of a website link, and the like, which is not specifically limited in the embodiment of the present invention.
Optionally, for each target combination, a position associated with a target sub-text in the target combination is determined in the target text, for example, a position before a first word of the target sub-text, a position after a last word of the target sub-text, and the like, and certainly, other positions associated with the target sub-text may also be used, which is not limited in the embodiment of the present invention. In this way, the target playback link for the clip to be selected in the target combination can be added to the above-mentioned position.
In order to facilitate the user to clearly add the playing link to each target sub-text in the target text, re-rendering can be performed at each target sub-text added with the playing link. In this way, a user can directly position the segment to be selected in the target combination to which each target sub-text belongs in the video to be processed in a confirmation mode of clicking each target sub-text in the target text, clicking the preset graphic mark in the target text and the like, so that the video to be processed is played from the initial playing time of the segment to be selected in the target combination to which each target sub-text belongs in the video to be processed, and therefore, the video positioning efficiency is improved.
Exemplarily, as shown in fig. 2, a schematic diagram of an effect provided by the embodiment of the present invention is provided. Wherein 200 in fig. 2 is a preset graphic mark 200 of a play link of the 3 rd episode in the tv show shown in fig. 2; fig. 2 shows 203 a target sub-text 203 in the diversity introduction of the above set 3, and 201 a preset graphic mark 201 of a target playing link corresponding to the target sub-text 203; in fig. 2, 204 is the target sub-text 204 in the diversity introduction of the above-mentioned set 3, and 202 is the preset icon 202 of the target playing link corresponding to the target sub-text 204.
As can be seen from the above, according to the scheme provided by the embodiment of the present invention, the video to be processed is divided into a plurality of segments to be selected according to the video content of the video to be processed, and the first tag including the person tag and the event tag of each segment to be selected is determined based on the video content. Then, according to the text content of the target text for describing the video to be processed, the target text is divided into a plurality of sub-texts, and a second label including a character label and an event label of each sub-text is determined based on the text content. Therefore, label matching can be carried out on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts, and each combination to be selected is obtained. Each combination to be selected comprises at least one fragment to be selected and one target sub-text, and when the character label of the fragment to be selected in each combination to be selected is matched with the character label of the target sub-text, the event label of the fragment to be selected is matched with the event label of the target sub-text. Therefore, for each target combination in each to-be-selected combination, a target playing link, which is used for indicating that the to-be-selected segment in the target combination starts to play the to-be-processed video from the starting playing time of the to-be-selected segment in the to-be-selected combination in the to-be-processed video, can be added to the target sub-text in the target combination in the target text.
Based on this, by applying the scheme provided by the embodiment of the present invention, the text can be divided into a plurality of sub-texts according to the text content of the text for describing the video, and a play link of the video picture corresponding to the sub-text at the start play time in the complete video is added to each sub-text. Therefore, when the user is interested in the video content corresponding to the subfolders in the text, the video content corresponding to the subfolders interested by the user can be directly positioned in the complete video corresponding to the text content through the playing link corresponding to the subfolders, and the complete video is played from the initial playing time of the video content in the complete video, so that the video positioning efficiency is improved, and further, the video watching rate is improved.
Optionally, in a specific implementation manner, in the step S102, dividing the target text into a plurality of sub-texts according to the text content of the target text for describing the video to be processed, and determining the second tag of each sub-text, the following steps 1021 to 1023 may be included:
step 1021: dividing a target text for describing a video to be processed based on a preset separator to obtain each sentence in the target text;
step 1022: performing word segmentation processing on each sentence to obtain at least one keyword in the sentence;
step 1023: and sequentially carrying out keyword category identification on each keyword according to the sequence of the position of each keyword in the target text from front to back, dividing sentences to which the first keywords and the second keywords belong into a sub-text when each first keyword related to a character and each second keyword related to an event are identified, and respectively determining the first keywords and the second keywords as character tags and event tags of the sub-text to obtain second tags of the sub-texts.
In this specific implementation manner, after the target text for describing the video to be processed is obtained, the target text may be divided based on a predetermined delimiter, so as to obtain each sentence in the target text.
Wherein the predetermined separator may include ",". ","; ","? ","! "and at least one of the symbols used for segmenting the characters in the target text in various language characters, which is not limited in the embodiment of the present invention. For example, may be. "may be used as a separator, and" may be used. ","; ","? "and"! "all as separators, etc.
Firstly, detecting separators included in a target text, so that a first word in the target text to a word before the first separator can be determined as a first sentence in the target text; determining the characters from the last character to the character behind the last separator as the last sentence in the target text; and determining a character located between the two delimiters among characters except the first sentence and the last sentence as a sentence of the target text.
For example, the target text is "one month black and high at night for a killer, and the Chinese language's male and female brakes fall from day to day, which disturbs the stable day of the inn. Family hewn, young, sedentary G, father is a generation of swordsman, always covering her under shadow. From the small victory and the good-winning she, persevered to choose a way of going out from home and running through rivers and lakes, but at the first station, she is detained in the inn, and from then on, a hard and outmost miscellaneous life begins. ". Then "will". "as a separator in the target text, three sentences in the target text can be obtained, and the three sentences are respectively: the first sentence: "killing person night of a month with dark wind height, the male and female in the legend fall from day to day, and the stable days of the inn are disturbed. "; the second sentence: "family heuchy, young girl who is a generation of swordsman always covers her in the shade. "; the third sentence: "from the small victory and the good-winning she, persevered to choose a way to go out of home and go through rivers and lakes alone, but at the first station, she is detained in the inn, and from this point on, a hard and outmost miscellaneous life is started. ".
Thus, after each sentence included in the target text is determined, word segmentation processing can be performed on each sentence to obtain at least one keyword in the sentence.
Since the development of a story is determined by characters and events, the character tags and event tags are not available in a text corresponding to a more complete story line.
In this way, the keywords are sequentially subjected to keyword category identification according to the sequence of the positions of the keywords in the target text from front to back, and when a first keyword related to a person and a second keyword related to an event are identified, a sentence to which the first keyword and the second keyword belong can be divided into a sub-text, and the first keyword and the second keyword are respectively determined as a person tag and an event tag of the sub-text, so as to obtain a second tag of the sub-text.
For example, the target text P is divided into sentence 1, sentence 2, and sentence 3. The arrangement order of sentence 1, sentence 2 and sentence 3 in the target text is: sentence 1, sentence 2, and sentence 3. The sentence 1 includes a keyword p1 and a keyword p2, the sentence 2 includes a keyword p3, and the sentence 3 includes a keyword p4 and a keyword p5. Further, the keywords P1, P2, P3, P4, and P5 are arranged in order from the top to the bottom of the position of each keyword in the target text P, and the obtained keyword arrangement order is: keyword p1, keyword p2, keyword p3, keyword p4, and keyword p5.
Thus, upon identifying the keyword p1, the keyword is a first keyword regarding a person; when the keyword p2 is identified, the keyword is a first keyword related to a person; when keyword p3 is identified, the keyword is a second keyword related to the event. Therefore, sentences 1 and 2 to which the keywords p1, p2 and p3 belong may be divided into the sub text 1, and the keywords p1, p2 and p3 may be determined as the second tags of the sub text 1.
Continuing to identify, and when a keyword p4 is identified, the keyword is a first keyword related to the person; when keyword p5 is identified, the keyword is a second keyword related to the event. Therefore, it is possible to divide the sentence 3 to which the keyword p4 and the keyword p5 belong into the sub-text 2, and determine the keyword p4 and the keyword p5 as the second tag of the sub-text 1.
Since a plurality of video clips reflecting the same or similar scenes may appear in each to-be-selected clip, that is, one sub-document may correspond to a plurality of to-be-selected clips reflecting the text content of the sub-document, each to-be-selected combination obtained includes at least one to-be-selected clip and one sub-text.
Based on this, optionally, in a specific implementation manner, in the step S103, performing label matching on the first label of each to-be-selected fragment and the second label of each target sub-text in the multiple sub-texts to obtain each to-be-selected combination, may include the following steps 1031 to 1032:
step 1031: aiming at each target sub-text, calculating a matching degree group of the target sub-text and each segment to be selected;
wherein each matching degree group comprises: the character label and the event label of the target sub-text are respectively matched with the character label and the event label of a segment to be selected by a first matching degree and a second matching degree.
In this specific implementation manner, after the first tag of each to-be-selected segment and the second tag of each target sub-text in the plurality of sub-texts are determined, a matching degree group related to each target sub-text and each to-be-selected segment may be calculated for each target sub-text.
Since the first tag and the second tag both include a character tag and an event tag, after the first tag of each segment to be selected and the second tag of each target sub-text are determined, for each target sub-text, a first matching degree between the character tag of the target sub-text and the character tag of each segment to be selected can be calculated, and a second matching degree between the event tag of the target sub-text and the event tag in each segment to be selected can be calculated. Thus, for each sub-text, a matching degree group about the target sub-text and each segment to be selected can be obtained.
Optionally, in a specific implementation manner, the step 1031 may include the following steps:
for each target sub-text, the following steps 11-13 are performed:
step 11: determining a first number of keywords included in a character tag of the target sub-text and a second number of keywords included in an event tag of the sub-text;
step 12: determining a third number of matched keywords in the character tag of the target sub-text and the character tag of the segment to be selected and a fourth number of matched keywords in the event tag of the target sub-text and the event tag of the segment to be selected for each segment to be selected;
step 13: calculating the ratio of the third quantity to the target sum value for each segment to be selected, and taking the ratio as the first matching degree of the character label of the target sub-text and the character label of the segment to be selected; calculating the ratio of the fourth quantity to the target sum value as a second matching degree of the event label of the target sub-text and the event label of the to-be-selected segment; wherein the target sum value is a sum of the first number and the second number.
In this specific implementation manner, after the first tag of each segment to be selected and the second tag of each target sub-text in the multiple sub-texts are determined, for each target sub-text, a first number of keywords included in a character tag of the target sub-text and a second number of keywords included in an event tag of the target sub-text may be determined. Then, for each segment to be selected, a third number of matching keywords in the character tag of the target sub-text and the character tag of the segment to be selected, and a fourth number of matching keywords in the event tag of the target sub-text and the event tag of the segment to be selected may be determined.
In this way, the sum of the first number and the second number may be determined as the target sum.
Then, for each segment to be selected, calculating a ratio of the third number to the target sum as a first matching degree of the character tag of the target sub-text and the character tag of the segment to be selected, and calculating a ratio of the fourth number to the target sum as a second matching degree of the event tag of the target sub-text and the event tag of the segment to be selected.
Since one target sub-text may correspond to a plurality of segments to be selected that reflect the text content of the target sub-text, each matching degree group includes: the character label and the event label of the target sub-text are respectively matched with the character label and the event label of a segment to be selected by a first matching degree and a second matching degree.
Step 1032: and aiming at each target sub-text, calculating the label matching degree of the target sub-text and each segment to be selected based on each matching degree group, and determining the target sub-text and the segment to be selected, the label matching degree of which with the target sub-text reaches a preset matching degree, as a combination to be selected to obtain each combination to be selected.
In this specific implementation manner, after the matching degree groups of the target sub-text and each to-be-selected segment are obtained, the label matching degree of the target sub-text and each to-be-selected segment can be calculated for each target sub-text based on each matching degree group obtained through calculation.
Since one target sub-text may correspond to a plurality of to-be-selected segments reflecting the text content of the target sub-text, in order to improve the working efficiency of the electronic device, the matching degree may be preset, so that for each target sub-text, after the tag matching degree of the target sub-text and each to-be-selected segment is obtained, it may be determined whether each tag matching degree of the target sub-text satisfies the preset matching degree.
If the to-be-selected segment meeting the preset matching degree exists in each label matching degree of the target sub-text, the probability of representing that the target sub-text is matched with the to-be-selected segment is high, and therefore the target sub-text and the to-be-selected segment meeting the preset matching degree with the label matching degree of the target sub-text can be determined to be the to-be-selected combination, and each to-be-selected combination is obtained.
If the to-be-selected segment meeting the preset matching degree does not exist in the matching degrees of the labels of the target sub-text, the probability of representing the matching of the target sub-text and the to-be-selected segment is low, and therefore the target sub-text and the to-be-selected segment cannot be determined as the to-be-selected combination.
Therefore, in a plurality of fragments to be selected which may correspond to a target sub-text and reflect the text content of the target sub-text, the combination of the target sub-text and the fragments to be selected, which has high possibility of matching the target sub-text with each fragment to be selected, is determined as the combination to be selected.
Optionally, for each target sub-text, the following formula may be used to calculate the tag matching degree between the target sub-text and each segment to be selected; wherein the formula is:
Figure BDA0003859887470000201
wherein, s1 i Characterizing the ith target sub-text, s2 j Characterizing the jth segment to be selected, M ij (s1 i ,s2 j ) Representing the matching degree of the label of the ith target sub-text and the jth segment to be selected, n representing the number of label items included by the first label and the second label, w k The weight m of the kth label in the first label representing the ith target sub-text and the second label of the jth segment to be selected ik (s1 i ,s2 j ) Representing the matching degree of a kth item label in a first label of an ith target sub-text and a kth item label in a second label of a jth segment to be selected;
and the number of the first and second electrodes,
Figure BDA0003859887470000211
it should be noted that the value range of k is determined by the number n of label items included in the first label and the second label, that is, when the first label and the second label include a person label and an event label, n =2, the value range of k is 1 and 2; when the first tag and the second tag include a person tag, an event tag, a place tag, and a time tag, n =4,k has a value ranging from 1, 2, 3, and 4. The embodiment of the present invention is not particularly limited.
In order to facilitate understanding of the above embodiments, the following specifically describes the person tag as a first item tag included in the first tag and the second tag, and the event tag as a second item tag included in the first tag and the second tag.
When k =1, the first label and the first item label included in the first label are person labels; when k =2, the first tag and the second item tag included in the first tag are event tags.
At this time, when k =1, a first matching degree m of a character tag in the first tag of the ith target sub-text and a character tag in the second tag of the jth segment to be selected i1 (s1 i ,s2 j ) Comprises the following steps:
Figure BDA0003859887470000212
when k =2, a second matching degree m of an event label in the first label of the ith target sub-text and an event label in the second label of the jth segment to be selected i2 (s1 i ,s2 j ) Comprises the following steps:
Figure BDA0003859887470000213
in this way, for each target sub-text, the following formula can be used to calculate the label matching degree of the target sub-text and each to-be-selected region segment; wherein the formula is:
M ij (s1 i ,s2 j )=w 1 *m i1 (s1 i ,s2 j )+w 2 *m i2 (s1 i ,s2 j )
wherein M is ij (s1 i ,s2 j ) Representing the matching degree of the label of the ith target sub text and the jth segment to be selected, w 1 The weights of character labels in a first label representing the ith target sub-text and a second label representing the jth segment to be selected; w is a 2 And the weights of event labels in the first label representing the ith target sub-text and the second label representing the jth segment to be selected.
In order to improve the use experience of the user, a target playing link about a to-be-selected segment in each target combination can be added to the target sub-text in the target combination. Therefore, a user can directly position the segment to be selected in the target combination in the video to be processed through the target playing link, so that the video to be processed is played from the initial playing time of the segment to be selected in the segment to be selected to which each target sub-text belongs in the video to be processed, the video positioning efficiency is improved, and the video watching rate is improved.
Therefore, optionally, in a specific implementation manner, the step S104: for each target combination in each to-be-selected combination, adding a target playing link for a to-be-selected segment in the target combination to a target sub-text in the target combination in a target text, which may include the following steps:
for each target combination, the following step 21 is performed:
step 21: and if the target combination comprises a segment to be selected, adding a target playing link related to the segment to be selected in the target combination to the target sub-text in the target combination in the target text.
In this specific implementation manner, for each target combination, if the target combination includes one to-be-selected segment, it is characterized that one to-be-selected segment that is matched with the target sub-text exists in the target combination, and thus, a target play link related to the to-be-selected segment in the target combination may be added to the target sub-text in the target combination in the target text.
In addition, in each to-be-selected segment, a plurality of video segments reflecting the same or similar scenes may appear, that is, one target sub-text may correspond to a plurality of to-be-selected segments reflecting the text content of the target sub-text. Therefore, the target segment can be selected from the segments to be selected in the preset selection mode, so that the matching accuracy of the target sub-text and the segments to be selected can be improved, the video positioning efficiency is improved, and the video watching rate is further improved.
Based on this, for each target combination, the following step 22 is performed:
step 22: if the target combination comprises a plurality of fragments to be selected, selecting the target fragments from the plurality of fragments to be selected in the target combination based on a preset selection mode, and adding appointed playing links related to the target fragments to target subfiles in the target combination in a target text; the appointed playing link is used for indicating that the video to be processed is played from the initial playing time of the target segment in the video to be processed.
In this specific implementation manner, for each target combination, if the target combination includes multiple to-be-selected segments, it is characterized that multiple to-be-selected segments matched with the target subfile exist in the target combination, so that, based on a preset selection manner, a target segment is selected from the multiple to-be-selected segments included in the target combination, and a specified play link related to the target segment is added to the target subfile in the target combination in the target text, so that a user can start playing the to-be-processed video from an initial play time of the target segment in the to-be-processed video, which is indicated by the specified play link.
Optionally, in a specific implementation manner, in the step 22, based on a preset selection manner, selecting a target segment from a plurality of segments to be selected included in the target combination may include the following step 31:
step 31: and acquiring a selection instruction, and determining the to-be-selected segment indicated by the selection instruction as a target segment.
In this specific implementation manner, for each target combination, if the target combination includes a plurality of segments to be selected, the segments to be selected indicated by the selection instruction may be determined as the target segments based on the obtained selection instruction.
For example, the to-be-selected combination Q includes a sub-text a, a to-be-selected segment B, and a to-be-selected segment C. And determining the segment A to be selected indicated by the selection instruction as a target segment based on the acquired selection instruction.
Optionally, a selection instruction is issued to the electronic device based on the click operation of the user, so that the electronic device obtains the selection instruction.
Optionally, in a specific implementation manner, in the step 22, based on a preset selection manner, selecting a target segment from a plurality of segments to be selected included in the target combination may include the following step 32:
step 32: and determining the to-be-selected segment with the maximum label matching degree with the target sub-text included in the target combination as the target segment from the plurality of to-be-selected segments included in the target combination.
In this specific implementation manner, for each target combination, if the target combination includes multiple to-be-selected segments, the to-be-selected segment with the largest tag matching degree with the target sub-text included in the target combination, among the multiple to-be-selected segments included in the target combination, may be determined as the target segment.
For example, the target combination Q includes a target sub-text a, a segment to be selected B, and a segment to be selected C. The matching degree of the target sub-text a and the to-be-selected segment A is 90, the matching degree of the target sub-text a and the to-be-selected segment B is 88, and the matching degree of the target sub-text a and the to-be-selected segment C is 80. The target sub text a is matched with the labels of the segment A to be selected, the segment B to be selected and the segment C to be selected: 90>88>80, therefore, the segment A to be selected can be determined as the target segment.
Corresponding to the text processing method provided by the embodiment of the invention, the embodiment of the invention also provides a video processing method.
Fig. 3 is a flowchart illustrating a video processing method according to an embodiment of the present invention, and as shown in fig. 3, the method may include the following steps S301 to S302.
S301: determining a target text;
each target sub text in the target text is associated with a target playing link, and the target playing link associated with each target sub text is added based on the text processing method provided by the embodiment of the invention.
With the showing of tv series or movies with various titles, the video client provides a function of viewing the introduction of the drama so that the user can roughly understand the contents of the drama. In addition, in order to facilitate the user to view the video content corresponding to the text content of interest, a target playing link associated with each target sub-text may be added to each target sub-text of the target text to which the text content belongs by using the text processing method provided in the embodiment of the present invention.
In this way, when playing a video, first, the target text to which the text content of interest belongs can be determined.
S302: when the selection operation of the target playing link associated with the specified sub-text in each target sub-text is detected, jumping to a playing interface of the target video described by the target text, and starting playing the target video from the moment indicated by the target playing link associated with the specified sub-text.
After determining the target text, the electronic device may detect whether there is a selection operation for a target playback link associated with a specific sub-text in each target sub-text.
Therefore, when the electronic equipment detects the selection operation of the target playing link associated with the specified sub-text in each target sub-text, the electronic equipment can jump to the playing interface of the target video described by the target text and start playing the playing video at the moment indicated by the target playing link associated with the specified sub-text, so that the positioning of the video content corresponding to the specified sub-text is completed, the video positioning efficiency is improved, and the video watching rate is improved.
Based on this, by applying the scheme provided by the embodiment of the present invention, when the user is interested in the video content corresponding to the sub-text in the text, the video content corresponding to the sub-text which is interested in the user can be directly located in the complete video corresponding to the text content through the playing link corresponding to the sub-text, and the complete video is played from the initial playing time of the video content in the complete video, so that the video locating efficiency is improved, and further, the viewing rate of the video is improved.
Corresponding to the text processing method provided in the embodiment of the present invention, an embodiment of the present invention further provides a text processing apparatus.
Fig. 4 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes the following modules:
the first tag determining module 410 is configured to divide a video to be processed into a plurality of segments to be selected according to video content of the video to be processed, and determine a first tag of each segment to be selected; wherein the first tag comprises: a person tag and an event tag;
a second tag determining module 420, configured to divide the target text into multiple sub-texts according to text content of the target text for describing the video to be processed, and determine a second tag of each sub-text; wherein the second tag comprises: person tags and event tags;
a to-be-selected combination determining module 430, configured to perform label matching on the first label of each to-be-selected segment and the second label of each target sub-text in the multiple sub-texts, to obtain each to-be-selected combination; each combination to be selected comprises at least one fragment to be selected and one target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text which are included in the combination to be selected;
a link adding module 440, configured to add, in the target text, a target playing link for the to-be-selected segment in the target combination for the target sub-text in the target combination, for each target combination in the to-be-selected combinations; the target playing link is used for indicating that the video to be processed is played from the initial playing time of the segment to be selected in the target combination in the video to be processed.
As can be seen from the above, according to the scheme provided by the embodiment of the present invention, firstly, according to the video content of the video to be processed, the video to be processed is divided into a plurality of segments to be selected, and based on the video content, the first tag including the person tag and the event tag of each segment to be selected is determined. Then, according to the text content of the target text for describing the video to be processed, the target text is divided into a plurality of sub-texts, and a second label including a character label and an event label of each sub-text is determined based on the text content. Therefore, label matching can be carried out on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts, and each combination to be selected is obtained. Each combination to be selected comprises at least one fragment to be selected and one target sub-text, and when the character tags of the fragments to be selected and the character tags of the target sub-texts in each combination to be selected are matched, the event tags of the fragments to be selected and the event tags of the target sub-texts are matched. Therefore, for each target combination in each to-be-selected combination, a target playing link, which is used for indicating that the to-be-selected segment in the target combination starts to play the to-be-processed video from the starting playing time of the to-be-selected segment in the to-be-selected combination in the to-be-processed video, can be added to the target sub-text in the target combination in the target text.
Based on this, by applying the scheme provided by the embodiment of the present invention, the text can be divided into a plurality of sub-texts according to the text content of the text for describing the video, and a play link of the video picture corresponding to the sub-text at the start play time in the complete video is added to each sub-text. Therefore, when the user is interested in the video content corresponding to the sub-text in the text, the video content corresponding to the sub-text which is interested by the user can be directly positioned in the complete video corresponding to the text content through the playing link corresponding to the sub-text, and the complete video is played from the initial playing time of the video content in the complete video, so that the video positioning efficiency is improved, and further, the video watching rate is improved.
Optionally, in a specific implementation manner, the second tag determining module 420 is specifically configured to:
dividing a target text for describing the video to be processed based on a preset separator to obtain each sentence in the target text;
performing word segmentation processing on each sentence to obtain at least one keyword in the sentence;
and sequentially carrying out keyword category identification on each keyword according to the sequence of the position of each keyword in the target text from front to back, dividing sentences to which the first keywords and the second keywords belong into one sub-text when each first keyword related to a character and each second keyword related to an event are identified, and respectively determining the first keywords and the second keywords as character tags and event tags of the sub-texts to obtain second tags of the sub-texts.
Optionally, in a specific implementation manner, the to-be-selected combination determining module 430 includes:
the calculation sub-module is used for calculating a matching degree group of each target sub-text and each segment to be selected aiming at each target sub-text; wherein each matching degree group comprises: the character label and the event label of the target sub-text are respectively matched with the character label and the event label of a segment to be selected by a first matching degree and a second matching degree;
and the to-be-selected combination determining submodule is used for calculating the matching degree of the target sub-text and each to-be-selected fragment based on each matching degree group aiming at each target sub-text, and determining the target sub-text and the to-be-selected fragment of which the matching degree with the label of the target sub-text reaches the preset matching degree as the to-be-selected combination to obtain each to-be-selected combination.
Optionally, in a specific implementation manner, the calculating submodule is specifically configured to:
for each target sub-text, the following steps are performed:
determining a first number of keywords included in the character tag of the target sub text and a second number of keywords included in the event tag of the sub text;
determining a third number of matched keywords in the character tag of the target sub-text and the character tag of the segment to be selected and a fourth number of matched keywords in the event tag of the target sub-text and the event tag of the segment to be selected for each segment to be selected;
calculating the ratio of the third quantity to a target sum value for each segment to be selected, and taking the ratio as a first matching degree of the character label of the target sub-text and the character label of the segment to be selected; calculating the ratio of the fourth quantity to the target sum value as a second matching degree of the event label of the target sub-text and the event label of the segment to be selected; wherein the target sum value is a sum of the first number and the second number.
Optionally, in a specific implementation manner, the to-be-selected combination determining module 430 includes a selecting submodule, and the to-be-selected combination determining module 430 is specifically configured to:
for each target combination, the following operations are performed:
if the target combination comprises a segment to be selected, triggering the link adding module 440;
if the target combination comprises a plurality of fragments to be selected, triggering the selection submodule;
the selection submodule is used for selecting a target fragment from a plurality of fragments to be selected included in the target combination based on a preset selection mode, and adding a specified playing link related to the target fragment to a target sub-text in the target combination in the target text; the appointed playing link is used for indicating that the video to be processed is played from the starting playing time of the target segment in the video to be processed.
Optionally, in a specific implementation manner, the selecting unit is specifically configured to:
acquiring a selection instruction, and determining a segment to be selected indicated by the selection instruction as a target segment; or determining the to-be-selected segment with the maximum label matching degree with the target sub-text included in the target combination as the target segment from the to-be-selected segments included in the target combination.
Corresponding to the video processing method provided by the embodiment of the invention, the embodiment of the invention also provides a video processing device.
Fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes the following modules:
a text determination module 510 for determining a target text; each target sub text in the target text is associated with a target playing link, and the target playing link associated with each target sub text is added based on the text processing method provided by the first aspect;
the video playing module 520 is configured to jump to a playing interface of the target video described in the target text and start playing the target video from a time indicated by the target playing link associated with the specified sub-text when a selection operation for the target playing link associated with the specified sub-text in each target sub-text is detected.
Based on this, by applying the scheme provided by the embodiment of the present invention, when the user is interested in the video content corresponding to the sub-text in the text, the video content corresponding to the sub-text which is interested by the user can be directly located in the complete video corresponding to the text content through the playing link corresponding to the sub-text, and the complete video is played from the initial playing time of the video content in the complete video, so that the video locating efficiency is improved, and further, the viewing rate of the video is improved.
Corresponding to the text processing method provided by the above embodiment of the present invention, an embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the steps of any of the text processing methods provided in the embodiments of the present invention when executing the program stored in the memory 603.
Corresponding to the video processing method provided by the above embodiment of the present invention, an embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702 and the memory 703 complete mutual communication through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the steps of any of the video processing methods provided in the embodiments of the present invention when executing the program stored in the memory 703.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the text processing methods in the above embodiments.
In yet another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the video processing methods in the above embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of any of the text processing methods of the above embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of any of the video processing methods of the above embodiments.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described with relative simplicity as they are substantially similar to method embodiments, where relevant only as described in portions of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (13)

1. A method of text processing, the method comprising:
dividing a video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed, and determining a first label of each fragment to be selected; wherein the first tag comprises: person tags and event tags;
dividing the target text into a plurality of sub-texts according to the text content of the target text for describing the video to be processed, and determining a second label of each sub-text; wherein the second tag comprises: person tags and event tags;
performing label matching on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts to obtain each combination to be selected; each combination to be selected comprises at least one fragment to be selected and a target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text included in the combination to be selected;
adding a target playing link of a segment to be selected in the target combination to the target sub-text in the target combination aiming at each target combination in the combinations to be selected; the target playing link is used for indicating that the video to be processed is played from the initial playing time of the segment to be selected in the target combination in the video to be processed.
2. The method according to claim 1, wherein the dividing the target text into a plurality of sub-texts according to the text content of the target text for describing the video to be processed and determining the second label of each sub-text comprises:
dividing a target text for describing the video to be processed based on a preset separator to obtain each sentence in the target text;
performing word segmentation processing on each sentence to obtain at least one keyword in the sentence;
and sequentially carrying out keyword category identification on each keyword according to the sequence of the position of each keyword in the target text from front to back, dividing sentences to which the first keywords and the second keywords belong into one sub-text when each first keyword related to a character and each second keyword related to an event are identified, and respectively determining the first keywords and the second keywords as character tags and event tags of the sub-texts to obtain second tags of the sub-texts.
3. The method according to claim 1, wherein the performing label matching on the first label of each segment to be selected and the second label of each target sub-text in the plurality of sub-texts to obtain each combination to be selected comprises:
aiming at each target sub-text, calculating a matching degree group of the target sub-text and each segment to be selected; wherein each matching degree group comprises: the character label and the event label of the target sub-text are respectively matched with the character label and the event label of a segment to be selected by a first matching degree and a second matching degree;
and aiming at each target sub-text, calculating the label matching degree of the target sub-text and each segment to be selected based on each matching degree group, and determining the target sub-text and the segment to be selected, of which the label matching degree with the target sub-text reaches a preset matching degree, as a combination to be selected to obtain each combination to be selected.
4. The method according to claim 3, wherein the calculating, for each target sub-text, a matching degree group for the target sub-text and each segment to be selected comprises:
for each target sub-text, the following steps are performed:
determining a first number of keywords included in the character tag of the target sub text and a second number of keywords included in the event tag of the sub text;
determining a third number of matched keywords in the character tag of the target sub-text and the character tag of the segment to be selected and a fourth number of matched keywords in the event tag of the target sub-text and the event tag of the segment to be selected for each segment to be selected;
calculating the ratio of the third quantity to a target sum value for each segment to be selected, and taking the ratio as a first matching degree of the character label of the target sub-text and the character label of the segment to be selected; calculating the ratio of the fourth quantity to the target sum value as a second matching degree of the event label of the target sub-text and the event label of the to-be-selected segment; wherein the target sum value is a sum of the first number and the second number.
5. The method according to claim 3, wherein for each target combination in the respective combinations to be selected, adding a target playing link for the target sub-text in the target combination to the target sub-text in the target combination, including:
for each target combination, the following operations are performed:
if the target combination comprises a segment to be selected, adding a target playing link related to the segment to be selected in the target combination to a target sub-text in the target combination in the target text;
if the target combination comprises a plurality of fragments to be selected, selecting the target fragments from the plurality of fragments to be selected included in the target combination based on a preset selection mode, and adding appointed playing links related to the target fragments to target sub texts in the target combination in the target texts; the appointed playing link is used for indicating that the video to be processed is played from the starting playing time of the target segment in the video to be processed.
6. The method according to claim 5, wherein selecting the target segment from the plurality of segments to be selected included in the target combination based on a preset selection manner comprises:
acquiring a selection instruction, and determining a segment to be selected indicated by the selection instruction as a target segment; alternatively, the first and second electrodes may be,
and determining the to-be-selected segment with the maximum label matching degree with the target sub-text included in the target combination as the target segment from the plurality of to-be-selected segments included in the target combination.
7. A method of video processing, the method comprising:
determining a target text; wherein each target sub-text in the target text is associated with a target play link, and the target play link associated with each target sub-text is added based on the text processing method of any one of claims 1 to 6;
when the selection operation of a target playing link associated with a specific sub-text in each target sub-text is detected, jumping to a playing interface of a target video described by the target text, and starting playing the target video from a moment indicated by the target playing link associated with the specific sub-text.
8. A text processing apparatus, characterized in that the apparatus comprises:
the first label determining module is used for dividing the video to be processed into a plurality of fragments to be selected according to the video content of the video to be processed and determining a first label of each fragment to be selected; wherein the first tag comprises: person tags and event tags;
the second label determining module is used for dividing the target text into a plurality of sub texts according to the text content of the target text for describing the video to be processed and determining a second label of each sub text; wherein the second tag comprises: person tags and event tags;
the to-be-selected combination determining module is used for performing label matching on the first label of each to-be-selected fragment and the second label of each target sub-text in the plurality of sub-texts to obtain each to-be-selected combination; each combination to be selected comprises at least one fragment to be selected and one target sub-text, and the character tag and the event tag of the fragment to be selected, which are included in each combination to be selected, are respectively matched with the character tag and the event tag of the target sub-text which are included in the combination to be selected;
a link adding module, configured to add, for each target combination in the to-be-selected combinations, a target play link for a to-be-selected segment in the target combination to a target sub-text in the target combination in the target text; the target playing link is used for indicating that the video to be processed is played from the initial playing time of the segment to be selected in the target combination in the video to be processed.
9. A video processing apparatus, characterized in that the apparatus comprises:
the text determination module is used for determining a target text; wherein each target sub-text in the target text is associated with a target play link, and the target play link associated with each target sub-text is added based on the text processing method of any one of claims 1 to 6;
and the video processing module is used for jumping to a playing interface of the target video described by the target text when detecting the selection operation of the target playing link associated with the specified sub-text in each target sub-text, and playing the target video from the moment indicated by the target playing link associated with the specified sub-text.
10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of claim 7 when executing a program stored in the memory.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.
13. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of claim 7.
CN202211158473.5A 2022-09-22 2022-09-22 Text processing method, video processing method, device and electronic equipment Pending CN115408565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211158473.5A CN115408565A (en) 2022-09-22 2022-09-22 Text processing method, video processing method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211158473.5A CN115408565A (en) 2022-09-22 2022-09-22 Text processing method, video processing method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115408565A true CN115408565A (en) 2022-11-29

Family

ID=84165127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211158473.5A Pending CN115408565A (en) 2022-09-22 2022-09-22 Text processing method, video processing method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115408565A (en)

Similar Documents

Publication Publication Date Title
US9202523B2 (en) Method and apparatus for providing information related to broadcast programs
US9438850B2 (en) Determining importance of scenes based upon closed captioning data
CN106331778B (en) Video recommendation method and device
KR101944469B1 (en) Estimating and displaying social interest in time-based media
US9008489B2 (en) Keyword-tagging of scenes of interest within video content
CN104798346B (en) For supplementing the method and computing system of electronic information relevant to broadcast medium
CN106326391B (en) Multimedia resource recommendation method and device
US20130047097A1 (en) Methods, systems, and computer program products for displaying tag words for selection by users engaged in social tagging of content
WO2018177139A1 (en) Method and apparatus for generating video abstract, server and storage medium
US20140082663A1 (en) Methods for Identifying Video Segments and Displaying Contextually Targeted Content on a Connected Television
CN113079417B (en) Method, device and equipment for generating bullet screen and storage medium
KR101916874B1 (en) Apparatus, method for auto generating a title of video contents, and computer readable recording medium
CN111274442B (en) Method for determining video tag, server and storage medium
AU2009345119A1 (en) Hierarchical tags with community-based ratings
Tapaswi et al. Aligning plot synopses to videos for story-based retrieval
CN110287375B (en) Method and device for determining video tag and server
CN110347866B (en) Information processing method, information processing device, storage medium and electronic equipment
CN108595679B (en) Label determining method, device, terminal and storage medium
CN111314732A (en) Method for determining video label, server and storage medium
CN111708909A (en) Video tag adding method and device, electronic equipment and computer-readable storage medium
US20140258472A1 (en) Video Annotation Navigation
CN114845149B (en) Video clip method, video recommendation method, device, equipment and medium
TWI709905B (en) Data analysis method and data analysis system thereof
CN115408565A (en) Text processing method, video processing method, device and electronic equipment
CN110942070B (en) Content display method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination