CN112804580A - Video dotting method and device - Google Patents

Video dotting method and device Download PDF

Info

Publication number
CN112804580A
CN112804580A CN202011622535.4A CN202011622535A CN112804580A CN 112804580 A CN112804580 A CN 112804580A CN 202011622535 A CN202011622535 A CN 202011622535A CN 112804580 A CN112804580 A CN 112804580A
Authority
CN
China
Prior art keywords
text
paragraph
sentence
determining
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011622535.4A
Other languages
Chinese (zh)
Other versions
CN112804580B (en
Inventor
董嘉文
林轩
徐文强
陈龑豪
张可尊
吕毅
王国强
王宁
张赣
彭婷
李警卫
彭业飞
肖永龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202011622535.4A priority Critical patent/CN112804580B/en
Publication of CN112804580A publication Critical patent/CN112804580A/en
Application granted granted Critical
Publication of CN112804580B publication Critical patent/CN112804580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification discloses a method and a device for video dotting. A method of video dotting, comprising: extracting audio from a target video to be clicked, and converting the audio into a corresponding text; dividing the text into a plurality of text paragraphs; determining paragraph titles of the text paragraphs; determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.

Description

Video dotting method and device
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a method and an apparatus for dotting a video.
Background
In the related art, the video can be dotted, so that the user can quickly locate the content which the user wants to watch based on the dotted result directly when watching the video. At present, operators generally carry out dotting manually, but the efficiency is very low by adopting the method.
Disclosure of Invention
In view of the above, the present specification provides a method and an apparatus for video dotting.
Specifically, the description is realized by the following technical scheme:
a method of video dotting, comprising:
extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
dividing the text into a plurality of text paragraphs;
determining paragraph titles of the text paragraphs;
determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.
An apparatus for video dotting, comprising:
the text acquisition unit extracts audio from a target video to be doted and converts the audio into a corresponding text;
the text dividing unit is used for dividing the text into a plurality of text paragraphs;
a title determination unit that determines a paragraph title of each text paragraph;
and the video dotting unit is used for determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize the dotting of the target video.
An apparatus for video dotting, comprising:
a processor;
a memory for storing machine executable instructions;
wherein, by reading and executing machine-executable instructions stored by the memory that correspond to video dotting logic, the processor is caused to:
extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
dividing the text into a plurality of text paragraphs;
determining paragraph titles of the text paragraphs;
determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.
One embodiment of the present specification realizes that an audio can be extracted from a target video, the audio is converted into a text, the text is divided into a plurality of text paragraphs, a paragraph title of each text paragraph is determined, a dotting position is determined based on each text paragraph, the target video is divided into a plurality of video segments according to the dotting position, and a paragraph title corresponding to the text paragraph is added to each video segment, thereby realizing automatic dotting of the target video.
By adopting the method, the target video can be automatically dotted, the whole process does not need manual operation, and the dotting efficiency can be greatly improved. In addition, the method obtains the text paragraphs based on the semantic division of the text, and then performs dotting based on the text paragraphs, so that the dotting accuracy can be improved.
Drawings
FIG. 1 is a flow chart diagram illustrating a method of video dotting in an exemplary embodiment of the present description;
FIG. 2 is a flowchart illustrating a method for splitting a text passage according to an exemplary embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for merging paragraphs of text in accordance with an exemplary embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating a method for determining a section heading according to an exemplary embodiment of the present disclosure;
FIG. 5 is a schematic diagram of another video dotting method shown in an exemplary embodiment of the present description;
FIG. 6 is a schematic diagram of a page showing a point result to a user as exemplary shown in the present specification;
FIG. 7 is a hardware block diagram of a server in which a video dotting apparatus according to an exemplary embodiment of the present disclosure is located;
fig. 8 is a block diagram of a video dotting apparatus according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
With the development of technology, various resources such as movies, television dramas, news, broadcasting, games, education, and the like can be displayed and shared in the form of videos, and the videos become an indispensable part of the life of people. At present, in order to improve the experience of a user watching a video, the video can be dotted, that is, a playing time period of some key contents in the video is marked and corresponding content introduction is added, so that the user can quickly locate the content that the user wants to watch directly based on a dotted result when watching the video.
However, the existing dotting method is generally operated manually, an operator needs to watch a video manually, drag the time axis back and forth for many times to find the playing time to be marked, and then add the corresponding content introduction, which not only wastes time and has low efficiency, but also may omit some key video contents in the process of dragging the time axis, and the dotting position accuracy is not high enough, and cannot meet the actual requirement.
The specification provides a video dotting method, which can determine video clips with different themes and automatically determine titles of the video clips based on text paragraphs obtained by video conversion, does not need manual operation in the whole process, can greatly improve dotting efficiency, can improve dotting accuracy and avoids omission of key video contents.
Referring to fig. 1, fig. 1 is a flowchart illustrating a video dotting method according to an exemplary embodiment of the present disclosure.
The method can be applied to an electronic device with a memory and a processor, such as a server or a server cluster. The method comprises the following steps:
102, extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
step 104, dividing the text into a plurality of text paragraphs;
step 106, determining paragraph titles of each text paragraph;
and 108, determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize the dotting of the target video.
The above-described process is explained below.
In this embodiment, a target video to be dotted may be obtained first, where the target video may be various types of videos such as a movie, a television play, a broadcast video, a news video, and a game video, and the description does not specially limit video content and video duration.
The audio may be extracted from the target video, and a specific method may refer to the related art, which is not described herein too much. The audio may then be converted into corresponding text, for example, the Speech may be converted into corresponding text by ASR (Automatic Speech Recognition).
And, since the text is converted from speech, each word in the text can carry a timestamp corresponding to the playing time of the target video. For example, if a sentence is "do you", then the timestamp of "you" may be 0 seconds (meaning the time when the target video plays to 0 seconds, the same applies below), "good" may be 0.1 seconds, and "do" may be 0.2 seconds. In fact, this example is only an illustrative illustration for easy understanding, and in practical applications, the time stamp may be more precise, such as obtaining the time stamp of each initial and final, such as obtaining the time stamp of "n" and "i" for "you". On the basis, timestamps with different accuracies can be obtained, such as a timestamp of a word, a timestamp of a sentence, a timestamp of a paragraph, and so on, for example, the timestamp of a sentence may be a timestamp of a first initial or final of a first word in the sentence, which is not exemplified herein. The target video may then be dotted based on these timestamps.
In this embodiment, the text may be divided based on an NLP (Natural Language Processing) technique, and then the sentences may be divided into several text paragraphs based on semantics, so that the sentences belonging to the same topic are divided into the same text paragraph. For example, if a news video includes 5 parts, i.e., a host introduction, news 1 content, news 2 content, news 3 content, and a host summary, the text may be divided into 5 text paragraphs corresponding to the 5 parts after the news video is converted into text. Of course, this example is only a schematic description, and it is not known in advance which subjects the target video contains when the text paragraphs are actually divided.
In this embodiment, after the text paragraphs obtained by the division, the paragraph titles of the text paragraphs may also be determined. For example, a keyword for each text paragraph may be extracted, and the paragraph title may be determined based on the keyword. The specific method will be described in detail in the examples below.
After the text paragraphs and the corresponding paragraph titles are determined, the playing time period corresponding to each text paragraph can be obtained. For example, a timestamp of the first word and a timestamp of the last word in each text paragraph may be obtained, and a playing time period corresponding to the text paragraph may be determined based on the timestamps. And then, the dotting position of the target video can be determined according to the playing time period of each text paragraph, so that the target video is dotted according to the dotting position, and a corresponding paragraph title is added to each video segment marked by the dotting position, thereby realizing the automatic dotting of the target video.
It should be noted that although the step 108 describes "dividing the target video into several video segments", the division is logical, and the video segments are not really split from the target video but only marked with playing moments according to which the video segments are divided. Of course, in other embodiments, the target video may be actually split into individual video segments according to the dotting position for other services, for example, the user may download the video segments.
As can be seen from the above description, in an embodiment of the present specification, an audio may be extracted from a target video, the audio is converted into a text, the text is then divided into a plurality of text paragraphs, a paragraph title of each text paragraph is determined, a dotting position is determined based on each text paragraph, the target video is divided into a plurality of video segments according to the dotting position, and a paragraph title corresponding to the text paragraph is added to each video segment, so that automatic dotting of the target video is achieved.
By adopting the method, the target video can be automatically dotted, the whole process does not need manual operation, key video contents can be prevented from being omitted due to manual dotting, and the dotting efficiency can be greatly improved. In addition, the target video is converted into the text, and the video segments with different subjects are marked based on different semantics of each paragraph in the text, so that the audio content of the video can be paid more attention to in the dotting process without paying more attention to the video images, and more accurate dotting results can be obtained for some videos with less video image change.
The dotting method for the video provided by the present specification is described below by using four parts, namely text paragraph splitting, text paragraph optimization, paragraph title determination, and paragraph category determination.
Text paragraph splitting
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for splitting a text paragraph according to an exemplary embodiment of the present disclosure. The method comprises the following steps:
step 202, acquiring a text obtained by converting a target video;
step 204, determining a plurality of topic sentences from the text;
step 206, taking each topic sentence as a starting sentence of a text paragraph, and dividing the text into a plurality of text paragraphs.
In this embodiment, a plurality of topic sentences may be extracted from the text converted from the target video, where the topic sentences refer to sentences that cause topics. For example, "let us chat about news of today" is a sentence, and after the sentence is spoken, the contents related to "news" will be mainly spoken, and the topic caused by the sentence is "news".
Each topic sentence obtained by extraction can be used as a starting sentence of a text paragraph, so that the text paragraph is divided into a plurality of text segments. Of course, the topic sentence may also be used as a sentence in other position of the text paragraph, such as the second sentence, the third sentence, the middle sentence, and so on, which is not limited in this specification.
The topic sentences can be determined from the text by the following method:
in one example, after the text is divided into sentences, each sentence may be input into a topic sentence identification model, and the topic sentence identification model may output a probability that each sentence is a topic sentence or not, or of course, whether each sentence is a topic sentence or not, and determine whether the sentence is a topic sentence based on an output result of the topic sentence identification model. The topic sentence identification model can be a binary classification model, the training samples can be a large number of sentences collected in advance, and the labels of the sentences can be 'topic sentences' or 'non-topic sentences'. The sentences for training can be collected according to the types of the target videos, for example, if the target videos to be doted are film and television works such as TV dramas and movies, a large number of lines can be mainly collected; if the target video to be doted is a news video or a broadcast video, sentences related to the news and the broadcast can be mainly collected.
In another example, there is also a comment on the target video, which may be a real-time comment related to the time of play, such as a bullet screen. Topic sentences in the text may be determined based on the comments.
For example, when a target video enters a new topic, the number of comments of a user may be suddenly increased, and thus whether a sentence is a topic sentence can be determined. For each sentence in the text, determining a playing time period corresponding to the sentence, then acquiring the number of comments in the time period, judging whether the number reaches a number threshold value, and if so, determining that the sentence is a topic sentence; if not, determining that the sentence is not a topic sentence. The number threshold may be set according to different application scenarios.
In another example, each sentence in the text may be matched with a topic sentence template, and the sentence matching the topic sentence template may be used as the topic sentence.
The topic sentence template can be various in form. For example, the topic sentence template may include topic words, and if a sentence also includes the topic words, it may be determined that the sentence matches the topic sentence template. The topic word may be, for example, "news" in the foregoing example, or may be other words capable of representing a topic, and this embodiment is not particularly limited thereto. Of course, this example is merely exemplary, and in other examples, the topic sentence template may further define a sentence pattern (e.g., question sentence, statement sentence), the number of words in the sentence, the grammatical structure of the sentence (e.g., cardinal-predicate structure, bingo structure, complement structure), the number of topic words contained in the sentence, and so on.
A topic sentence template provided in the present specification is described below as an example. The template may include a first known template word and a first unknown topic word, where the first known template word is preset and fixed, and the first known template word may be a word that can cause a topic after forming a certain grammatical relationship with the first unknown topic word. The first unknown topic word is not preset and fixed as long as the first unknown topic word can form the above-mentioned specified grammatical relationship with the first known template word, for example, the topic sentence template may be in the form of:
topic sentence template 1: (chat) [ xxx ];
topic sentence template 2: (today) [ xxx ] (how);
topic sentence template 3: (how to look at) [ xxx ] and [ xxx ].
In the topic sentence template, the word in the small bracket () is the first known template word, the word in the middle bracket [ ] is the first unknown topic word, and the first known template word is preset, such as "chat", "today", "what kind", "what look" in the above example. The first unknown topic word is not preset, and is exemplified by "xxx" in the above example, which means that the first topic word is unknown, and can be any type of topic word, such as "news", "clothing", "science and technology". The first unknown topic words may be nouns, verbs, phrases, and the like, such as "running", "driving plane", "double eleven shopping festival", and "occupied seat", which is not limited in this specification.
And a first designated grammatical relation exists between the first known template word and the first unknown topic word, such as a dominance-predicate relation, a moving-guest relation, a moving-complement relation and the like. As the topic sentence template 1, the first known template word and the first unknown topic word have a moving guest relationship. Of course, although the grammatical relationship between the first known template word and the first unknown topic word is described here, when matching whether a sentence matches the topic sentence template, the grammatical relationship of the sentence may be analyzed and then matched, or the grammatical relationship of the sentence may not be analyzed, but the matching may be performed directly according to the position of the word in the sentence, which is not particularly limited in this specification.
One topic sentence template may include a plurality of first known template words (such as the topic sentence template 2) or may include a plurality of first unknown topic words (such as the topic sentence template 3). And the topic sentence template can also be generalized to the following form:
topic sentence template 4: (let us chat/talk to everyone) (xxx);
topic sentence template 5: (xxx) (xxx)/(how like/recently how like/how);
topic sentence template 6: (how/how) to treat (xxx) and (xxx);
topic sentence template 7: there are (what/what) (better/better) (xxx).
The topic sentence template can be determined by the following method as to whether or not a sentence matches the topic sentence template. The sentence and each topic sentence template can be matched one by one, whether a first known template word in the topic sentence template exists in the sentence is judged, if yes, whether a first unknown topic word with a first designated grammatical relation in the topic sentence template exists with the first known template word is further judged, and if yes, the sentence is determined to be matched with the topic sentence template.
For example, suppose a sentence is "let us chat about a chat weather bar next". The sentence may be matched with each topic sentence template, and when matching with the above topic sentence template 1, it is found that a first known template word "chat" exists in the sentence, and a topic word "weather" exists after the first known template word, and a grammatical relationship between "chat" and "weather" is the same as a first designated grammatical relationship specified in the topic sentence template 1, it is determined that the sentence matches the template 1.
Of course, the above example is merely an exemplary illustration, and in other examples, another method may be adopted to extract the topic sentence from the text, or the topic sentence may be extracted by combining the above methods: for example, the text may be input into the topic sentence recognition model to recognize a plurality of topic sentences. However, the topic sentences identified at this time may not be accurate enough, and some sentences other than the topic sentences may also be identified as topic sentences, so that the topic sentences may be matched with the topic sentence template, and unmatched topic sentences may be filtered. Then, judging whether the number of the comments in the playing time period corresponding to each filtered topic sentence is larger than a number threshold value or not by combining the comments of the target video, and if so, determining the topic sentence as the topic sentence; if not, the sentences can be filtered, a plurality of candidate sentences having context relation with the sentences can also be obtained, whether the number of comments in the playing time period corresponding to the candidate sentences is larger than a threshold value or not is judged, and if so, the corresponding candidate sentences are taken as the topic sentences.
Second, text paragraph optimization
In this embodiment, after the text paragraphs are obtained by division, the text paragraphs obtained by division may be optimized, so that each text paragraph is more accurate.
In one example, semantically close text paragraphs may be merged. Referring to fig. 3, the merging method may include the following steps:
step 302, determining neighboring paragraphs of a text paragraph;
step 304, judging whether the text paragraph and the neighbor paragraph accord with a merging condition;
and step 306, if yes, merging the text paragraphs and the neighboring paragraphs.
The neighbor paragraphs may be the first N paragraphs and/or the last N paragraphs that have a context relationship with the text paragraph, where N is a preset integer greater than or equal to 1. Whether the text paragraph and its neighbor paragraphs meet the merge condition can be judged by the following method:
for example, whether the merging condition is satisfied may be determined based on the similarity. The text paragraphs and their neighbors may be converted into vectors, and then the similarity between the vectors is calculated using the cosine theorem. If the similarity is greater than the similarity threshold, determining that the text paragraph and the neighboring paragraphs thereof meet the merging condition; and if the similarity is smaller than the similarity threshold, determining that the text paragraph and the adjacent paragraph thereof do not accord with the merging condition. Of course, other methods may be adopted to calculate the similarity, and this description is not given here by way of example.
For another example, whether the merging condition is met may also be determined based on the time length corresponding to the paragraph. Considering that the corresponding playing time of some divided text paragraphs in the target video may be short, it may be stated that when the target video plays the content of a certain topic, other topics are suddenly inserted in the middle, so that an originally complete topic is split into a plurality of text paragraphs, and then the text paragraphs may also be merged. Meanwhile, in order to avoid that the playing time length corresponding to the combined text paragraphs is too long, a time length threshold value can be set, so that the playing time length corresponding to the combined text paragraphs can be controlled within a reasonable range.
The method comprises the steps of obtaining a first playing time of a text paragraph in a target video corresponding to a video clip, obtaining a second playing time of a neighboring paragraph in the target video corresponding to the video clip, calculating the sum of the first playing time and the second playing time, and judging whether the sum is smaller than a time threshold value or not, if so, judging whether the sum is smaller than the time threshold value; determining that the text paragraph and its neighbor paragraphs meet the merging condition; if so, determining that the text paragraph and the neighbor paragraphs thereof do not accord with the merging condition. The method for determining the neighbor paragraph refers to the foregoing embodiments, and is not described herein again.
It should be noted that, although it is mentioned herein that the video segment corresponding to the text passage is determined, the video segment is actually determined according to the timestamp of the text passage, and the video segment is a logical video segment, and actually, the target video is not split in this step.
In other examples, other methods may also be used to determine whether the merging condition is met, or the two methods may also be combined to determine whether the merging condition is met, that is, only when the similarity is smaller than the similarity threshold and the duration is smaller than the duration threshold, it is determined that the merging condition is met.
In this example, similar text paragraphs are merged, so that similar subject contents can be finally divided into the same video segment, and accuracy of subsequent dotting on the target video is improved.
In another example, the validity of each text passage obtained by dividing may be analyzed according to validity rules, and invalid sections may be filtered.
For example, different valid keywords and invalid keywords may be set based on the content and type of the target video, the valid keywords may be words related to the content and type, and the invalid keywords may be words unrelated to the content and type, such as chat related words, advertisement related words, interaction related words, and the like. For example, when the target video is a live video and the content of the live video is related to a financial service, the valid keywords may be keywords related to the financial service, such as fund, stock, interest, profit, loan, etc.; the invalid keyword can be a word of the anchor chatting with the user, such as "today", "weather", "mood", or an interactive related word, such as "welcome", "hello", "red envelope", and the like. Whether the effective keywords and the invalid keywords exist in each text paragraph can be judged, and if the number of the effective keywords reaches a number threshold value, the paragraph is determined to be effective; and if the number of the invalid keywords reaches the number threshold, determining that the paragraph is invalid.
As another example, a text passage may also be input to a validity model, which may output the text passage as valid or invalid. Of course, a probability of validity or invalidity may be output, based on which it is determined whether a text passage is valid. The training sample of the validity model may be a large number of sentences collected in advance, and the labels of the sentences may be "valid" or "invalid". The sentences for training can be collected according to the type of the target video to be dotted, for example, if the target video to be dotted is a movie and television work such as a television play, a movie and the like, a large number of lines can be mainly collected; if the target video to be doted is a news video or a broadcast video, sentences related to the news and the broadcast can be mainly collected.
For another example, whether a paragraph is valid may also be determined from a comment of the target video. For example, comments in a playing time period corresponding to each text paragraph can be obtained, and if the number of the comments is greater than a number threshold, the text paragraph is determined to be valid; otherwise, the text passage is determined to be invalid. For another example, comments in the playing time period corresponding to each text paragraph can be obtained, whether the number and the proportion of preset keywords appearing in the comments reach corresponding threshold values is judged, and if yes, the text paragraph is determined to be valid; otherwise, the text passage is determined to be invalid. The preset keywords may be the same as or different from the valid keywords/invalid keywords.
The above example is to determine whether the whole text passage is valid, and the similar method can be also used in this specification to determine whether each sentence in the text passage is valid, so as to filter some invalid sentences.
For example, since the text paragraphs are divided by using the topic sentence as the starting sentence of the text paragraph in the present embodiment, the last sentence in each text paragraph is passively determined based on the starting sentence of the next paragraph, and in this case, the determined last sentence is not necessarily accurate in practice. Then, whether invalid sentences exist in the ending part can be judged for the ending part of each text paragraph (the ending part can contain the last M sentences of the paragraph, and the number of M can be preset), and if yes, the invalid sentences are filtered, so that more accurate ending sentences can be obtained, and finally, video segments obtained based on text paragraph division are more accurate.
Third, paragraph title determination
In this embodiment, a paragraph title of each text paragraph may also be determined, where the paragraph title may represent a content mainly described by the text paragraph, and a corresponding paragraph title may be added to each video segment as a title of the video segment after a target video is dotted and a plurality of video segments are marked.
In one example, the title of a text paragraph may be determined according to the text content contained in the entire text paragraph, for example, each text paragraph may be input into a title model separately to obtain the paragraph title of the text paragraph. The training sample of the title model may be a large number of sentences collected in advance, and the labels of the sentences may be "title" or "not title". The sentences for training may also be collected according to the type of the target video to be doted, and reference may be made to the foregoing embodiment specifically, and details are not repeated here.
In another example, paragraph titles may also be determined based on the topic sentences of a text paragraph.
For example, topic words may be extracted from topic sentences, where a topic word is a word that can express one topic, and the topic word may be directly used as a paragraph title.
For another example, the paragraph titles may be determined according to the title templates.
Referring to fig. 4, the method for determining a paragraph title may include the steps of:
step 402, extracting topic words from sentences matching the topic sentence template;
step 404, obtaining a topic sentence template matched with the sentence;
step 406, a title template corresponding to the topic sentence template is obtained, wherein the title template comprises a second known template word and a second unknown topic word, and the second known template word and the second unknown topic word have a second specified syntax relationship;
step 408, determining the second unknown topic word in the title template as the topic word extracted from the sentence, and obtaining a paragraph title of a text paragraph corresponding to the sentence.
The above steps are explained in detail below:
as described in the foregoing embodiment, the topic sentence template may include the first known template word and the first unknown topic word. Similarly, a second known template word and a second unknown topic word may be included in the title template. The second known template word may be a summary of the first known template word, for example, the first known template word may be "chat", "let us chat", "speak next", and then the second known template word may be "chat". And a second known template word and a second unknown topic word in the title template have a second specified grammatical relation, and the second specified grammatical relation can also be a main-meaning relation, a moving-guest relation, a moving-complement relation and the like. Moreover, there is a correspondence between the title template and the sentence template. One title template may correspond to only one topic sentence template, or may correspond to a plurality of topic sentence templates. The first specified grammatical relationship in the title template may be the same as or different from the second specified grammatical relationship in the corresponding topic sentence template.
For example, for the topic sentence template 1 in the foregoing embodiment, there may be a corresponding title template a, as follows:
topic sentence template 1: (chat) [ xxx ];
title template a: (chat) [ xxx ].
Wherein, the chat in the title template is a second known template word, and the xxx is a second unknown topic word.
As another example, for the topic sentence template 2 in the foregoing embodiment, there may be a corresponding title template b and title template c, as follows:
topic sentence template 2: (today) [ xxx ] (how);
title template b: (how to) xxx;
title template c: (introduction) [ xxx ].
Of course, the above title template b and title template c may be summarized in the following form:
(how to see/introduce) [ xxx ].
In this example, one topic sentence template may correspond to a plurality of title templates. The desired title template may be selected according to the actual situation.
In this example, a topic sentence template matching the topic sentence is determined for each topic sentence, topic words are extracted from the topic sentence based on the topic sentence template, a topic template having a corresponding relationship with the topic sentence template is found, and the topic words extracted from the topic sentence are used as second unknown topic words in the topic template to obtain a paragraph title of a text paragraph in which the topic sentence is located.
Taking the above-mentioned topic sentence template 1 and the title template a as an example, assuming that a topic sentence is "chat the chat fund below, it can be known that the topic sentence has the first known template word" chat "in the topic sentence template 1, and the sentence has the topic word" today's fund ", and there is a grammatical relationship between the topic word and the first known template word, which is the same as the first specified grammatical relationship between the first known template word and the first unknown topic word in the topic sentence template 1, and then the topic sentence matches the topic sentence template 1. And extracting the topic word 'fund' from the topic sentence, then finding out a corresponding topic template a, determining the 'fund' as an unknown second topic word in the topic template a, and obtaining a paragraph title of 'chat fund'.
In another example, paragraph titles may also be determined based on the commentary of the target video.
For example, comments in a video time period corresponding to each text passage may be obtained, words with a high frequency of occurrence or preset keywords related to the subject and type of the target video may be extracted from the comments, and the passage title may be determined based on the words.
Of course, in other examples, other methods may be used to determine the title of each text paragraph, or the paragraph titles may be determined in combination with the above methods. For example, on the one hand, some primary titles may be obtained based on the title generation model, on the other hand, the topic sentence of each text paragraph may be matched with the title template to obtain other primary titles, and then the final paragraph title may be determined from these primary titles in combination with the comment of the target video.
Fourth, paragraph category determination
In this embodiment, in addition to determining the paragraph title of the text paragraph, the paragraph category of the text paragraph may be determined, where the paragraph category may represent the type of the content of the text paragraph, and may be a more concise and more general description compared to the paragraph title, for example, a description unrelated to the subject of the target video, such as a description related to a flow, a description related to a chapter structure, and the like. Taking a target video as an example of a live video introducing a financial product, table 1 below exemplarily shows a paragraph title and a paragraph category corresponding to the live video.
Figure BDA0002878620840000151
Figure BDA0002878620840000161
TABLE 1
In one example, each text paragraph may be input into the category identification model, resulting in a category to which the text paragraph belongs. The training samples of the category identification model can be sentences of various types, the sample labels are standard categories, and the standard categories can be collected, induced and summarized in advance according to different service scenes.
In another example, each text paragraph may be matched with a standard category, and if the text paragraph is matched with the standard category, the standard category is determined as the paragraph category of the corresponding text paragraph. The determination method of the standard category here may be the same as the above example, and a standard category library may also be constructed, which is continuously enriched.
For example, a corresponding lexicon may be established for each standard category, for example, for the standard category of "product reading", a lexicon such as "introduction", "fund", "profit" or the like may be included, and if a sentence such as "we introduce a next xx fund" is included in a text paragraph, it may be determined that words in the lexicon appear in the sentence, and the standard category of "product reading" is matched. Of course, this example is merely an exemplary illustration, and other methods may be adopted, and this embodiment is not exemplified here.
In another example, comment information of the target video may also be obtained, and paragraph categories of the text paragraphs may be determined based on the comment information. For example, comments in a playing time period corresponding to each text paragraph may be obtained, whether preset keywords related to the target video theme exist in the comments is determined, and if yes, the paragraph category is determined based on the word with the higher frequency of occurrence.
For another example, the sentence pattern type of the comment corresponding to each text paragraph may be analyzed, and if the number of occurrences of a question sentence is large, and the description may be a user question and answer link, the paragraph type of the text paragraph may be determined as "user question and answer".
Of course, the above examples are merely exemplary, and in other examples, other methods may be adopted to determine the paragraph category of the text paragraph, or the paragraph category of the text paragraph may be determined by combining the methods of the above examples.
Each text paragraph is obtained through division, and after the paragraph title and the paragraph category of each text paragraph are determined, the target video can be dotted according to the paragraph title and the paragraph category.
For example, a playing time period corresponding to each text paragraph in the target video may be obtained, where the playing time period may be determined by a timestamp of a first word and a timestamp of a last word in the text paragraph, the playing time period may be marked in the target video, and then a corresponding paragraph title and a paragraph category may be added to each marked video segment, so as to achieve dotting of the target video.
After the target video is doted, a dotting file can be generated, and a dotting position and a paragraph title and a paragraph category corresponding to the dotting position can be recorded in the dotting file, which refer to related technologies and are not described herein.
As can be seen from the above description, in an embodiment of the present specification, after an audio is extracted from a target video and converted into a text, the text may be divided into a plurality of text paragraphs, and a paragraph title and a paragraph category of each text paragraph are determined, so that after a plurality of video segments are obtained by dotting the target video, a corresponding paragraph title and a corresponding paragraph category are added to each video segment, so that a user can clearly obtain rough content of the video segment, and use experience is improved.
The following describes a video dotting method described in this specification, taking a target video as a live video as an example. Referring to fig. 5, fig. 5 is a schematic diagram illustrating another video dotting method according to an exemplary embodiment of the present disclosure. The method comprises the following steps:
step 502, extracting audio from a target live video to be doted, and converting the audio into a corresponding text;
step 504, inputting the text into a topic sentence identification model to obtain a plurality of topic sentences;
step 506, filtering the topic sentences which do not match with the topic sentence templates;
step 508, taking each topic sentence as a starting sentence of a text paragraph, and dividing the text into a plurality of text paragraphs;
step 510, determining paragraph titles of each text paragraph;
and step 512, merging the text paragraphs with similarity greater than the similarity threshold with the adjacent paragraphs.
The specific methods in steps 502-510 refer to the foregoing embodiments, and are not described herein again.
In this embodiment, after merging a text paragraph with a similarity greater than a similarity threshold with its neighboring paragraphs, it is further necessary to determine the paragraph titles of the paragraphs obtained by merging.
For example, assuming that a paragraph a of text has a paragraph heading 1 and its neighboring paragraph b has a paragraph heading 2, after the paragraph of text and the neighboring paragraph are merged, it is necessary to determine that the paragraph is merged.
In one example, the title of the merged paragraph may be determined from the original paragraph titles of the text paragraph and its neighboring paragraphs.
For example, the title of a paragraph with more text content in a text paragraph and a neighboring paragraph may be determined as the title of the combined paragraph. If text paragraph a has more text content than text paragraph b, paragraph heading 1 is selected as the heading for the merged paragraph.
As another example, a section heading with higher accuracy may also be selected. For example, assuming that the above paragraph heading 1 and paragraph heading 2 are obtained based on the heading model that predicts that the score of the paragraph heading 1 is 0.9 and the score of the paragraph heading 2 is 0.7, the paragraph heading 1 is selected as the heading of the merged paragraph.
As another example, the title may also be determined based on the similarity between the title and the content. The paragraph heading 1, the paragraph heading 2 and the combined text paragraph can be converted into vectors, then the similarity between the paragraph heading 1 and the combined text paragraph is calculated, the similarity between the paragraph heading 2 and the combined text paragraph is calculated, and the paragraph heading with high similarity is selected as the heading of the combined paragraph.
In another example, instead of selecting the original paragraph title, a paragraph title may be re-determined. For example, the combined text paragraphs may be entered into a heading model to obtain new paragraph headings. For another example, the topic sentence of the text passage obtained after merging may be identified, and a new title may be determined based on the topic sentence and the title template.
At step 514, the invalid text paragraphs are filtered based on the validity rules.
In this embodiment, some words and phrases may be collected for the subject and the type of the target live video, so as to determine the validity of each text paragraph based on the words and phrases. For example, if the target live broadcast video is a video for introducing financial products, on one hand, some valid words and phrases in the financial industry can be collected, and if a certain number of valid words and phrases exist in a text passage, it can be determined that the text passage is valid. On the other hand, due to the particularity of the live video, the host may perform some interaction with the user (such as red packet sending, coupon sending, chat, and the like), and then may also collect invalid words and phrases related to the interaction, and if a certain number of these invalid words and phrases exist in the text passage, it may be determined that the text passage is invalid.
Step 516, filtering invalid sentences in the ending part of the text paragraphs based on the comments of the target video;
step 518, determining paragraph categories of each text paragraph;
step 520, determining a dotting position based on each text paragraph, and adding a corresponding paragraph title and paragraph category to each video segment marked by the dotting position;
and 522, displaying the marked video clips to the user according to the sequence of the priorities from high to low.
The specific methods in steps 516-520 refer to the foregoing embodiments, and are not described herein again.
In this embodiment, the priority of each marked video segment may also be calculated, and the video segments are presented to the user in the order from high to low in priority, so that the video segment with higher importance may be presented in front, and the user experience may be improved. Wherein the priority of each video segment can be calculated by the following method:
in one example, for each video segment, the paragraph contents and the paragraph titles of the corresponding text paragraphs may be converted into corresponding vectors, and then the similarity between the vectors is calculated based on the cosine theorem, so as to obtain a correlation factor, wherein the higher the value of the correlation factor is, the higher the priority is.
In another example, a ratio factor of invalid sentences in the text paragraphs corresponding to each video segment may also be calculated, and if the ratio factor is larger, the priority is lower. Each sentence in the text paragraph can be respectively matched with the invalid words and sentences, and if the matching result shows that the sentence is an invalid sentence.
In another example, the playing time period of each video segment in the target video may also be obtained, and a temporal attenuation factor is calculated, where the higher the value of the temporal attenuation factor, the higher the priority. Generally, the content of the beginning and ending parts in a live video can be used for warm scenes and summarization, so that the excessive core content is not related, and the core content is usually appeared in the middle part of the video. That is, the time attenuation factors of the video segments are in a gaussian distribution rule along the dimension of the playing time, and the corresponding time attenuation factors can be determined from the gaussian distribution curve according to the playing time period corresponding to the video segments.
Of course, besides the above examples, other methods may also be adopted to determine the priority of each text passage, or the priority of each text passage may also be determined by combining the above methods, for example, the ratio factor, the correlation factor, the time decay factor may be summed, weighted, summed, etc., to determine a combined priority factor, and then the priority of each text passage is determined according to the priority factor.
Referring to fig. 6, fig. 6 is a schematic diagram of a page showing a dotting result to a user.
As shown in fig. 6, each video clip generated in the dotting may correspond to a card, which may be displayed in a floating manner under the page, and the user may view different cards by sliding. Each card can contain paragraph names and paragraph categories, and after a user clicks the card, the user can automatically jump to the video clip corresponding to the card to play. In addition, the card may also include a business control related to the business type, such as "go to buy" in fig. 6, and the user may jump to a corresponding page for purchasing financial products after clicking the control.
As can be seen from the above description, in an embodiment of the present specification, audio in a live video may be extracted, the audio is converted into a text, and then the text is divided into text paragraphs, so as to divide each live video segment based on the text paragraphs. And the titles of the video clips obtained through division can be determined, the videos are displayed to the user according to the priority, and the user can directly click to jump to the part which the user wants to watch based on the displayed content, so that the user experience is prompted.
Corresponding to the foregoing embodiments of the video dotting method, the present specification also provides embodiments of a video dotting apparatus.
The embodiment of the video dotting device can be applied to a server. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor of the server where the device is located. In terms of hardware, as shown in fig. 7, a hardware structure diagram of a server where a video dotting apparatus is located in this specification is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 7, the server where the apparatus is located in the embodiment may also include other hardware according to the actual function of the server, which is not described again.
Fig. 8 is a block diagram of a video dotting apparatus according to an exemplary embodiment of the present disclosure.
Referring to fig. 8, the video dotting apparatus can be applied to the server shown in fig. 7, and includes: text acquisition unit 810, text division unit 820, title determination unit 830, category determination unit 840, video dotting unit 850, validity judgment unit 860, filtering unit 870, and priority determination unit 880.
The text acquisition unit 810 extracts an audio from a target video to be doted, and converts the audio into a corresponding text;
a text dividing unit 820 that divides the text into a number of text paragraphs;
a title determination unit 830 that determines a paragraph title of each text paragraph;
and a video dotting unit 850 for determining a dotting position of the target video based on the divided text paragraphs to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments to realize dotting of the target video.
Optionally, the text dividing unit 820:
determining a plurality of topic sentences from the text;
and dividing the text into a plurality of text paragraphs by taking each topic sentence as a starting sentence of the text paragraphs.
Optionally, when the text dividing unit 820 determines a plurality of topic sentences from the text:
and inputting the text into a topic sentence identification model to obtain a plurality of topic sentences contained in the text.
Optionally, after the text is input into the topic sentence identification model to obtain a plurality of topic sentences included in the text, the text dividing unit 820 further:
matching each topic sentence with a preset topic sentence template;
and filtering the topic sentences which do not match with the topic sentence template.
Optionally, when the text dividing unit 820 determines a plurality of topic sentences from the text:
for each sentence in the text, matching the sentence with a preset topic sentence template;
and if the sentence is matched with any topic sentence template, determining that the sentence is a topic sentence.
Optionally, each topic sentence template includes a first known template word and a first unknown topic word, and the first known template word and the first unknown topic word have a first specified grammatical relationship;
the process of judging whether the sentence is matched with the topic sentence template comprises the following steps:
determining whether the first known template word is present in the sentence,
if yes, judging whether a topic word with the first appointed grammatical relation with the first known template word exists or not;
and if so, determining that the sentence is matched with the topic sentence template.
Optionally, the text dividing unit 820 further:
after determining that the sentence matches the topic sentence template, extracting the topic words from the sentence;
acquiring a title template corresponding to a topic sentence template matched with the sentence, wherein the title template comprises a second known template word and a second unknown topic word, and the second known template word and the second unknown topic word have a second specified syntax relationship;
and determining a second unknown topic word in the title template as the topic word extracted from the sentence to obtain a paragraph title of a text paragraph corresponding to the sentence.
Optionally, the text dividing unit 820 further:
obtaining comments of the target video, wherein each comment corresponds to a certain playing time of the target video;
when the text dividing unit determines a plurality of topic sentences from the text:
for each sentence in the text, performing the following:
acquiring a playing time period corresponding to the sentence;
counting the number of the comments in the playing time period based on the playing time corresponding to the comments;
judging whether the number of the comments is larger than a number threshold value;
and if so, determining the sentence as the topic sentence.
Optionally, the text passage dividing unit 820 further:
determining neighbor paragraphs of the text paragraphs;
judging whether the text paragraph and the neighbor paragraph accord with a merging condition or not;
and if so, merging the text paragraph and the neighbor paragraph to split the target video based on the merged paragraph.
Optionally, when determining whether the text paragraph and the neighboring paragraph meet the merging condition, the text paragraph dividing unit 820:
calculating the similarity between the text paragraph and the neighbor paragraph;
and if the similarity is greater than a similarity threshold value, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
Optionally, when determining whether the text paragraph and the neighboring paragraph meet the merging condition, the text paragraph dividing unit 820:
acquiring a first playing time length of a corresponding video clip of the text paragraph in the target video;
acquiring a second playing time length of the corresponding video clip of the neighbor paragraph in the target video;
calculating the sum of the first playing time length and the second playing time length;
and if the sum is smaller than a duration threshold, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
Optionally, the method further includes:
the validity judging unit 860 judges whether each text paragraph obtained by division is valid based on a validity rule;
the filtering unit 870 filters the invalid text passage.
Optionally, the validity judging unit 860:
determining whether the text passage is valid based on valid keywords and/or invalid keywords.
Optionally, the validity judging unit 860:
and inputting the text paragraphs into an effectiveness model to obtain a prediction result of whether the text paragraphs are effective or not.
Optionally, the title determining unit 830:
the text paragraphs are input into a title model, and the titles of the text paragraphs are determined based on the prediction results of the title model.
Optionally, the title determining unit 830 further:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph titles of the text paragraphs based on the comments.
Optionally, the method further includes:
the category determining unit 840 determines a paragraph category of each text paragraph, and determines the paragraph category as a segment category of the corresponding video segment.
Optionally, the category determining unit 840:
and respectively inputting each text paragraph into a category identification model to obtain the paragraph category of the text paragraph.
Optionally, the category determining unit 840:
matching each text paragraph with a standard paragraph category;
and determining the matched standard paragraph category as the paragraph category of the corresponding text paragraph.
Optionally, after determining the paragraph category of each text paragraph, the category determining unit 840 further:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph categories of the text paragraphs based on content of the comments.
Optionally, when the video dotting unit 850 determines the dotting position of the target video based on the text paragraphs obtained after the division:
acquiring a playing time period corresponding to each text paragraph;
and determining the dotting position of the target video based on the playing time period corresponding to each text paragraph.
Optionally, the playing time period corresponding to the text paragraph is determined based on a timestamp corresponding to a first word and a timestamp corresponding to a last word in the text paragraph.
Optionally, the method further includes:
the priority determining unit 880 determines the priority of each divided video segment, so as to display each video segment to the user in an order from high to low according to the priority.
Optionally, when determining the priority of each divided video segment, the priority determining unit 880:
calculating the correlation between the content of each text paragraph and the corresponding paragraph title for the text paragraph corresponding to each video clip;
determining a priority of each video clip based on the correlation, the similarity being positively correlated with the priority.
Optionally, when determining the priority of each divided video segment, the priority determining unit 880:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
a priority of each text passage is determined based on the comments.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
Corresponding to the foregoing embodiment of the video dotting method, this specification further provides a video dotting apparatus, including: a processor and a memory for storing machine executable instructions. Wherein the processor and the memory are typically interconnected by means of an internal bus. In other possible implementations, the device may also include an external interface to enable communication with other devices or components.
In this embodiment, the processor is caused to:
extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
dividing the text into a plurality of text paragraphs;
determining paragraph titles of the text paragraphs;
determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.
Optionally, when dividing the text into text paragraphs, the processor is caused to:
determining a plurality of topic sentences from the text;
and dividing the text into a plurality of text paragraphs by taking each topic sentence as a starting sentence of the text paragraphs.
Optionally, when a number of topic sentences are determined from the text, the processor is caused to:
and inputting the text into a topic sentence identification model to obtain a plurality of topic sentences contained in the text.
Optionally, after inputting the text into a topic sentence identification model, obtaining a plurality of topic sentences contained in the text, the processor is further caused to:
matching each topic sentence with a preset topic sentence template;
and filtering the topic sentences which do not match with the topic sentence template.
Optionally, when a number of topic sentences are determined from the text, the processor is caused to:
for each sentence in the text, matching the sentence with a preset topic sentence template;
and if the sentence is matched with any topic sentence template, determining that the sentence is a topic sentence.
Optionally, each topic sentence template includes a first known template word and a first unknown topic word, and the first known template word and the first unknown topic word have a first specified grammatical relationship;
a process of determining whether the sentence matches a topic sentence template, the processor caused to:
determining whether the first known template word is present in the sentence,
if yes, judging whether a topic word with the first appointed grammatical relation with the first known template word exists or not;
and if so, determining that the sentence is matched with the topic sentence template.
Optionally, the processor is further caused to:
after determining that the sentence matches the topic sentence template, extracting the topic words from the sentence;
acquiring a title template corresponding to a topic sentence template matched with the sentence, wherein the title template comprises a second known template word and a second unknown topic word, and the second known template word and the second unknown topic word have a second specified syntax relationship;
and determining a second unknown topic word in the title template as the topic word extracted from the sentence to obtain a paragraph title of a text paragraph corresponding to the sentence.
Optionally, the processor is further caused to:
obtaining comments of the target video, wherein each comment corresponds to a certain playing time of the target video;
upon determining a number of topic sentences from the text, the processor is caused to:
for each sentence in the text, performing the following:
acquiring a playing time period corresponding to the sentence;
counting the number of the comments in the playing time period based on the playing time corresponding to the comments;
judging whether the number of the comments is larger than a number threshold value;
and if so, determining the sentence as the topic sentence.
Optionally, before splitting the target video into video segments based on the divided text paragraphs, the processor is further caused to:
determining neighbor paragraphs of the text paragraphs;
judging whether the text paragraph and the neighbor paragraph accord with a merging condition or not;
and if so, merging the text paragraph and the neighbor paragraph to split the target video based on the merged paragraph.
Optionally, when determining whether the text passage and the neighbor passage meet the merging condition, the processor is caused to:
calculating the similarity between the text paragraph and the neighbor paragraph;
and if the similarity is greater than a similarity threshold value, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
Optionally, when determining whether the text passage and the neighbor passage meet the merging condition, the processor is caused to:
acquiring a first playing time length of a corresponding video clip of the text paragraph in the target video;
acquiring a second playing time length of the corresponding video clip of the neighbor paragraph in the target video;
calculating the sum of the first playing time length and the second playing time length;
and if the sum is smaller than a duration threshold, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
Optionally, before determining the dotting position of the target video based on the divided text paragraphs to divide the target video into video segments, the processor is further caused to:
judging whether the text paragraphs are valid or not according to validity rules aiming at each text paragraph obtained by division;
invalid text paragraphs are filtered.
Optionally, when determining whether the text passage is valid based on a validity rule, the processor is caused to:
determining whether the text passage is valid based on valid keywords and/or invalid keywords.
Optionally, when determining whether the text passage is valid based on a validity rule, the processor is caused to:
and inputting the text paragraphs into an effectiveness model to obtain a prediction result of whether the text paragraphs are effective or not.
Optionally, in determining paragraph titles of the text paragraphs, the processor is caused to:
the text paragraphs are input into a title model, and the titles of the text paragraphs are determined based on the prediction results of the title model.
Optionally, after determining the paragraph titles of the text paragraphs, the processor is further caused to:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph titles of the text paragraphs based on the comments.
Optionally, after determining the paragraph titles of the text paragraphs, the processor is further caused to:
determining the paragraph category of each text paragraph, and determining the paragraph category as the segment category of the corresponding video segment.
Optionally, in determining a paragraph category for each text paragraph, the processor is caused to:
and respectively inputting each text paragraph into a category identification model to obtain the paragraph category of the text paragraph.
Optionally, in determining a paragraph category for each text paragraph, the processor is caused to:
matching each text paragraph with a standard paragraph category;
and determining the matched standard paragraph category as the paragraph category of the corresponding text paragraph.
Optionally, after determining the paragraph category for each text paragraph, the processor is further caused to:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph categories of the text paragraphs based on content of the comments.
Optionally, when determining the dotting position of the target video based on the text paragraphs obtained after the division, the processor is caused to:
acquiring a playing time period corresponding to each text paragraph;
and determining the dotting position of the target video based on the playing time period corresponding to each text paragraph.
Optionally, the playing time period corresponding to the text paragraph is determined based on a timestamp corresponding to a first word and a timestamp corresponding to a last word in the text paragraph.
Optionally, the processor is further caused to:
and determining the priority of each divided video clip so as to display each video clip to the user from high to low according to the priority.
Optionally, when determining the priority of each divided video segment, the processor is caused to:
calculating the correlation between the content of each text paragraph and the corresponding paragraph title for the text paragraph corresponding to each video clip;
determining a priority of each video clip based on the correlation, the similarity being positively correlated with the priority.
Optionally, when determining the priority of each divided video segment, the processor is caused to:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
a priority of each text passage is determined based on the comments.
In correspondence with the foregoing embodiments of the video dotting method, the present specification further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of:
extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
dividing the text into a plurality of text paragraphs;
determining paragraph titles of the text paragraphs;
determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.
Optionally, the dividing the text into a plurality of text paragraphs includes:
determining a plurality of topic sentences from the text;
and dividing the text into a plurality of text paragraphs by taking each topic sentence as a starting sentence of the text paragraphs.
Optionally, the determining a plurality of topic sentences from the text includes:
and inputting the text into a topic sentence identification model to obtain a plurality of topic sentences contained in the text.
Optionally, after the step of inputting the text into the topic sentence identification model to obtain a plurality of topic sentences included in the text, the method further includes:
matching each topic sentence with a preset topic sentence template;
and filtering the topic sentences which do not match with the topic sentence template.
Optionally, the determining a plurality of topic sentences from the text includes:
for each sentence in the text, matching the sentence with a preset topic sentence template;
and if the sentence is matched with any topic sentence template, determining that the sentence is a topic sentence.
Optionally, each topic sentence template includes a first known template word and a first unknown topic word, and the first known template word and the first unknown topic word have a first specified grammatical relationship;
the process of judging whether the sentence is matched with the topic sentence template comprises the following steps:
determining whether the first known template word is present in the sentence,
if yes, judging whether a topic word with the first appointed grammatical relation with the first known template word exists or not;
and if so, determining that the sentence is matched with the topic sentence template.
Optionally, the method further includes:
after determining that the sentence matches the topic sentence template, extracting the topic words from the sentence;
acquiring a title template corresponding to a topic sentence template matched with the sentence, wherein the title template comprises a second known template word and a second unknown topic word, and the second known template word and the second unknown topic word have a second specified syntax relationship;
and determining a second unknown topic word in the title template as the topic word extracted from the sentence to obtain a paragraph title of a text paragraph corresponding to the sentence.
Optionally, the method further includes:
obtaining comments of the target video, wherein each comment corresponds to a certain playing time of the target video;
the determining a plurality of topic sentences from the text comprises:
for each sentence in the text, performing the following:
acquiring a playing time period corresponding to the sentence;
counting the number of the comments in the playing time period based on the playing time corresponding to the comments;
judging whether the number of the comments is larger than a number threshold value;
and if so, determining the sentence as the topic sentence.
Optionally, before splitting the target video into a plurality of video segments based on the text paragraphs obtained after the splitting, the method further includes:
determining neighbor paragraphs of the text paragraphs;
judging whether the text paragraph and the neighbor paragraph accord with a merging condition or not;
and if so, merging the text paragraph and the neighbor paragraph to split the target video based on the merged paragraph.
Optionally, the determining whether the text passage and the neighbor passage meet the merging condition includes:
calculating the similarity between the text paragraph and the neighbor paragraph;
and if the similarity is greater than a similarity threshold value, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
Optionally, the determining whether the text passage and the neighbor passage meet the merging condition includes:
acquiring a first playing time length of a corresponding video clip of the text paragraph in the target video;
acquiring a second playing time length of the corresponding video clip of the neighbor paragraph in the target video;
calculating the sum of the first playing time length and the second playing time length;
and if the sum is smaller than a duration threshold, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
Optionally, before determining a dotting position of the target video based on the text paragraphs obtained after the dividing, so as to divide the target video into a plurality of video segments, the method further includes:
judging whether the text paragraphs are valid or not according to validity rules aiming at each text paragraph obtained by division;
invalid text paragraphs are filtered.
Optionally, the determining whether the text passage is valid based on the validity rule includes:
determining whether the text passage is valid based on valid keywords and/or invalid keywords.
Optionally, the determining whether the text passage is valid based on the validity rule includes:
and inputting the text paragraphs into an effectiveness model to obtain a prediction result of whether the text paragraphs are effective or not.
Optionally, the determining the paragraph title of each text paragraph includes:
the text paragraphs are input into a title model, and the titles of the text paragraphs are determined based on the prediction results of the title model.
Optionally, after determining the paragraph title of each text paragraph, the method further includes:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph titles of the text paragraphs based on the comments.
Optionally, after determining the paragraph title of each text paragraph, the method further includes:
determining the paragraph category of each text paragraph, and determining the paragraph category as the segment category of the corresponding video segment.
Optionally, the determining the paragraph category of each text paragraph includes:
and respectively inputting each text paragraph into a category identification model to obtain the paragraph category of the text paragraph.
Optionally, the determining the paragraph category of each text paragraph includes:
matching each text paragraph with a standard paragraph category;
and determining the matched standard paragraph category as the paragraph category of the corresponding text paragraph.
Optionally, after determining the paragraph category of each text paragraph, the method further includes:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph categories of the text paragraphs based on content of the comments.
Optionally, the determining the dotting position of the target video based on the text paragraphs obtained after the dividing includes:
acquiring a playing time period corresponding to each text paragraph;
and determining the dotting position of the target video based on the playing time period corresponding to each text paragraph.
Optionally, the playing time period corresponding to the text paragraph is determined based on a timestamp corresponding to a first word and a timestamp corresponding to a last word in the text paragraph.
Optionally, the method further includes:
and determining the priority of each divided video clip so as to display each video clip to the user from high to low according to the priority.
Optionally, the determining the priority of each divided video segment includes:
calculating the correlation between the content of each text paragraph and the corresponding paragraph title for the text paragraph corresponding to each video clip;
determining a priority of each video clip based on the correlation, the similarity being positively correlated with the priority.
Optionally, the determining the priority of each divided video segment includes:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
a priority of each text passage is determined based on the comments.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (51)

1. A method of video dotting, comprising:
extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
dividing the text into a plurality of text paragraphs;
determining paragraph titles of the text paragraphs;
determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.
2. The method of claim 1, the dividing the text into text paragraphs, comprising:
determining a plurality of topic sentences from the text;
and dividing the text into a plurality of text paragraphs by taking each topic sentence as a starting sentence of the text paragraphs.
3. The method of claim 2, said determining a number of topic sentences from said text comprising:
and inputting the text into a topic sentence identification model to obtain a plurality of topic sentences contained in the text.
4. The method of claim 3, wherein the inputting the text into a topic sentence identification model further comprises, after obtaining a plurality of topic sentences contained in the text:
matching each topic sentence with a preset topic sentence template;
and filtering the topic sentences which do not match with the topic sentence template.
5. The method of claim 2, said determining a number of topic sentences from said text comprising:
for each sentence in the text, matching the sentence with a preset topic sentence template;
and if the sentence is matched with any topic sentence template, determining that the sentence is a topic sentence.
6. The method of claim 4 or 5, wherein each topic sentence template comprises a first known template word and a first unknown topic word, and the first known template word has a first designated grammatical relationship with the first unknown topic word;
the process of judging whether the sentence is matched with the topic sentence template comprises the following steps:
determining whether the first known template word is present in the sentence,
if yes, judging whether a topic word with the first appointed grammatical relation with the first known template word exists or not;
and if so, determining that the sentence is matched with the topic sentence template.
7. The method of claim 6, further comprising:
after determining that the sentence matches the topic sentence template, extracting the topic words from the sentence;
acquiring a title template corresponding to a topic sentence template matched with the sentence, wherein the title template comprises a second known template word and a second unknown topic word, and the second known template word and the second unknown topic word have a second specified syntax relationship;
and determining a second unknown topic word in the title template as the topic word extracted from the sentence to obtain a paragraph title of a text paragraph corresponding to the sentence.
8. The method of claim 2, further comprising:
obtaining comments of the target video, wherein each comment corresponds to a certain playing time of the target video;
the determining a plurality of topic sentences from the text comprises:
for each sentence in the text, performing the following:
acquiring a playing time period corresponding to the sentence;
counting the number of the comments in the playing time period based on the playing time corresponding to the comments;
judging whether the number of the comments is larger than a number threshold value;
and if so, determining the sentence as the topic sentence.
9. The method of claim 1, before splitting the target video into video segments based on the divided text paragraphs, further comprising:
determining neighbor paragraphs of the text paragraphs;
judging whether the text paragraph and the neighbor paragraph accord with a merging condition or not;
and if so, merging the text paragraph and the neighbor paragraph to split the target video based on the merged paragraph.
10. The method of claim 9, wherein the determining whether the text passage and the neighbor passage meet the merging condition comprises:
calculating the similarity between the text paragraph and the neighbor paragraph;
and if the similarity is greater than a similarity threshold value, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
11. The method of claim 9, wherein the determining whether the text passage and the neighbor passage meet the merging condition comprises:
acquiring a first playing time length of a corresponding video clip of the text paragraph in the target video;
acquiring a second playing time length of the corresponding video clip of the neighbor paragraph in the target video;
calculating the sum of the first playing time length and the second playing time length;
and if the sum is smaller than a duration threshold, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
12. The method of claim 1, wherein before determining the dotting position of the target video based on the divided text paragraphs to divide the target video into video segments, the method further comprises:
judging whether the text paragraphs are valid or not according to validity rules aiming at each text paragraph obtained by division;
invalid text paragraphs are filtered.
13. The method of claim 12, wherein determining whether the text passage is valid based on validity rules comprises:
determining whether the text passage is valid based on valid keywords and/or invalid keywords.
14. The method of claim 12, wherein determining whether the text passage is valid based on validity rules comprises:
and inputting the text paragraphs into an effectiveness model to obtain a prediction result of whether the text paragraphs are effective or not.
15. The method of claim 1, wherein determining a paragraph heading for each text paragraph comprises:
the text paragraphs are input into a title model, and the titles of the text paragraphs are determined based on the prediction results of the title model.
16. The method of claim 1, after determining the paragraph heading of each text paragraph, further comprising:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph titles of the text paragraphs based on the comments.
17. The method of claim 1, after determining the paragraph heading of each text paragraph, further comprising:
determining the paragraph category of each text paragraph, and determining the paragraph category as the segment category of the corresponding video segment.
18. The method of claim 17, wherein determining a paragraph category for each text paragraph comprises:
and respectively inputting each text paragraph into a category identification model to obtain the paragraph category of the text paragraph.
19. The method of claim 17, wherein determining a paragraph category for each text paragraph comprises:
matching each text paragraph with a standard paragraph category;
and determining the matched standard paragraph category as the paragraph category of the corresponding text paragraph.
20. The method of any of claims 17-19, after determining the paragraph category for each text paragraph, further comprising:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph categories of the text paragraphs based on content of the comments.
21. The method of claim 1, wherein determining the dotting position of the target video based on the segmented text paragraphs comprises:
acquiring a playing time period corresponding to each text paragraph;
and determining the dotting position of the target video based on the playing time period corresponding to each text paragraph.
22. The method of claim 21, wherein the playback time period corresponding to the text passage is determined based on a timestamp corresponding to a first word and a timestamp corresponding to a last word in the text passage.
23. The method of claim 1, further comprising:
and determining the priority of each divided video clip so as to display each video clip to the user from high to low according to the priority.
24. The method of claim 23, wherein the determining the priority of each divided video segment comprises:
calculating the correlation between the content of each text paragraph and the corresponding paragraph title for the text paragraph corresponding to each video clip;
determining a priority of each video clip based on the correlation, the similarity being positively correlated with the priority.
25. The method of claim 24, wherein the determining the priority of each divided video segment comprises:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
a priority of each text passage is determined based on the comments.
26. An apparatus for video dotting, comprising:
the text acquisition unit extracts audio from a target video to be doted and converts the audio into a corresponding text;
the text dividing unit is used for dividing the text into a plurality of text paragraphs;
a title determination unit that determines a paragraph title of each text paragraph;
and the video dotting unit is used for determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize the dotting of the target video.
27. The apparatus of claim 26, wherein the text dividing unit:
determining a plurality of topic sentences from the text;
and dividing the text into a plurality of text paragraphs by taking each topic sentence as a starting sentence of the text paragraphs.
28. The apparatus according to claim 27, said text dividing unit, when determining a number of topic sentences from said text:
and inputting the text into a topic sentence identification model to obtain a plurality of topic sentences contained in the text.
29. The apparatus according to claim 28, wherein the text dividing unit, after inputting the text into the topic sentence identification model to obtain a plurality of topic sentences contained in the text, further:
matching each topic sentence with a preset topic sentence template;
and filtering the topic sentences which do not match with the topic sentence template.
30. The apparatus according to claim 27, said text dividing unit, when determining a number of topic sentences from said text:
for each sentence in the text, matching the sentence with a preset topic sentence template;
and if the sentence is matched with any topic sentence template, determining that the sentence is a topic sentence.
31. The apparatus according to claim 29 or 30, wherein each topic sentence template comprises a first known template word and a first unknown topic word, and the first known template word has a first designated grammatical relationship with the first unknown topic word;
the process of judging whether the sentence is matched with the topic sentence template comprises the following steps:
determining whether the first known template word is present in the sentence,
if yes, judging whether a topic word with the first appointed grammatical relation with the first known template word exists or not;
and if so, determining that the sentence is matched with the topic sentence template.
32. The apparatus of claim 31, the text partitioning unit further to:
after determining that the sentence matches the topic sentence template, extracting the topic words from the sentence;
acquiring a title template corresponding to a topic sentence template matched with the sentence, wherein the title template comprises a second known template word and a second unknown topic word, and the second known template word and the second unknown topic word have a second specified syntax relationship;
and determining a second unknown topic word in the title template as the topic word extracted from the sentence to obtain a paragraph title of a text paragraph corresponding to the sentence.
33. The apparatus of claim 27, the text partitioning unit further:
obtaining comments of the target video, wherein each comment corresponds to a certain playing time of the target video;
when the text dividing unit determines a plurality of topic sentences from the text:
for each sentence in the text, performing the following:
acquiring a playing time period corresponding to the sentence;
counting the number of the comments in the playing time period based on the playing time corresponding to the comments;
judging whether the number of the comments is larger than a number threshold value;
and if so, determining the sentence as the topic sentence.
34. The apparatus of claim 26, wherein the text paragraph dividing unit further:
determining neighbor paragraphs of the text paragraphs;
judging whether the text paragraph and the neighbor paragraph accord with a merging condition or not;
and if so, merging the text paragraph and the neighbor paragraph to split the target video based on the merged paragraph.
35. The apparatus of claim 34, wherein the text passage dividing unit, when determining whether the text passage and the neighboring passage meet the merging condition:
calculating the similarity between the text paragraph and the neighbor paragraph;
and if the similarity is greater than a similarity threshold value, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
36. The apparatus of claim 34, wherein the text passage dividing unit, when determining whether the text passage and the neighboring passage meet a merge condition:
acquiring a first playing time length of a corresponding video clip of the text paragraph in the target video;
acquiring a second playing time length of the corresponding video clip of the neighbor paragraph in the target video;
calculating the sum of the first playing time length and the second playing time length;
and if the sum is smaller than a duration threshold, determining that the text paragraph and the neighbor paragraph accord with a merging condition.
37. The apparatus of claim 26, further comprising:
the validity judging unit is used for judging whether each text paragraph obtained by division is valid or not based on a validity rule;
and the filtering unit is used for filtering invalid text paragraphs.
38. The apparatus according to claim 37, wherein said validity judging unit:
determining whether the text passage is valid based on valid keywords and/or invalid keywords.
39. The apparatus according to claim 37, wherein said validity judging unit:
and inputting the text paragraphs into an effectiveness model to obtain a prediction result of whether the text paragraphs are effective or not.
40. The apparatus of claim 26, wherein the title determination unit:
the text paragraphs are input into a title model, and the titles of the text paragraphs are determined based on the prediction results of the title model.
41. The apparatus of claim 26, wherein the title determination unit further:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph titles of the text paragraphs based on the comments.
42. The apparatus of claim 26, further comprising:
and the category determining unit is used for determining the paragraph category of each text paragraph and determining the paragraph category as the segment category of the corresponding video segment.
43. The apparatus of claim 42, the category determination unit:
and respectively inputting each text paragraph into a category identification model to obtain the paragraph category of the text paragraph.
44. The apparatus of claim 42, the category determination unit:
matching each text paragraph with a standard paragraph category;
and determining the matched standard paragraph category as the paragraph category of the corresponding text paragraph.
45. The apparatus according to any of claims 42-44, wherein the category determining unit, after determining the paragraph category for each text paragraph, further:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
optimizing paragraph categories of the text paragraphs based on content of the comments.
46. The apparatus of claim 26, wherein the video dotting unit, when determining the dotting position of the target video based on the divided text paragraphs:
acquiring a playing time period corresponding to each text paragraph;
and determining the dotting position of the target video based on the playing time period corresponding to each text paragraph.
47. The apparatus of claim 46, wherein the playback time period corresponding to the text passage is determined based on a timestamp corresponding to a first word and a timestamp corresponding to a last word in the text passage.
48. The apparatus of claim 26, further comprising:
and the priority determining unit is used for determining the priority of each divided video clip so as to display each video clip to the user from high to low according to the priority.
49. The apparatus according to claim 48, wherein said priority determining unit, when determining the priority of each of the divided video segments:
calculating the correlation between the content of each text paragraph and the corresponding paragraph title for the text paragraph corresponding to each video clip;
determining a priority of each video clip based on the correlation, the similarity being positively correlated with the priority.
50. The apparatus according to claim 49, wherein said priority determining unit, when determining the priority of each of the divided video segments:
acquiring a playing time period corresponding to each text paragraph;
obtaining comments in the playing time period;
a priority of each text passage is determined based on the comments.
51. An apparatus for video dotting, comprising:
a processor;
a memory for storing machine executable instructions;
wherein, by reading and executing machine-executable instructions stored by the memory that correspond to video dotting logic, the processor is caused to:
extracting audio from a target video to be clicked, and converting the audio into a corresponding text;
dividing the text into a plurality of text paragraphs;
determining paragraph titles of the text paragraphs;
determining the dotting position of the target video based on the text paragraphs obtained after division so as to divide the target video into a plurality of video segments, and adding paragraph titles corresponding to the text paragraphs to the video segments so as to realize dotting of the target video.
CN202011622535.4A 2020-12-31 2020-12-31 Video dotting method and device Active CN112804580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011622535.4A CN112804580B (en) 2020-12-31 2020-12-31 Video dotting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011622535.4A CN112804580B (en) 2020-12-31 2020-12-31 Video dotting method and device

Publications (2)

Publication Number Publication Date
CN112804580A true CN112804580A (en) 2021-05-14
CN112804580B CN112804580B (en) 2023-01-20

Family

ID=75807494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011622535.4A Active CN112804580B (en) 2020-12-31 2020-12-31 Video dotting method and device

Country Status (1)

Country Link
CN (1) CN112804580B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676772A (en) * 2021-08-16 2021-11-19 上海哔哩哔哩科技有限公司 Video generation method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189107A1 (en) * 2014-12-30 2016-06-30 Hon Hai Precision Industry Co., Ltd Apparatus and method for automatically creating and recording minutes of meeting
CN106126620A (en) * 2016-06-22 2016-11-16 北京鼎泰智源科技有限公司 Method of Chinese Text Automatic Abstraction based on machine learning
CN108235141A (en) * 2018-03-01 2018-06-29 北京网博视界科技股份有限公司 Live video turns method, apparatus, server and the storage medium of fragmentation program request
CN110399489A (en) * 2019-07-08 2019-11-01 厦门市美亚柏科信息股份有限公司 A kind of chat data segmentation method, device and storage medium
CN110517689A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of voice data processing method, device and storage medium
CN111708915A (en) * 2020-06-12 2020-09-25 腾讯科技(深圳)有限公司 Content recommendation method and device, computer equipment and storage medium
US20200320307A1 (en) * 2019-04-08 2020-10-08 Baidu Usa Llc Method and apparatus for generating video
CN111767393A (en) * 2020-06-22 2020-10-13 中国建设银行股份有限公司 Text core content extraction method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189107A1 (en) * 2014-12-30 2016-06-30 Hon Hai Precision Industry Co., Ltd Apparatus and method for automatically creating and recording minutes of meeting
CN106126620A (en) * 2016-06-22 2016-11-16 北京鼎泰智源科技有限公司 Method of Chinese Text Automatic Abstraction based on machine learning
CN108235141A (en) * 2018-03-01 2018-06-29 北京网博视界科技股份有限公司 Live video turns method, apparatus, server and the storage medium of fragmentation program request
US20200320307A1 (en) * 2019-04-08 2020-10-08 Baidu Usa Llc Method and apparatus for generating video
CN110399489A (en) * 2019-07-08 2019-11-01 厦门市美亚柏科信息股份有限公司 A kind of chat data segmentation method, device and storage medium
CN110517689A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of voice data processing method, device and storage medium
CN111708915A (en) * 2020-06-12 2020-09-25 腾讯科技(深圳)有限公司 Content recommendation method and device, computer equipment and storage medium
CN111767393A (en) * 2020-06-22 2020-10-13 中国建设银行股份有限公司 Text core content extraction method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676772A (en) * 2021-08-16 2021-11-19 上海哔哩哔哩科技有限公司 Video generation method and device
CN113676772B (en) * 2021-08-16 2023-08-08 上海哔哩哔哩科技有限公司 Video generation method and device

Also Published As

Publication number Publication date
CN112804580B (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN112733654B (en) Method and device for splitting video
US11197036B2 (en) Multimedia stream analysis and retrieval
US20180160200A1 (en) Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media
KR102112973B1 (en) Estimating and displaying social interest in time-based media
CN110557659B (en) Video recommendation method and device, server and storage medium
US9008489B2 (en) Keyword-tagging of scenes of interest within video content
CN114465737B (en) Data processing method and device, computer equipment and storage medium
CN112733660B (en) Method and device for splitting video strip
CN109558513A (en) A kind of content recommendation method, device, terminal and storage medium
US11017022B2 (en) Method and system for providing audio content
CN116361510A (en) Method and device for automatically extracting and retrieving scenario segment video established by utilizing film and television works and scenario
CN110287376B (en) Method for extracting important movie fragments based on script and subtitle analysis
CN112804580B (en) Video dotting method and device
CN109062905B (en) Barrage text value evaluation method, device, equipment and medium
CN114491149A (en) Information processing method and apparatus, electronic device, storage medium, and program product
Metze et al. Beyond audio and video retrieval: topic-oriented multimedia summarization
CN116567351B (en) Video processing method, device, equipment and medium
Smaïli et al. Summarizing videos into a target language: Methodology, architectures and evaluation
Bigot et al. Detecting individual role using features extracted from speaker diarization results
Baraldi et al. Neuralstory: an interactive multimedia system for video indexing and re-use
Ariyasu et al. Message analysis algorithms and their application to social tv
Wang et al. Overview of the NLPCC 2023 Shared Task 10: Learn to Watch TV: Multimodal Dialogue Understanding and Response Generation
KR20200071996A (en) Language study method using user terminal and central server
Dhakal Political-advertisement video classification using deep learning methods
CN114697762B (en) Processing method, processing device, terminal equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant