CN115080792A - Video association method and device, electronic equipment and storage medium - Google Patents

Video association method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115080792A
CN115080792A CN202210546415.3A CN202210546415A CN115080792A CN 115080792 A CN115080792 A CN 115080792A CN 202210546415 A CN202210546415 A CN 202210546415A CN 115080792 A CN115080792 A CN 115080792A
Authority
CN
China
Prior art keywords
video
target
speech
duration
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210546415.3A
Other languages
Chinese (zh)
Inventor
赵瑞书
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing IQIYI Science and Technology Co Ltd
Original Assignee
Beijing IQIYI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing IQIYI Science and Technology Co Ltd filed Critical Beijing IQIYI Science and Technology Co Ltd
Priority to CN202210546415.3A priority Critical patent/CN115080792A/en
Publication of CN115080792A publication Critical patent/CN115080792A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Abstract

The embodiment of the application provides a video association method, a video association device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a target video album corresponding to the target short video according to at least one target person in the target short video; acquiring a speech-line information set corresponding to the target short video, wherein the speech-line information set comprises N pieces of speech-line data, and each piece of speech-line data corresponds to a first duration; in a speech database of a target video album, retrieving at least one first episode corresponding to each piece of speech data respectively and a second duration corresponding to each first episode; and selecting at least one first episode from a plurality of first episodes of the lines information set according to a first duration and a second duration corresponding to the N lines of line data respectively, determining the selected first episode as a target long video, and establishing the association between the target short video and the target long video. The method and the device realize the association of the target long video and the target short video, and improve the efficiency of finding the target long video.

Description

Video association method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of video technologies, and in particular, to a video association method and apparatus, an electronic device, and a storage medium.
Background
When watching a short video, a user has a strong desire to watch a long video corresponding to the short video if seeing an interested content. However, considering the wide range of short video sources, the corresponding long video sources are not marked. If a user wants to find a long video associated with a short video, the user needs to analyze the short video to obtain the name of a person and search for other videos of the person, which requires many operation steps and low search efficiency.
Disclosure of Invention
The embodiment of the application provides a video association method, a video association device, electronic equipment and a storage medium, so as to solve the problem of how to establish an association relationship between a short video and a corresponding long video.
In a first aspect of embodiments of the present application, a video association method is provided, including:
determining a target video album corresponding to a target short video according to at least one target person in the target short video;
acquiring a speech-line information set corresponding to the target short video, wherein the speech-line information set comprises N pieces of speech-line data corresponding to the target short video, each piece of speech-line data corresponds to a first duration, and N is a positive integer;
in a speech database corresponding to the target video album, retrieving at least one first episode corresponding to each piece of speech data and a second duration corresponding to each first episode;
and selecting at least one first episode from a plurality of first episodes corresponding to the lines information set according to the first duration and the second duration corresponding to the N lines data respectively, determining the selected first episode as a target long video, and establishing the association between the target short video and the target long video.
In a second aspect of the embodiments of the present application, there is also provided a video association apparatus, including:
the determining module is used for determining a target video album corresponding to the target short video according to at least one target person in the target short video;
a first obtaining module, configured to obtain a speech information set corresponding to the target short video, where the speech information set includes N pieces of speech data corresponding to the target short video, and each piece of speech data corresponds to a first duration, where N is a positive integer;
the retrieval module is used for retrieving at least one first episode corresponding to each piece of speech data and a second duration period corresponding to each first episode in a speech database corresponding to the target video album;
and the first association establishing module is used for selecting at least one first episode from a plurality of first episodes corresponding to the lines information set according to the first duration and the second duration corresponding to the N lines data respectively, determining the selected first episode as a target long video, and establishing the association between the target short video and the target long video.
In a third aspect of the embodiments of the present application, there is also provided an electronic device, including:
at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video association method described above.
In a fourth aspect of embodiments of the present application, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the video association method described above.
In a fifth aspect of embodiments of the present application, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the video association method described above.
The embodiment of the application at least comprises the following technical effects:
according to the technical scheme, a target video album corresponding to the target short video is determined through a target person in the target short video, then a first episode corresponding to each line data in N line data in the target short video is retrieved in a line database corresponding to the target video album, at least one first episode is selected from a plurality of first episodes corresponding to the target short video as a target long video corresponding to the target short video according to a first duration corresponding to the N line data in the target short video and a second duration corresponding to the first episode, and the association between the target long video and the target short video is established, so that the efficiency of inquiring the target long video is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flowchart of a video association method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a video correlation apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present application, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The embodiment of the application provides a video association method which can be applied to a server. The server may be a background server for the broadcast application.
As shown in fig. 1, the method may include:
step 101, determining a target video album corresponding to a target short video according to at least one target person in the target short video.
The video resources provided by the broadcast applications comprise video resources which are intercepted from TV shows, movies or art programs and have video length less than a preset time length, and the videos are called short videos in the application. For example, a certain episode of a television show, a segment of a movie or a variety program or a video resulting from the concatenation of different segments.
Specifically, the short video generally includes one or more characters, and for a target short video (a certain short video), at least one character appearing in the target short video may be determined as a target character, and a target video album corresponding to the target short video may be determined by the target character, where the target video album may be a tv show, a movie, or a synthesis corresponding to the target short video, and the target video album may include at least one episode. For a television show, a target video album is the name of the television show, and the number of episodes included in the target video album is the number of episodes included in the television show; for a movie, a target video album is the name of the movie, the target video album includes an episode, and the episode is the movie; for the variety program, the target video album is the name of the variety program, and the number of episodes included in the target video album is the number of programs of each period included in the variety program.
And determining a target video album to which the target long video corresponding to the target short video belongs through the target characters in the target short video.
102, acquiring a speech information set corresponding to the target short video, where the speech information set includes N pieces of speech data corresponding to the target short video, and each piece of speech data corresponds to a first duration, where N is a positive integer.
By processing the target short video, for example, performing character recognition on a video frame in the target short video, the speech-line data appearing in the target short video and the time period in which each piece of speech-line data lasts in the target short video, that is, the first duration, can be obtained. And obtaining a speech information set corresponding to the target short video according to the obtained N pieces of speech data.
Step 103, in the speech database corresponding to the target video album, at least one first episode corresponding to each piece of speech data and a second duration corresponding to each first episode are retrieved.
And storing the speech-line data corresponding to all the episodes in the target video album in the speech-line database corresponding to the target video album. And searching in the line database aiming at each line data in the line data set corresponding to the target short video, so as to obtain an episode, namely a first episode, of each line data, wherein the number of the first episode is at least one, and a time period, namely a second duration, of each line data in the corresponding first episode can be obtained.
And 104, selecting at least one first episode from a plurality of first episodes corresponding to the lines information set according to the first duration and the second duration corresponding to the N lines data respectively, determining the selected first episode as a target long video, and establishing the association between the target short video and the target long video.
An episode corresponding to each line data of N line data in the target short video in the target video album is a first episode, wherein each line data corresponds to at least one first episode, the number of the first episodes corresponding to the line information sets is a set of the first episodes corresponding to each line data in the N line data, so that the number of the first episodes corresponding to the line information sets is multiple, at least one first episode needs to be selected from the multiple first episodes to determine the target long video, and the association between the target short video and the target long video is established. Specifically, a video association database may be set on the server, where the video association database is used to store identification information corresponding to the target short video and identification information corresponding to the target long video, where an association exists. After the target long video is determined, the identification information corresponding to the target long video and the identification information corresponding to the target short video are respectively obtained, and the two identification information are stored in a video association database.
Specifically, at least one first episode may be selected from a plurality of first episodes corresponding to the lines information set as the target long video according to a first duration and a second duration corresponding to the N lines data, respectively.
In the implementation process of the application, a target video album corresponding to a target short video is determined through a target person in the target short video, then a first episode corresponding to each line data in N line data in the target short video is retrieved in a line database corresponding to the target video album, at least one first episode is selected from a plurality of first episodes corresponding to the target short video as a target long video corresponding to the target short video according to a first duration corresponding to the N line data in the target short video and a second duration corresponding to the first episode, and the association between the target long video and the target short video is established, so that the efficiency of querying the target long video is improved.
In an optional embodiment of the present application, the determining, in step 101, a target video album corresponding to a target short video according to at least one target person in the target short video includes:
extracting a first video frame from the target short video according to a first frame extraction frequency;
extracting characters from the first video frame to obtain the target character;
and searching a target video album associated with the target character in a video album database, wherein a plurality of video albums are stored in the video album database, and each video album is associated with at least one character.
Specifically, a first video frame may be extracted from the target short video according to a first frame extraction frequency, where a value of the first frame extraction frequency may be determined according to a length of the target short video, and the longer the target short video is, the smaller the first frame extraction frequency is, the shorter the target short video is, and the larger the first frame extraction frequency is, so as to ensure that the number of first video frames extracted according to the first frame extraction frequency may satisfy a preset condition, and under a condition that the number of first video frames satisfies the preset condition, it may be ensured that a sufficient number of people may be extracted from the first video frames.
The method includes the steps of extracting characters from a first video frame to obtain at least one character in the first video frame, specifically, obtaining at least one face image in the first video frame by using a face detection model, then performing face recognition on each face image by using a face recognition model (the face recognition model can perform feature extraction on the face image to be recognized, and searching a character database for characters matched with features corresponding to the face image, wherein a large number of characters and feature sets corresponding to each character are stored in the character database), determining the character corresponding to each face image, and the target character can be a character with the occurrence frequency larger than a preset frequency threshold value in at least one character extracted from a target short video.
A video album database is arranged in the background server, a plurality of video albums are stored in the video album database, and each video album is associated with at least one character. Then, the video album may be searched for a target video album associated with the target person.
In the implementation process of the application, the target character in the video frame of the target video is determined, so that the target video album related to the target character can be searched from the video album database, and the efficiency of searching the video album is improved.
In an optional embodiment of the present application, the step 102 of obtaining a speech-line information set corresponding to the target short video includes:
extracting a second video frame from the target short video according to a second frame extraction frequency;
respectively performing character recognition on each second video frame to obtain corresponding lines content in each second video frame;
determining N lines of speech data corresponding to the target short video according to the speech content corresponding to each second video frame;
determining the speech information set according to the N pieces of speech data;
and in the process of determining the N lines of speech data corresponding to the target short video, determining a first duration corresponding to each line of speech data respectively based on the frame extraction time corresponding to each second video frame.
Specifically, the second video frames may be extracted from the target short video according to the second frame extraction frequency, and character recognition may be performed on each second video frame to obtain all the speech-line contents in the target short video. After determining the content of the speech-lines, each piece of speech-line data needs to be tracked to determine a continuous time period, i.e., a first continuous time period, of each piece of speech-line data in the target short video, specifically, the first continuous time period corresponding to each piece of speech-line data can be determined by the frame extraction time corresponding to the second video frame, for a certain piece of speech-line data, the frame extraction time corresponding to the second video frame where the speech-line data appears for the first time is the start time of the speech-line data, and when no speech-line data or another speech-line data appears in the second video frame after the start time, the frame extraction time corresponding to the previous second video frame where no speech-line data or another speech-line data appears is determined as the end time of the speech-line data.
Once another speech data appears in the second video frame after the start time, it can be determined that the frame extraction time corresponding to the second video frame before the second video frame corresponding to the another speech data appears is the end time of the speech data.
In the implementation process of the application, the speech information set corresponding to the target short video and the first duration corresponding to each speech data can be determined through the speech content in the video frame of the target short video.
In an optional embodiment of the present application, the performing text recognition on each second video frame in step 102 to obtain the speech-line content in each second video frame includes:
respectively carrying out character recognition on each second video frame to obtain characters in each second video frame and the positions of the characters in the second video frames;
determining a target position in each second video frame according to the position of the characters in the second video frame, wherein the target position is a position for displaying the content of the lines;
and respectively extracting the content of the station words from the characters in each second video frame according to the target position.
Specifically, in the process of acquiring the speech-line content in each second video frame, it is considered that all the characters appearing in the second video frame are recognized in the process of character recognition, and therefore the speech-line content needs to be determined from the recognized characters. In a video, the lines generally have the following characteristics: the height of the lines is fixed (the lines are located at a fixed distance from the lower edge of the image), the lines are located at the left, the center or the right in the image, and the length of the lines is dynamically changed (the length of the lines is determined by the number of words and letters included in the lines). Then the target position for displaying the speech-line content in the second video frame can be determined by analyzing the position of the text in each second video frame in the second video frame according to the fact that the frequency of updating the text at the position for displaying the speech-line content is greater than that of updating the text at other positions. After the target position is determined, the characters at other positions in each second video frame except the target position can be determined as non-speech characters and filtered, and the characters at the target position of each second video frame are determined as speech content corresponding to the second video frame.
In the implementation process of the application, the content of the lines can be determined from the characters in the second video frames by determining the target position for displaying the content of the lines in the target short video, so that the content of the lines in each second video frame can be accurately identified, and the situation that other characters are mistaken for the content of the lines is prevented.
In an optional embodiment of the present application, after the association between the target short video and the target long video is established in step 104, the method further includes:
acquiring a second duration corresponding to each piece of speech data in the target short video in the target long video respectively;
and determining at least one target time interval associated with the target short video in the target long video according to the second duration, and establishing association between the target short video and the target time interval, wherein the time interval spliced by at least two continuous second durations in the plurality of second durations is the target time interval.
Specifically, after the association between the target short video and the target long video is established, the user can directly jump to the corresponding target long video by clicking a jump button indicating to jump to the target long video on a playing interface of the target short video in the process of watching the target short video, but if the duration of the target long video is longer than the preset duration, the user still cannot determine the specific position of the target short video in the target long video, and therefore, in order to improve user experience, the association between the target short video and the target time period in the target long video can be established.
And determining the specifically associated target time period of the target short video in the target long video according to the second duration period of each piece of speech data in the target short video in the target long video. If the target short video is a complete segment in the target long video, the number of the target time periods is one, and if the target short video is obtained by splicing a plurality of segments in the target long video, the number of the target time periods is multiple.
It should be noted that, by establishing the association between the target short video and the target long video, a user watching the target short video may send a jump instruction to the server by clicking a jump button arranged on a playing interface of the target short video, and when the server detects the jump instruction triggered by the user, search for the target long video associated with the target short video based on the association between the target short video and the target long video, for example, search for the target long video associated with the target short video from a video association database (used for storing identification information corresponding to the target short video having an association relationship and identification information corresponding to the target long video). And switching the playing content of the current playing interface from the target short video to the target long video, specifically setting the link corresponding to the skip button as the link corresponding to the target long video, and then skipping to the corresponding target long video after the user clicks the challenge button.
Further, in order to enable a user watching the target short video to jump to a target period of the target long video corresponding to the target short video by clicking a jump button on a playing interface of the target short video, time information of the target period corresponding to the target long video (the time information includes a start time and an end time of the target period in the target long video) may be added in a video association database (the video association database is used for storing identification information corresponding to the target short video and identification information corresponding to the target long video in association relationship), and then association between the target short video and the target period in the target long video is established, so that the user watching the target short video can jump to the target period corresponding to the target long video by clicking the jump button indicated on the playing interface of the target short video, jump to the target period corresponding to the target long video (when the server detects that the user clicks the jump button, searching time information of a target time interval corresponding to the target long video related to the target short video from a video related database, and setting a link corresponding to the jump button as a link of the target time interval corresponding to the target long video according to the time information corresponding to the target time interval).
According to the implementation process, the target time intervals in the target short video and the target long video can be associated, accurate positioning of the target short video in the target long video is achieved, a user can know which time interval the currently watched target short video specifically originates from in the target long video, the user can find the target short video in the target long video conveniently, and user experience is improved.
In an optional embodiment of the present application, in step 104, according to the first duration and the second duration respectively corresponding to the N lines of speech data, selecting at least one first episode from a plurality of first episodes corresponding to the lines of speech information set, and determining the selected first episode as a target long video includes:
judging whether the first speech data and the second speech data meet a merging condition or not according to a first duration and a second duration corresponding to the first speech data and a first duration and a second duration corresponding to the second speech data adjacent to the first speech data;
under the condition that the first speech data and the second speech data meet a merging condition, merging the first speech data and the second speech data to obtain first merged speech data, and determining a second episode corresponding to the first merged speech data, a third duration corresponding to the second episode, and a fourth duration corresponding to the target short video;
under the condition that the first merged speech-line data and the adjacent third speech-line data meet the merging condition, merging the first merged speech-line data and the third speech-line data to obtain second merged speech-line data, and continuing to execute the merging step on the basis of the second merged speech-line data until the merging condition is not met or the last speech-line data in the target short video is merged to finish merging so as to determine the target long video;
and under the condition that the first merged speech line data and the third speech line data do not meet the merging condition, determining that the second episode is a target long video corresponding to the target short video, and continuing to execute the merging step on the basis of the third speech line data until the merging condition is not met or the last speech line data in the target short video is merged to determine the target long video.
Specifically, when a target long video corresponding to a target short video is determined, the embodiment of the present application may combine adjacent first speech data and second speech data that satisfy a combination condition in the target short video to obtain first combined speech data, determine a second episode corresponding to the first combined speech data, and then combine the first combined speech data with adjacent third speech data.
It should be noted that the first speech data and the second speech data are two adjacent speech data, which may be two speech data located at the open end of the target short video, or two speech data located at other positions in the target short video, and this is not specifically limited in this application. The first speech data and the second speech data may be corresponding speech data in the target short video, or merged speech data obtained by merging corresponding speech data in the target short video. The third speech-line data adjacent to the first merged speech-line data may be either speech-line data located before the first merged speech-line data or speech-line data located after the first merged speech-line data.
If the first merged speech-line data and the third speech-line data meet the merging condition, merging the first merged speech-line data and the third speech-line data to obtain second merged speech-line data, continuing to execute the merging step on the basis of the second merged speech-line data until the merging condition is not met or the merging step is finished when the last speech-line data in the target short video is merged, determining that a third episode corresponding to the third merged speech-line data obtained before the merging condition is not met is a target long video corresponding to the target short video, and determining that a fourth episode corresponding to the fourth merged speech-line data obtained when the last speech-line data is merged is a target long video corresponding to the target short video.
If the first merged speech-line data and the third speech-line data do not meet the merging condition, determining that a second episode corresponding to the first merged speech-line data is a target long video corresponding to a target short video, continuing to execute the merging step on the basis of the third speech-line data until the merging condition is not met or the merging step is finished when the last speech-line data in the target short video is merged, determining that a fifth episode corresponding to fifth merged speech-line data obtained before the merging condition is not met is the target long video corresponding to the target short video, and determining that a sixth episode corresponding to sixth merged speech-line data obtained when the last speech-line data is merged is the target long video corresponding to the target short video.
According to the implementation process, the number of the first episodes corresponding to the target short videos can be gradually reduced by merging the lines data in the target short videos, the search range of searching the target long videos is reduced, and the target long videos can be determined according to the episodes corresponding to the merged lines data obtained by merging until the merging cannot be performed, so that the efficiency of finding the target long videos is improved, and further, a user can conveniently jump to the target long videos from the target short videos directly from the target short videos to the corresponding target long videos by clicking the jump button on the play interface of the target short videos.
In an optional embodiment of the present application, the determining, according to a first duration and a second duration corresponding to first speech data and a first duration and a second duration corresponding to second speech data adjacent to the first speech data, whether the first speech data and the second speech data satisfy a merge condition, which is included in step 104, where the first speech data is located before the second speech data, includes:
acquiring a first duration of an interval between a first ending moment of the first speech data and a first starting moment of the second speech data according to first duration periods corresponding to the first speech data and the second speech data respectively;
for each first target episode, obtaining a second ending time corresponding to the first speech data and a second duration of a second starting time interval corresponding to the second speech data, wherein the first target episode comprises the first speech data and the second speech data;
and under the condition that a target second time length with a difference value smaller than a preset time threshold exists in the at least one second time length, determining that the first speech data and the second speech data meet a merging condition.
Specifically, whether the first speech data and the second speech data satisfy the merge condition may be determined by a first duration of the first speech data in the target short video and a second duration in the first episode, and a first duration of the second speech data in the target short video and a second duration in the first episode. The duration of the interval between the first speech data and the second speech data in the target short video is a first duration, that is, the duration of the interval between the first ending time of the first speech data and the first starting time of the second speech data, and the duration of the interval between the first speech data and the second speech data in the first target episode is a second duration, that is, the duration of the interval between the second ending time of the first speech data and the second starting time of the second speech data, where the first target episode includes the first speech data and the second speech data, that is, the first target episode is an intersection of a first episode corresponding to the first speech data and a first episode corresponding to the second speech, the number of the first target episodes is at least one, each first target episode corresponds to one second duration, and then the number of the second durations is also at least one.
And under the condition that a target second time length with the difference value smaller than a preset time threshold exists in at least one second time length, determining that the first speech data and the second speech data meet a combination condition. That is, when the duration of the interval between the first speech data and the second speech data in the first target episode is close to the duration of the interval between the first speech data and the second speech data in the target short video (the difference is smaller than the preset time threshold), it may be determined that the first speech data and the second speech data satisfy the merge condition.
In the implementation process of the application, by comparing the relationship between the interval duration between the first speech data and the second speech data in the first target episode and the interval duration between the first speech data and the second speech data in the target short video, whether the first speech data and the second speech data can be combined can be quickly determined, so that the number of the first episodes corresponding to the target short video can be reduced, and the search range for searching the target long video can be gradually reduced.
Specifically, the determining a second episode corresponding to the first merged lines data, a third duration corresponding to the second episode, and a fourth duration corresponding to the target short video includes:
determining a first episode corresponding to the target second duration as a second episode corresponding to the first merged lines;
determining that a time period from a starting time of the first speech data in the second drama set to an ending time of the second speech data in the second drama set is the third duration time period;
determining a time period from a starting time of the first speech data in the target short video to an ending time of the second speech data in the target short video as the fourth duration.
In an optional embodiment of the present application, the retrieving, in the speech database corresponding to the target video album in step 103, at least one first episode corresponding to each piece of the speech data respectively includes:
for each piece of speech data, searching target speech data, the matching degree of which with the speech data meets a preset condition, in the speech database;
and determining the episode in which the target speech-line data is positioned as a first episode corresponding to the speech-line data.
In the process of character recognition, certain false recognition may exist, for example, if a "day" character is recognized as a "target" character, in order to prevent the final determination of the first episode from being influenced by the false recognition, a preset matching degree threshold value may be set when the lines data is searched in the lines database, and as long as the matching degree between the searched lines data and the lines data in the lines database meets a preset condition, that is, is greater than the preset matching degree threshold value, it is considered that the two lines data are successfully matched, and then, the episode where the target lines data matched in the lines database is located is the first episode corresponding to the lines data.
The implementation process can prevent the influence on the determination of the first episode due to the false recognition in the speech content recognition process.
In an optional embodiment of the present application, after the step 101 of establishing the association between the target short video and the target long video, the method further includes:
when a jump instruction triggered by a user watching the target short video is detected, searching for a target long video related to the target short video based on the relation between the target short video and the target long video, and controlling the playing content of the current playing interface to be switched from the target short video to the target long video.
Specifically, after the association relationship between the target short video and the target long video is established, a user watching the target short video may send a jump instruction to the server by clicking a jump button arranged on a playing interface of the target short video, and when the server detects a jump instruction triggered by the user, the target long video associated with the target short video is searched based on the association between the target short video and the target long video, for example, the target long video having the association relationship with the target short video is searched from a video association database (used for storing identification information corresponding to the target short video having the association relationship and identification information corresponding to the target long video). And switching the playing content of the current playing interface from the target short video to the target long video, specifically setting the link corresponding to the skip button as the link corresponding to the target long video, so that after the user clicks the challenge button, the user can skip to the corresponding target long video, thereby saving the time for the user to search for the target long video.
According to the implementation process, a user watching the target short video can click the skip button to send the skip instruction to the server, and the user can directly skip to the corresponding target long video from the target short video, so that the user experience is improved, meanwhile, the function that the short video is played to drive the long video to be played can be realized, and the opportunity and the scene of the long video to be played can be effectively increased.
The above overall implementation process of the video association method provided in this embodiment of the application is to determine a target video album corresponding to a target short video by a target person in the target short video, then retrieve a first episode corresponding to each piece of line data in N pieces of line data in the target short video in a line database corresponding to the target video album, select at least one first episode from a plurality of first episodes corresponding to the target short video as a target long video corresponding to the target short video according to a first duration corresponding to the N pieces of line data in the target short video and a second duration corresponding to the first episode, and establish association between the target long video and the target short video, thereby improving efficiency of querying the target long video, and further facilitating a user to directly jump from the target short video to the corresponding target long video by clicking a jump button on a play interface of the target short video indicating to jump to the target long video, meanwhile, the function that the short video playing drives the long video playing can be realized, and the opportunity and the scene of the long video playing are effectively increased.
An embodiment of the present application further provides a video association apparatus, as shown in fig. 2, the apparatus includes:
a determining module 201, configured to determine, according to at least one target person in a target short video, a target video album corresponding to the target short video;
a first obtaining module 202, configured to obtain a speech information set corresponding to the target short video, where the speech information set includes N pieces of speech data corresponding to the target short video, and each piece of speech data corresponds to a first duration, where N is a positive integer;
a retrieving module 203, configured to retrieve, in a speech database corresponding to the target video album, at least one first episode corresponding to each piece of the speech data and a second duration corresponding to each first episode in each first episode;
a first association establishing module 204, configured to select at least one first episode from the multiple first episodes corresponding to the lines information set according to the first duration and the second duration corresponding to the N lines data, determine the selected first episode as a target long video, and establish an association between the target short video and the target long video.
Optionally, the determining module 201 includes:
the first extraction submodule is used for extracting a first video frame from the target short video according to a first frame extraction frequency;
the first obtaining submodule is used for extracting people from the first video frame to obtain the target people;
a search submodule, configured to search a video album database for a target video album associated with the target character, where the video album database stores a plurality of video albums, and each video album is associated with at least one character.
Optionally, the first obtaining module 202 includes:
the second extraction submodule is used for extracting a second video frame from the target short video according to a second frame extraction frequency;
the second obtaining submodule is used for respectively carrying out character recognition on each second video frame and obtaining corresponding lines content in each second video frame;
the first determining submodule is used for determining N pieces of speech-line data corresponding to the target short video according to the speech-line content corresponding to each second video frame;
the second determining submodule is used for determining the speech-line information set according to the N pieces of speech-line data;
and in the process of determining the N lines of speech data corresponding to the target short video, determining a first duration corresponding to each line of speech data respectively based on the frame extraction time corresponding to each second video frame.
Optionally, the second obtaining sub-module includes:
the first acquisition unit is used for respectively carrying out character recognition on each second video frame to acquire characters in each second video frame and positions of the characters in the second video frames;
a first determining unit, configured to determine a target position in each second video frame according to a position of the text in the second video frame, where the target position is a position for displaying the content of the speech-lines;
and the extracting unit is used for respectively extracting the content of the station words from the characters in each second video frame according to the target position.
Optionally, the apparatus further comprises:
the second obtaining module is used for obtaining a second duration period corresponding to each piece of speech data in the target short video in the target long video respectively;
and the second association establishing module is used for determining at least one target time interval associated with the target short video in the target long video according to the second duration, and establishing association between the target short video and the target time interval, wherein a time interval obtained by splicing at least two continuous second durations in the plurality of second duration is the target time interval.
Optionally, the first association establishing module 204 includes:
the judgment sub-module is used for judging whether the first speech-line data and the second speech-line data meet a merging condition or not according to a first duration and a second duration corresponding to the first speech-line data and a first duration and a second duration corresponding to the second speech-line data adjacent to the first speech-line data, wherein the first speech-line data is positioned in front of the second speech-line data;
a third determining sub-module, configured to, when the first speech data and the second speech data meet a merging condition, merge the first speech data and the second speech data to obtain first merged speech data, and determine a second episode corresponding to the first merged speech data, a third duration corresponding to the second episode, and a fourth duration corresponding to the target short video;
a fourth determining sub-module, configured to, when the first merged speech-line data and the adjacent third speech-line data satisfy a merging condition, merge the first merged speech-line data and the third speech-line data to obtain second merged speech-line data, and continue to perform the merging step based on the second merged speech-line data until the merging condition is not satisfied or the last speech-line data merged into the target short video is merged, so as to determine the target long video;
and a fifth determining sub-module, configured to determine that the second episode is a target long video corresponding to the target short video when the first merged lines data and the third lines data do not meet a merging condition, and continue to perform the merging step based on the third lines data until the merging condition is not met or the last lines data in the target short video is merged, so as to determine the target long video.
Optionally, the first speech-line data is located before the second speech-line data, and the determining sub-module includes:
a second obtaining unit, configured to obtain a first duration of an interval between a first end time of the first speech data and a first start time of the second speech data according to first duration periods corresponding to the first speech data and the second speech data, respectively;
a third obtaining unit, configured to obtain, for each first target episode, a second duration of a second ending time interval corresponding to the first speech data and a second starting time interval corresponding to the second speech data, where the first target episode includes the first speech data and the second speech data;
a second determining unit, configured to determine that the first speech data and the second speech data satisfy a merging condition when a target second duration exists in the at least one second duration, where a difference between the target second duration and the first duration is smaller than a preset time threshold.
Optionally, the third determining sub-module includes:
a third determining unit, configured to determine that the first episode corresponding to the target second duration is a second episode corresponding to the first merged lines;
a fourth determining unit, configured to determine that a time period between a start time of the first speech data in the second drama set and an end time of the second speech data in the second drama set is the third duration;
a fifth determining unit, configured to determine that a time period between a start time of the first speech data in the target short video and an end time of the second speech data in the target short video is the fourth duration.
Optionally, the retrieving module 203 includes:
the searching sub-module is used for searching target speech data, the matching degree of which with the speech data meets preset conditions, in the speech database aiming at each piece of speech data;
and the sixth determining submodule is used for determining that the episode in which the target speech-line data is positioned is the first episode corresponding to the speech-line data.
Optionally, the video association apparatus further includes:
and the control module is used for searching a target long video associated with the target short video based on the association between the target short video and the target long video when a jump instruction triggered by a user watching the target short video is detected, and controlling the playing content of the current playing interface to be switched from the target short video to the target long video.
According to the video association device provided by the embodiment of the application, a target video album corresponding to a target short video is determined through a target person in the target short video, then a first episode corresponding to each line data in N line data in the target short video is retrieved in a line database corresponding to the target video album, at least one first episode is selected from a plurality of first episodes corresponding to the target short video as a target long video corresponding to the target short video according to a first duration corresponding to the N line data in the target short video and a second duration corresponding to the first episode, and association between the target long video and the target short video is established, so that the efficiency of inquiring the target long video is improved.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
An embodiment of the present application further provides an electronic device, including:
at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video association method described above.
The Memory mentioned in the above electronic device may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor mentioned in the electronic device may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
Embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the video association method described above.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the video association method is implemented.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (14)

1. A method for video association, the method comprising:
determining a target video album corresponding to a target short video according to at least one target person in the target short video;
acquiring a speech-line information set corresponding to the target short video, wherein the speech-line information set comprises N pieces of speech-line data corresponding to the target short video, each piece of speech-line data corresponds to a first duration, and N is a positive integer;
in a speech database corresponding to the target video album, retrieving at least one first episode corresponding to each piece of speech data and a second duration corresponding to each first episode;
and selecting at least one first episode from a plurality of first episodes corresponding to the lines information set according to the first duration and the second duration corresponding to the N lines data respectively, determining the selected first episode as a target long video, and establishing the association between the target short video and the target long video.
2. The video association method of claim 1, wherein the determining a target video album corresponding to the target short video according to at least one target person in the target short video comprises:
extracting a first video frame from the target short video according to a first frame extraction frequency;
extracting characters from the first video frame to obtain the target character;
and searching a target video album associated with the target character in a video album database, wherein a plurality of video albums are stored in the video album database, and each video album is associated with at least one character.
3. The video association method according to claim 1, wherein the obtaining of the speech information set corresponding to the target short video includes:
extracting a second video frame from the target short video according to a second frame extraction frequency;
respectively performing character recognition on each second video frame to obtain corresponding lines content in each second video frame;
determining N lines of speech data corresponding to the target short video according to the speech content corresponding to each second video frame;
determining the speech information set according to the N pieces of speech data;
and in the process of determining the N lines of speech data corresponding to the target short video, determining a first duration corresponding to each line of speech data respectively based on the frame extraction time corresponding to each second video frame.
4. The video association method according to claim 3, wherein said performing character recognition on each of the second video frames to obtain the speech-line content in each of the second video frames respectively comprises:
respectively carrying out character recognition on each second video frame to obtain characters in each second video frame and the positions of the characters in the second video frames;
determining a target position in each second video frame according to the position of the characters in the second video frame, wherein the target position is a position for displaying the content of the lines;
and respectively extracting the content of the station words from the characters in each second video frame according to the target position.
5. The video association method of claim 1, wherein after establishing the association of the target short video with the target long video, the method further comprises:
acquiring a second duration corresponding to each piece of speech data in the target short video in the target long video respectively;
and determining at least one target time interval associated with the target short video in the target long video according to the second duration, and establishing association between the target short video and the target time interval, wherein the time interval spliced by at least two continuous second durations in the plurality of second durations is the target time interval.
6. The video association method according to claim 1, wherein the selecting, according to the first duration and the second duration corresponding to the N lines of speech data, at least one first episode from among a plurality of first episodes corresponding to the line of speech information set, and determining the selected first episode as a target long video includes:
judging whether the first speech data and the second speech data meet a merging condition or not according to a first duration and a second duration corresponding to the first speech data and a first duration and a second duration corresponding to the second speech data adjacent to the first speech data;
under the condition that the first speech data and the second speech data meet a merging condition, merging the first speech data and the second speech data to obtain first merged speech data, and determining a second episode corresponding to the first merged speech data, a third duration corresponding to the second episode, and a fourth duration corresponding to the target short video;
under the condition that the first merged speech-line data and the adjacent third speech-line data meet the merging condition, merging the first merged speech-line data and the third speech-line data to obtain second merged speech-line data, and continuing to execute the merging step on the basis of the second merged speech-line data until the merging condition is not met or the last speech-line data in the target short video is merged to finish merging so as to determine the target long video;
and under the condition that the first merged speech line data and the third speech line data do not meet the merging condition, determining that the second episode is a target long video corresponding to the target short video, and continuing to execute the merging step on the basis of the third speech line data until the merging condition is not met or the last speech line data in the target short video is merged to determine the target long video.
7. The method of claim 6, wherein the determining whether the first speech data and the second speech data satisfy a merge condition according to a first duration and a second duration corresponding to the first speech data and a first duration and a second duration corresponding to the second speech data adjacent to the first speech data, comprises:
acquiring a first duration of an interval between a first ending moment of the first speech data and a first starting moment of the second speech data according to first duration periods corresponding to the first speech data and the second speech data respectively;
for each first target episode, obtaining a second ending time corresponding to the first speech data and a second duration of a second starting time interval corresponding to the second speech data, wherein the first target episode comprises the first speech data and the second speech data;
and under the condition that a target second time length with a difference value smaller than a preset time threshold exists in the at least one second time length, determining that the first speech data and the second speech data meet a merging condition.
8. The video association method of claim 7, wherein said determining a second episode to which the first merged lines data corresponds, a third duration to which the second episode corresponds, and a fourth duration to which the target short video corresponds comprises:
determining a first episode corresponding to the target second duration as a second episode corresponding to the first merged lines;
determining that a time period from a start time of the first speech-line data in the second drama set to an end time of the second speech-line data in the second drama set is the third duration;
determining a time period from a starting time of the first speech data in the target short video to an ending time of the second speech data in the target short video as the fourth duration.
9. The video association method according to claim 1, wherein the retrieving, in a speech database corresponding to the target video album, at least one first episode respectively corresponding to each piece of the speech data includes:
for each piece of speech data, searching target speech data, the matching degree of which with the speech data meets a preset condition, in the speech database;
and determining the episode in which the target speech-line data is positioned as a first episode corresponding to the speech-line data.
10. The video association method of claim 1, wherein after establishing the association of the target short video with the target long video, the method further comprises:
when a jump instruction triggered by a user watching the target short video is detected, searching for the target long video associated with the target short video based on the association between the target short video and the target long video, and controlling the playing content of the current playing interface to be switched from the target short video to the target long video.
11. A video association apparatus, the apparatus comprising:
the determining module is used for determining a target video album corresponding to the target short video according to at least one target person in the target short video;
a first obtaining module, configured to obtain a speech information set corresponding to the target short video, where the speech information set includes N pieces of speech data corresponding to the target short video, and each piece of speech data corresponds to a first duration, where N is a positive integer;
the retrieval module is used for retrieving at least one first episode corresponding to each piece of speech data and a second duration period corresponding to each first episode in a speech database corresponding to the target video album;
and the first association establishing module is used for selecting at least one first episode from a plurality of first episodes corresponding to the lines information set according to the first duration and the second duration corresponding to the N lines data respectively, determining the selected first episode as a target long video, and establishing the association between the target short video and the target long video.
12. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video association method of any one of claims 1 to 10.
13. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the video association method of any one of claims 1 to 10.
14. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the video association method of any one of claims 1 to 10.
CN202210546415.3A 2022-05-19 2022-05-19 Video association method and device, electronic equipment and storage medium Pending CN115080792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210546415.3A CN115080792A (en) 2022-05-19 2022-05-19 Video association method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210546415.3A CN115080792A (en) 2022-05-19 2022-05-19 Video association method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115080792A true CN115080792A (en) 2022-09-20

Family

ID=83249836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210546415.3A Pending CN115080792A (en) 2022-05-19 2022-05-19 Video association method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115080792A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240983A (en) * 2023-11-16 2023-12-15 湖南快乐阳光互动娱乐传媒有限公司 Method and device for automatically generating sound drama

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240983A (en) * 2023-11-16 2023-12-15 湖南快乐阳光互动娱乐传媒有限公司 Method and device for automatically generating sound drama
CN117240983B (en) * 2023-11-16 2024-01-26 湖南快乐阳光互动娱乐传媒有限公司 Method and device for automatically generating sound drama

Similar Documents

Publication Publication Date Title
CN108769731B (en) Method and device for detecting target video clip in video and electronic equipment
WO2019109643A1 (en) Video recommendation method and apparatus, and computer device and storage medium
JP2021525031A (en) Video processing for embedded information card locating and content extraction
US8214368B2 (en) Device, method, and computer-readable recording medium for notifying content scene appearance
US8108257B2 (en) Delayed advertisement insertion in videos
US9837125B2 (en) Generation of correlated keyword and image data
CN109688475B (en) Video playing skipping method and system and computer readable storage medium
CN110913241B (en) Video retrieval method and device, electronic equipment and storage medium
CN111711855A (en) Video generation method and device
CN110502661A (en) A kind of video searching method, system and storage medium
CN111314732A (en) Method for determining video label, server and storage medium
US10795932B2 (en) Method and apparatus for generating title and keyframe of video
CN110287375B (en) Method and device for determining video tag and server
CN110674345A (en) Video searching method and device and server
WO2019128724A1 (en) Method and device for data processing
CN115080792A (en) Video association method and device, electronic equipment and storage medium
JP5257356B2 (en) Content division position determination device, content viewing control device, and program
US20170040040A1 (en) Video information processing system
CN108882024B (en) Video playing method and device and electronic equipment
KR20080112975A (en) Method, system and recording medium storing a computer program for building moving picture search database and method for searching moving picture using the same
Hahm et al. Event-based sport video segmentation using multimodal analysis
US7590333B2 (en) Image extraction from video content
CN112818984A (en) Title generation method and device, electronic equipment and storage medium
US10178415B2 (en) Chapter detection in multimedia streams via alignment of multiple airings
KR20200076126A (en) Providing method of search service of the video based on person and application prividing the service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination