CN114257864B

CN114257864B - Seek method and device of player in HLS format video source scene

Info

Publication number: CN114257864B
Application number: CN202210169802.XA
Authority: CN
Inventors: 李本龙; 白剑; 黄海亮; 梁瑛玮; 张海林; 鲁和平; 李长杰; 陈焕然; 李乐; 王浩; 洪行健; 冷冬; 丁一
Original assignee: Yifang Information Technology Co ltd
Current assignee: Yifang Information Technology Co ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2023-02-03
Anticipated expiration: 2042-02-24
Also published as: CN114257864A

Abstract

The invention provides a seek method and a seek device of a player under an HLS format video source scene, wherein the method comprises the following steps: determining the correct pts of the first frame image in the nth ts segment; acquiring pts data of a first frame image after demultiplexing in the nth ts segment; acquiring pts data of an nth frame image after demultiplexing in an nth ts segment; determining the pts time of the n frame image after demultiplexing in the nth ts segment and the offset of the pts time of the first frame image after demultiplexing in the ts segment; determining pts time of the n frame image after demultiplexing in the n ts segment; a plurality of ts segments are played sequentially. The invention realizes the accurate correction of pts time of each frame image in a plurality of spliced segments of the HLS format video, so that the video can be normally played, can be compatible with non-standard video, avoids transcoding again by an encoding server, and also provides good seek experience.

Description

Seek method and device of player in HLS format video source scene

Technical Field

The invention relates to the technical field of networks, in particular to a seek method and a seek device of a player in a video source scene based on an HLS format.

Background

In the video on demand system, because of the excellent cross-platform and cross-terminal characteristics of the HLS (HTTP Live Stream) protocol, the HLS supports DRM, and is widely used for transmission and playing of video streams. In the educational scenario, the video seek interaction is just needed by the student: the video needs to be dragged rapidly to know the content of the video approximately; and skipping to a certain time point to repeatedly watch important content. In some scenes, the HLS video is spliced from different videos, or the same video is spliced after being cut.

Taking the figure 1 of the specification as an example, the videos in the table are formed by clipping three videos, namely movie1, movie2 and movie3, and each video only takes a part of the segments. Before the correction is not carried out, the video is played in a normal sequence, the ts piece of movie2_1 is played, the time display of the progress bar is inaccurate, the display is correct for 30 seconds, and actually 0 second is displayed. When the video is played from seek to the 31 st second, since there is no matched video frame in ts of movie2_1, pts does not correspond to pts, and the video playing fails.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a seek method and a seek device of a player in an HLS format video source scene, which solve the defects that the playing time of an HLS format video source in the prior art is not accurately displayed and the playing failure is easy to occur.

The technical scheme of the invention is realized as follows: a seek method of a player based on an HLS format video source scene comprises the following steps:

determining the correct pts of the first frame image in the nth ts segment, and recording the correct pts as ts _ start _ pts, wherein ts _ start _ pts is the total duration of the previous n-1 ts segments;

acquiring pts data of the first frame image after demultiplexing in the nth ts segment, and recording the pts data as ts _ first _ packet _ pts;

acquiring pts data of an nth frame image after demultiplexing in an nth ts segment, recording the pts data as ts _ n _ packet _ pts,

ts _ n _ packet _ pts is from pts data in the nth packet after demultiplexing the current ts segment;

determining the pts time of the n-th frame image after demultiplexing in the n-th ts segment, and the offset _ pts _ value relative to the pts time of the first frame image after demultiplexing in the ts segment, wherein:

offset_pts_value=ts_n_packet_pts-ts_first_packet_pts；

determining pts time ok _ n _ packet _ pts of the n frame image after demultiplexing in the nth ts segment, wherein:

ok_n_packet_pts=ts_start_pts+offset_pts_value；

a plurality of ts segments are sequentially played.

Further, before the step of calculating the correct pts of the first frame image in the nth ts segment, which is denoted as ts _ start _ pts and is the total duration of the first n-1 ts segments, the method further includes:

and receiving the cross-segment playing operation of the user, and determining the total number of ts segments.

Further, the step of receiving the cross-segment playing operation of the user includes:

receiving a first touch instruction of a user in an operation area;

acquiring content segments played by a user in a video sequence playing process, wherein the played content segments comprise dragging ts segments;

and regenerating the ts segment according to the played content segment and a second touch instruction of the user.

Further, the step of regenerating the ts segment according to the played content segment and the second touch instruction of the user includes:

acquiring the characteristic information of the played content segment;

searching the current ts segment and the adjacent ts segment of the played content segment, and determining the reference pts time of the change of the characteristic information;

pushing an image of a reference pts time continuous interval time frame at a corresponding position of the playing progress bar by taking the reference pts time as a center;

and regenerating pts segments according to the second touch instruction of the user to the image of each continuous interval time frame.

Further, the step of obtaining the feature information of the played content segment includes:

acquiring picture information and sound information of a played content segment, wherein the picture information comprises scene information and character information, and the sound information comprises voice information and background sound information;

determining feature information F = S · a + C · B + V · C + B · d according to the scene information, the character information, the voice information and the background voice information, wherein S is an offset of the scene information, C is an offset of the character information, V is an offset of the voice information, B is an offset of the background voice information, and a, B, C and d are weighted values corresponding to the offsets;

the step of searching the current ts segment and the adjacent ts segment of the played content segment and determining the reference pts time of the change of the characteristic information comprises the following steps:

when the characteristic information exceeds a threshold value, determining a ts segment corresponding to the threshold value exceeding and reference pts time when the threshold value exceeding is determined;

and acquiring the pts of the frame with the maximum scene information offset value near the reference pts time as the reference pts time.

The invention also provides a seek device of a player based on the HLS format video source scene, which comprises:

the first determining module is used for determining the correct pts of the first frame image in the nth ts segment, and the correct pts is recorded as ts _ start _ pts, and the ts _ start _ pts is the sum of the time lengths of the first n-1 ts segments;

the first acquisition module is used for acquiring pts data of the first frame image after demultiplexing in the nth ts segment, and the pts data is marked as ts _ first _ packet _ pts;

the second obtaining module is used for obtaining pts data of the demultiplexed nth frame image in the nth ts segment, and the pts data are recorded as ts _ n _ packet _ pts, and the ts _ n _ packet _ pts is from pts data in the nth packet after the current ts segment is demultiplexed;

a second determining module, configured to determine, in an nth ts segment, an offset _ pts _ value of a pts time of an nth frame image after demultiplexing with respect to a pts time of a first frame image after demultiplexing in the ts segment, where:

offset_pts_value=ts_n_packet_pts-ts_first_packet_pts；

a third determining module, configured to determine pts time ok _ n _ packet _ pts of the n-th frame image after demultiplexing in the nth ts segment, where:

ok_n_packet_pts=ts_start_pts+offset_pts_value；

and the processing module is used for sequentially playing the ts fragments.

Further, the device further comprises a receiving module, configured to receive a cross-segment playing operation of a user, and determine the total number of ts segments.

Further, the receiving module includes:

the receiving submodule is used for receiving a first touch instruction of a user in the control area;

the acquisition submodule is used for acquiring content segments played by a user in the video sequence playing process, wherein the played content segments comprise dragging ts segments;

and the generation sub-module is used for regenerating the ts segment according to the played content segment and a second touch instruction of the user.

Further, the generating sub-module includes:

an acquisition unit configured to acquire feature information of a played content segment;

a determining subunit, configured to search ts segments that are currently and adjacently located in a played content segment, and determine a reference pts time at which the feature information is changed;

the pushing subunit is used for pushing an image of a reference pts time continuous interval time frame at a corresponding position of the playing progress bar by taking the reference pts time as a center;

and the generating subunit is used for regenerating the pts section according to the second touch instruction of the user to the image of each continuous interval time frame.

Further, the acquiring unit includes:

the acquiring subunit is used for acquiring picture information and sound information of the played content segment, wherein the picture information comprises scene information and character information, and the sound information comprises voice information and background sound information;

a first determining subunit, configured to determine feature information F = S · a + C · B + V · C + B · d according to the scene information, the character information, the vocal information, and the background vocal information, where S is an offset of the scene information, C is an offset of the character information, V is an offset of the vocal information, B is an offset of the background vocal information, and a, B, C, and d are weighted values corresponding to the offsets;

a determination unit comprising:

the second determining subunit is used for determining ts fragments corresponding to the threshold value exceeding and reference pts time when the threshold value is exceeded when the characteristic information exceeds the threshold value;

and the acquisition subunit is used for acquiring pts of the frame where the maximum value of the scene information offset is located near the reference pts time as the reference pts time.

The method comprises the steps of re-determining the correct pts of a first frame image in an nth ts segment, obtaining pts data of the first frame image after demultiplexing in the nth ts segment, obtaining the pts data of the nth frame image after demultiplexing, determining the pts time of the nth frame image in the nth ts segment according to the pts data of the nth frame image and the pts data of the first frame image, and finally determining the pts time of the nth frame image in the nth ts segment after demultiplexing relative to the offset of the pts time of the first frame image after demultiplexing in the ts segment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a ts comparison table before and after correction according to a first embodiment of the present invention;

fig. 2 is a flowchart of a seek method of a player in a video source scene based on HLS format according to a first embodiment of the present invention;

fig. 3 is a flowchart of a seek method of a player in a video source scene based on HLS format according to a second embodiment of the present invention;

fig. 4 is a detailed flowchart of S21;

fig. 5 is a detailed flowchart of S213;

fig. 6 is a detailed flowchart of S2131 and S2132;

fig. 7 is a block diagram of a seek apparatus of a player in a video source scene based on HLS format according to a third embodiment of the present invention;

fig. 8 is a detailed structural block diagram of the receiving module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment:

referring to fig. 1 and fig. 2, an embodiment of the present invention discloses a seek method for a player in a video source scene based on HLS format, including:

s11, determining the correct pts of the first frame image in the nth ts segment, and recording the correct pts as ts _ start _ pts, wherein the ts _ start _ pts is the sum of the time lengths of the first n-1 ts segments.

Generally, an HLS format video is formed by cutting a plurality of (e.g., three videos, such as movie1, movie2, movie3, etc.) videos, and each video only takes a part of ts segments (e.g., in fig. 1, movie1 includes three ts segments, movie2 includes two ts segments, and movie3 includes four ts segments), each ts segment has a corresponding duration, and ts _ start _ pts is taken as the sum of the durations of the first n-1 ts segments.

S12, acquiring pts data of the first frame image after demultiplexing in the nth ts segment, and recording the pts data as ts _ first _ packet _ pts.

In the step, the original pts of the first frame image after demultiplexing is obtained in the nth ts segment.

S13, acquiring pts data of the demultiplexed nth frame image in the nth ts segment, and recording the pts data as ts _ n _ packet _ pts, wherein the ts _ n _ packet _ pts is the pts data in the nth packet after the current ts segment is demultiplexed.

In the step, the original pts of the nth frame image after demultiplexing is obtained in the nth ts segment.

S14, determining a pts time of the nth frame image after demultiplexing in the nth ts segment, and an offset _ pts _ value from the pts time of the first frame image after demultiplexing in the ts segment, where:

offset_pts_value=ts_n_packet_pts-ts_first_packet_pts。

the time difference of pts of the nth frame image relative to pts of the first frame image can be calculated through s _ first _ packet _ pts and ts _ n _ packet _ pts, and is recorded as an offset.

S15, determining pts time ok _ n _ packet _ pts of the n frame image after demultiplexing in the n ts segment, wherein:

ok_n_packet_pts=ts_start_pts+offset_pts_value。

and ok _ n _ packet _ pts is the actual pts time of the nth frame image after demultiplexing.

And S16, sequentially playing a plurality of ts fragments.

As can be seen from fig. 1, the modified video can be played in the normal sequence, and when the ts slice of movie2_1 is played, the time of the progress bar is displayed correctly, and when the seek is played in the 31 st second, the modified video can be played normally, and the playing time can also be displayed correctly.

In the embodiment of the invention, the correct pts of the first frame image in the nth ts segment is re-determined, the pts data of the first frame image after demultiplexing is obtained in the nth ts segment, the pts data of the nth frame image after demultiplexing is obtained in the nth ts segment, then the pts time of the nth frame image after demultiplexing is determined according to the pts data of the nth frame image and the pts data of the first frame image, the offset of the pts time of the first frame image after demultiplexing in the ts segment relative to the pts time of the first frame image after demultiplexing is determined, and finally the pts time of the nth frame image in the nth ts segment after demultiplexing is determined, so that the video can be normally played in a plurality of spliced segments of HLS format videos, the pts time of each frame image is accurately corrected, the video can be compatible with non-standard videos, the transcoding of an encoding server is avoided, the server cost is saved, and good seek is provided on the other hand, and the experience of the video seek meeting the requirement just before interaction of an educational scene.

Second embodiment:

referring to fig. 3 to fig. 6, an embodiment of the present invention discloses another seek method for a player in a video source scene based on HLS format, including:

and S21, receiving the cross-segment playing operation of the user and determining the total number of ts segments.

The step is usually used under the condition that important content needs to be played in an education or editing scene, a user determines the approximately corresponding position of the important content on the progress bar through the operation of pre-dragging the progress bar, then drags the progress bar at intervals for multiple times within one sequence time to realize cross-section playing operation so as to manually approximately determine the position of the important content, and then determines the total number of ts segments through the mode of combining intelligent locking and manual selection.

It should be noted that the cross-segment playing operation in this step may be a multiple-time cross-segment dragging operation.

As shown in fig. 4, as a preferred solution but not limited thereto, the step S21 further includes steps S211 to S213:

s211, a first touch command of a user in the control area is received.

In this step, the first touch command may be only a gesture touch operation, such as a zigzag sliding operation, in a specific touch area, such as a playing frame. Dragging operation of the progress bar before the "Z" shaped sliding operation may also be included, and also in general, the "Z" shaped sliding may be used as a shortcut for starting to determine the total number of ts segments.

S212, acquiring the content segments played by the user in the video sequence playing process, wherein the played content segments comprise dragging ts segments.

After the first touch instruction is received, intelligently judging a plurality of spaced forward dragging operations of a user on the video in the video sequential playing process, recording the forward dragging position, and sequentially determining the number of fragments of the played content and the ts fragment where the played content fragment is located. In general, the played content segment may further include a ts segment that is continuous between two dragging ts segments.

And S213, regenerating the ts segment according to the played content segment and the second touch instruction of the user.

The implementation manner of regenerating ts segments provided in this step is not limited to automatic generation and intelligent cooperation manual completion, and it should be noted that in this implementation, the regenerated ts segments may be the same as or different from the original ts segments of the video, and in such an understanding, the number of the regenerated ts segments may be less than the number of the original cut ts segments of the video, or equal to the number of the played content segments.

Referring to fig. 5, as a preferable scheme but not limited to step S213, in this embodiment, it is preferable to generate the ts segment in an intelligent and manual manner to improve the accuracy of the ts segment, and step S213 further includes steps S2131 to S2134:

s2131, obtaining the characteristic information of the played content segment.

S2132, searching current ts fragments and adjacent ts fragments of the played content fragments, and determining the reference pts time of the characteristic information change.

In general, the played content segment dragged by the user should contain the reference pts, but the reference pts may be earlier or later than the played content segment, so the present step searches for the current and adjacent ts segments of the played content segment, so as to ensure the operational convenience of the user and the reliability of obtaining the reference pts to the maximum extent, and avoid the situation of resetting the played content segment due to the user's reverse dragging.

And S2133, with the reference pts time as a center, pushing an image of a time frame with continuous intervals of the reference pts time at a corresponding position of the playing progress bar.

The continuous interval time of the step can be 5s or 10s, and an image screenshot or an image diagram corresponding to the time appears on the playing progress bar.

And S2134, regenerating pts sections according to the second touch instruction of the user to the image of each continuous interval time frame.

In this step, the second touch instruction may be a unique confirmation operation on an image of a continuous time frame, or a deletion operation on a corresponding played content segment, or may be a gesture touch operation on a specific touch area, such as a playing screen, such as an "O" -shaped sliding operation, so as to implement direct circular playing after step S27.

As shown in fig. 6, as a supplement to the step S2131, but not a limitation to the step S2131, the step S2131 further includes S2131a and S2131b:

s2131a, picture information and sound information of the played content segment are obtained, wherein the picture information comprises scene information and character information, and the sound information comprises voice information and background sound information.

As an example and not by way of limitation, the scene information includes background color of the document, pattern information, such as white background, black background or shading, background pattern, etc., and may also include scene information of the person, such as indoor, outdoor, etc.; the personal information may include information on the face or clothing of the interlocutor, or the like, which appears in the screen, without being limited to the speaker, the interlocutor, and the like. The principle of life information and background sound information is the same, and no excrescence is made here.

And S2131B, determining feature information F = S · a + C · B + V · C + B · d according to the scene information, the character information, the voice information and the background voice information, wherein S is an offset of the scene information, C is an offset of the character information, V is an offset of the voice information, B is an offset of the background voice information, and a, B, C and d are weighted values corresponding to the offsets.

As an example and not by way of limitation, when each feature information does not change, it is determined that an offset of the corresponding feature information is 0, when a variation of the feature information is larger, a value of the offset of the corresponding feature information is larger, and finally whether the feature information changes or not is comprehensively considered according to weights corresponding to the scene information, the person information, the voice information, and the background voice information.

Referring to fig. 6, corresponding to steps S2131a and S2131b, S2132 further includes steps S2132a and S2132b:

and S2132a, when the characteristic information exceeds the threshold, determining a ts segment corresponding to the threshold exceeding and a reference pts time when the threshold exceeding is carried out.

When the weight change corresponding to the scene information, the person information, the voice information and the background voice information is most significant, the current pts is considered to be close to or equal to the reference pts and is taken as the reference pts time.

And S2132b, collecting pts of a frame where the scene information offset maximum value is located near the reference pts time as a reference pts time.

In this embodiment, the pts of the frame in which the maximum value of the scene information offset is located is preferably taken as the reference pts time.

S22, determining the correct pts of the first frame image in the nth ts segment, and recording the correct pts as ts _ start _ pts, wherein the ts _ start _ pts is the sum of the time lengths of the first n-1 ts segments.

And S23, acquiring pts data of the first frame image after demultiplexing in the nth ts segment, and recording the pts data as ts _ first _ packet _ pts.

S24, acquiring pts data of the demultiplexed nth frame image in the nth ts segment, and recording the pts data as ts _ n _ packet _ pts, wherein the ts _ n _ packet _ pts is the pts data in the nth packet after the current ts segment is demultiplexed.

S25, determining a pts time of the nth frame image after demultiplexing in the nth ts segment, and an offset _ pts _ value from the pts time of the first frame image after demultiplexing in the ts segment, where:

offset_pts_value=ts_n_packet_pts-ts_first_packet_pts。

s26, determining pts time ok _ n _ packet _ pts of the demultiplexed nth frame image in the nth ts segment, wherein:

ok_n_packet_pts=ts_start_pts+offset_pts_value。

and S27, sequentially playing the ts fragments.

Steps S22 to S27 are the same as the corresponding steps of the first embodiment, and are not described again.

The embodiment of the invention determines the total number of the ts segments by receiving the cross-segment playing operation of the user, can determine the total number of the ts segments by combining intelligent locking and manual selection under the condition that important contents need to be played in an education or editing scene, and improves the reliability and flexibility of the method. Meanwhile, the number of the fragments of the played content and the ts fragment where the played content fragment is located are intelligently determined, the ts fragment can be generated according to actual needs, the ts fragment can be conveniently processed subsequently, finally, the reference pts time is determined through combination of various feature information, and selection according to pictures is provided, so that the usability and reliability of the method are improved to the maximum extent, and good seek experience is provided.

The third embodiment:

referring to fig. 7 and fig. 8, the present invention further provides a seek apparatus 100 of a player in an HLS format video source scenario, including a receiving module 110, a first determining module 120, a first obtaining module 130, a second obtaining module 140, a second determining module 150, a third determining module 160, and a processing module 1070, where:

the receiving module 110 is connected to the first determining module 120, the first obtaining module 130, and the second obtaining module 140, and is configured to receive a cross-segment playing operation of a user and determine the total number of ts segments.

The first determining module 120 is connected to the third determining module 160, and is configured to determine a correct pts of the first frame image in the nth ts segment, where the correct pts is denoted as ts _ start _ pts, and ts _ start _ pts is a sum of durations of the first n-1 ts segments.

The first obtaining module 130 is connected to the second determining module 150, and is configured to obtain pts data of the first frame image in the nth ts segment after demultiplexing, which is denoted as ts _ first _ packet _ pts.

The second obtaining module 140 is connected to the second determining module 150, and is configured to record pts data of the demultiplexed nth frame image as ts _ n _ packet _ pts, where the ts _ n _ packet _ pts is pts data in the nth packet after the current ts segment is demultiplexed.

The second determining module 150 is connected to the third determining module 160, and configured to determine an offset _ pts _ value of the pts time of the nth frame image after demultiplexing in the nth ts segment relative to the pts time of the first frame image after demultiplexing in the ts segment, where:

offset_pts_value=ts_n_packet_pts-ts_first_packet_pts

the third determining module 160 is connected to the processing module 1070, and configured to determine a pts time ok _ n _ packet _ pts of the n-th frame image after demultiplexing in the n-th ts segment, where:

ok_n_packet_pts=ts_start_pts+offset_pts_value。

and the processing module 1070 is configured to sequentially play the ts segments.

Referring to fig. 8, in the present embodiment, the receiving module 110 includes a receiving submodule 111, an obtaining submodule 112, and a generating submodule 113, where:

the receiving submodule 111 is connected with the obtaining submodule 112 and is used for receiving a first touch instruction of a user in the control area;

the obtaining submodule 112 is connected to the generating submodule 113, and is configured to obtain a content segment that has been played by a user in a video sequential playing process, where the played content segment includes a dragged ts segment;

and the generating sub-module 113 is configured to regenerate the ts segment according to the played content segment and the second touch instruction of the user.

As a preferred but not limited solution to the generating sub-module 113, the generating sub-module 113 further includes an obtaining unit 1131, a determining unit 1132, a pushing unit 1133, and a generating unit 1134, where:

an obtaining unit 1131, connected to the determining unit 1132, configured to obtain feature information of the played content segment;

a determining unit 1132, connected to the pushing unit 1133, configured to search current and adjacent ts segments of the played content segment, and determine a reference pts time at which the feature information is changed;

the pushing unit 1133 is connected to the generating unit 1134, and is configured to push, at a position corresponding to the playing progress bar, an image of a time frame with a continuous interval of the reference pts time, with the reference pts time as a center;

the generating unit 1134 is configured to regenerate a pts section according to the second touch instruction of the user for the image of each of the consecutive time frames.

As a supplement to the obtaining unit 1131, but not limited thereto, the obtaining unit 1131 further includes a obtaining subunit 1131a and a first determining subunit 1131b:

the obtaining subunit 1131a, connected to the first determining subunit 1131b, is configured to obtain picture information and sound information of the played content segment, where the picture information includes scene information and character information, and the sound information includes voice information and background sound information;

a first determining subunit 1131B, configured to determine, according to the scene information, the person information, the voice information, and the background voice information, feature information F = S · a + C · B + V · C + B · d, where S is an offset of the scene information, C is an offset of the person information, V is an offset of the voice information, B is an offset of the background voice information, and a, B, C, and d are weighted values corresponding to the offsets.

Corresponding to the obtaining sub-unit 1131a and the first determining sub-unit 1131b, the determining unit 1132 further includes a second determining sub-unit 1132a and an acquiring sub-unit 1132b:

the second determining subunit 1132a is connected to the first determining subunit 1131b and the acquiring subunit 1132b, and is configured to determine, when the feature information exceeds the threshold, a ts segment corresponding to the threshold exceeding and a reference pts time when the threshold exceeding is performed.

And the collecting subunit 1132b is configured to collect, as a reference pts time, a pts of a frame where the maximum scene information offset is located near the reference pts time.

The modules and units of this embodiment correspond to the steps of the first and second embodiments, and are not described herein again.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the above division of each functional module is only used for illustration, and in practical applications, the above function distribution may be performed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

In the embodiments provided in the present application, it should be understood that the disclosed seek method and seek apparatus may be implemented in other manners. For example, the above-described embodiment of the seek apparatus is merely illustrative, and the division of the modules or units is only one logical function division, and there may be other division ways in actual implementation, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may substantially or partially contribute to the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A seek method of a player in a video source scene based on an HLS format is characterized by comprising the following steps:

determining the correct pts of the first frame image in the nth ts segment, and recording the correct pts as ts _ start _ pts, wherein the ts _ start _ pts is the sum of the durations of the first n-1 ts segments;

acquiring pts data of an nth frame image after demultiplexing from an nth ts segment, and recording the pts data as ts _ n _ packet _ pts, wherein the ts _ n _ packet _ pts is from pts data in the nth packet after demultiplexing of a current ts segment;

offset_pts_value＝ts_n_packet_pts-ts_first_packet_pts；

ok_n_packet_pts＝ts_start_pts+offset_pts_value；

sequentially playing a plurality of ts segments;

before the step of determining the correct pts of the first frame image in the nth ts segment, which is recorded as ts _ start _ pts, and ts _ start _ pts is the sum of the durations of the first n-1 ts segments, the method further includes:

receiving a cross-segment playing operation of a user, and determining the total number of ts segments;

the step of receiving the cross-segment playing operation of the user comprises:

receiving a first touch instruction of a user in an operation area, wherein the first touch instruction comprises a dragging operation;

acquiring content segments played by a user in a video sequential playing process, recording a forward dragging position, sequentially determining the number of the played content segments and a ts segment where the played content segments are located, wherein the played content segments comprise dragged ts segments, and the played content segments further comprise continuous ts segments between the two dragged ts segments;

regenerating a ts segment according to the played content segment and a second touch instruction of the user, wherein the regenerated ts segment is the same as or different from the original ts segment of the video;

the step of regenerating the ts segment according to the played content segment and the second touch instruction of the user includes:

acquiring the characteristic information of the played content segment;

pushing an image of a time frame continuously spaced from the reference pts time at a corresponding position of the playing progress bar by taking the reference pts time as a center;

regenerating pts sections according to second touch instructions of the user to the images of the continuous interval time frames;

the step of obtaining the feature information of the played content segment includes:

acquiring picture information and sound information of a played content clip, wherein the picture information comprises scene information and character information, and the sound information comprises voice information and background sound information;

the step of searching the current ts segment and the adjacent ts segment of the played content segment and determining the reference pts time of the characteristic information change comprises the following steps:

and acquiring pts of a frame where the maximum value of the scene information offset is located near the reference pts time as a reference pts time.

2. A seek device of a player based on HLS format video source scene, comprising:

the first determining module is used for determining the correct pts of the first frame image in the nth ts fragment, and the correct pts is recorded as ts _ start _ pts, wherein the ts _ start _ pts is the sum of the time lengths of the first n-1 ts fragments;

offset_pts_value＝ts_n_packet_pts-ts_first_packet_pts；

a third determining module, configured to determine a pts time ok _ n _ packet _ pts of the n-th frame image after demultiplexing in the n-th ts segment, where:

ok_n_packet_pts＝ts_start_pts+offset_pts_value；

the processing module is used for sequentially playing the ts fragments;

the receiving module is used for receiving the cross-segment playing operation of a user and determining the total number of ts segments;

the receiving module comprises:

the receiving submodule is used for receiving a first touch instruction of a user in an operation area, and the first touch instruction comprises a dragging operation;

the acquisition submodule is used for acquiring the played content segments of a user in the video sequential playing process, recording the forward dragging position, sequentially determining the number of the played content segments and the ts segments where the played content segments are located, wherein the played content segments comprise dragged ts segments, and the played content segments further comprise continuous ts segments between the two dragged ts segments;

the generation submodule is used for regenerating a ts segment according to the played content segment and a second touch instruction of the user, and the regenerated ts segment is the same as or different from the original ts segment of the video;

generating a sub-module comprising:

the acquiring unit is used for acquiring the characteristic information of the played content segment;

a determining subunit, configured to search a current ts segment and an adjacent ts segment of a played content segment, and determine a reference pts time at which the feature information is changed;

the pushing subunit is used for pushing an image of a time frame continuously spaced from the reference pts time at a position corresponding to the playing progress bar by taking the reference pts time as a center;

the generating subunit is used for regenerating pts sections according to a second touch instruction of the user to the image of each continuous interval time frame;

an acquisition unit comprising:

the acquisition subunit is used for acquiring picture information and sound information of the played content segment, wherein the picture information comprises scene information and character information, and the sound information comprises voice information and background sound information;

a determination unit comprising:

and the acquisition subunit is used for acquiring pts of a frame in which the maximum scene information offset is positioned near the reference pts time as a reference pts time.