Disclosure of Invention
The embodiment of the disclosure at least provides a video processing method, a video playing method, a video processing device, a video playing device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a video processing method, including:
receiving a video acquisition request aiming at a target video sent by a user side; the video acquisition request carries a target display proportion of the user side;
under the condition that the target display proportion is detected to be a first display proportion, obtaining preset cutting information of the target video corresponding to the target display proportion, and obtaining subtitle information determined based on the color value change condition of a subtitle display area of the target video; wherein the first display proportion is a display proportion at which the subtitle display area cannot be completely displayed;
and processing the target video based on the subtitle information and the cutting information, and sending the processed target video to the user side.
In a possible implementation, the method further includes determining a subtitle display area in the target video according to the following method:
sampling the target video, and determining a plurality of sampling video frames of the target video;
identifying text-presentation regions in the plurality of sampled video frames;
and determining a subtitle display area in the target video based on the character display areas in the plurality of sampling video frames.
In a possible implementation, the method further includes determining a subtitle display area in the target video according to the following method:
and inputting the target video to a pre-trained first neural network, and outputting a subtitle display area of the target video by the first neural network.
In a possible implementation manner, the cropping information includes a cropping coordinate corresponding to each video frame in the target video;
for any first display scale, the method further comprises determining cropping information of the target video at the any first display scale according to the following method:
and inputting the target video and any one first display proportion into a pre-trained second neural network, wherein the second neural network outputs the cutting information of the target video at any one first display proportion.
In one possible implementation, the subtitle information of the target video is determined according to the following method:
determining a change video frame, in the target video, of which the characters displayed in the subtitle display area are different from the adjacent video frames;
and identifying subtitle information displayed in a subtitle display area in the change video frame.
In a possible implementation manner, the obtaining of the subtitle information determined based on the color value change condition of the subtitle display area of the target video includes:
acquiring continuous pixel points with the same color value in a subtitle display area of the target video;
determining continuous pixel points to be screened based on the color difference value between the continuous pixel points and other pixel points and the change condition of the same pixel position of the continuous pixel points in preset time;
and aggregating the continuous pixel points to be screened, matching the aggregation result with characters stored in a character library, and determining the subtitle information of the target video based on the matching result.
In a possible implementation manner, the step of obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video is executed by a third neural network;
the third neural network is obtained by training according to the following steps:
acquiring a sample video frame with subtitle annotation information;
inputting the sample video frame into a third neural network to be trained to obtain predicted caption information corresponding to the sample video;
and training the third neural network to be trained based on the predicted caption information and the caption marking information.
In a possible implementation, the processing the target video based on the subtitle information and the cropping information includes:
intercepting caption images corresponding to the matched continuous pixel points from the target video;
cutting the target video according to the cutting information; and overlaying the subtitle image to the cut target video.
In a possible implementation, the processing the target video based on the subtitle information and the cropping information includes:
after the target video is cut based on the cutting information, if the cut target video comprises a part of subtitle areas, fuzzy processing is carried out on the character information in the part of subtitle areas in the target video, and the subtitle information is displayed in the target video after the fuzzy processing in an overlapping mode.
In a second aspect, an embodiment of the present disclosure provides a video playing method, including:
responding to the playing operation of a target video, and sending a video acquisition request, wherein the video acquisition request carries a target display proportion of a user side;
and receiving and playing the processed target video, wherein the processed target video is determined according to the cutting information corresponding to the target display proportion and the subtitle information determined based on the color value change condition of the subtitle display area of the target video.
In a third aspect, an embodiment of the present disclosure further provides a video processing apparatus, including:
the receiving module is used for receiving a video acquisition request aiming at a target video sent by a user side; the video acquisition request carries a target display proportion of the user side;
the acquisition module is used for acquiring preset cutting information of the target video corresponding to the target display proportion and acquiring subtitle information determined based on the color value change condition of a subtitle display area of the target video under the condition that the target display proportion is detected to be a first display proportion; wherein the first display proportion is a display proportion at which the subtitle display area cannot be completely displayed;
and the processing module is used for processing the target video based on the subtitle information and the cutting information and sending the processed target video to the user side.
In a possible implementation manner, the processing module is further configured to determine a subtitle display area in the target video according to the following method:
sampling the target video, and determining a plurality of sampling video frames of the target video;
identifying text presentation regions in the plurality of sampled video frames;
and determining a subtitle display area in the target video based on the character display areas in the plurality of sampling video frames.
In a possible implementation manner, the processing module is further configured to determine a subtitle display area in the target video according to the following method:
and inputting the target video to a pre-trained first neural network, and outputting a subtitle display area of the target video by the first neural network.
In a possible implementation manner, the cropping information includes a cropping coordinate corresponding to each video frame in the target video;
for any first display scale, the processing module is further configured to determine cropping information of the target video at any first display scale according to the following method:
and inputting the target video and any one first display proportion into a pre-trained second neural network, wherein the second neural network outputs the cutting information of the target video at any one first display proportion.
In a possible implementation manner, the obtaining module is further configured to determine subtitle information of the target video according to the following method:
determining a change video frame, in the target video, of which the characters displayed in the subtitle display area are different from the adjacent video frames;
and identifying subtitle information displayed in a subtitle display area in the change video frame.
In a possible implementation manner, the obtaining module, when obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video, is configured to:
acquiring continuous pixel points with the same color value in a subtitle display area of the target video;
determining continuous pixel points to be screened based on the color difference value between the continuous pixel points and other pixel points and the change condition of the same pixel position of the continuous pixel points in preset time;
and aggregating the continuous pixel points to be screened, matching the aggregation result with characters stored in a character library, and determining the subtitle information of the target video based on the matching result.
In a possible implementation manner, the step of obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video is executed by a third neural network;
the obtaining module is further configured to train the third neural network according to the following steps:
acquiring a sample video frame with subtitle annotation information;
inputting the sample video frame into a third neural network to be trained to obtain predicted caption information corresponding to the sample video;
and training the third neural network to be trained on the basis of the predicted caption information and the caption marking information.
In one possible implementation, when processing the target video based on the subtitle information and the cropping information, the processing module is configured to:
intercepting caption images corresponding to the matched continuous pixel points from the target video;
cutting the target video according to the cutting information; and overlaying the subtitle image to the cut target video.
In one possible implementation, when processing the target video based on the subtitle information and the cropping information, the processing module is configured to:
and after the target video is cut based on the cutting information, if the cut target video comprises a part of subtitle areas, blurring processing is carried out on the character information in the part of subtitle areas in the target video, and the subtitle information is superposed and displayed in the blurred target video.
In a fourth aspect, an embodiment of the present disclosure provides a video playing apparatus, including:
the sending module is used for responding to the playing operation of the target video and sending a video acquisition request, wherein the video acquisition request carries the target display proportion of the user side;
and the playing module is used for receiving and playing the processed target video, and the processed target video is determined according to the cutting information corresponding to the target display proportion and the subtitle information determined based on the color value change condition of the subtitle display area of the target video.
In a fifth aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of any one of the possible implementations of the first or second aspect.
In a sixth aspect, the disclosed embodiments also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program is executed by a processor to perform the steps in any one of the possible implementation manners of the first aspect or the second aspect.
According to the video processing and playing method and device, the computer device and the storage medium provided by the embodiment of the disclosure, for each video, the cutting information and the subtitle information corresponding to the video at each first display proportion can be predetermined, after a video acquisition request sent by a user side is received, the target video can be processed based on the predetermined subtitle information and cutting information corresponding to the target display proportion under the condition that the target display proportion of the user side is detected to be the first display proportion, and then the processed target video is sent to the user side, so that complete display of the subtitle information can be ensured under the condition that the display proportions of different user sides are met, and the viewing experience of users is improved.
In addition, because the subtitle information of the target video is determined based on the color value change condition of the subtitle display area of the target video, the determined subtitle information is more accurate.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of a variety, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Research shows that when the video is played, the requested video is generally cut to meet the display requirements of different users. When a video is cut, the cutting of key content in the video is often avoided by cutting the subtitles, so that the subtitle loss during video playing is caused, and the video watching experience of a user is influenced.
Based on the above research, the present disclosure provides a video processing and playing method, an apparatus, a computer device, and a storage medium, for each video, the clipping information and the subtitle information corresponding to the video at each first display ratio may be predetermined, after receiving a video acquisition request sent by a user, the target video may be processed based on the predetermined subtitle information and the clipping information corresponding to the target display ratio under the condition that it is detected that the target display ratio of the user is the first display ratio, and then the processed target video is sent to the user.
To facilitate understanding of the present embodiment, a detailed description is first given of a video processing method disclosed in the embodiments of the present disclosure, and an execution subject of the video processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device is generally a server.
Referring to fig. 1, a flowchart of a video processing method provided in the embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:
s101: receiving a video acquisition request aiming at a target video sent by a user side; the video obtaining request carries a target display proportion of the user side.
S102: under the condition that the target display proportion is detected to be a first display proportion, obtaining preset cutting information of the target video corresponding to the target display proportion, and obtaining subtitle information determined based on the color value change condition of a subtitle display area of the target video; wherein the first display proportion is a display proportion at which the subtitle display region cannot be completely displayed.
S103: and processing the target video based on the subtitle information and the cutting information, and sending the processed target video to the user side.
Each step and the corresponding implementation method in the embodiments of the present disclosure will be described in detail below.
For S101, the target display ratio of the user side may be a ratio that the target video needs to be displayed on the terminal device corresponding to the user side. The ratio to be displayed includes an aspect ratio of full-screen display, for example, the aspect ratio of the terminal device corresponding to the user side is 21:9, the display proportion of the target video which needs to be displayed in a full screen on the terminal device is also 21:9; or, the ratio to be displayed may also be a ratio to be viewed, which is selected by the user for the target video, for example, before the user triggers the target video to play, a display ratio input box may be set, and the user may input the ratio to be viewed within a preset range, which may be, for example, 16:9 to 22:9, such as 18.5: 9. 21:9, etc.
In a specific implementation, the video obtaining request of the target video may be a request generated after the user triggers a play button corresponding to the target video, for example, after the user triggers a play button of any target video in a user-side application program, the video obtaining request of the corresponding target video is generated; or, after the target video has been played, the user inputs the ratio of the target video to be watched through the display ratio input box at the preset position, and then generates the video acquisition request of the corresponding target video.
S102: under the condition that the target display proportion is detected to be a first display proportion, obtaining preset cutting information of the target video corresponding to the target display proportion, and obtaining subtitle information determined based on the color value change condition of a subtitle display area of the target video; wherein the first display proportion is a display proportion at which the subtitle display region cannot be completely displayed.
For any video, the video is often uploaded according to a certain display ratio when being uploaded to a server by a video producer, the display ratio is generally related to a device for shooting the video, and the target display ratio of a user terminal needing to play the video may be various, and the display ratio of a common user terminal device is 16: 9. 18: 9. 19: 9. 21: and 9, after receiving the video, adaptively adjusting the video according to different display proportions, wherein the adaptively adjusting is generally to clip part of the content of the video, so that the key information of the video can be displayed under different display proportions.
As shown in fig. 2, to adjust the display scale of the video content, in fig. 2, the initial display scale of the video is 16:9 (solid line portion), target display ratio 21:9, the process of said adjustment is also in 16:9, cutting out a complete display scale of 21:9 (dashed line).
In the process of adjusting the video, it is necessary to determine the cropping information of the video at each second display scale, where the second display scale may include the display scales of all known devices (i.e., the aspect ratio of the screen of the known device), and determine the subtitle display area of the video, and when it is detected that the subtitle display area cannot be completely displayed at any second display scale, process the video based on the cropping information and the subtitle information corresponding to the second display scale information.
Here, the second presentation proportion includes the first presentation proportion, and for example, the second presentation proportion may include A, B, C, D, E, F six presentation proportions, where if a subtitle presentation area of a target video cannot be completely presented under cutting information corresponding to A, B, C, D four presentation proportions, A, B, C, D four presentation proportions are the first presentation proportion. For E, F, when a video acquisition request carrying an E or F presentation ratio sent by a user end is received, because a subtitle presentation area can be completely presented, a target video is processed only according to clipping information corresponding to the target presentation ratio carried in the video acquisition request.
When determining the subtitle display area in the target video, the method can be implemented in any one of the following two ways:
in one possible implementation, as shown in fig. 3, the subtitle display area in the target video may be determined by:
s301: and sampling the target video, and determining a plurality of sampling video frames of the target video.
Here, in sampling the target video, the target video may be sampled at a preset initial sampling frequency, for example, 5 frames per second.
S302: text presentation regions in the plurality of sampled video frames are identified.
Specifically, the text display areas in the plurality of video frames, in which the text is displayed, may be identified through an Optical Character Recognition (OCR) or other identification technologies, where the identifying the text display areas in the plurality of sample video frames may be to identify area coordinate information corresponding to the text display areas in the sample video frames.
Illustratively, as shown in fig. 4a, it is a schematic diagram of the identified text display area in the video frame. The area coordinate information of the text display area, such as the pixel coordinates of its four vertices, is determined in fig. 4 a.
The text display area comprises a subtitle display area for displaying subtitles and a non-subtitle text display area in the video frame.
Specifically, when the target video is played, besides the text content displayed in the subtitle display area, text displayed with special effect often appears in some positions, such as around a human face or around an object, for example, "how |)! "etc., which may include a subtitle display area and/or a text display area in a video picture.
S303: and determining a subtitle display area in the target video based on the character display areas in the plurality of sampling video frames.
Here, since the display position of the subtitle in the target video is relatively fixed, the subtitle is usually stably located at a certain position in the middle lower part of the video, and the subtitle position is generally displayed horizontally in the middle, the subtitle display area in the text display area may be determined according to the display characteristics of the subtitle.
Specifically, the text display areas in the plurality of sampled video frames may be superimposed to find the superimposed area, and the more times of the superimposed area are superimposed, the more characters appearing in the superimposed area are represented, which is in accordance with the characteristics of the caption display area and is not in accordance with the text "java | displayed with special effect! The display characteristics of the method may determine, as the superimposition position region where the subtitles are displayed, a region where the number of times of superimposition meets a preset condition, for example, a region where the number of times of superimposition is the largest, so that the subtitle display region in the text display region may be determined according to a relative position relationship between the text display region and the superimposition position region.
For example, as shown in fig. 4b, to determine a schematic view of a subtitle display area in the target video, in fig. 4b, four text display areas are superimposed, the width of the longest text display area is represented by a solid line, the widths of three shorter text display areas are represented by a dotted line for distinguishing, an area filled with a shadow in the middle is the superimposition position area, and after the plurality of sampled video frames are superimposed, it may be determined that the text display area including the superimposition position area is the subtitle display area; or, it may also be determined that a text display area in the same line as the superimposition position area is the subtitle display area; or, it may also be determined that a text display area that is less than a preset distance from the superimposition position area is used as the subtitle display area.
In addition, high-frequency sampling can be performed on video content in a preset time period of the target video, for example, 10 frames per second, which is higher than 5 frames per second of the initial sampling frequency, is performed in the first minute or any middle minute of the video to obtain a key frame of the target video, the distribution condition of the display area where the subtitles are located is obtained by performing superposition processing on the character display area on the key frame, and the predicted display range of the subtitles in the target video can be determined according to the distribution condition.
Specifically, when the predicted display range of the subtitles in the target video is determined according to the distribution of the display areas where the subtitles are located, the display area of the current video picture may be pre-divided into a plurality of areas to be screened, when the display areas where the subtitles are located are all located in a certain area to be screened, the area to be screened may be used as the predicted display range, for example, the display area of the video picture may be divided into an upper half display area and a lower half display area, and when it is detected that the subtitle display areas are all located in the lower half display area, the lower half display area may be determined as the predicted display range of the subtitles. The division number of the region to be screened can be divided into 2 or more regions according to actual needs, which is not limited in the embodiments of the present disclosure.
Further, in order to avoid the caption display area in the key frame of the target video, only the above-mentioned "java | happens to be displayed within a preset time period! Instead of the subtitle of the video, the predicted display range of the subtitle is erroneously identified, and self-verification can be performed while the subtitle display area is identified subsequently.
For example, the frequency of the text appearing in the predicted display range of the subtitle may be counted, if no text appears in the predicted display range within 1 minute, it is determined that the predicted display range is predicted incorrectly, and the predicted display range is re-determined according to the above steps within a preset time period before or after the preset time period; or, it may be determined whether the identified subtitle display area overlaps with the superimposition position area, and when none of the identified subtitle display areas overlaps with the superimposition position area within a continuous period of time, it is determined that the prediction display range is incorrect, and the prediction display range is re-determined according to the above steps within a preset time period before or after the preset time period. Taking the 45 th minute of the video with the preset time period of 90 minutes and the preset time length of 1 minute as examples, the preset time length before the preset time period is the 44 th minute, and the preset time length after the preset time period is the 46 th minute.
In another possible implementation manner, when determining the subtitle display area in the target video, the target video may be further input to a first neural network trained in advance, and the first neural network outputs the subtitle display area of the target video.
Here, when the first neural network training is performed, a sample video with subtitles and corresponding label information may be used to train the first neural network to be trained, then a loss value in the current training process is calculated based on an output result of the first neural network and the label information corresponding to the sample video, and when the loss value is smaller than a preset loss value, it may be determined that the first neural network training is completed.
In a possible implementation manner, the cropping information includes a cropping coordinate corresponding to each video frame in the target video, and for any first display scale, the cropping information of the target video at any first display scale may be determined according to the following method:
and inputting the target video and any one first display proportion into a pre-trained second neural network, wherein the second neural network outputs the cutting information of the target video at any one first display proportion.
Here, the second neural network may be a neural network of the same type as the first neural network, that is, the second neural network may be trained by using a training method similar to that of the first neural network, and therefore, the training process of the second neural network is not described herein again.
Specifically, when determining the input cropping information of the target video, the second neural network may identify key information in a sample video frame of the target video, where the key information may include, for example, a face image, and may determine, from a plurality of pieces of cropping information to be screened, which is determined for any one of the first display ratios, that the cropping information is reserved with the most key information as the target cropping information, so as to avoid poor video impression caused by cropping the key information. For example, the initial display ratio of the video is 16:9, the first display ratio is 21: and 9, for the first display proportion, randomly generating a plurality of clipping coordinates to be screened by performing horizontal clipping on a video picture (for example, clipping from the upper side or/and the lower side of the video picture in fig. 2), and determining clipping information with the most retained key information as the clipping information of the target video.
In one possible implementation, after determining the subtitle display area, as shown in fig. 5, the subtitle information determined based on the color value change condition of the subtitle display area of the target video may be obtained according to the following steps:
s501: and acquiring continuous pixel points with the same color value in the subtitle display area of the target video.
S502: and determining the continuous pixel points to be screened based on the color difference value between the continuous pixel points and other pixel points and the change condition of the same pixel position of the continuous pixel points in preset time.
Here, the color difference value refers to a difference value between color values of each pixel point, where one color corresponds to one color value, for example, a color value # ffffff expressed in 16 systems in a common web page format indicates that the color is white; or the intensity values of Red, green and Blue color channels represented by (0, 255 and 0) are respectively 0, 255 and 0, the color formed by the three color channels is Green, and the color values have multiple representation methods, so that different color value representation methods can be mutually converted, and the color difference value needs to be converted to the same representation method for calculation.
Aiming at any word in a caption, a color difference value does not exist between a pixel point corresponding to the word and an adjacent pixel point generally, for example, in a white caption, the color of the caption is always white, and the background color in a caption display area is diversified.
And if the color difference value of some continuous pixel points and other pixel points exceeds the preset color difference value and the color value of the pixel position corresponding to the continuous pixel point in the preset time is not changed, taking the continuous pixel point as the continuous pixel point to be screened.
S503: and aggregating the continuous pixel points to be screened, matching the aggregation result with characters stored in a character library, and determining the subtitle information of the target video based on the matching result.
In a possible implementation manner, the color value of the pixel point corresponding to the background in the subtitle display area may also not change within a predetermined time, for example, the background in the subtitle display area is a gray background plate with uniform color, and therefore, it is obviously impossible to directly determine the subtitle information in the subtitle display area based on the color difference value and the change condition of the same pixel position within the predetermined time.
Therefore, after the continuous pixel points to be screened are determined, the continuous pixel points to be screened can be aggregated, the aggregation result is matched with the characters in the character library, the continuous pixel points matched with the characters in the character library are used as the subtitle information of the target video, and the background obviously cannot be matched with the characters in the character library, so that the background in a subtitle display area can be effectively prevented from being determined as the subtitle information.
In a possible implementation manner, the step of obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video may be performed by a third neural network, as shown in fig. 6, and the third neural network may be trained by the following steps:
s601: and acquiring a sample video frame with subtitle marking information.
S602: and inputting the sample video frame into a third neural network to be trained to obtain the predicted caption information corresponding to the sample video.
S603: and training the third neural network to be trained based on the predicted caption information and the caption marking information.
Here, the training process of the third neural network may be similar to that of the first neural network and the second neural network, and the loss value of the current training may be calculated based on the predicted caption information and the caption marking information, and when the loss value is smaller than a preset loss value, it may be determined that the training of the third neural network is completed.
In another possible implementation manner, when determining the subtitles displayed in the subtitle display area, it may also be determined that, in the target video, a change video frame in which characters displayed in the subtitle display area are different from adjacent video frames; and identifying subtitle information displayed in a subtitle display area in the change video frame.
Here, the adjacent video frame may refer to a previous video frame, and the changed video frame, that is, a video frame with changed subtitles, may be determined by changing the texts identified by an identification technology such as OCR, specifically, may be determined by identifying the subtitles in each video frame of the target video obtained after sampling by the identification technology, and determining the video frame with changed subtitles as the changed video frame by comparing whether the texts in the subtitles in each video frame are the same; or, directly using the video frame with the changed display position and/or size of the corresponding subtitle display area as the changed video frame.
For example, taking 100 video frames obtained after the target video is sampled as an example, if the same subtitle/subtitle display areas are respectively displayed in the obtained 2 nd to 35 th frames, 36 th to 70 th frames, and 71 th to 100 th frames, it may be determined that the 2 nd, 36 th, and 71 th frames are the change video frames.
Furthermore, the determined caption display area in the frames 2, 36 and 71 can be identified, and the caption information displayed in the caption display area can be determined. Illustratively, the subtitle information presented by the subtitle presentation area may be determined by recognition techniques such as OCR. Here, the subtitle information includes text content corresponding to a subtitle.
S103: and processing the target video based on the subtitle information and the cutting information, and sending the processed target video to the user side.
Here, the processed target video may be sent to the user end in a streaming data transmission form, and the processing of the target video is not performed by a terminal device corresponding to the user end.
In a possible implementation manner, when the target video is processed based on the subtitle information and the cropping information, the subtitle images corresponding to the matched continuous pixel points can be intercepted from the target video; and cutting the target video according to the cutting information, and overlaying the subtitle image to the cut target video.
For example, when the subtitle image is superimposed onto the clipped target video, the subtitle image may be superimposed onto a preset position of the clipped target video, for example, three pixel positions above the clipped bottom, for performing centered display.
In addition, when the target video is processed based on the subtitle information and the cropping information, after the text content corresponding to the subtitle displayed in the subtitle display area is identified, a new subtitle is generated based on the identified text content, and the new subtitle is superimposed to the cropped target video according to a preset display position for display, so that the processed target video is generated; or, after the subtitle display area is identified, the whole subtitle display area is directly cut, that is, the subtitle background and the subtitle are simultaneously cut, so that a small video simultaneously carrying the subtitle and the subtitle background is generated, and the small video is superimposed to the cut target video according to a preset display position for display. The superimposition mode for displaying the subtitle information in a superimposed manner may refer to the above-mentioned superimposition mode, and is not described herein again.
In another possible implementation manner, after the target video is cut based on the cutting information, if the cut target video includes a partial subtitle region, the text information in the partial subtitle region in the target video may be blurred, and the subtitle information is displayed in the blurred target video in an overlapping manner.
Specifically, the blurring processing includes multiple image blurring processing modes such as gaussian (filtering) blurring processing, mean (filtering) blurring processing, median (filtering) blurring processing, bilateral (filtering) blurring processing, and the like, and the superimposing and displaying of the subtitle information in the target video subjected to the blurring processing includes superimposing the subtitle image to the clipped target video. The superimposing mode for superimposing and displaying the subtitle information in the target video after the blurring processing can refer to the superimposing mode, and is not described herein again.
According to the video processing method provided by the embodiment of the disclosure, for each video, the cutting information and the caption information corresponding to the video at each first display proportion can be predetermined, after a video acquisition request sent by a user side is received, the target video can be processed based on the predetermined caption information and cutting information corresponding to the target display proportion under the condition that the target display proportion of the user side is detected to be the first display proportion, and then the processed target video is sent to the user side, so that the complete display of the caption information can be ensured under the condition that the display proportions of different user sides are met, and the viewing experience of the user is improved.
Referring to fig. 7, which is a flowchart of a video playing method provided in the embodiment of the present disclosure, the method includes steps S701 to S702, where:
s701: responding to the playing operation of the target video, and sending a video acquisition request, wherein the video acquisition request carries a target display proportion of a user side.
S702: and receiving and playing the processed target video, wherein the processed target video is determined according to the cutting information corresponding to the target display proportion and the subtitle information determined based on the color value change condition of the subtitle display area of the target video.
An execution subject of the video playing method provided by the embodiment of the present disclosure is generally a computer device with certain computing capability, and the computer device includes: the intelligent terminal device with the display function can be, for example, a smart phone, a tablet computer, an intelligent wearable device and the like.
For the processing procedure of the target video, reference may be made to related contents in the above video processing method, and details are not repeated here.
It will be understood by those of skill in the art that in the above method of the present embodiment, the order of writing the steps does not imply a strict order of execution and does not impose any limitations on the implementation, as the order of execution of the steps should be determined by their function and possibly inherent logic.
Based on the same inventive concept, a video processing apparatus corresponding to the video processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the video processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 8, there is shown a schematic architecture diagram of a video processing apparatus according to an embodiment of the present disclosure, the apparatus includes: a receiving module 801, an obtaining module 802, and a processing module 803; wherein,
a receiving module 801, configured to receive a video acquisition request for a target video sent by a user side; the video acquisition request carries a target display proportion of the user side;
an obtaining module 802, configured to, when it is detected that the target display ratio is the first display ratio, obtain predetermined clipping information of the target video corresponding to the target display ratio, and obtain subtitle information determined based on a color value change condition of a subtitle display area of the target video; wherein the first display proportion is a display proportion at which the subtitle display area cannot be completely displayed;
the processing module 803 is configured to process the target video based on the subtitle information and the cropping information, and send the processed target video to the user side.
In a possible implementation manner, the processing module 803 is further configured to determine a subtitle display area in the target video according to the following method:
sampling the target video, and determining a plurality of sampling video frames of the target video;
identifying text presentation regions in the plurality of sampled video frames;
and determining a subtitle display area in the target video based on the character display areas in the plurality of sampling video frames.
In a possible implementation manner, the processing module 803 is further configured to determine a subtitle display area in the target video according to the following method:
and inputting the target video to a pre-trained first neural network, and outputting a subtitle display area of the target video by the first neural network.
In a possible implementation manner, the cropping information includes a cropping coordinate corresponding to each video frame in the target video;
for any first display scale, the processing module 803 is further configured to determine cropping information of the target video at any first display scale according to the following method:
and inputting the target video and any one first display proportion into a pre-trained second neural network, wherein the second neural network outputs the cutting information of the target video at any one first display proportion.
In a possible implementation manner, the obtaining module 802 is further configured to determine the subtitle information of the target video according to the following method:
determining a change video frame, in the target video, of which the characters displayed in the subtitle display area are different from the adjacent video frames;
and identifying caption information displayed in a caption display area in the change video frame.
In a possible implementation manner, the obtaining module 802, when obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video, is configured to:
acquiring continuous pixel points with the same color value in a subtitle display area of the target video;
determining continuous pixel points to be screened based on the color difference values between the continuous pixel points and other pixel points and the change condition of the same pixel position of the continuous pixel points in preset time;
and aggregating the continuous pixel points to be screened, matching the aggregation result with characters stored in a character library, and determining the subtitle information of the target video based on the matching result.
In a possible implementation manner, the step of obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video is executed by a third neural network;
the obtaining module is further configured to train the third neural network according to the following steps:
acquiring a sample video frame with subtitle annotation information;
inputting the sample video frame into a third neural network to be trained to obtain predicted caption information corresponding to the sample video;
and training the third neural network to be trained based on the predicted caption information and the caption marking information.
In a possible implementation manner, the processing module 803, when processing the target video based on the subtitle information and the cropping information, is configured to:
intercepting caption images corresponding to the matched continuous pixel points from the target video;
cutting the target video according to the cutting information; and overlaying the subtitle image to the cut target video.
In a possible implementation manner, the processing module 803, when processing the target video based on the subtitle information and the cropping information, is configured to:
after the target video is cut based on the cutting information, if the cut target video comprises a part of subtitle areas, fuzzy processing is carried out on the character information in the part of subtitle areas in the target video, and the subtitle information is displayed in the target video after the fuzzy processing in an overlapping mode.
The video processing apparatus provided by the embodiment of the present disclosure, for each video, may predetermine the clipping information and the subtitle information corresponding to the video at each first display ratio, and after receiving a video acquisition request sent by a user, may process the target video based on the predetermined subtitle information and the clipping information corresponding to the target display ratio when detecting that the target display ratio of the user is the first display ratio, and then send the processed target video to the user, so that, under the condition that the display ratios of different users are satisfied, complete display of the subtitle information is ensured, and viewing experience of the user is improved.
Referring to fig. 9, which is a schematic diagram of an architecture of a video playing apparatus according to an embodiment of the present disclosure, the apparatus includes: a sending module 901 and a playing module 902; wherein,
a sending module 901, configured to send a video acquisition request in response to a playing operation on a target video, where the video acquisition request carries a target display ratio of a user side;
and the playing module 902 is configured to receive and play the processed target video, where the processed target video is determined according to the clipping information corresponding to the target display scale and the subtitle information determined based on the color value change condition of the subtitle display area of the target video.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 10, a schematic structural diagram of a computer device 1000 provided in the embodiment of the present disclosure includes a processor 1001, a memory 1002, and a bus 1003. The memory 1002 is used for storing execution instructions, and includes a memory 10021 and an external memory 10022; the memory 10021 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 1001 and the data exchanged with the external memory 10022 such as a hard disk, the processor 1001 exchanges data with the external memory 10022 through the memory 10021, and when the computer device 1000 operates, the processor 1001 and the memory 1002 communicate through the bus 1003, so that the processor 1001 executes the following instructions:
receiving a video acquisition request aiming at a target video sent by a user side; the video acquisition request carries a target display proportion of the user side;
under the condition that the target display proportion is detected to be a first display proportion, obtaining preset cutting information of the target video corresponding to the target display proportion, and obtaining subtitle information determined based on the color value change condition of a subtitle display area of the target video; wherein the first display proportion is a display proportion at which the subtitle display area cannot be completely displayed;
and processing the target video based on the subtitle information and the cutting information, and sending the processed target video to the user side.
In a possible implementation manner, the instructions of the processor 1001 further include determining a subtitle display area in the target video according to the following method:
sampling the target video, and determining a plurality of sampling video frames of the target video;
identifying text-presentation regions in the plurality of sampled video frames;
and determining a subtitle display area in the target video based on the character display areas in the plurality of sampling video frames.
In a possible implementation manner, the instructions of the processor 1001 further include determining a subtitle display area in the target video according to the following method:
and inputting the target video to a pre-trained first neural network, and outputting a subtitle display area of the target video by the first neural network.
In a possible implementation manner, in the instructions of the processor 1001, the cropping information includes cropping coordinates corresponding to each video frame in the target video;
for any first display proportion, determining cropping information of the target video at any first display proportion according to the following method:
and inputting the target video and any one first display proportion into a pre-trained second neural network, wherein the second neural network outputs the cutting information of the target video at any one first display proportion.
In one possible implementation, in the instructions of the processor 1001, the subtitle information of the target video is determined according to the following method:
determining a change video frame, in the target video, of which the characters displayed in the subtitle display area are different from the adjacent video frames;
and identifying subtitle information displayed in a subtitle display area in the change video frame.
In one possible implementation, the instructions of the processor 1001, the obtaining subtitle information determined based on a color value change condition of a subtitle display area of the target video, includes:
acquiring continuous pixel points with the same color value in a subtitle display area of the target video;
determining continuous pixel points to be screened based on the color difference value between the continuous pixel points and other pixel points and the change condition of the same pixel position of the continuous pixel points in preset time;
and aggregating the continuous pixel points to be screened, matching the aggregation result with characters stored in a character library, and determining the subtitle information of the target video based on the matching result.
In a possible implementation manner, in the instructions of the processor 1001, the step of obtaining the subtitle information determined based on the color value change condition of the subtitle display area of the target video is performed by a third neural network;
the third neural network is obtained by training according to the following steps:
acquiring a sample video frame with subtitle annotation information;
inputting the sample video frame into a third neural network to be trained to obtain predicted caption information corresponding to the sample video;
and training the third neural network to be trained on the basis of the predicted caption information and the caption marking information.
In a possible implementation, the processing the target video based on the subtitle information and the cropping information in the instructions of the processor 1001 includes:
intercepting caption images corresponding to the matched continuous pixel points from the target video;
cutting the target video according to the cutting information; and overlaying the subtitle image to the cut target video.
In one possible implementation, in the instructions of the processor 1001, the processing the target video based on the subtitle information and the cropping information includes:
after the target video is cut based on the cutting information, if the cut target video comprises a part of subtitle areas, fuzzy processing is carried out on the character information in the part of subtitle areas in the target video, and the subtitle information is displayed in the target video after the fuzzy processing in an overlapping mode.
Alternatively, the processor 1001 is caused to execute the following instructions:
responding to the playing operation of a target video, and sending a video acquisition request, wherein the video acquisition request carries a target display proportion of a user side;
and receiving and playing the processed target video, wherein the processed target video is determined according to the cutting information corresponding to the target display proportion and the subtitle information determined based on the color value change condition of the subtitle display area of the target video.
Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the video processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the video processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implementing, and for example, a plurality of units or components may be combined, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or changes to the embodiments described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the technical scope of the disclosure; such modifications, changes and substitutions do not depart from the spirit and scope of the embodiments disclosed herein, and they should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.