CN113225614A - Video playing method, device, server and storage medium - Google Patents

Video playing method, device, server and storage medium Download PDF

Info

Publication number
CN113225614A
CN113225614A CN202110427518.3A CN202110427518A CN113225614A CN 113225614 A CN113225614 A CN 113225614A CN 202110427518 A CN202110427518 A CN 202110427518A CN 113225614 A CN113225614 A CN 113225614A
Authority
CN
China
Prior art keywords
video
target
output
text data
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110427518.3A
Other languages
Chinese (zh)
Inventor
朱星龙
张恩勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jiuzhou Electric Appliance Co Ltd
Original Assignee
Shenzhen Jiuzhou Electric Appliance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jiuzhou Electric Appliance Co Ltd filed Critical Shenzhen Jiuzhou Electric Appliance Co Ltd
Priority to CN202110427518.3A priority Critical patent/CN113225614A/en
Publication of CN113225614A publication Critical patent/CN113225614A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a video playing method used for a server, which comprises the following steps: receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user; extracting the text data and the target video from the result video; converting the target video into an output video, and obtaining an output subtitle based on the text data; adding the output subtitles to the output video to obtain a resultant video; and sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles. The invention also discloses a video playing device, a server and a computer readable storage medium. The user at the receiving end can acquire the information through the text data, and the user experience is better.

Description

Video playing method, device, server and storage medium
Technical Field
The present invention relates to the field of multimedia file processing, and in particular, to a video playing method, apparatus, server, and computer-readable storage medium.
Background
The video conference and the video call can provide an all-around perception control environment comprising various media such as audio, video, pictures, texts and the like for users distributed in different places, and are an indispensable technical hotspot of the modern information society.
In the existing video playing method, a plurality of users respectively record real-time audio and video data by using corresponding sending ends, and send the recorded real-time audio and video data to other users, so as to realize information exchange among different users.
However, the user experience is poor by adopting the existing video playing method.
Disclosure of Invention
The invention mainly aims to provide a video playing method, a video playing device, a server and a computer readable storage medium, and aims to solve the technical problem that the user experience is poor when the existing video playing method is adopted in the prior art.
In order to achieve the above object, the present invention provides a video playing method for a server, the method comprising the following steps:
receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user;
extracting the text data and the target video from the result video;
converting the target video into an output video, and obtaining an output subtitle based on the text data;
adding the output subtitles to the output video to obtain a resultant video;
and sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
Optionally, the result video further includes a target timestamp of the text data in the target video; the step of obtaining an output subtitle based on the text data includes:
and obtaining the output caption based on the text data and the target timestamp.
Optionally, the result video includes a plurality of result videos corresponding to a plurality of target users, one result video corresponds to one target video, one target video corresponds to one text data, and one text data corresponds to one target timestamp; the step of converting the target video into an output video includes:
performing video merging on the plurality of target videos to obtain the output video;
the step of obtaining the output subtitle based on the text data and the target timestamp includes:
obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps.
Optionally, the step of performing video merging on the multiple target videos to obtain the output video includes:
merging the video frames of the target videos to obtain a merged video frame with a first preset resolution;
obtaining the output video with the first preset resolution based on the merged video frame.
Optionally, before the step of obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps, the method further includes:
acquiring position information of a video frame of each target video in the plurality of target videos in the merged video frame;
the step of obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps includes:
obtaining an output subtitle based on the location information, the plurality of text data, and the plurality of target timestamps.
Optionally, before the step of sending the result video to a receiving end to enable the receiving end to play the result video and the output subtitles, the method further includes:
acquiring a second preset resolution of the receiving end;
performing resolution conversion on the result video to obtain a converted video with the second preset resolution, wherein the converted video comprises the output subtitles;
the step of sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles comprises the following steps:
and sending the converted video to a receiving end so that the receiving end plays the converted video and the output subtitles.
Optionally, the step of adding the output subtitles to the output video to obtain a result video includes:
inserting the output subtitles into the output video in a manner of supplemental enhancement information or vertical blanking interval information to obtain the resulting video.
In addition, to achieve the above object, the present invention further provides a video playing apparatus for a server, the apparatus including:
the receiving module is used for receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user;
an extraction module, configured to extract the text data and the target video from the result video;
the conversion module is used for converting the target video into an output video and obtaining an output subtitle based on the text data;
the adding module is used for adding the output subtitles to the output video to obtain a result video;
and the sending module is used for sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
In addition, to achieve the above object, the present invention further provides a server, including: the system comprises a memory, a processor and a video playing program stored on the memory and running on the processor, wherein the video playing program realizes the steps of the video playing method according to any item when being executed by the processor.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium, having a video playing program stored thereon, where the video playing program, when executed by a processor, implements the steps of the video playing method as described in any one of the above.
The technical scheme of the invention provides a video playing method which is used for a server and comprises the following steps: receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user; extracting the text data and the target video from the result video; converting the target video into an output video, and obtaining an output subtitle based on the text data; adding the output subtitles to the output video to obtain a resultant video; and sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
In the existing video playing method, when a receiving end plays recorded real-time audio and video, the sound of audio data is unclear, so that a user at the receiving end cannot hear the sound of a target user, the user at the receiving end cannot acquire information, and the user experience is poor. By the video playing method, the voice information of the target user is converted into the text data, the output subtitles corresponding to the text data are obtained, and the output subtitles are played when the result video is played, so that the user at the receiving end can obtain the information through the output subtitles, and the user experience is good.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a schematic diagram of a server architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a video playing method according to a first embodiment of the present invention;
FIG. 3 is a block diagram of a video player according to a first embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a server structure of a hardware operating environment according to an embodiment of the present invention.
Typically, the server comprises: at least one processor 301, a memory 302, and a video playback program stored on the memory and executable on the processor, the video playback program being configured to implement the steps of the video playback method as described above.
The processor 301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 301 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. The processor 301 may further include an AI (Artificial Intelligence) processor for processing operations related to the video playback method, so that the video playback method model can be trained and learned autonomously, thereby improving efficiency and accuracy.
Memory 302 may include one or more computer-readable storage media, which may be non-transitory. Memory 302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 302 is used to store at least one instruction for execution by processor 301 to implement a video playback method provided by method embodiments herein.
In some embodiments, the terminal may further include: a communication interface 303 and at least one peripheral device. The processor 301, the memory 302 and the communication interface 303 may be connected by a bus or signal lines. Various peripheral devices may be connected to communication interface 303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 304, a display screen 305, and a power source 306.
The communication interface 303 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 301 and the memory 302. In some embodiments, processor 301, memory 302, and communication interface 303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 301, the memory 302 and the communication interface 303 may be implemented on a single chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 304 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 304 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 304 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 304 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 305 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 305 is a touch display screen, the display screen 305 also has the ability to capture touch signals on or over the surface of the display screen 305. The touch signal may be input to the processor 301 as a control signal for processing. At this point, the display screen 305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 305 may be one, the front panel of the electronic device; in other embodiments, the display screens 305 may be at least two, respectively disposed on different surfaces of the electronic device or in a folded design; in still other embodiments, the display screen 305 may be a flexible display screen disposed on a curved surface or a folded surface of the electronic device. Even further, the display screen 305 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 305 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The power supply 306 is used to power various components in the electronic device. The power source 306 may be alternating current, direct current, disposable or rechargeable. When the power source 306 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the architecture shown in fig. 1 does not constitute a limitation on the transmit end, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a video playing program is stored on the computer-readable storage medium, and when the video playing program is executed by a processor, the steps of the video playing method as described above are implemented. Therefore, a detailed description thereof will be omitted. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. It is determined that the program instructions may be deployed to be executed on one server, or on multiple servers at one site, or distributed across multiple sites and interconnected by a communication network.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Based on the hardware structure, the embodiment of the video playing method is provided.
Referring to fig. 2, fig. 2 is a schematic flowchart of a video playing method according to a first embodiment of the present invention, where the method is used at a sending end, and includes the following steps:
step S11: receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user.
It should be noted that the main execution body of the method is a server, the server installs a video playing program, and when the server executes the video playing program, the video playing method of the invention is realized.
The video playing method is mainly used for instant video communication scenes such as video calls, video conferences and the like. The instant video communication scene does not have a subtitle function, and in some specific scenes, the voice of a user may be unclear (when a plurality of users talk at the same time in a multi-user video conference, the audio content is more, so that the user at the receiving end cannot hear clearly).
It can be understood that the target users are all users participating in the video call (or video conference), the transmitting end is the transmitting end corresponding to all users participating in the video call (or video conference), and the receiving end is the receiving end corresponding to all users participating in the video call (or video conference); the structures of the sending end and the receiving end are described with reference to the structure of the server, and the structures are similar and are not described again here.
In the invention, the target video comprises the video of the target user and the audio recorded when the target user is recorded, namely the target video comprises the target audio. In addition, the recorded target audio is a continuous audio, the information included in the target audio is not all valid, the valid audio may be voice information included in the target audio and only the valid audio (i.e., the voice information) is converted to obtain text data when the target audio is converted.
Wherein the text data is inserted into the target video in a manner of supplementing enhancement information or vertical blanking period information to obtain the result video.
In the video compression standard of H264/H265, SEI (supplemental enhancement information) is used to insert supplemental enhancement information in certain specific data areas by using the normative property of video coding, and the information itself is included in the video, so that some video supplemental information can be quickly and efficiently delivered. In the video compression standard, the vertical blanking interval information is inserted into some specific data area by using the standard characteristic of video coding, and the information is contained in the video, so that some video supplementary information can be transmitted quickly and efficiently.
Step S12: and extracting the text data and the target video from the result video.
Step S13: and converting the target video into an output video, and obtaining an output subtitle based on the text data.
Wherein the result video further comprises a target timestamp of the text data in the target video; the step of obtaining an output subtitle based on the text data includes: and obtaining the output caption based on the text data and the target timestamp. That is, in this embodiment, text data is added to the output video in the form of subtitles. The output caption has the target timestamp, and when the output caption is played, the output caption is played when the time corresponding to the target timestamp arrives.
It can be understood that the target timestamp is the playing time of the voice information corresponding to the text data in the target video, for example, the voice information of 1 minute and 10 seconds in the target video is: reported in Beijing conference rooms. The target timestamp of the text data corresponding to the voice information is 1 minute and 10 seconds.
Typically, in a video call or video conference scenario, the target users are multiple, namely: the result video comprises a plurality of result videos corresponding to a plurality of target users respectively, one result video corresponds to one target video, one target video corresponds to one text data, and one text data corresponds to one target timestamp; the step of converting the target video into an output video includes: performing video merging on the plurality of target videos to obtain the output video; correspondingly, the step of obtaining the output subtitle based on the text data and the target timestamp includes: obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps.
Wherein the step of video merging the plurality of target videos to obtain the output video comprises: merging the video frames of the target videos to obtain a merged video frame with a first preset resolution; obtaining the output video with the first preset resolution based on the merged video frame.
It should be noted that, in general, resolutions of target videos sent by a sending end may be different (for example, 1K, 2K, or 4K), and they need to be merged into an output video, where the resolution of the output video is a first preset resolution, and preferably, the first preset resolution is 8K in this application; wherein, the 1K resolution is 1920 × 1080, the 2K resolution is 2560 × 1440, the 4K resolution is 3840 × 2160, and the 8K resolution is 7680 × 4320. The output video is the video that is the integrated video frame.
In a specific application, the merged video frame has a plurality of different display areas, and the different display areas are used for displaying pictures of different target videos. For example, if the target video is 4 target videos, the merged video frame has 4 different display areas, and one display area is used for displaying the picture of one target video.
It can be understood that, when the number of target users is not more than 4, the merged video frame of the output video may be displayed in a single page, where a page of merged video frame is a picture including target videos corresponding to multiple target users; when the number of target users exceeds 4, the merged video frames can be merged video frames of output videos displayed by multiple pages (each page displays the merged video frames corresponding to the 4 target videos, each page of merged video frames has a first preset resolution, and the multiple page merged video frames relate to the target videos of all the target users), and the users can perform page turning operation to switch different display pages; when the number of the target users exceeds 4, the video frames corresponding to the plurality of target users can be combined into a combined video frame displayed on one page, and the whole page of the combined video frame has a first preset resolution. The present invention is not limited by the specific display mode.
Wherein, before the step of obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps, the method further comprises: acquiring position information of a video frame of each target video in the plurality of target videos in the merged video frame; accordingly, the step of obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps includes: obtaining an output subtitle based on the location information, the plurality of text data, and the plurality of target timestamps.
The plurality of text data are derived from target videos corresponding to a plurality of target users, and the plurality of text data need to be integrated into one output subtitle based on the position information (one position information corresponding to one target video) and a plurality of target timestamps. In the merged video frame of the output video, the picture corresponding to the target video (or the video frame in the target video) has different display areas, and the position information of the display areas in the merged video frame is the position information.
For example, the target video includes two corresponding text data, that is, a text data of the a target video at 0 min 6 sec and B text data of the B target video at 1 min 3 sec, the display area of the a target video is in the left area, the display area of the B target video is in the right area, and the position information is: obtaining an output subtitle based on the position information, the plurality of text data, and the plurality of target timestamps, wherein the content of the output subtitle is as follows: 0 minute 6 second text data a and 1 minute 3 second text data b, where a text data is played in the left area and b text data is played in the right area.
Step S14: adding the output subtitles to the output video to obtain a resultant video.
Specifically, the step of adding the output subtitles to the output video to obtain a result video includes: inserting the output subtitles into the output video in a manner of supplemental enhancement information or vertical blanking interval information to obtain the resulting video.
Step S15: and sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
In this embodiment, the resultant video is a video in which an output subtitle is added to the output video, and when the output video is played, the output subtitle is played. When the receiving end plays the output caption, the output caption is automatically played when the time corresponding to the target time stamp in the target caption arrives.
For example, the target users include 4 target videos, all of the target videos acquired by the 4 target user sending ends are 4K videos, and the content of the corresponding 4 voice messages is as follows:
4K video signal Information group Time stamp Caption content
1 A 00:00:04 "Beijing" conference room
2 B 00:00:02 You are Shanghai meeting room
3 C 00:00:05 That you are all Guangzhou conference rooms
4 D 00:00:06 Your good is Shenzhen conference room
Wherein 1234 is a 4-channel target video of four target users, ABCD is a name of text data corresponding to voice information, the sending end inserts data of information a and B into 4K video signals 1 and 2 by supplementing enhancement information, and the sending end inserts data of information C and D into 4K video signals 3 and 4 by vertical blanking period information. This forms new 4K video signals 1A, 2B, 3C and 4D, the plurality of resulting videos. In another embodiment, the server may also directly extract the corresponding text data from the original ABCD through a speech recognition technology without the need of the sending end to extract the text data.
The server can analyze the text data in the 1A, 2B, 3C and 4D four-path 4K videos by identifying the supplemental enhancement information and the scene extinction information, and obtain output subtitles based on target timestamps (00: 00:04, 00:00:02, 00:00: 05 and 00:00: 05), text data (that you good me is Beijing conference room, that you good me is Shanghai conference room, that you good me is Guangzhou conference room and that you good me is Shenzhen conference room) and position information (that 1234 videos are respectively positioned at the upper left, the upper right, the lower left and the lower right) corresponding to the text data, wherein the content of the output subtitles is as follows:
4K video signal Position of Time stamp Caption content
1A Upper left of 00:00:04 "Beijing" conference room
2B Upper right part 00:00:02 You are Shanghai meeting room
3C Left lower part 00:00:05 That you are all Guangzhou conference rooms
4D Lower right 00:00:06 Your good is Shenzhen conference room
At this time, the output video is 8K video obtained by merging 4K videos, the 8K video includes 4 videos, and the display areas of the 4 videos are displayed on the upper left, the upper right, the lower left, and the lower right, respectively.
In addition, the output subtitles are inserted into the spliced video in a mode of supplementing the enhancement information, and the insertion mode is as follows:
Figure BDA0003028688190000111
the display of the playing result video and the output subtitles is as follows:
the video is carried out for a time of 00:00:02
Figure BDA0003028688190000112
The video is carried out for a time of 00:00:04
Figure BDA0003028688190000113
The video is performed for a time of 00: 00:05
Figure BDA0003028688190000114
The video is performed for a time of 00: 00:06
Figure BDA0003028688190000115
Further, before the step of sending the result video to a receiving end to enable the receiving end to play the result video and the output subtitles, the method further includes: acquiring a second preset resolution of the receiving end; performing resolution conversion on the result video to obtain a converted video with the second preset resolution, wherein the converted video comprises the output subtitles; the step of sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles comprises the following steps: and sending the converted video to a receiving end so that the receiving end plays the converted video and the output subtitles.
It should be noted that the receiving end may not be able to directly play the output video with the first preset resolution, and needs to convert the output video into the converted video, where the resolution of the converted video is the corresponding second preset resolution (generally, the display resolution of the receiving end) of the receiving end, so that the receiving end can play the converted video and output the subtitles.
The technical scheme of the invention provides a video playing method which is used for a server and comprises the following steps: receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user; extracting the text data and the target video from the result video; converting the target video into an output video, and obtaining an output subtitle based on the text data; adding the output subtitles to the output video to obtain a resultant video; and sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
In the existing video playing method, when a receiving end plays recorded real-time audio and video, the sound of audio data is unclear, so that a user at the receiving end cannot hear the sound of a target user, the user at the receiving end cannot acquire information, and the user experience is poor. By the video playing method, the voice information of the target user is converted into the text data, the output subtitles corresponding to the text data are obtained, and the output subtitles are played when the result video is played, so that the user at the receiving end can obtain the information through the output subtitles, and the user experience is good.
Referring to fig. 3, fig. 3 is a block diagram of a first embodiment of a video playing apparatus according to the present invention, where the apparatus is used at a transmitting end, and the apparatus includes:
the receiving module 10 is configured to receive a result video sent by a sending end, where the result video is obtained by adding text data to target video data, the text data is obtained by converting voice information in a target video, and the target video is obtained by recording a target user;
an extracting module 20, configured to extract the text data and the target video from the result video;
a conversion module 30, configured to convert the target video into an output video, and obtain an output subtitle based on the text data;
an adding module 40, configured to add the output subtitles to the output video to obtain a result video;
a sending module 50, configured to send the result video to a receiving end, so that the receiving end plays the result video and the output subtitles.
The above description is only an alternative embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A video playing method, for a server, the method comprising the steps of:
receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user;
extracting the text data and the target video from the result video;
converting the target video into an output video, and obtaining an output subtitle based on the text data;
adding the output subtitles to the output video to obtain a resultant video;
and sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
2. The method of claim 1, wherein the result video further includes a target timestamp of the text data in the target video; the step of obtaining an output subtitle based on the text data includes:
and obtaining the output caption based on the text data and the target timestamp.
3. The method of claim 2, wherein the result video comprises a plurality of result videos corresponding to a plurality of target users, respectively, one result video corresponding to one target video, one target video corresponding to one text data, one text data corresponding to one target timestamp; the step of converting the target video into an output video includes:
performing video merging on the plurality of target videos to obtain the output video;
the step of obtaining the output subtitle based on the text data and the target timestamp includes:
obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps.
4. The method of claim 3, wherein said step of video merging said plurality of target videos to obtain said output video comprises:
merging the video frames of the target videos to obtain a merged video frame with a first preset resolution;
obtaining the output video with the first preset resolution based on the merged video frame.
5. The method of claim 4, wherein prior to the step of obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps, the method further comprises:
acquiring position information of a video frame of each target video in the plurality of target videos in the merged video frame;
the step of obtaining the output subtitle based on the plurality of text data and the plurality of target timestamps includes:
obtaining an output subtitle based on the location information, the plurality of text data, and the plurality of target timestamps.
6. The method of claim 5, wherein prior to the step of transmitting the resulting video to a receiving end for the receiving end to play the resulting video and the output subtitles, the method further comprises:
acquiring a second preset resolution of the receiving end;
performing resolution conversion on the result video to obtain a converted video with the second preset resolution, wherein the converted video comprises the output subtitles;
the step of sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles comprises the following steps:
and sending the converted video to a receiving end so that the receiving end plays the converted video and the output subtitles.
7. The method of claim 6, wherein the step of adding the output subtitles to the output video to obtain a resultant video comprises:
inserting the output subtitles into the output video in a manner of supplemental enhancement information or vertical blanking interval information to obtain the resulting video.
8. A video playback apparatus, for a server, the apparatus comprising:
the receiving module is used for receiving a result video sent by a sending end, wherein the result video is obtained by adding text data into target video data, the text data is obtained by converting voice information in the target video, and the target video is obtained by recording a target user;
an extraction module, configured to extract the text data and the target video from the result video;
the conversion module is used for converting the target video into an output video and obtaining an output subtitle based on the text data;
the adding module is used for adding the output subtitles to the output video to obtain a result video;
and the sending module is used for sending the result video to a receiving end so that the receiving end plays the result video and the output subtitles.
9. A server, characterized in that the server comprises: memory, processor and a video playback program stored on the memory and running on the processor, the video playback program when executed by the processor implementing the steps of the video playback method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, having a video playback program stored thereon, which when executed by a processor implements the steps of the video playback method according to any one of claims 1 to 7.
CN202110427518.3A 2021-04-20 2021-04-20 Video playing method, device, server and storage medium Pending CN113225614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110427518.3A CN113225614A (en) 2021-04-20 2021-04-20 Video playing method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110427518.3A CN113225614A (en) 2021-04-20 2021-04-20 Video playing method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN113225614A true CN113225614A (en) 2021-08-06

Family

ID=77088096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110427518.3A Pending CN113225614A (en) 2021-04-20 2021-04-20 Video playing method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN113225614A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100039498A1 (en) * 2007-05-17 2010-02-18 Huawei Technologies Co., Ltd. Caption display method, video communication system and device
CN105700848A (en) * 2014-12-12 2016-06-22 三星电子株式会社 Device and method for controlling sound output
CN108134918A (en) * 2018-01-30 2018-06-08 苏州科达科技股份有限公司 Method for processing video frequency, device and multipoint video processing unit, conference facility
CN110740283A (en) * 2019-10-29 2020-01-31 杭州当虹科技股份有限公司 method for converting voice into character based on video communication
CN111107299A (en) * 2019-12-05 2020-05-05 视联动力信息技术股份有限公司 Method and device for synthesizing multi-channel video
CN112399133A (en) * 2016-09-30 2021-02-23 阿里巴巴集团控股有限公司 Conference sharing method and device
CN112532931A (en) * 2020-11-20 2021-03-19 北京搜狗科技发展有限公司 Video processing method and device and electronic equipment
CN112584078A (en) * 2019-09-27 2021-03-30 深圳市万普拉斯科技有限公司 Video call method, video call device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100039498A1 (en) * 2007-05-17 2010-02-18 Huawei Technologies Co., Ltd. Caption display method, video communication system and device
CN105700848A (en) * 2014-12-12 2016-06-22 三星电子株式会社 Device and method for controlling sound output
CN112399133A (en) * 2016-09-30 2021-02-23 阿里巴巴集团控股有限公司 Conference sharing method and device
CN108134918A (en) * 2018-01-30 2018-06-08 苏州科达科技股份有限公司 Method for processing video frequency, device and multipoint video processing unit, conference facility
CN112584078A (en) * 2019-09-27 2021-03-30 深圳市万普拉斯科技有限公司 Video call method, video call device, computer equipment and storage medium
CN110740283A (en) * 2019-10-29 2020-01-31 杭州当虹科技股份有限公司 method for converting voice into character based on video communication
CN111107299A (en) * 2019-12-05 2020-05-05 视联动力信息技术股份有限公司 Method and device for synthesizing multi-channel video
CN112532931A (en) * 2020-11-20 2021-03-19 北京搜狗科技发展有限公司 Video processing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US11151359B2 (en) Face swap method, face swap device, host terminal and audience terminal
CN107124661B (en) Communication method, device and system in live channel
US9479728B2 (en) Video SMS message sending and receiving methods and apparatuses thereof, and handheld electronic device
CN109327727A (en) Live streaming method for stream processing and plug-flow client in a kind of WebRTC
CN109874043B (en) Video stream sending method, video stream playing method and video stream playing device
CN113660503B (en) Same-screen interaction control method and device, electronic equipment and storage medium
CN110162255B (en) Single-machine program running method, device, equipment and storage medium
CN109474833B (en) Network live broadcast method, related device and system
CN112269554B (en) Display system and display method
CN112087591A (en) Interactive system and method for video conference
CN112689172A (en) Program playing method and device, set top box and storage medium
CN111757187A (en) Multi-language subtitle display method, device, terminal equipment and storage medium
CN113225614A (en) Video playing method, device, server and storage medium
CN115665504A (en) Event identification method and device, electronic equipment and storage medium
CN114363666A (en) Video processing method and device and electronic equipment
CN115065835A (en) Live-broadcast expression display processing method, server, electronic equipment and storage medium
CN112770167A (en) Video display method and device, intelligent display terminal and storage medium
CN113411636A (en) Live wheat-connecting method and device, electronic equipment and computer-readable storage medium
CN115379250B (en) Video processing method, device, computer equipment and storage medium
CN103369366A (en) Multimedia content propagation server and relevant multimedia playing method
CN112911403B (en) Event analysis method and device, television and computer readable storage medium
CN114501098B (en) Subtitle information editing method, device and storage medium
CN111770373B (en) Content synchronization method, device and equipment based on live broadcast and storage medium
CN113709652B (en) Audio play control method and electronic equipment
CN113852776B (en) Frame inserting method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination