CN116055797A - Video processing method and device - Google Patents

Video processing method and device Download PDF

Info

Publication number
CN116055797A
CN116055797A CN202111267020.1A CN202111267020A CN116055797A CN 116055797 A CN116055797 A CN 116055797A CN 202111267020 A CN202111267020 A CN 202111267020A CN 116055797 A CN116055797 A CN 116055797A
Authority
CN
China
Prior art keywords
video
video frame
target
target video
playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111267020.1A
Other languages
Chinese (zh)
Inventor
许兴旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202111267020.1A priority Critical patent/CN116055797A/en
Publication of CN116055797A publication Critical patent/CN116055797A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application provides a video processing method and a device, wherein the video processing method comprises the following steps: firstly, acquiring a video to be processed, carrying out character recognition on a target video frame in the acquired video to be processed, then taking the recognized characters as video enhancement contents, carrying out character enhancement processing on the target video frame to obtain playing data of the target video frame, and then carrying out playing processing on the playing data of the target video frame. Therefore, the characters in the video to be processed can be enhanced, the characters in the video can be clearly displayed, the display effect is improved, the characters in the video can be clearly seen by a remote person watching the video through a network, and the user experience is improved.

Description

Video processing method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a video processing method. The present application is also directed to a video processing apparatus, a computing device, and a computer-readable storage medium.
Background
With the rapid development of computer technology and network technology, video is becoming more and more popular as a transmission medium, and many aspects of people's work and life involve video. With the increasing video degree, communication in the forms of live broadcast, video conference and the like based on a network is increasing, and in many scenes, demonstration needs to be performed in a mode based on a PPT (PowerPoint) or a blackboard writing and the like.
In the scene of demonstration by means of PPT or blackboard writing, on-site personnel can clearly see characters in the PPT or blackboard writing, and experience is good, but for remote personnel watching through a network, the characters in the PPT or blackboard writing in a video acquired by an acquisition end can be unclear or the quality of the video can be deteriorated due to the change of light rays or the movement of display equipment and the like, or the intermediate processes of acquisition, processing, network transmission and the like of the acquisition end can be caused; that is, the definition of the words in the PPT or the blackboard writing in the video watched by the remote personnel through the network may be poor, the display effect is not satisfactory, so that the words in the PPT or the blackboard writing in the video which cannot be clearly watched by the remote personnel are reduced, and the user experience is reduced.
Disclosure of Invention
In view of this, embodiments of the present application provide a video processing method. The application relates to a video processing device, a computing device and a computer readable storage medium, so as to solve the technical problem of poor definition of characters in video in the prior art.
According to a first aspect of an embodiment of the present application, there is provided a video processing method, including:
Acquiring a video to be processed, and performing character recognition on a target video frame in the video to be processed;
taking the identified characters as video enhancement contents, and performing character enhancement processing on the target video frames to obtain playing data of the target video frames;
and playing the playing data of the target video frame.
According to a second aspect of embodiments of the present application, there is provided a video processing apparatus, including:
the acquisition module is configured to acquire a video to be processed and perform character recognition on a target video frame in the video to be processed;
the enhancement module is configured to take the identified characters as video enhancement contents, perform character enhancement processing on the target video frames and obtain playing data of the target video frames;
and the playing module is configured to play the playing data of the target video frame.
According to a third aspect of embodiments of the present application, there is provided a computing device comprising:
a memory and a processor;
the memory is used to store computer executable instructions and the processor is used to execute the computer executable instructions to implement the steps of any video processing method.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement steps of any video processing method.
According to the video processing method, the video to be processed can be acquired firstly, the character recognition is carried out on the target video frame in the acquired video to be processed, then the characters obtained through recognition are used as video enhancement contents, the character enhancement processing is carried out on the target video frame, the playing data of the target video frame are obtained, and then the playing data of the target video frame can be played. Under the condition, the character recognition can be carried out on the acquired target video frame in the video to be processed, the recognized characters are used as video enhancement contents and are processed separately from specific video frame contents, so that the characters in the target video frame in the video to be processed can be enhanced, the characters in the video can be clearly displayed, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
Drawings
FIG. 1 is a flow chart of a video processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of another video processing method according to an embodiment of the present application;
FIG. 3 is a flow chart of yet another video processing method according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
FIG. 5 is a block diagram of a computing device according to one embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
First, terms related to one or more embodiments of the present application will be explained.
OCR (optical character recognition): the method is characterized in that the text data is scanned, and then the image file is analyzed and processed to obtain text and layout information.
In the present application, a video processing method is provided, and the present application relates to a video processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Fig. 1 shows a flowchart of a video processing method according to an embodiment of the present application, which specifically includes the following steps:
step 102: and acquiring the video to be processed, and performing character recognition on the target video frame in the video to be processed.
Specifically, the video to be processed may refer to a video that needs to be subjected to text recognition to perform subsequent text enhancement processing, such as a live video, a video in a video conference, an online course video, and the like; the target video frame may refer to a video frame waiting for text recognition in the video to be processed, such as a current video frame, or a video frame determined based on the recognition frequency to be text recognized. The text enhancement processing may refer to processing operations that enhance text included in a video frame, so that the text included in the video frame is displayed more clearly.
In practical application, the word recognition may refer to a technology for recognizing words included in the video, and in specific implementation, word recognition may be performed on the video to be processed through OCR technology, where the word recognition performed on the video to be processed actually is performed on a video frame in the video to be processed.
It should be noted that, the video to be processed may be a video waiting to be transmitted to the playing end, or may also be a video waiting to be played by the playing end, that is, the executing device for obtaining the video to be processed may be a server, or may be a playing end that receives a video transmitted by the server. That is, in the present application, text recognition is performed on a video, so that text enhancement processing operation on the video may be performed at a server side or a play side.
When the server or the client acquires the video to be processed, the server or the client can perform character recognition on the target video frame in the acquired video to be processed, so that the subsequent character enhancement processing is performed on the target video frame in the video to be processed based on recognized characters.
In an optional implementation manner of this embodiment, the acquired video to be processed may be video content generated in real time, or may be video content generated in history, that is, the acquired video to be processed may be acquired, and the specific implementation process may be as follows:
Receiving a video picture acquired by an acquisition end, and taking the video picture as an acquired video to be processed; or alternatively, the process may be performed,
and acquiring a historical video, and taking the historical video as a video to be processed.
It should be noted that, the collection end may refer to a device for collecting video frames in real time, such as a device at a main conference site end in a conference system, a main broadcasting end in live broadcasting, etc., where the collection end may collect a current video frame in real time and transmit the current video frame to the server, where the video frame collected by the collection end is a video to be processed, and at this time, the video content of the video to be processed needs to be collected and obtained in real time.
In addition, the historical video can be obtained as the video to be processed, the historical video can be a pre-recorded video, at the moment, the video content of the video to be processed is pre-determined and cannot be changed, such as a plurality of online courses, which are recorded previously, but have very high course quality, and a large number of users can watch the online courses at the moment, and the online courses can be used as the video to be processed.
In practical application, if the device main body for acquiring the video to be processed is a server, the server can directly receive the video picture acquired by the acquisition end, and the received video picture is used as the video to be processed; or the server side can directly acquire the historical video which needs to be subjected to word enhancement processing from the video database, and the acquired historical video is used as the video to be processed. If the equipment main body for acquiring the video to be processed is a client, the client can acquire a video picture acquired by the acquisition end from the server and takes the video picture as the video to be processed; or the client can acquire the historical video which needs to be subjected to word enhancement processing from the video database of the server, and the acquired historical video is used as the video to be processed.
The video picture that can acquire in real time by the acquisition end in this application can be regarded as the video that waits to process, also can regard as the video that waits to process with the history video that has recorded in advance for this application both can carry out word enhancement to the video that generates in real time and handle, can carry out word enhancement to the history video that generates very long time before again, and the type of video that waits to process is nimble, can adapt to different application scenario, and adaptability is higher.
In an optional implementation manner of this embodiment, an area for performing text recognition may be preset, that is, text recognition may be performed on a target video frame in a video to be processed, and the specific implementation process may be as follows:
determining a preset area in a target video frame;
and carrying out character recognition on a preset area in the target video frame.
Specifically, the preset area may be a preset area for performing text recognition, where the preset area may be an area where a high probability such as PPT, blackboard writing, etc. in the target video frame is located.
Because the positions of the PPT, the blackboard writing and the like in the video are relatively fixed, the region where the PPT, the blackboard writing and the like are likely to be in the video can be predetermined, the region is determined to be a preset region, and when the character recognition is carried out on the target video frame subsequently, the character recognition can be carried out on the video content of the preset region only, so that the character recognition is carried out on the region where the PPT, the blackboard writing and the like are located in the target video frame, the character content included in the PPT, the blackboard writing and the like is recognized, the error recognition and the enhancement on other characters (such as characters like subtitles) included in the target video frame are avoided, and the accuracy of the character enhancement processing on the target video frame subsequently is ensured.
In addition, the preset area can be each area including the text content in the target video frame, so that the text recognition can be carried out on each area including the text in the target video frame, and the text enhancement processing is carried out on all the recognized text subsequently, thereby avoiding missing some text in the target video frame, ensuring the coverage rate of text recognition and improving the user experience.
In practical application, different preset areas can be set for character recognition in different application scenes, for example, in a live broadcast or video conference scene, the positions of the PPT, the blackboard writing and the like in the video are compared and determined, so that the range (the area where the PPT, the blackboard writing and the like are positioned in the video) can be directly selected as the preset area for character recognition; in the historical video scene, the positions of the PPT, the blackboard writing and the like in different historical videos can be quite different, so that each area including the text content in the target video frame can be set as a preset area for text recognition.
Of course, in actual implementation, other areas in the target video frame may be set as a preset area, for example, a preset range based on the central position in the target video frame may also be set as a preset area for performing text recognition, which is not limited in this application.
The preset area for character recognition in the method can be preset, so that different character recognition scenes can be flexibly adapted, and the adaptability is high.
In an optional implementation manner of this embodiment, when the video to be processed is formed of continuous video frames and the target video frames in the video to be processed are subjected to word recognition, it may be determined, according to a preset recognition frequency, which video frames are target video frames and need to be subjected to word recognition, that is, the target video frames in the video to be processed are subjected to word recognition, and the specific implementation process may be as follows:
determining a target video frame in the video to be processed according to a preset identification frequency;
and performing character recognition on the target video frame.
Specifically, the preset recognition frequency may be a preset frequency for performing text recognition on the video to be processed, that is, performing text recognition once every other video frames. For example, the preset recognition frequency can be real-time recognition, that is, each video frame of the video to be processed is subjected to text recognition, and at this time, each video frame of the video to be processed can be used as a target video frame to perform text recognition; or, the preset recognition frequency may be a preset recognition gap, for example, the text recognition is performed once every video frame with a preset value, or the text recognition is performed once every preset time, at this time, each video frame of the video to be processed is not text recognized, but text recognition is performed once every several frames of video frames, and the video frame that needs text recognition and is determined according to the preset recognition gap is the target video frame.
In practical application, for the execution device for performing text recognition on the target video frame to be a server, a worker can set a preset recognition frequency in the server in advance, different users cannot modify the preset recognition frequency after setting, and the server determines the target video frame in the video to be processed based on the preset recognition frequency to perform text recognition. For the execution device for performing text recognition on the target video frame, as a client, a user can autonomously set a preset recognition frequency for performing text recognition on the video to be processed according to the type of the video watched by the user, the watching habit and the like through the client, namely the preset recognition frequency can be set by the user autonomous device and can be modified by different users.
In addition, when to start character recognition of the video to be processed and when to stop character recognition of the video to be processed can be preset, that is, the start recognition time and the end recognition time can be preset. The start identification time and the end identification time can be determined according to the video content of the video to be processed, if the video to be processed can be subjected to video content analysis, if the PPT or the blackboard writing is detected to appear, the character identification of the video to be processed can be started, namely the start identification time is the time when the PPT or the blackboard writing appears in the video to be processed, and the target video frame is the video frame when the PPT or the blackboard writing appears; if the fact that the PPT or the blackboard writing and the like disappear in the video to be processed is detected, stopping character recognition on the video to be processed, namely stopping recognition time to be the vanishing time of the PPT or the blackboard writing and the like in the video to be processed.
Of course, in practical application, besides determining the target video frame in the video to be processed through the preset recognition frequency to perform character recognition, the target video frame in the video to be processed can be determined according to the key frame to perform character recognition. That is, image analysis is performed on each video frame included in the video to be processed, whether the video frame is a key frame is determined, if the video frame is a key frame, text recognition is performed on the video frame as a target video frame, and if the video frame is not a key frame, the video frame is not taken as a target video frame, and text recognition is not performed. The key frame may be a video frame whose content such as PPT or an blackboard writing included in the video to be processed changes, or a video frame whose content is different from that of the previous video frame in the video to be processed.
Step 104: and taking the identified characters as video enhancement contents, and performing character enhancement processing on the target video frames to obtain playing data of the target video frames.
Specifically, on the basis of acquiring a video to be processed and performing character recognition on a target video frame in the video to be processed, further, characters obtained through recognition can be used as video enhancement contents, character enhancement processing is performed on the target video frame, and play data of the target video frame are obtained.
It should be noted that, the video enhancement content may refer to content that performs enhancement display on characters in a target video frame; the text enhancement processing may refer to processing operations of enhancing text included in a video frame so that the text included in the video frame is displayed more clearly, for example, text obtained by performing text recognition on a target video frame may be displayed at a preset position in the target video frame so that unclear text is displayed, and may be displayed more clearly at the preset position in the target video frame.
In an optional implementation manner of this embodiment, the identified subtitle may be directly encoded into the target video frame, so as to perform text enhancement processing on the target video frame, and obtain the playing data of the target video frame, that is, the text obtained by identification is used as the video enhancement content, and perform text enhancement processing on the target video frame, so as to obtain the playing data of the target video frame, where the specific implementation process may be as follows:
and encoding the identified characters into the target video frame as subtitles to obtain playing data of the target video frame.
It should be noted that, in the existing video file, two types of subtitles are mainly included, one is the subtitle as the additional information, and independent of the video data, the subtitle is only superimposed on the video image according to the time axis when the video file is played, and the other is the subtitle as a part of the video data, and the subtitle is encoded with the content of the video image to generate the video image, and is embedded in the video image when the video file is played.
In practical application, the identified text can be used as a part of the target video frame, namely, the identified text and the target video frame are encoded together to obtain the play data of the target video frame, and the play data of the target video frame is the target video frame embedded with the identified text. Since the identified characters are embedded in the target video frame, the playing data of the target video frame is generated, and the video image content and the identified character content in the playing data of the target video frame at the moment are integrated, the identified characters cannot be operated, namely cannot be operated, but are displayed as embedded subtitles, so that the characters in the target video frame can be displayed in a clearer mode.
According to the method and the device, characters obtained by identifying the target video frames can be used as subtitles to be encoded into the target video frames, so that the characters in the target video frames can be displayed in a clearer and more stable mode, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
In an optional implementation manner of this embodiment, the identified text may be displayed as additional information independent of the video data, that is, the identified text is used as video enhancement content, and text enhancement processing is performed on the target video frame to obtain the play data of the target video frame, where the specific implementation process may be as follows:
The identified characters are used as the operable content of the target video frame;
playing data of the target video frame is generated based on the operable content and the target video frame.
Specifically, the operable content may refer to text content capable of performing copying, editing, and the like, for example, the operable content may be a bullet screen, a subtitle independent of video data, and the like.
It should be noted that, the text obtained by performing text recognition on the target video frame may be used as the barrage, subtitle, and other additional information independent of the video data, i.e. the content may be manipulated, and then based on the manipulated content and the target video frame, the play data of the target video frame may be generated. At this time, the play data of the target video frame includes two parts of data, one part is the image data of the target video frame, and the other part is the text data identified in the target video frame.
In practical application, when playing data of a target video frame is generated based on the operable content and the target video frame, the operable content can be stored in an enhanced content data packet, the target video frame is stored in a video data packet, the data in the two data packets carry time information, and when the video data is played subsequently, the operable content and the target video frame can be played in alignment according to a time axis.
According to the method and the device, the identified characters can be used as the operable content of the target video frame and are independent of the specific video content, so that the identified characters can be copied, edited and other operations independently, a user can conveniently process the identified characters, and user experience is improved.
Step 106: and playing the playing data of the target video frame.
Specifically, on the basis of taking the identified characters as video enhancement contents and performing character enhancement processing on the target video frames to obtain playing data of the target video frames, further, playing processing is performed on the playing data of the target video frames.
In an optional implementation manner of this embodiment, the playing data of the target video frame is played, and the specific implementation process may be as follows:
transmitting the playing data of the target video frame to a playing end for playing; or alternatively, the process may be performed,
and playing the playing data of the target video frame locally in the equipment.
If the text enhancement processing is performed, the device main body that obtains the playing data of the target video frame is the server, and when the playing data of the target video frame is processed, the server may transmit the playing data of the target video frame to the playing end, and the playing end plays according to the timestamp. If the device main body for obtaining the playing data of the target video frame is the playing end, the playing end can directly play the playing data of the target video frame according to the timestamp in the local device when the playing data of the target video frame is played.
In an optional implementation manner of this embodiment, if the identified text is taken as a barrage of the target video frame, the user may edit a barrage so as to update the barrage, that is, the operation content is the barrage, and at this time, play the playing data of the target video frame, which may be implemented as follows:
during the process of playing the playing data of the target video frame, editing operation aiming at a target barrage which is any operable content is obtained;
and updating the target barrage according to the editing operation to obtain the updated target barrage.
Specifically, the editing operation may refer to an operation of editing the content of the target barrage. It should be noted that, if the user finds that an error exists in a certain character, that is, the barrage content of a barrage displayed is wrong when watching the play data of the target video frame, the user can edit the barrage content of the barrage at this time, and input the correct content to obtain an updated barrage, that is, the operable content after correcting the error.
In practical application, if the text enhancement processing is performed, the device main body of the playing data of the target video frame is the server, that is, the operation content (i.e., the barrage) is stored in the server, and other playing ends can acquire the operation content, at this time, the server can receive the editing operation of the user on the target barrage through the playing end in the process of playing the playing data of the target video frame, update the target barrage according to the received editing operation, obtain the updated target barrage, then update the target barrage stored in the storage space, so as to ensure that the stored operation content is correct content corrected by the user, and can be subsequently synchronized to other playing ends.
In addition, if the device main body for performing text enhancement processing to obtain the play data of the target video frame is the play end, that is, the operation content (i.e., the barrage) is stored in the play end, only the play end can see the displayed operation content (i.e., the barrage), at this time, the play end can detect the editing operation of the user for the target barrage in the process of playing the play data of the target video frame, update the target barrage according to the editing operation, obtain the updated target barrage, and then update the target barrage stored in the storage space, so as to ensure that the stored operation content is correct content corrected by the user, and facilitate the user to review next time.
According to the method and the device, the server side or the playing side can acquire editing operation of the user for the target barrage based on the playing side in the process of playing data of the target video frame, and update the target barrage according to the editing operation to obtain the updated target barrage, so that the content with the wrong identification can be corrected based on the editing operation of the user, and the correctness of the text content with the enhanced display is guaranteed.
In an optional implementation manner of this embodiment, after the identified text is used as the operable content to generate the play data of the target video frame, the user may further perform operations such as copy and store the operable content of the play data of the target video frame, that is, the operable content is a bullet screen/subtitle, and at this time, the play processing is performed on the play data of the target video frame, where a specific implementation process may be as follows:
In the process of playing the playing data of the target video frame, receiving a copying operation aiming at target operable content, wherein the target operable content is any operable content;
copy the target operable content and store it.
It should be noted that, the playing end may provide a copy function of the operable content in the playing data of the target video frame during the playing data of the target video frame, and the user may select any operable content to perform the copy operation during the process of watching the playing data of the target video frame, at this time, the playing end may copy the operable content of the target and store the copied content locally, so that the user may record some content during the process of watching the playing data of the target video frame, thereby improving the efficiency of recording content and improving the user experience.
According to the video processing method, the video to be processed can be acquired firstly, the character recognition is carried out on the target video frame in the acquired video to be processed, then the characters obtained through recognition are used as video enhancement contents, the character enhancement processing is carried out on the target video frame, the playing data of the target video frame are obtained, and then the playing data of the target video frame can be played. Under the condition, the characters in the target video frame can be enhanced by carrying out character recognition on the acquired target video frame in the video to be processed, and the recognized characters are used as video enhancement contents and are processed separately from specific video contents, so that the characters in the target video frame can be enhanced, the characters in the video can be clearly displayed, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
Fig. 2 shows a flowchart of another video processing method according to an embodiment of the present application, which is applied to a playing end, and specifically includes the following steps:
step 202: and receiving the video picture acquired by the acquisition end from the service end, taking the video picture as an acquired video to be processed, determining a target video frame according to a preset identification frequency set by a user, and performing character identification on the target video frame.
It should be noted that, for a scene of acquiring a video picture in real time, such as a live broadcast, a video conference, etc., the acquiring end may transmit the acquired video picture to the server after acquiring the video picture in real time, and the server may transmit the received video picture to each playing end in real time. Therefore, for the scene that the video to be processed is the video picture acquired in real time, the operation of carrying out subsequent text enhancement processing at the playing end can be preferentially selected, and the data transmission delay of the server end is avoided while the text display effect is enhanced.
In an optional implementation manner of this embodiment, an area for performing text recognition may be preset, that is, text recognition may be performed on the target video frame, and the specific implementation process may be as follows:
determining a preset area in a target video frame;
and carrying out character recognition on a preset area in the target video frame.
Because the positions of the PPT, the blackboard writing and the like in the video are relatively fixed, the region where the PPT, the blackboard writing and the like are likely to be in the video can be predetermined, the region is determined to be a preset region, and when the character recognition is carried out on the target video frame subsequently, the character recognition can be carried out on the video content of the preset region only, so that the character recognition is carried out on the region where the PPT, the blackboard writing and the like are located in the target video frame, the character content included in the PPT, the blackboard writing and the like is recognized, the error recognition and the enhancement on other characters (such as characters like subtitles) included in the target video frame are avoided, and the accuracy of the character enhancement processing on the target video frame subsequently is ensured.
In addition, the preset area can be each area including the text content in the target video frame, so that the text recognition can be carried out on each area including the text in the target video frame, and the text enhancement processing is carried out on all the recognized text subsequently, thereby avoiding missing some text in the target video frame, ensuring the coverage rate of text recognition and improving the user experience. The preset area for character recognition in the method can be preset, so that different character recognition scenes can be flexibly adapted, and the adaptability is high.
In an optional implementation manner of this embodiment, when the video to be processed is formed of continuous video frames and the target video frames in the video to be processed are subjected to word recognition, it may be determined, according to a preset recognition frequency, which video frames are target video frames and need to be subjected to word recognition, that is, the target video frames in the video to be processed are subjected to word recognition, and the specific implementation process may further be as follows:
determining a target video frame in the video to be processed according to a preset identification frequency;
and performing character recognition on the target video frame.
Of course, in practical application, besides determining the target video frame in the video to be processed through the preset recognition frequency to perform character recognition, the target video frame in the video to be processed can be determined according to the key frame to perform character recognition. That is, image analysis is performed on each video frame included in the video to be processed, whether the video frame is a key frame is determined, if the video frame is a key frame, text recognition is performed on the video frame as a target video frame, and if the video frame is not a key frame, the video frame is not taken as a target video frame, and text recognition is not performed. The key frame may be a video frame whose content such as PPT or an blackboard writing included in the video to be processed changes, or a video frame whose content is different from that of the previous video frame in the video to be processed.
Step 204: and encoding the identified characters into the target video frame as subtitles to obtain playing data of the target video frame, and playing the playing data of the target video frame locally.
According to the method and the device, characters obtained by identifying the target video frames can be used as subtitles to be encoded into the target video frames, so that the characters in the target video frames can be displayed in a clearer and more stable mode, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
Step 206: and taking the identified characters as operable contents of the target video frame, and generating playing data of the target video frame based on the operable contents and the target video frame.
According to the method and the device, the identified characters can be used as the operable content of the target video frame and are independent of the specific video content, so that the identified characters can be copied, edited and other operations independently, a user can conveniently process the identified characters, and user experience is improved.
It should be noted that, in practical applications, step 204 and step 206 are two different ways of generating the play data of the target video frame, and may alternatively be performed.
Step 208: and playing the playing data of the target video frame locally in the equipment.
Step 210: during the process of playing the playing data of the target video frame, detecting editing operation aiming at a target barrage, wherein the target barrage is any operable content; and updating the target barrage according to the editing operation to obtain the updated target barrage.
According to the method and the device, the playing end can detect editing operation of a user on the target barrage in the process of playing data of the target video frame, update the target barrage according to the editing operation, and obtain the updated target barrage, so that the content with the identification error can be corrected based on the editing operation of the user, and the correctness of the character content with the enhanced display is guaranteed.
Step 212: in the process of playing the playing data of the target video frame, receiving a copying operation aiming at target operable content, wherein the target operable content is any operable content; the target operational content is then copied and stored.
It should be noted that, the playing end may provide a copy function of the operable content in the playing data of the target video frame during the playing data of the target video frame, and the user may select any operable content to perform the copy operation during the process of watching the playing data of the target video frame, at this time, the playing end may copy the operable content of the target and store the copied content locally, so that the user may record some content during the process of watching the playing data of the target video frame, thereby improving the efficiency of recording content and improving the user experience.
In practical applications, the steps 208-212 are the operation steps that can be performed after the step 206, and the steps 210 and 212 are two operation modes for the operable content, which can be performed alternatively or both.
According to the video processing method, the client can acquire the video to be processed first, perform character recognition on the target video frame in the acquired video to be processed, then take the recognized characters as video enhancement contents, perform character enhancement processing on the target video frame to obtain play data of the target video frame, and then perform play processing on the play data of the target video frame. Under the condition, the client can perform character recognition on the acquired target video frame in the video to be processed, and the recognized characters are used as video enhancement contents and are processed separately from specific video contents, so that the characters in the target video frame can be enhanced, the characters in the video can be clearly displayed, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
Fig. 3 shows a flowchart of yet another video processing method according to an embodiment of the present application, which is applied to a server, and specifically includes the following steps:
Step 302: the method comprises the steps of obtaining a historical video, taking the historical video as a video to be processed, taking each video frame in the video to be processed as a target video frame, and carrying out character recognition on the target video frame.
It should be noted that, for the scenario of the historical video, the server stores each pre-generated historical video, and the historical video does not need to be transmitted in real time, so that excessive data processing operations are performed at the server, and transmission delay is not caused, at this time, in order to avoid that each playing end performs a text enhancement process once, processing resources of the playing end are wasted, and the server can perform related operations of the text enhancement process by itself. Therefore, for the scene that the video to be processed is the historical video, the operation of carrying out subsequent text enhancement processing on the server side can be preferentially selected, and the processing resources of the playing side are saved while the text display effect is enhanced.
In an optional implementation manner of this embodiment, an area for performing text recognition may be preset, that is, text recognition may be performed on a target video frame in a video to be processed, and the specific implementation process may be as follows:
determining a preset area in a target video frame;
and carrying out character recognition on a preset area in the target video frame.
Because the positions of the PPT, the blackboard writing and the like in the video are relatively fixed, the region where the PPT, the blackboard writing and the like are likely to be in the video with high probability can be predetermined, the region is determined as a preset region, when the text recognition is carried out on the target video frame in the video to be processed subsequently, the text recognition can be carried out on the video content of the preset region only, so that the text recognition is carried out on the region where the PPT, the blackboard writing and the like are located in the target video frame, the text content included in the PPT, the blackboard writing and the like is identified, the error recognition and enhancement on other characters (such as characters like subtitles) included in the target video frame are avoided, and the accuracy rate of the text enhancement processing on the target video frame subsequently is ensured.
In addition, the preset area can be each area including the text content in the target video frame, so that the text recognition can be carried out on each area including the text in the target video frame, and the text enhancement processing is carried out on all the recognized text subsequently, thereby avoiding missing some text in the target video frame, ensuring the coverage rate of text recognition and improving the user experience. The preset area for character recognition in the method can be preset, so that different character recognition scenes can be flexibly adapted, and the adaptability is high.
In an optional implementation manner of this embodiment, when the video to be processed is formed of continuous video frames and the target video frames in the video to be processed are subjected to word recognition, it may be determined, according to a preset recognition frequency, which video frames are target video frames and need to be subjected to word recognition, that is, the target video frames in the video to be processed are subjected to word recognition, and the specific implementation process may further be as follows:
determining a target video frame in the video to be processed according to a preset identification frequency;
and performing character recognition on the target video frame.
Of course, in practical application, besides determining the target video frame in the video to be processed through the preset recognition frequency to perform character recognition, the target video frame in the video to be processed can be determined according to the key frame to perform character recognition. That is, image analysis is performed on each video frame included in the video to be processed, whether the video frame is a key frame is determined, if the video frame is a key frame, text recognition is performed on the video frame as a target video frame, and if the video frame is not a key frame, the video frame is not taken as a target video frame, and text recognition is not performed. The key frame may be a video frame whose content such as PPT or an blackboard writing included in the video to be processed changes, or a video frame whose content is different from that of the previous video frame in the video to be processed.
Step 304: and encoding the identified characters into the target video frame as subtitles to obtain playing data of the target video frame, and playing the playing data of the target video frame locally.
According to the method and the device, characters obtained by identifying the target video frames can be used as subtitles to be encoded into the target video frames, so that the characters in the target video frames can be displayed in a clearer and more stable mode, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
Step 306: and taking the identified characters as operable contents of the target video frame, and generating playing data of the target video frame based on the operable contents and the target video frame.
According to the method and the device, the identified characters can be used as the operable content of the target video frame and are independent of the specific video content, so that the identified characters can be edited independently, a user can conveniently process the identified characters, and user experience is improved.
It should be noted that, in practical applications, step 304 and step 306 are two different ways of generating the play data of the target video frame, and may alternatively be performed.
Step 308: and transmitting the playing data of the target video frame to a playing end for playing.
Step 310: and in the process of playing the playing data of the target video frame by the playing end, receiving the editing operation of the user on the basis of the playing end on the target barrage, and updating the target barrage according to the editing operation to obtain the updated target barrage.
According to the method and the device, the server side can detect editing operation of a user on the target barrage in the process of playing data of the target video frame, update the target barrage according to the editing operation, and obtain the updated target barrage, so that the content with the identification error can be corrected based on the editing operation of the user, and the correctness of the character content with the enhanced display is guaranteed.
In practical applications, the steps 308-310 are operational steps that may be performed after the step 306.
According to the video processing method, the server side can acquire the video to be processed first, perform character recognition on the target video frame in the acquired video to be processed, then take the recognized characters as video enhancement contents, perform character enhancement processing on the target video frame to obtain play data of the target video frame, and then perform play processing on the play data of the target video frame. Under the condition, the server can perform character recognition on the acquired target video frame in the video to be processed, and the recognized characters are used as video enhancement contents and are processed separately from specific video contents, so that the characters in the target video frame can be enhanced, the characters in the video can be clearly displayed, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
Corresponding to the above method embodiment, the present application further provides an embodiment of a video processing apparatus, and fig. 4 shows a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:
an acquisition module 402, configured to acquire a video to be processed, and perform text recognition on a target video frame in the video to be processed;
the enhancement module 404 is configured to take the identified text as video enhancement content, perform text enhancement processing on the target video frame, and obtain play data of the target video frame;
and a playing module 406 configured to play the playing data of the target video frame.
Optionally, enhancement module 404 is further configured to:
and encoding the identified characters into the target video frame as subtitles to obtain playing data of the target video frame.
Optionally, enhancement module 404 is further configured to:
the identified characters are used as the operable content of the target video frame;
playing data of the target video frame is generated based on the operable content and the target video frame.
Optionally, the operable content is a barrage; enhancement module 404 is further configured to:
during the process of playing the playing data of the target video frame, editing operation aiming at a target barrage which is any operable content is obtained;
And updating the target barrage according to the editing operation to obtain the updated target barrage.
Optionally, the operable content is a bullet screen/subtitle; enhancement module 404 is further configured to:
in the process of playing the playing data of the target video frame, receiving a copying operation aiming at target operable content, wherein the target operable content is any operable content;
copy the target operable content and store it.
Optionally, the acquisition module 402 is further configured to:
receiving a video picture acquired by an acquisition end, and taking the video picture as an acquired video to be processed; or alternatively, the process may be performed,
and acquiring a historical video, and taking the historical video as a video to be processed.
Optionally, the acquisition module 402 is further configured to:
determining a preset area in a target video frame;
and carrying out character recognition on a preset area in the target video frame.
Optionally, the acquisition module 402 is further configured to:
determining a target video frame in the video to be processed according to a preset identification frequency;
and performing character recognition on the target video frame.
Optionally, the playback module 406 is further configured to:
transmitting the playing data of the target video frame to a playing end for playing; or alternatively, the process may be performed,
And playing the playing data of the target video frame locally in the equipment.
The video processing device provided by the application can acquire the video to be processed first, perform character recognition on the target video frame in the acquired video to be processed, then take the recognized characters as video enhancement contents, perform character enhancement processing on the target video frame to obtain the play data of the target video frame, and then perform play processing on the play data of the target video frame. Under the condition, the characters in the target video frame can be enhanced by carrying out character recognition on the acquired target video frame in the video to be processed, and the recognized characters are used as video enhancement contents and are processed separately from specific video contents, so that the characters in the target video frame can be enhanced, the characters in the video can be clearly displayed, the display effect is improved, the characters in the video can be clearly seen by remote personnel watching the video through a network, and the user experience is improved.
The above is a schematic solution of a video processing apparatus of the present embodiment. It should be noted that, the technical solution of the video processing apparatus and the technical solution of the video processing method belong to the same concept, and details of the technical solution of the video processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the video processing method.
Fig. 5 illustrates a block diagram of a computing device 500, provided in accordance with an embodiment of the present application. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530 and database 550 is used to hold data.
Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 5 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.
Wherein the processor 520 is configured to execute computer-executable instructions to perform the steps of any video processing method.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the video processing method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the video processing method.
An embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform the steps of any video processing method.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the video processing method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the video processing method.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims (12)

1. A video processing method, comprising:
acquiring a video to be processed, and performing character recognition on a target video frame in the video to be processed;
taking the identified characters as video enhancement contents, and performing character enhancement processing on the target video frames to obtain playing data of the target video frames;
and performing playing processing on the playing data of the target video frame.
2. The video processing method according to claim 1, wherein the step of performing text enhancement processing on the target video frame using the identified text as video enhancement content to obtain play data of the target video frame includes:
and encoding the identified characters into the target video frame as subtitles to obtain playing data of the target video frame.
3. The video processing method according to claim 1, wherein the step of performing text enhancement processing on the target video frame using the identified text as video enhancement content to obtain play data of the target video frame includes:
the identified characters are used as the operable content of the target video frame;
and generating playing data of the target video frame based on the operable content and the target video frame.
4. The video processing method according to claim 3, wherein the operable content is a bullet screen; performing playing processing on the playing data of the target video frame, including:
during the process of playing the playing data of the target video frame, editing operation aiming at a target barrage which is any operable content is obtained;
and updating the target barrage according to the editing operation to obtain the updated target barrage.
5. The video processing method according to claim 3, wherein the operable content is a bullet screen/subtitle; performing playing processing on the playing data of the target video frame, including:
receiving copy operation aiming at target operable content in the process of playing the playing data of the target video frame, wherein the target operable content is any operable content;
and copying and storing the target operable content.
6. The method for video processing according to any one of claims 1 to 5, wherein the acquiring the video to be processed includes:
receiving a video picture acquired by an acquisition end, and taking the video picture as an acquired video to be processed; or alternatively, the process may be performed,
And acquiring a historical video, and taking the historical video as the video to be processed.
7. The method for processing video according to any one of claims 1 to 5, wherein the performing text recognition on the target video frame in the video to be processed includes:
determining a preset area in the target video frame;
and carrying out character recognition on a preset area in the target video frame.
8. The method for processing video according to any one of claims 1 to 5, wherein the performing text recognition on the target video frame in the video to be processed includes:
determining a target video frame in the video to be processed according to a preset identification frequency;
and performing character recognition on the target video frame.
9. The video processing method according to any one of claims 1 to 5, wherein the playing processing of the playing data of the target video frame includes:
transmitting the playing data of the target video frame to a playing end for playing; or alternatively, the process may be performed,
and playing the playing data of the target video frame locally in the equipment.
10. A video processing apparatus, comprising:
the acquisition module is configured to acquire a video to be processed and perform character recognition on a target video frame in the video to be processed;
The enhancement module is configured to take the identified characters as video enhancement contents, and perform character enhancement processing on the target video frames to obtain playing data of the target video frames;
and the playing module is configured to play the playing data of the target video frame.
11. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer executable instructions and the processor is configured to execute the computer executable instructions to implement the steps of the video processing method of any one of claims 1 to 9.
12. A computer readable storage medium, characterized in that it stores computer executable instructions which, when executed by a processor, implement the steps of the video processing method of any one of claims 1 to 9.
CN202111267020.1A 2021-10-28 2021-10-28 Video processing method and device Pending CN116055797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111267020.1A CN116055797A (en) 2021-10-28 2021-10-28 Video processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111267020.1A CN116055797A (en) 2021-10-28 2021-10-28 Video processing method and device

Publications (1)

Publication Number Publication Date
CN116055797A true CN116055797A (en) 2023-05-02

Family

ID=86118734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111267020.1A Pending CN116055797A (en) 2021-10-28 2021-10-28 Video processing method and device

Country Status (1)

Country Link
CN (1) CN116055797A (en)

Similar Documents

Publication Publication Date Title
US11308993B2 (en) Short video synthesis method and apparatus, and device and storage medium
WO2019205872A1 (en) Video stream processing method and apparatus, computer device and storage medium
US20220188357A1 (en) Video generating method and device
US9137586B2 (en) Content creation method and media cloud server
US10897658B1 (en) Techniques for annotating media content
CN112118395B (en) Video processing method, terminal and computer readable storage medium
KR20150083355A (en) Augmented media service providing method, apparatus thereof, and system thereof
WO2017157135A1 (en) Media information processing method, media information processing device and storage medium
CN113038185B (en) Bullet screen processing method and device
CN111010598A (en) Screen capture application method and smart television
CN113225585A (en) Video definition switching method and device, electronic equipment and storage medium
CN108881119B (en) Method, device and system for video concentration
CN110415318B (en) Image processing method and device
US8896708B2 (en) Systems and methods for determining, storing, and using metadata for video media content
CN116055797A (en) Video processing method and device
CN115499677A (en) Audio and video synchronization detection method and device based on live broadcast
KR20140134126A (en) Content creation method and apparatus
KR101997909B1 (en) Program and recording medium for extracting ai image learning parameters for resolution restoration
US20210303853A1 (en) Systems and methods for automated tracking on a handheld device using a remote camera
US20210303830A1 (en) Systems and methods for automated tracking using a client device
US20150179228A1 (en) Synchronized movie summary
CN110691256B (en) Video associated information processing method and device, server and storage medium
KR20190140813A (en) Image learning system that performs resolution restoration function based on object
CN114466223B (en) Video data processing method and system for coding technology
EP4057631A1 (en) Method and apparatus for live streaming, server, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination