CN113786605A - Video processing method, apparatus and computer readable storage medium - Google Patents
Video processing method, apparatus and computer readable storage medium Download PDFInfo
- Publication number
- CN113786605A CN113786605A CN202110971576.2A CN202110971576A CN113786605A CN 113786605 A CN113786605 A CN 113786605A CN 202110971576 A CN202110971576 A CN 202110971576A CN 113786605 A CN113786605 A CN 113786605A
- Authority
- CN
- China
- Prior art keywords
- video
- common elements
- file
- sound effect
- displayed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 31
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 230000000694 effects Effects 0.000 claims abstract description 111
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000006835 compression Effects 0.000 claims description 24
- 238000007906 compression Methods 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 19
- 230000009471 action Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000004069 differentiation Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004806 packaging method and process Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 229910000635 Spelter Inorganic materials 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/30—Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers
- A63F13/35—Details of game servers
- A63F13/355—Performing operations on behalf of clients with restricted processing capabilities, e.g. servers transform changing game scene into an encoded video stream for transmitting to a mobile phone or a thin client
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/231—Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/262—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
- H04N21/26208—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
- H04N21/4665—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4781—Games
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/50—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
- A63F2300/53—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing
- A63F2300/538—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing for performing operations on behalf of the game client, e.g. rendering
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The application discloses a video processing method, a video processing device and a computer readable storage medium, wherein the video processing method comprises the following steps: acquiring common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
Description
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video processing method and apparatus, and a computer-readable storage medium.
Background
Cloud games are very hot at present, and the cloud games mainly adopt the modes of image quality reduction and real-time coding streaming to realize video compression, but the former video compression mode can sacrifice certain image quality, and the latter video compression mode is not good experience for delay-sensitive games (such as fps games).
Disclosure of Invention
The embodiment of the application aims to solve the problems that the image quality is influenced and the starting time of cloud application is prolonged by adopting the conventional video compression mode.
In order to achieve the above object, an aspect of the present application provides a video processing method, where the video processing method is applied to a server, and the method includes:
acquiring common elements with the same sound effect in each video frame of a video file of a target application;
extracting common elements in the video frame;
generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
Optionally, the step of acquiring common elements with the same sound effect in each video frame of the video file of the target application includes:
and extracting n frames of images from the video frames of the video file with the same sound effect to perform image comparison to obtain the common elements.
Optionally, before the step of obtaining common elements with the same sound effect in each video frame of the video file of the target application, the method includes:
acquiring the calling times of each independent sound effect file in the video file of the target application;
and taking the video frame corresponding to the independent sound effect file with the calling times larger than the preset time threshold value as the video frame with the same sound effect in the video file.
Optionally, the step of obtaining the number of times of calling each independent sound effect file in the video file of the target application includes:
acquiring voice similarity of each independent sound effect file in the video file of the target application;
taking the independent sound effect file with the voice similarity larger than a preset value as the same target sound effect file;
and acquiring the calling times of the target sound effect files to obtain the calling times of the independent sound effect files.
In order to achieve the above object, an aspect of the present application provides a video processing method, where the video processing method is applied to a user side, and the method includes:
acquiring a video frame to be displayed;
calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
and adding the common elements to the video frames to be displayed, and displaying the video frames to be displayed after the common elements are added.
Optionally, before the step of calling the common element in the video frame to be displayed according to the associated information of the video frame to be displayed, the method includes:
judging whether the common elements are prestored locally according to the associated information of the video frames to be displayed;
when the common elements are not pre-stored locally, sending a downloading request of the common elements to a server;
and downloading the common elements in the video frames to be displayed from the server according to the downloading request.
In addition, to achieve the above object, another aspect of the present application further provides a video processing apparatus, which includes a first obtaining module, a compressing module, a processing module, and a sending module, wherein:
the first acquisition module is used for acquiring common elements with the same sound effect in each video frame of the video file of the target application;
the compression module is used for extracting common elements in the video frames;
the processing module is used for generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
the sending module is used for sending the common elements, the video files after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when the video files are displayed.
In addition, to achieve the above object, another aspect of the present application further provides a video processing apparatus, including a second obtaining module, a calling module, and a display module, where:
the second acquisition module is used for acquiring a video frame to be displayed;
the calling module is used for calling the common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
the display module is used for adding the common elements to the video frames to be displayed and displaying the video frames to be displayed after the common elements are added.
In addition, in order to achieve the above object, another aspect of the present application further provides a video processing apparatus, which includes a memory, a processor, and a video processing program stored in the memory and running on the processor, wherein the processor implements the steps of the video processing method as described above when executing the video processing program.
In addition, to achieve the above object, another aspect of the present application further provides a computer readable storage medium having a video processing program stored thereon, where the video processing program, when executed by a processor, implements the steps of the video processing method as described above.
The application provides a video processing method, which comprises the steps of obtaining common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a schematic flowchart illustrating a first embodiment of a video processing method according to the present application;
FIG. 3 is a schematic flow chart of the video processing method before the step of obtaining common elements with the same sound effect in each video frame of the video file of the target application;
fig. 4 is a schematic flowchart of a video processing method according to a third embodiment of the present application;
FIG. 5 is a schematic diagram of a first module of the video processing apparatus of the present application;
fig. 6 is a schematic diagram of a second module of the video processing apparatus according to the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The main solution of the embodiment of the application is as follows: acquiring common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
Because a large number of repeated elements exist in the cloud game video, the problem that the large number of repeated elements are repeatedly called for many times by adopting the conventional video compression mode cannot be solved, so that the starting time of the cloud game is long, and the user experience is influenced. The method comprises the steps of obtaining common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a terminal device in a hardware operating environment according to an embodiment of the present application.
As shown in fig. 1, the terminal device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal device configuration shown in fig. 1 does not constitute a limitation of the terminal device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a video processing program may be included in a memory 1005, which is a kind of computer-readable storage medium.
In the terminal device shown in fig. 1, the network interface 1004 is mainly used for data communication with the background server; the user interface 1003 is mainly used for data communication with a client (user side); when the terminal is a server, the processor 1001 may be configured to call a video processing program in the memory 1005 and perform the following operations:
acquiring common elements with the same sound effect in each video frame of a video file of a target application;
extracting common elements in the video frame;
generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
When the terminal is a user terminal, the processor 1001 may be configured to call the video processing program in the memory 1005, and perform the following operations:
acquiring a video frame to be displayed;
calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
and adding the common elements to the video frames to be displayed, and displaying the video frames to be displayed after the common elements are added.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a video processing method according to a first embodiment of the present application.
It should be noted that although a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different from that shown or described herein.
The video processing method of the embodiment is applied to a server and comprises the following steps:
step S10, common elements with the same sound effect in each video frame of the video file of the target application are obtained;
it should be noted that the target application of the present application may be a cloud application or a common application, where the cloud application refers to an application in which a terminal and a service (cloud) end interact with each other, the terminal operates in a synchronous cloud, and occupies a local space and also reserves terminal data through cloud backup. The application explains the analysis of the video processing method by cloud game application.
In the game, the specific sound effect and the picture element are bound in a one-to-one correspondence, so that the same picture element can be found only by finding out the part with the same sound. In an embodiment, in the same game, videos with the same or different level progress are selected according to the game progress of a player, video clips (i.e., video frames) with the same sound effect are obtained based on the videos, n key frame numbers (n is 1, 2, 3.. n) are extracted from each video clip, and image differentiation comparison is performed to obtain differentiation elements in the video clips, for example, Structural Similarity Index (SSIM) is used, or image differentiation comparison and other methods are used to compare image frames with other image frames respectively, and then subtraction is performed to subtract the differentiation elements in the video clips to obtain common elements (i.e., repeat elements).
Further, the obtained common elements are input into the object recognition model for classification recognition training to recognize a continuous action, for example, the punching action of the game character is a continuous action, and the game character is punched out first and then retracted. All actions of the common elements in the sound effect generation stage can be recognized by adopting the object recognition model. In one embodiment, since the sound is continuous in a segment, the motion is continuous, different videos are aligned according to the sound, that is, the videos are aligned according to the sound, and the pictures are aligned due to the binding of the sound and the pictures. Then, the elements in the aligned frames are detected by using an edge detection method, for example, two different frames are detected first (two video frames with a larger difference can be found by using picture similarity comparison), and objects in the frames are cross-compared by using image comparison, so as to find objects with a higher similarity (a threshold value can be set, for example, 100%). And after the object with higher similarity is obtained, comparing the object with the element in the next picture, and repeating the steps until the number of the objects with higher similarity is not reduced after n continuous pictures are compared, wherein the object is a common element. Therefore, the change process of the common element in the sound effect generation time period can be defined in an object recognition mode.
Step S20, extracting common elements in the video frame;
after the common elements of the video frames are obtained, the common elements need to be extracted from the video frames, that is, the common elements need to be extracted from the video frames, so that the common elements do not need to be transmitted when the video streams are transmitted, and the volume of the transmitted video is saved. In order to increase the transmission efficiency of the video stream, the extracted common elements need to be compressed, wherein the common elements can be divided into different compression types. For example, since different levels are provided in a game, the elements of each level are mostly different, but some elements are across all levels, such as UI parts (scores, names, head portraits, etc.), so for common elements across all levels, the common elements can be classified as separately compressed during compression, so that the common elements are valid in all game levels. While for the generic elements specific to the customs, another type of compression may be employed for compression. In one embodiment, the compression type of the common element may be determined by obtaining the number of times the common element appears in the image frames, for example, determining whether the number of image frames corresponding to the extracted common element is greater than N, and if so, indicating that the common element appears in N video segments, and at this time, considering the common element as a common element across all the level, and compressing such common element according to a single compression type. After obtaining a plurality of separately compressed classifications, the separately compressed classifications need to be deduplicated to obtain non-duplicated portions. And further storing the common element compression package into a common element compression library, wherein common element compression libraries corresponding to different game types are established in the server, and when the user plays games, the corresponding common element compression package can be loaded from the common element compression libraries. When the cloud game video is coded, the object searching capability is used, the key frame is found to comprise the content of the common element compression library, the common element is set with a specific mark, the user is reminded to download the common element, and meanwhile, the server side records the downloaded common element.
Step S30, generating the association information of the common elements and the video frames of the video files according to the common elements in the video frames;
in this embodiment, when the common elements are extracted, the progress sequence of the game is obtained, the common elements are numbered according to the progress sequence of the game, and the corresponding game progress when the common elements appear for the first time is recorded, wherein the game progress when the common elements appear is recorded according to the sequence, so as to facilitate the calling of the common elements. When the common elements are numbered, the position information, such as coordinates, of the image frame where the common elements are located needs to be recorded, so that the user side can call the common elements in the video frame according to the numbers and the coordinates of the common elements, and the method is convenient and fast.
Step S40, the commonalities, the video files after the commonalities are extracted and the associated information are sent to a user side, so that the user side can call the commonalities according to the associated information when displaying the video files.
The server sends the common elements, the video file after extracting the common elements and the associated information to the user side in a wireless mode, so that the user side calls the common elements according to the associated information when displaying the video file, for example, the user side calls the common elements required by the video file according to the number + coordinates of the common elements.
In the embodiment, common elements with the same sound effect in each video frame of a video file of a target application are obtained; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
Further, referring to fig. 3, a second embodiment of the video processing method of the present application is proposed.
The video processing method is applied to a server, and the difference between the second embodiment of the video processing method and the first embodiment is that the step of acquiring common elements with the same sound effect in each video frame of a video file of a target application is preceded by the following steps:
step S11, obtaining the calling times of each independent sound effect file in the video file of the target application;
and step S12, taking the video frame corresponding to the independent sound effect file with the calling times larger than the preset time threshold value as the video frame with the same sound effect in the video file.
In this embodiment, deep learning is adopted to perform sound effect separation on the audio in the video file to obtain each independent sound effect file, and the separated independent sound effect files may have differences in restoration, wherein the most significant characteristic is that the audio fingerprints are inconsistent. At this time, the voice similarity score may be used, and then the independent sound effect files with high similarity are merged, for example, audio feature extraction (i.e., voiceprint extraction) is performed by using MFCC + GMM (voiceprint extraction and recognition), and then a classification model, for example, an SVM (Support Vector Machine) is used to classify or score the voiceprint, so as to obtain the independent sound effect files with similar scores, that is, the voice similarity. And the similarity of the feature vectors between the two sections of voice can be directly compared to obtain an independent sound effect file with high similarity. And further numbering all separated independent sound effect files, wherein the independent sound effect files with high similarity are merged to the same number. For example, if the similarity between the independent sound effect file a and the independent sound effect file B is greater than 95%, the independent sound effect file a and the independent sound effect file B are marked with the same number. In the game process, the separated independent sound effect files can be detected in real time by using audio fingerprints, if the independent sound effect files are detected to be called, the number +1 is carried out on the corresponding number to indicate that the independent sound effect files are called for 1 time, therefore, the calling times of the independent sound effect files corresponding to the number can be obtained according to the calling times of all the numbers, then the video frames corresponding to the independent sound effect files with the calling times larger than the preset times threshold value are obtained, the video frames are used as the video frames with the same sound effect in the video files, and the independent sound files with the calling times being few are filtered.
Because the sound and the picture elements in the game are in one-to-one correspondence, after a large amount of repeatedly called audio is found, a large amount of repeatedly compressible picture elements in the video can be found. The independent sound effect file with the small called times shows that the number of times of the picture element is small, and the compression value is not large. It will be appreciated that the more frequent the sounds occur, the greater the value of the compression.
The number of the independent sound files is used for counting the calling times of the independent sound files, and the video clips with the calling times larger than the preset times threshold are determined to be the video clips with the same sound effect in the game video, so that the video clips with the same sound effect in the game video can be quickly and accurately acquired.
Further, there may be various types of sounds in the game, such as: the sound system comprises action sound effect, special effect sound effect, event sound effect, map sound effect (including BGM), voice and the like, wherein the sounds are bound with partial elements (such as characters and animals) in the picture one by one, and when the sounds are repeated, the partial elements in the picture can also be repeated. Meanwhile, these sound effects may be triggered at the same time to output a mixed sound, and therefore, these mixed sound needs to be separated to obtain each independent sound effect file.
In one embodiment, the audio track separation may be performed using deep learning, wherein the parsing is performed by using a spelter (audio track separation software) as an example.
Training the model based on the Unet neural network:
a. during training, a group of voice data (such as mixed sound, map sound, action sound, special effect sound and event sound) is aligned on a time axis, the frequency spectrum of the voice data is extracted, and then a magnitude spectrum is calculated based on the frequency spectrum.
b. And respectively inputting the mixed sound amplitude spectrum into a map sound Unet, an action sound Unet, a special effect sound Unet and an event sound Unet to obtain predicted map sound Unet, action sound Unet, special effect sound Unet and event sound Unet, respectively calculating the distance between a prediction result and a standard result and averaging the distance to be used as a loss function, wherein internal parameters of the map sound Unet, the action sound Unet, the special effect sound Unet and the event sound Unet are continuously updated along with data input.
c. After the amplitude spectrums of the map sound, the action sound, the special effect sound and the event sound are predicted, the spleteter squares the map sound, the action sound, the special effect sound and the event sound respectively to obtain map sound energy m _ eng, action sound energy v _ eng, special effect sound energy s _ eng and event sound e _ eng respectively. Then, the occupation ratio of the map sound at each time on each frequency band of the mixed sound is calculated by using m _ mask/(m _ eng + v _ eng + s _ eng + e _ eng), and similarly, the occupation ratio of the action sound, the special effect sound and the event sound on each frequency band is calculated. And finally, multiplying the input mixed sound frequency spectrum by m _ mask, v _ mask, s _ mask and e _ mask respectively to obtain map sound, action sound, special effect sound and event sound frequency spectrums, and obtaining the voices of the map sound, the action sound, the special effect sound and the event sound by using inverse STFT (short time Fourier transform), namely each independent sound effect file.
In which the independent sound effect file obtained by the separation of the audio track is not 100% restored, there are some differences, and therefore, the sound effect can be normalized by the speech similarity algorithm. For example, when the sound in the cloud game video is played, the played sound effect and the extracted sound effect are matched by using a voice similarity calculation method, and when the matching degree is greater than n, the played sound effect and the extracted sound effect are the same sound, so that whether the extracted independent sound effect file is correct or not can be determined.
In one embodiment, since most games are developed by using a common game engine, and the sound effect and music files of the games packaged by the game engine are usually stored according to a conventional path, each independent sound effect file can be directly extracted from the game files. Wherein, under the game directory, the packaged audio file resources are usually encrypted, but these files can be unpacked by a specific tool, so as to obtain each independent sound effect file in the game. For example, audio files of music, sound effects and the like of the game are located in an installation game directory of the game application, and the conventional audio files can be acquired by means of extension searching. For part of games, the audio files are packaged into other formats by engine packaging and the like, so that the directories can be searched by the rules of folder naming. For example, the extraction method is described by an Android game:
a. the suffix name is modified to be zip or rar, and the folder is obtained by decompression.
b. In its directory, the lookup contains: the subdirectory of names of audio, music, sound, BGM, etc. looks at the header name in the file through the notebook, determines the file packaging format, for example, if the header name UnityFS indicates that it is packaged by unity, in addition to UnityFS, RIFF, BKHD, etc. are common packages.
c. Acquiring audio data: the audio file after being packaged can not be directly played by an audio player, and the audio data needs to be extracted. For example, for a Unity file, the extraction may be performed using a debugging tool of the Unity Studio class. If the file is of other types, the part before the FSB5 can be deleted in a 16-system editor (such as UltraEdit), then the file is saved as the FSB file, and then the FSB file is converted into the ogg audio format through the 3-party tool, so that each independent sound effect file in the game can be obtained.
According to the embodiment, the audio track in the video file is separated in a deep learning mode to obtain each independent sound effect file, so that the independent sound effect files in the video file can be accurately extracted.
Further, referring to fig. 4, a third embodiment of the video processing method of the present application is proposed.
The video processing method is applied to a user side, and comprises the following steps:
step S50, acquiring a video frame to be displayed;
step S60, calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
step S70, adding the generic element to the video frame to be displayed, and displaying the video frame to be displayed after the generic element is added.
In this embodiment, a current game type is acquired, and a video frame to be displayed is acquired according to the game type, wherein different game types and corresponding video frames to be displayed are provided with the same ID, and the ID is used for associating the game type with the video frame to be displayed. After a user enters a game, the current game progress is obtained (the game progress can be confirmed in image identification, comparison and other modes), then the common element compression package is preloaded according to the game progress, and the common element compression package is decoded at the local end to obtain the number + coordinate corresponding to the common element. At this time, the number + coordinate in the video frame to be displayed is acquired, the decoded common element is called according to the number + coordinate in the video frame to be displayed, for example, the number + coordinate in the video frame to be displayed is matched with the number + coordinate of the decoded common element, the successfully matched common element is added to the video frame to be displayed according to the coordinate of the successfully matched common element, and the video frame to be displayed after the common element is added is displayed. When the video file is coded, the coded frame image and the common elements are subjected to element comparison to find out the repeated part, then the repeated part is deducted, replaced by a specific mark (namely, a number), and the coordinates of the repeated part are coded together.
When the number + coordinate in the video frame to be displayed is acquired, whether the matched common elements are prestored is inquired from the local according to the number + coordinate, if the matched common elements are not prestored locally, a downloading request of the common elements is sent to the server, and the corresponding common elements are downloaded from the server according to the downloading request.
Wherein, for the non-cloud game, but the video related to the cloud game, the common elements of the server can be packaged and attached to the tail end of the video. When the video is played, data is decompressed to the memory and then called in the memory. The principle is consistent with the above principle, and the difference is that the former is based on the interaction between the server and the local, and the latter is based on the local processing completely.
In the embodiment, the common element is called through the associated information of the video frame to be displayed, then the common element is added into the video frame to be displayed, and the video frame to be displayed after the common element is added is displayed, so that the problem that the cloud application is long in starting time due to the fact that the common element is repeatedly transmitted when the video stream is transmitted is avoided.
In addition, the present application also provides a video processing apparatus, which includes a memory, a processor, and a video processing program stored in the memory and running on the processor, wherein the processor implements the steps of the video processing method when executing the video processing program.
Referring to fig. 5, the video processing apparatus 100 includes a first obtaining module 10, a compressing module 20, a processing module 30, and a transmitting module 40, wherein:
the first obtaining module 10 is configured to obtain common elements with the same sound effect in each video frame of a video file of a target application;
the compression module 20 is configured to extract common elements in the video frames;
the processing module 30 is configured to generate association information between the common element and a video frame of the video file according to the common element in the video frame;
the sending module 40 is configured to send the common elements, the video file after the common elements are extracted, and the associated information to a user side, so that the user side calls the common elements according to the associated information when displaying the video file;
further, the first obtaining module 10 includes a first obtaining unit;
the first acquisition unit is used for extracting n frames of images from video frames of the video file with the same sound effect to perform image comparison to obtain the common elements.
Further, the first obtaining unit is further configured to obtain the number of calls of each independent sound effect file in the video file of the target application;
the first obtaining unit is further configured to use the video frame corresponding to the independent sound effect file with the calling frequency greater than a preset frequency threshold value as the video frame with the same sound effect in the video file.
Further, the first obtaining unit comprises a first obtaining subunit;
the first obtaining subunit is configured to obtain a voice similarity of each independent sound effect file in the video file of the target application;
the first obtaining subunit is further configured to use the independent sound effect file with the voice similarity greater than a preset value as the same target sound effect file;
the first obtaining subunit is further configured to obtain the calling times of each target sound effect file to obtain the calling times of each independent sound effect file.
Referring to fig. 6, the video processing apparatus 100 includes a second obtaining module 50, a calling module 60, and a display module 70, wherein:
the second obtaining module 50 is configured to obtain a video frame to be displayed;
the calling module 60 is configured to call a common element in the video frame to be displayed according to the associated information of the video frame to be displayed;
the display module 70 is configured to add the common element to the video frame to be displayed, and display the video frame to be displayed after the common element is added.
Further, the calling module 60 includes a judging unit, a sending unit and a downloading unit;
the judging unit is used for judging whether the common elements are prestored locally according to the associated information of the video frames to be displayed;
the sending unit is configured to send a download request of the common element to a server when the common element is not pre-stored locally;
and the downloading unit is used for downloading the common elements in the video frames to be displayed from the server according to the downloading request.
The implementation of the functions of each module of the video processing apparatus is similar to the process in the above method embodiment, and is not repeated here.
Furthermore, the present application also provides a computer-readable storage medium having stored thereon a video processing method program, which when executed by a processor implements the steps of the above video processing method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While alternative embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including alternative embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (10)
1. A video processing method applied to a server, the method comprising:
acquiring common elements with the same sound effect in each video frame of a video file of a target application;
extracting common elements in the video frame;
generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
2. The video processing method according to claim 1, wherein the step of obtaining common elements having the same sound effect in each video frame of the video file of the target application comprises:
and extracting n frames of images from the video frames of the video file with the same sound effect to perform image comparison to obtain the common elements.
3. The video processing method according to claim 1, wherein the step of obtaining common elements having the same sound effect in each video frame of the video file of the target application is preceded by:
acquiring the calling times of each independent sound effect file in the video file of the target application;
and taking the video frame corresponding to the independent sound effect file with the calling times larger than the preset time threshold value as the video frame with the same sound effect in the video file.
4. The video processing method according to claim 3, wherein the step of obtaining the number of calls for each independent sound effect file in the video file of the target application comprises:
acquiring voice similarity of each independent sound effect file in the video file of the target application;
taking the independent sound effect file with the voice similarity larger than a preset value as the same target sound effect file;
and acquiring the calling times of the target sound effect files to obtain the calling times of the independent sound effect files.
5. A video processing method is applied to a user side, and the method comprises the following steps:
acquiring a video frame to be displayed;
calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
and adding the common elements to the video frames to be displayed, and displaying the video frames to be displayed after the common elements are added.
6. The video processing method according to claim 5, wherein the step of calling the common element in the video frame to be displayed according to the associated information of the video frame to be displayed is preceded by:
judging whether the common elements are prestored locally according to the associated information of the video frames to be displayed;
when the common elements are not pre-stored locally, sending a downloading request of the common elements to a server;
and downloading the common elements in the video frames to be displayed from the server according to the downloading request.
7. A video processing apparatus, comprising a first obtaining module, a compressing module, a processing module, and a sending module, wherein:
the first acquisition module is used for acquiring common elements with the same sound effect in each video frame of the video file of the target application;
the compression module is used for extracting common elements in the video frames;
the processing module is used for generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
the sending module is used for sending the common elements, the video files after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when the video files are displayed.
8. A video processing apparatus, comprising a second obtaining module, a calling module, and a display module, wherein:
the second acquisition module is used for acquiring a video frame to be displayed;
the calling module is used for calling the common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
the display module is used for adding the common elements to the video frames to be displayed and displaying the video frames to be displayed after the common elements are added.
9. A video processing apparatus comprising a memory, a processor and a video processing program stored on the memory and running on the processor, the processor when executing the video processing program implementing the steps of the method according to any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a video processing program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110971576.2A CN113786605B (en) | 2021-08-23 | 2021-08-23 | Video processing method, apparatus and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110971576.2A CN113786605B (en) | 2021-08-23 | 2021-08-23 | Video processing method, apparatus and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113786605A true CN113786605A (en) | 2021-12-14 |
CN113786605B CN113786605B (en) | 2024-03-22 |
Family
ID=78876328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110971576.2A Active CN113786605B (en) | 2021-08-23 | 2021-08-23 | Video processing method, apparatus and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113786605B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900715A (en) * | 2022-04-27 | 2022-08-12 | 深圳元象信息科技有限公司 | Video data processing method, terminal and storage medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6184182A (en) * | 1984-10-02 | 1986-04-28 | Victor Co Of Japan Ltd | Generating device for repetitive sound signal synchronized with video signal system |
CN104113768A (en) * | 2014-06-26 | 2014-10-22 | 小米科技有限责任公司 | Associated information generation method and device |
KR20180028588A (en) * | 2016-09-08 | 2018-03-19 | 주식회사 이타기술 | Method and apparatus for adaptive frame synchronizaion |
CN110113677A (en) * | 2018-02-01 | 2019-08-09 | 阿里巴巴集团控股有限公司 | The generation method and device of video subject |
CN110213610A (en) * | 2019-06-13 | 2019-09-06 | 北京奇艺世纪科技有限公司 | A kind of live scene recognition methods and device |
CN110677716A (en) * | 2019-08-20 | 2020-01-10 | 咪咕音乐有限公司 | Audio processing method, electronic device, and storage medium |
CN110740390A (en) * | 2019-11-12 | 2020-01-31 | 成都索贝数码科技股份有限公司 | video and audio credible playing method for generating associated abstract based on interframe extraction |
CN111355977A (en) * | 2020-04-16 | 2020-06-30 | 广东小天才科技有限公司 | Optimized storage method and device for live webcast video |
CN111669612A (en) * | 2019-03-08 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Live broadcast-based information delivery method and device and computer-readable storage medium |
CN112272327A (en) * | 2020-10-26 | 2021-01-26 | 腾讯科技(深圳)有限公司 | Data processing method, device, storage medium and equipment |
CN112367551A (en) * | 2020-10-30 | 2021-02-12 | 维沃移动通信有限公司 | Video editing method and device, electronic equipment and readable storage medium |
CN112616061A (en) * | 2020-12-04 | 2021-04-06 | Oppo广东移动通信有限公司 | Live broadcast interaction method and device, live broadcast server and storage medium |
-
2021
- 2021-08-23 CN CN202110971576.2A patent/CN113786605B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6184182A (en) * | 1984-10-02 | 1986-04-28 | Victor Co Of Japan Ltd | Generating device for repetitive sound signal synchronized with video signal system |
CN104113768A (en) * | 2014-06-26 | 2014-10-22 | 小米科技有限责任公司 | Associated information generation method and device |
KR20180028588A (en) * | 2016-09-08 | 2018-03-19 | 주식회사 이타기술 | Method and apparatus for adaptive frame synchronizaion |
CN110113677A (en) * | 2018-02-01 | 2019-08-09 | 阿里巴巴集团控股有限公司 | The generation method and device of video subject |
CN111669612A (en) * | 2019-03-08 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Live broadcast-based information delivery method and device and computer-readable storage medium |
CN110213610A (en) * | 2019-06-13 | 2019-09-06 | 北京奇艺世纪科技有限公司 | A kind of live scene recognition methods and device |
CN110677716A (en) * | 2019-08-20 | 2020-01-10 | 咪咕音乐有限公司 | Audio processing method, electronic device, and storage medium |
CN110740390A (en) * | 2019-11-12 | 2020-01-31 | 成都索贝数码科技股份有限公司 | video and audio credible playing method for generating associated abstract based on interframe extraction |
CN111355977A (en) * | 2020-04-16 | 2020-06-30 | 广东小天才科技有限公司 | Optimized storage method and device for live webcast video |
CN112272327A (en) * | 2020-10-26 | 2021-01-26 | 腾讯科技(深圳)有限公司 | Data processing method, device, storage medium and equipment |
CN112367551A (en) * | 2020-10-30 | 2021-02-12 | 维沃移动通信有限公司 | Video editing method and device, electronic equipment and readable storage medium |
CN112616061A (en) * | 2020-12-04 | 2021-04-06 | Oppo广东移动通信有限公司 | Live broadcast interaction method and device, live broadcast server and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900715A (en) * | 2022-04-27 | 2022-08-12 | 深圳元象信息科技有限公司 | Video data processing method, terminal and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113786605B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110246512B (en) | Sound separation method, device and computer readable storage medium | |
KR101578279B1 (en) | Methods and systems for identifying content in a data stream | |
US10178365B1 (en) | System and method for combining audio tracks with video files | |
CN109871490B (en) | Media resource matching method and device, storage medium and computer equipment | |
CN109474850B (en) | Motion pixel video special effect adding method and device, terminal equipment and storage medium | |
KR20140114238A (en) | Method for generating and displaying image coupled audio | |
WO2023197979A1 (en) | Data processing method and apparatus, and computer device and storage medium | |
CN113177538B (en) | Video cycle identification method and device, computer equipment and storage medium | |
CN111586466B (en) | Video data processing method and device and storage medium | |
CN114286171B (en) | Video processing method, device, equipment and storage medium | |
CN109286848B (en) | Terminal video information interaction method and device and storage medium | |
CN113786605B (en) | Video processing method, apparatus and computer readable storage medium | |
CN111488813B (en) | Video emotion marking method and device, electronic equipment and storage medium | |
CN113515998A (en) | Video data processing method and device and readable storage medium | |
CN109376145A (en) | The method for building up of movie dialogue database establishes device and storage medium | |
CN111008287A (en) | Audio and video processing method and device, server and storage medium | |
CN112533009B (en) | User interaction method, system, storage medium and terminal equipment | |
CN108744498B (en) | Virtual game quick starting method based on double VR equipment | |
CN113268635B (en) | Video processing method, device, server and computer readable storage medium | |
CN114554297B (en) | Page screenshot method and device, electronic equipment and storage medium | |
EP2136314A1 (en) | Method and system for generating multimedia descriptors | |
CN116543796B (en) | Audio processing method and device, computer equipment and storage medium | |
CN110880326B (en) | Voice interaction system and method | |
US20240062546A1 (en) | Information processing device, information processing method, and recording medium | |
CN108875315B (en) | Method, system, and medium for transforming fingerprints to detect unauthorized media content items |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |