CN113786605A - Video processing method, apparatus and computer readable storage medium - Google Patents

Video processing method, apparatus and computer readable storage medium Download PDF

Info

Publication number
CN113786605A
CN113786605A CN202110971576.2A CN202110971576A CN113786605A CN 113786605 A CN113786605 A CN 113786605A CN 202110971576 A CN202110971576 A CN 202110971576A CN 113786605 A CN113786605 A CN 113786605A
Authority
CN
China
Prior art keywords
video
common elements
file
sound effect
displayed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110971576.2A
Other languages
Chinese (zh)
Other versions
CN113786605B (en
Inventor
李立锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110971576.2A priority Critical patent/CN113786605B/en
Publication of CN113786605A publication Critical patent/CN113786605A/en
Application granted granted Critical
Publication of CN113786605B publication Critical patent/CN113786605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/30Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers
    • A63F13/35Details of game servers
    • A63F13/355Performing operations on behalf of clients with restricted processing capabilities, e.g. servers transform changing game scene into an encoded video stream for transmitting to a mobile phone or a thin client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/231Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26208Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4665Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4781Games
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/50Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
    • A63F2300/53Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing
    • A63F2300/538Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing for performing operations on behalf of the game client, e.g. rendering

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a video processing method, a video processing device and a computer readable storage medium, wherein the video processing method comprises the following steps: acquiring common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.

Description

Video processing method, apparatus and computer readable storage medium
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video processing method and apparatus, and a computer-readable storage medium.
Background
Cloud games are very hot at present, and the cloud games mainly adopt the modes of image quality reduction and real-time coding streaming to realize video compression, but the former video compression mode can sacrifice certain image quality, and the latter video compression mode is not good experience for delay-sensitive games (such as fps games).
Disclosure of Invention
The embodiment of the application aims to solve the problems that the image quality is influenced and the starting time of cloud application is prolonged by adopting the conventional video compression mode.
In order to achieve the above object, an aspect of the present application provides a video processing method, where the video processing method is applied to a server, and the method includes:
acquiring common elements with the same sound effect in each video frame of a video file of a target application;
extracting common elements in the video frame;
generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
Optionally, the step of acquiring common elements with the same sound effect in each video frame of the video file of the target application includes:
and extracting n frames of images from the video frames of the video file with the same sound effect to perform image comparison to obtain the common elements.
Optionally, before the step of obtaining common elements with the same sound effect in each video frame of the video file of the target application, the method includes:
acquiring the calling times of each independent sound effect file in the video file of the target application;
and taking the video frame corresponding to the independent sound effect file with the calling times larger than the preset time threshold value as the video frame with the same sound effect in the video file.
Optionally, the step of obtaining the number of times of calling each independent sound effect file in the video file of the target application includes:
acquiring voice similarity of each independent sound effect file in the video file of the target application;
taking the independent sound effect file with the voice similarity larger than a preset value as the same target sound effect file;
and acquiring the calling times of the target sound effect files to obtain the calling times of the independent sound effect files.
In order to achieve the above object, an aspect of the present application provides a video processing method, where the video processing method is applied to a user side, and the method includes:
acquiring a video frame to be displayed;
calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
and adding the common elements to the video frames to be displayed, and displaying the video frames to be displayed after the common elements are added.
Optionally, before the step of calling the common element in the video frame to be displayed according to the associated information of the video frame to be displayed, the method includes:
judging whether the common elements are prestored locally according to the associated information of the video frames to be displayed;
when the common elements are not pre-stored locally, sending a downloading request of the common elements to a server;
and downloading the common elements in the video frames to be displayed from the server according to the downloading request.
In addition, to achieve the above object, another aspect of the present application further provides a video processing apparatus, which includes a first obtaining module, a compressing module, a processing module, and a sending module, wherein:
the first acquisition module is used for acquiring common elements with the same sound effect in each video frame of the video file of the target application;
the compression module is used for extracting common elements in the video frames;
the processing module is used for generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
the sending module is used for sending the common elements, the video files after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when the video files are displayed.
In addition, to achieve the above object, another aspect of the present application further provides a video processing apparatus, including a second obtaining module, a calling module, and a display module, where:
the second acquisition module is used for acquiring a video frame to be displayed;
the calling module is used for calling the common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
the display module is used for adding the common elements to the video frames to be displayed and displaying the video frames to be displayed after the common elements are added.
In addition, in order to achieve the above object, another aspect of the present application further provides a video processing apparatus, which includes a memory, a processor, and a video processing program stored in the memory and running on the processor, wherein the processor implements the steps of the video processing method as described above when executing the video processing program.
In addition, to achieve the above object, another aspect of the present application further provides a computer readable storage medium having a video processing program stored thereon, where the video processing program, when executed by a processor, implements the steps of the video processing method as described above.
The application provides a video processing method, which comprises the steps of obtaining common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a schematic flowchart illustrating a first embodiment of a video processing method according to the present application;
FIG. 3 is a schematic flow chart of the video processing method before the step of obtaining common elements with the same sound effect in each video frame of the video file of the target application;
fig. 4 is a schematic flowchart of a video processing method according to a third embodiment of the present application;
FIG. 5 is a schematic diagram of a first module of the video processing apparatus of the present application;
fig. 6 is a schematic diagram of a second module of the video processing apparatus according to the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The main solution of the embodiment of the application is as follows: acquiring common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
Because a large number of repeated elements exist in the cloud game video, the problem that the large number of repeated elements are repeatedly called for many times by adopting the conventional video compression mode cannot be solved, so that the starting time of the cloud game is long, and the user experience is influenced. The method comprises the steps of obtaining common elements with the same sound effect in each video frame of a video file of a target application; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a terminal device in a hardware operating environment according to an embodiment of the present application.
As shown in fig. 1, the terminal device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal device configuration shown in fig. 1 does not constitute a limitation of the terminal device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a video processing program may be included in a memory 1005, which is a kind of computer-readable storage medium.
In the terminal device shown in fig. 1, the network interface 1004 is mainly used for data communication with the background server; the user interface 1003 is mainly used for data communication with a client (user side); when the terminal is a server, the processor 1001 may be configured to call a video processing program in the memory 1005 and perform the following operations:
acquiring common elements with the same sound effect in each video frame of a video file of a target application;
extracting common elements in the video frame;
generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
When the terminal is a user terminal, the processor 1001 may be configured to call the video processing program in the memory 1005, and perform the following operations:
acquiring a video frame to be displayed;
calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
and adding the common elements to the video frames to be displayed, and displaying the video frames to be displayed after the common elements are added.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a video processing method according to a first embodiment of the present application.
It should be noted that although a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different from that shown or described herein.
The video processing method of the embodiment is applied to a server and comprises the following steps:
step S10, common elements with the same sound effect in each video frame of the video file of the target application are obtained;
it should be noted that the target application of the present application may be a cloud application or a common application, where the cloud application refers to an application in which a terminal and a service (cloud) end interact with each other, the terminal operates in a synchronous cloud, and occupies a local space and also reserves terminal data through cloud backup. The application explains the analysis of the video processing method by cloud game application.
In the game, the specific sound effect and the picture element are bound in a one-to-one correspondence, so that the same picture element can be found only by finding out the part with the same sound. In an embodiment, in the same game, videos with the same or different level progress are selected according to the game progress of a player, video clips (i.e., video frames) with the same sound effect are obtained based on the videos, n key frame numbers (n is 1, 2, 3.. n) are extracted from each video clip, and image differentiation comparison is performed to obtain differentiation elements in the video clips, for example, Structural Similarity Index (SSIM) is used, or image differentiation comparison and other methods are used to compare image frames with other image frames respectively, and then subtraction is performed to subtract the differentiation elements in the video clips to obtain common elements (i.e., repeat elements).
Further, the obtained common elements are input into the object recognition model for classification recognition training to recognize a continuous action, for example, the punching action of the game character is a continuous action, and the game character is punched out first and then retracted. All actions of the common elements in the sound effect generation stage can be recognized by adopting the object recognition model. In one embodiment, since the sound is continuous in a segment, the motion is continuous, different videos are aligned according to the sound, that is, the videos are aligned according to the sound, and the pictures are aligned due to the binding of the sound and the pictures. Then, the elements in the aligned frames are detected by using an edge detection method, for example, two different frames are detected first (two video frames with a larger difference can be found by using picture similarity comparison), and objects in the frames are cross-compared by using image comparison, so as to find objects with a higher similarity (a threshold value can be set, for example, 100%). And after the object with higher similarity is obtained, comparing the object with the element in the next picture, and repeating the steps until the number of the objects with higher similarity is not reduced after n continuous pictures are compared, wherein the object is a common element. Therefore, the change process of the common element in the sound effect generation time period can be defined in an object recognition mode.
Step S20, extracting common elements in the video frame;
after the common elements of the video frames are obtained, the common elements need to be extracted from the video frames, that is, the common elements need to be extracted from the video frames, so that the common elements do not need to be transmitted when the video streams are transmitted, and the volume of the transmitted video is saved. In order to increase the transmission efficiency of the video stream, the extracted common elements need to be compressed, wherein the common elements can be divided into different compression types. For example, since different levels are provided in a game, the elements of each level are mostly different, but some elements are across all levels, such as UI parts (scores, names, head portraits, etc.), so for common elements across all levels, the common elements can be classified as separately compressed during compression, so that the common elements are valid in all game levels. While for the generic elements specific to the customs, another type of compression may be employed for compression. In one embodiment, the compression type of the common element may be determined by obtaining the number of times the common element appears in the image frames, for example, determining whether the number of image frames corresponding to the extracted common element is greater than N, and if so, indicating that the common element appears in N video segments, and at this time, considering the common element as a common element across all the level, and compressing such common element according to a single compression type. After obtaining a plurality of separately compressed classifications, the separately compressed classifications need to be deduplicated to obtain non-duplicated portions. And further storing the common element compression package into a common element compression library, wherein common element compression libraries corresponding to different game types are established in the server, and when the user plays games, the corresponding common element compression package can be loaded from the common element compression libraries. When the cloud game video is coded, the object searching capability is used, the key frame is found to comprise the content of the common element compression library, the common element is set with a specific mark, the user is reminded to download the common element, and meanwhile, the server side records the downloaded common element.
Step S30, generating the association information of the common elements and the video frames of the video files according to the common elements in the video frames;
in this embodiment, when the common elements are extracted, the progress sequence of the game is obtained, the common elements are numbered according to the progress sequence of the game, and the corresponding game progress when the common elements appear for the first time is recorded, wherein the game progress when the common elements appear is recorded according to the sequence, so as to facilitate the calling of the common elements. When the common elements are numbered, the position information, such as coordinates, of the image frame where the common elements are located needs to be recorded, so that the user side can call the common elements in the video frame according to the numbers and the coordinates of the common elements, and the method is convenient and fast.
Step S40, the commonalities, the video files after the commonalities are extracted and the associated information are sent to a user side, so that the user side can call the commonalities according to the associated information when displaying the video files.
The server sends the common elements, the video file after extracting the common elements and the associated information to the user side in a wireless mode, so that the user side calls the common elements according to the associated information when displaying the video file, for example, the user side calls the common elements required by the video file according to the number + coordinates of the common elements.
In the embodiment, common elements with the same sound effect in each video frame of a video file of a target application are obtained; extracting common elements in the video frame; generating association information of the common elements and the video frames of the video file according to the common elements in the video frames; and sending the common elements, the video file after extracting the common elements and the associated information to the user side so that the user side can call the common elements according to the associated information when displaying the video file. Therefore, by extracting the common elements with the same sound effect in the video file, the common elements do not need to be repeatedly transmitted when the video stream is transmitted, so that the transmitted video volume is saved, and the starting time of the cloud application is shortened.
Further, referring to fig. 3, a second embodiment of the video processing method of the present application is proposed.
The video processing method is applied to a server, and the difference between the second embodiment of the video processing method and the first embodiment is that the step of acquiring common elements with the same sound effect in each video frame of a video file of a target application is preceded by the following steps:
step S11, obtaining the calling times of each independent sound effect file in the video file of the target application;
and step S12, taking the video frame corresponding to the independent sound effect file with the calling times larger than the preset time threshold value as the video frame with the same sound effect in the video file.
In this embodiment, deep learning is adopted to perform sound effect separation on the audio in the video file to obtain each independent sound effect file, and the separated independent sound effect files may have differences in restoration, wherein the most significant characteristic is that the audio fingerprints are inconsistent. At this time, the voice similarity score may be used, and then the independent sound effect files with high similarity are merged, for example, audio feature extraction (i.e., voiceprint extraction) is performed by using MFCC + GMM (voiceprint extraction and recognition), and then a classification model, for example, an SVM (Support Vector Machine) is used to classify or score the voiceprint, so as to obtain the independent sound effect files with similar scores, that is, the voice similarity. And the similarity of the feature vectors between the two sections of voice can be directly compared to obtain an independent sound effect file with high similarity. And further numbering all separated independent sound effect files, wherein the independent sound effect files with high similarity are merged to the same number. For example, if the similarity between the independent sound effect file a and the independent sound effect file B is greater than 95%, the independent sound effect file a and the independent sound effect file B are marked with the same number. In the game process, the separated independent sound effect files can be detected in real time by using audio fingerprints, if the independent sound effect files are detected to be called, the number +1 is carried out on the corresponding number to indicate that the independent sound effect files are called for 1 time, therefore, the calling times of the independent sound effect files corresponding to the number can be obtained according to the calling times of all the numbers, then the video frames corresponding to the independent sound effect files with the calling times larger than the preset times threshold value are obtained, the video frames are used as the video frames with the same sound effect in the video files, and the independent sound files with the calling times being few are filtered.
Because the sound and the picture elements in the game are in one-to-one correspondence, after a large amount of repeatedly called audio is found, a large amount of repeatedly compressible picture elements in the video can be found. The independent sound effect file with the small called times shows that the number of times of the picture element is small, and the compression value is not large. It will be appreciated that the more frequent the sounds occur, the greater the value of the compression.
The number of the independent sound files is used for counting the calling times of the independent sound files, and the video clips with the calling times larger than the preset times threshold are determined to be the video clips with the same sound effect in the game video, so that the video clips with the same sound effect in the game video can be quickly and accurately acquired.
Further, there may be various types of sounds in the game, such as: the sound system comprises action sound effect, special effect sound effect, event sound effect, map sound effect (including BGM), voice and the like, wherein the sounds are bound with partial elements (such as characters and animals) in the picture one by one, and when the sounds are repeated, the partial elements in the picture can also be repeated. Meanwhile, these sound effects may be triggered at the same time to output a mixed sound, and therefore, these mixed sound needs to be separated to obtain each independent sound effect file.
In one embodiment, the audio track separation may be performed using deep learning, wherein the parsing is performed by using a spelter (audio track separation software) as an example.
Training the model based on the Unet neural network:
a. during training, a group of voice data (such as mixed sound, map sound, action sound, special effect sound and event sound) is aligned on a time axis, the frequency spectrum of the voice data is extracted, and then a magnitude spectrum is calculated based on the frequency spectrum.
b. And respectively inputting the mixed sound amplitude spectrum into a map sound Unet, an action sound Unet, a special effect sound Unet and an event sound Unet to obtain predicted map sound Unet, action sound Unet, special effect sound Unet and event sound Unet, respectively calculating the distance between a prediction result and a standard result and averaging the distance to be used as a loss function, wherein internal parameters of the map sound Unet, the action sound Unet, the special effect sound Unet and the event sound Unet are continuously updated along with data input.
c. After the amplitude spectrums of the map sound, the action sound, the special effect sound and the event sound are predicted, the spleteter squares the map sound, the action sound, the special effect sound and the event sound respectively to obtain map sound energy m _ eng, action sound energy v _ eng, special effect sound energy s _ eng and event sound e _ eng respectively. Then, the occupation ratio of the map sound at each time on each frequency band of the mixed sound is calculated by using m _ mask/(m _ eng + v _ eng + s _ eng + e _ eng), and similarly, the occupation ratio of the action sound, the special effect sound and the event sound on each frequency band is calculated. And finally, multiplying the input mixed sound frequency spectrum by m _ mask, v _ mask, s _ mask and e _ mask respectively to obtain map sound, action sound, special effect sound and event sound frequency spectrums, and obtaining the voices of the map sound, the action sound, the special effect sound and the event sound by using inverse STFT (short time Fourier transform), namely each independent sound effect file.
In which the independent sound effect file obtained by the separation of the audio track is not 100% restored, there are some differences, and therefore, the sound effect can be normalized by the speech similarity algorithm. For example, when the sound in the cloud game video is played, the played sound effect and the extracted sound effect are matched by using a voice similarity calculation method, and when the matching degree is greater than n, the played sound effect and the extracted sound effect are the same sound, so that whether the extracted independent sound effect file is correct or not can be determined.
In one embodiment, since most games are developed by using a common game engine, and the sound effect and music files of the games packaged by the game engine are usually stored according to a conventional path, each independent sound effect file can be directly extracted from the game files. Wherein, under the game directory, the packaged audio file resources are usually encrypted, but these files can be unpacked by a specific tool, so as to obtain each independent sound effect file in the game. For example, audio files of music, sound effects and the like of the game are located in an installation game directory of the game application, and the conventional audio files can be acquired by means of extension searching. For part of games, the audio files are packaged into other formats by engine packaging and the like, so that the directories can be searched by the rules of folder naming. For example, the extraction method is described by an Android game:
a. the suffix name is modified to be zip or rar, and the folder is obtained by decompression.
b. In its directory, the lookup contains: the subdirectory of names of audio, music, sound, BGM, etc. looks at the header name in the file through the notebook, determines the file packaging format, for example, if the header name UnityFS indicates that it is packaged by unity, in addition to UnityFS, RIFF, BKHD, etc. are common packages.
c. Acquiring audio data: the audio file after being packaged can not be directly played by an audio player, and the audio data needs to be extracted. For example, for a Unity file, the extraction may be performed using a debugging tool of the Unity Studio class. If the file is of other types, the part before the FSB5 can be deleted in a 16-system editor (such as UltraEdit), then the file is saved as the FSB file, and then the FSB file is converted into the ogg audio format through the 3-party tool, so that each independent sound effect file in the game can be obtained.
According to the embodiment, the audio track in the video file is separated in a deep learning mode to obtain each independent sound effect file, so that the independent sound effect files in the video file can be accurately extracted.
Further, referring to fig. 4, a third embodiment of the video processing method of the present application is proposed.
The video processing method is applied to a user side, and comprises the following steps:
step S50, acquiring a video frame to be displayed;
step S60, calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
step S70, adding the generic element to the video frame to be displayed, and displaying the video frame to be displayed after the generic element is added.
In this embodiment, a current game type is acquired, and a video frame to be displayed is acquired according to the game type, wherein different game types and corresponding video frames to be displayed are provided with the same ID, and the ID is used for associating the game type with the video frame to be displayed. After a user enters a game, the current game progress is obtained (the game progress can be confirmed in image identification, comparison and other modes), then the common element compression package is preloaded according to the game progress, and the common element compression package is decoded at the local end to obtain the number + coordinate corresponding to the common element. At this time, the number + coordinate in the video frame to be displayed is acquired, the decoded common element is called according to the number + coordinate in the video frame to be displayed, for example, the number + coordinate in the video frame to be displayed is matched with the number + coordinate of the decoded common element, the successfully matched common element is added to the video frame to be displayed according to the coordinate of the successfully matched common element, and the video frame to be displayed after the common element is added is displayed. When the video file is coded, the coded frame image and the common elements are subjected to element comparison to find out the repeated part, then the repeated part is deducted, replaced by a specific mark (namely, a number), and the coordinates of the repeated part are coded together.
When the number + coordinate in the video frame to be displayed is acquired, whether the matched common elements are prestored is inquired from the local according to the number + coordinate, if the matched common elements are not prestored locally, a downloading request of the common elements is sent to the server, and the corresponding common elements are downloaded from the server according to the downloading request.
Wherein, for the non-cloud game, but the video related to the cloud game, the common elements of the server can be packaged and attached to the tail end of the video. When the video is played, data is decompressed to the memory and then called in the memory. The principle is consistent with the above principle, and the difference is that the former is based on the interaction between the server and the local, and the latter is based on the local processing completely.
In the embodiment, the common element is called through the associated information of the video frame to be displayed, then the common element is added into the video frame to be displayed, and the video frame to be displayed after the common element is added is displayed, so that the problem that the cloud application is long in starting time due to the fact that the common element is repeatedly transmitted when the video stream is transmitted is avoided.
In addition, the present application also provides a video processing apparatus, which includes a memory, a processor, and a video processing program stored in the memory and running on the processor, wherein the processor implements the steps of the video processing method when executing the video processing program.
Referring to fig. 5, the video processing apparatus 100 includes a first obtaining module 10, a compressing module 20, a processing module 30, and a transmitting module 40, wherein:
the first obtaining module 10 is configured to obtain common elements with the same sound effect in each video frame of a video file of a target application;
the compression module 20 is configured to extract common elements in the video frames;
the processing module 30 is configured to generate association information between the common element and a video frame of the video file according to the common element in the video frame;
the sending module 40 is configured to send the common elements, the video file after the common elements are extracted, and the associated information to a user side, so that the user side calls the common elements according to the associated information when displaying the video file;
further, the first obtaining module 10 includes a first obtaining unit;
the first acquisition unit is used for extracting n frames of images from video frames of the video file with the same sound effect to perform image comparison to obtain the common elements.
Further, the first obtaining unit is further configured to obtain the number of calls of each independent sound effect file in the video file of the target application;
the first obtaining unit is further configured to use the video frame corresponding to the independent sound effect file with the calling frequency greater than a preset frequency threshold value as the video frame with the same sound effect in the video file.
Further, the first obtaining unit comprises a first obtaining subunit;
the first obtaining subunit is configured to obtain a voice similarity of each independent sound effect file in the video file of the target application;
the first obtaining subunit is further configured to use the independent sound effect file with the voice similarity greater than a preset value as the same target sound effect file;
the first obtaining subunit is further configured to obtain the calling times of each target sound effect file to obtain the calling times of each independent sound effect file.
Referring to fig. 6, the video processing apparatus 100 includes a second obtaining module 50, a calling module 60, and a display module 70, wherein:
the second obtaining module 50 is configured to obtain a video frame to be displayed;
the calling module 60 is configured to call a common element in the video frame to be displayed according to the associated information of the video frame to be displayed;
the display module 70 is configured to add the common element to the video frame to be displayed, and display the video frame to be displayed after the common element is added.
Further, the calling module 60 includes a judging unit, a sending unit and a downloading unit;
the judging unit is used for judging whether the common elements are prestored locally according to the associated information of the video frames to be displayed;
the sending unit is configured to send a download request of the common element to a server when the common element is not pre-stored locally;
and the downloading unit is used for downloading the common elements in the video frames to be displayed from the server according to the downloading request.
The implementation of the functions of each module of the video processing apparatus is similar to the process in the above method embodiment, and is not repeated here.
Furthermore, the present application also provides a computer-readable storage medium having stored thereon a video processing method program, which when executed by a processor implements the steps of the above video processing method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While alternative embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including alternative embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A video processing method applied to a server, the method comprising:
acquiring common elements with the same sound effect in each video frame of a video file of a target application;
extracting common elements in the video frame;
generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
and sending the common elements, the video file after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when displaying the video file.
2. The video processing method according to claim 1, wherein the step of obtaining common elements having the same sound effect in each video frame of the video file of the target application comprises:
and extracting n frames of images from the video frames of the video file with the same sound effect to perform image comparison to obtain the common elements.
3. The video processing method according to claim 1, wherein the step of obtaining common elements having the same sound effect in each video frame of the video file of the target application is preceded by:
acquiring the calling times of each independent sound effect file in the video file of the target application;
and taking the video frame corresponding to the independent sound effect file with the calling times larger than the preset time threshold value as the video frame with the same sound effect in the video file.
4. The video processing method according to claim 3, wherein the step of obtaining the number of calls for each independent sound effect file in the video file of the target application comprises:
acquiring voice similarity of each independent sound effect file in the video file of the target application;
taking the independent sound effect file with the voice similarity larger than a preset value as the same target sound effect file;
and acquiring the calling times of the target sound effect files to obtain the calling times of the independent sound effect files.
5. A video processing method is applied to a user side, and the method comprises the following steps:
acquiring a video frame to be displayed;
calling common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
and adding the common elements to the video frames to be displayed, and displaying the video frames to be displayed after the common elements are added.
6. The video processing method according to claim 5, wherein the step of calling the common element in the video frame to be displayed according to the associated information of the video frame to be displayed is preceded by:
judging whether the common elements are prestored locally according to the associated information of the video frames to be displayed;
when the common elements are not pre-stored locally, sending a downloading request of the common elements to a server;
and downloading the common elements in the video frames to be displayed from the server according to the downloading request.
7. A video processing apparatus, comprising a first obtaining module, a compressing module, a processing module, and a sending module, wherein:
the first acquisition module is used for acquiring common elements with the same sound effect in each video frame of the video file of the target application;
the compression module is used for extracting common elements in the video frames;
the processing module is used for generating the association information of the common elements and the video frames of the video file according to the common elements in the video frames;
the sending module is used for sending the common elements, the video files after the common elements are extracted and the associated information to a user side so that the user side can call the common elements according to the associated information when the video files are displayed.
8. A video processing apparatus, comprising a second obtaining module, a calling module, and a display module, wherein:
the second acquisition module is used for acquiring a video frame to be displayed;
the calling module is used for calling the common elements in the video frame to be displayed according to the associated information of the video frame to be displayed;
the display module is used for adding the common elements to the video frames to be displayed and displaying the video frames to be displayed after the common elements are added.
9. A video processing apparatus comprising a memory, a processor and a video processing program stored on the memory and running on the processor, the processor when executing the video processing program implementing the steps of the method according to any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a video processing program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202110971576.2A 2021-08-23 2021-08-23 Video processing method, apparatus and computer readable storage medium Active CN113786605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110971576.2A CN113786605B (en) 2021-08-23 2021-08-23 Video processing method, apparatus and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110971576.2A CN113786605B (en) 2021-08-23 2021-08-23 Video processing method, apparatus and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113786605A true CN113786605A (en) 2021-12-14
CN113786605B CN113786605B (en) 2024-03-22

Family

ID=78876328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110971576.2A Active CN113786605B (en) 2021-08-23 2021-08-23 Video processing method, apparatus and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113786605B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900715A (en) * 2022-04-27 2022-08-12 深圳元象信息科技有限公司 Video data processing method, terminal and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6184182A (en) * 1984-10-02 1986-04-28 Victor Co Of Japan Ltd Generating device for repetitive sound signal synchronized with video signal system
CN104113768A (en) * 2014-06-26 2014-10-22 小米科技有限责任公司 Associated information generation method and device
KR20180028588A (en) * 2016-09-08 2018-03-19 주식회사 이타기술 Method and apparatus for adaptive frame synchronizaion
CN110113677A (en) * 2018-02-01 2019-08-09 阿里巴巴集团控股有限公司 The generation method and device of video subject
CN110213610A (en) * 2019-06-13 2019-09-06 北京奇艺世纪科技有限公司 A kind of live scene recognition methods and device
CN110677716A (en) * 2019-08-20 2020-01-10 咪咕音乐有限公司 Audio processing method, electronic device, and storage medium
CN110740390A (en) * 2019-11-12 2020-01-31 成都索贝数码科技股份有限公司 video and audio credible playing method for generating associated abstract based on interframe extraction
CN111355977A (en) * 2020-04-16 2020-06-30 广东小天才科技有限公司 Optimized storage method and device for live webcast video
CN111669612A (en) * 2019-03-08 2020-09-15 腾讯科技(深圳)有限公司 Live broadcast-based information delivery method and device and computer-readable storage medium
CN112272327A (en) * 2020-10-26 2021-01-26 腾讯科技(深圳)有限公司 Data processing method, device, storage medium and equipment
CN112367551A (en) * 2020-10-30 2021-02-12 维沃移动通信有限公司 Video editing method and device, electronic equipment and readable storage medium
CN112616061A (en) * 2020-12-04 2021-04-06 Oppo广东移动通信有限公司 Live broadcast interaction method and device, live broadcast server and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6184182A (en) * 1984-10-02 1986-04-28 Victor Co Of Japan Ltd Generating device for repetitive sound signal synchronized with video signal system
CN104113768A (en) * 2014-06-26 2014-10-22 小米科技有限责任公司 Associated information generation method and device
KR20180028588A (en) * 2016-09-08 2018-03-19 주식회사 이타기술 Method and apparatus for adaptive frame synchronizaion
CN110113677A (en) * 2018-02-01 2019-08-09 阿里巴巴集团控股有限公司 The generation method and device of video subject
CN111669612A (en) * 2019-03-08 2020-09-15 腾讯科技(深圳)有限公司 Live broadcast-based information delivery method and device and computer-readable storage medium
CN110213610A (en) * 2019-06-13 2019-09-06 北京奇艺世纪科技有限公司 A kind of live scene recognition methods and device
CN110677716A (en) * 2019-08-20 2020-01-10 咪咕音乐有限公司 Audio processing method, electronic device, and storage medium
CN110740390A (en) * 2019-11-12 2020-01-31 成都索贝数码科技股份有限公司 video and audio credible playing method for generating associated abstract based on interframe extraction
CN111355977A (en) * 2020-04-16 2020-06-30 广东小天才科技有限公司 Optimized storage method and device for live webcast video
CN112272327A (en) * 2020-10-26 2021-01-26 腾讯科技(深圳)有限公司 Data processing method, device, storage medium and equipment
CN112367551A (en) * 2020-10-30 2021-02-12 维沃移动通信有限公司 Video editing method and device, electronic equipment and readable storage medium
CN112616061A (en) * 2020-12-04 2021-04-06 Oppo广东移动通信有限公司 Live broadcast interaction method and device, live broadcast server and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900715A (en) * 2022-04-27 2022-08-12 深圳元象信息科技有限公司 Video data processing method, terminal and storage medium

Also Published As

Publication number Publication date
CN113786605B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN110246512B (en) Sound separation method, device and computer readable storage medium
KR101578279B1 (en) Methods and systems for identifying content in a data stream
US10178365B1 (en) System and method for combining audio tracks with video files
CN109871490B (en) Media resource matching method and device, storage medium and computer equipment
CN109474850B (en) Motion pixel video special effect adding method and device, terminal equipment and storage medium
KR20140114238A (en) Method for generating and displaying image coupled audio
WO2023197979A1 (en) Data processing method and apparatus, and computer device and storage medium
CN113177538B (en) Video cycle identification method and device, computer equipment and storage medium
CN111586466B (en) Video data processing method and device and storage medium
CN114286171B (en) Video processing method, device, equipment and storage medium
CN109286848B (en) Terminal video information interaction method and device and storage medium
CN113786605B (en) Video processing method, apparatus and computer readable storage medium
CN111488813B (en) Video emotion marking method and device, electronic equipment and storage medium
CN113515998A (en) Video data processing method and device and readable storage medium
CN109376145A (en) The method for building up of movie dialogue database establishes device and storage medium
CN111008287A (en) Audio and video processing method and device, server and storage medium
CN112533009B (en) User interaction method, system, storage medium and terminal equipment
CN108744498B (en) Virtual game quick starting method based on double VR equipment
CN113268635B (en) Video processing method, device, server and computer readable storage medium
CN114554297B (en) Page screenshot method and device, electronic equipment and storage medium
EP2136314A1 (en) Method and system for generating multimedia descriptors
CN116543796B (en) Audio processing method and device, computer equipment and storage medium
CN110880326B (en) Voice interaction system and method
US20240062546A1 (en) Information processing device, information processing method, and recording medium
CN108875315B (en) Method, system, and medium for transforming fingerprints to detect unauthorized media content items

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant