CN109640112B

CN109640112B - Video processing method, device, equipment and storage medium

Info

Publication number: CN109640112B
Application number: CN201910037302.9A
Authority: CN
Inventors: 乔文彤
Original assignee: Guangzhou Huya Information Technology Co Ltd
Current assignee: Guangzhou Huya Information Technology Co Ltd
Priority date: 2019-01-15
Filing date: 2019-01-15
Publication date: 2021-11-23
Anticipated expiration: 2039-01-15
Also published as: CN109640112A

Abstract

The embodiment of the invention discloses a video processing method, a video processing device, video processing equipment and a storage medium. The method comprises the following steps: acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information; determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information; and adding the video label corresponding to the video characteristic parameter to the video to be processed. By the technical scheme, the richness of the video tags can be improved, and the quality of understanding of video contents by a viewer is improved.

Description

Video processing method, device, equipment and storage medium

Technical Field

Embodiments of the present invention relate to video processing technologies, and in particular, to a video processing method, an apparatus, a device, and a storage medium.

Background

With the gradual development of network videos and the gradual enrichment of video contents, the requirements of users on video watching experience are higher and higher.

In the prior art, the tag extraction processing method for the game video mainly provides video tag content by identifying a video picture, so that the provided video tag is limited to the content which can be displayed on the picture, the content of the video tag is too single, and the quality of understanding the game video by a viewer when watching the game video is reduced.

Disclosure of Invention

Embodiments of the present invention provide a video processing method, apparatus, device, and storage medium, so as to improve the richness of video tags and improve the quality of understanding of video content by viewers.

In a first aspect, an embodiment of the present invention provides a video processing method, including:

acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information;

determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information;

and adding the video label corresponding to the video characteristic parameter to the video to be processed.

In a second aspect, an embodiment of the present invention further provides a video processing apparatus, where the apparatus includes:

the information acquisition module is used for acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information;

the parameter determining module is used for determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information;

and the label adding module is used for adding the video label corresponding to the video characteristic parameter to the video to be processed.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a video processing method as in any of the embodiments of the present invention.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the video processing method according to any one of the embodiments of the present invention.

According to the embodiment of the invention, the audio characteristic information in the video to be processed is obtained, the audio characteristic information comprises at least one item of sound track information, voiceprint information and system voice prompt information, the video characteristic parameter corresponding to the video to be processed is determined according to the audio characteristic information, the video tag corresponding to the video characteristic parameter is added to the video to be processed, and the richer video tag is obtained by utilizing the audio characteristic information in the video, so that the problems that the content of the video tag is too single and the video understanding quality is reduced due to the fact that the content of the video tag is only provided through a video picture in the prior art are solved, the richness of the video tag is improved, and the quality of the video content understanding of a viewer is improved.

Drawings

Fig. 1a is a schematic flowchart of a video processing method according to an embodiment of the present invention;

FIG. 1b is a schematic diagram of a video tag display mode according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a video processing method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a video processing apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a schematic flowchart of a video processing method according to an embodiment of the present invention. The method is applicable to the case of tagging video content, and can be executed by a video processing apparatus, which may be composed of hardware and/or software, and may be generally integrated in a server and all computer devices including video processing functions. The method specifically comprises the following steps:

s110, acquiring audio characteristic information in the video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information, and system voice prompt information.

This embodiment mainly relies on the label that the image can't be discerned to present simple, utilizes the audio frequency characteristic information of video to carry out accurate discernment to in video processing and live broadcast process, increase the abundant degree of label from a plurality of dimensions. The video tag may be keyword information for labeling the video highlight content.

In this embodiment, the video to be processed may be, for example, a game-like video, which may be a recorded video segment, or a live video stream, which is not limited herein. The audio feature information in the video to be processed may be sound data of the video, such as video sound effects of the game video, which includes at least one of sound track information, voiceprint information and system voice prompt information. The vocal tract information may be multi-dimensional sound information having a stereo vocal tract, the voiceprint information may be information such as a volume of sound and a feature of sound wave, and the system voice prompt information may be, for example, a key event prompt sound generated when a key event of the system is triggered.

And S120, determining a video characteristic parameter corresponding to the video to be processed according to the audio characteristic information.

In this embodiment, the video feature parameter may be a feature parameter for characterizing the content of a key event in the video, for example, data identified by the highlight operation time of a player in the game video. Because some operation data can not be directly displayed on the game video picture, the video characteristic parameters can be obtained by analyzing the audio characteristic information of the video. For example, when an enemy player in a game video starts shooting, the enemy shooting distance value is not displayed on the video screen, and therefore the distance value cannot be determined directly on the video screen, but can be determined by the size of the gunshot.

For example, a preset algorithm may be adopted to identify audio feature information extracted from a video to be processed, and a video feature parameter corresponding to the video to be processed is obtained according to an identification result. For example, the audio characteristic information extracted from the video to be processed may be subjected to text recognition, and text or data information matched with a preset keyword is screened out from the text recognition result as the video characteristic parameter of the video to be processed.

The method for determining the video characteristic parameters corresponding to the video to be processed by utilizing the audio characteristic information has the advantages that more helpful labels can be abstracted more comprehensively in multiple dimensions, and label dimensions which cannot be touched by image recognition are created, so that the quality of video content understanding of video viewers is greatly improved.

And S130, adding the video label corresponding to the video characteristic parameter to the video to be processed.

In this embodiment, different video feature parameters may correspond to different video tags, so as to perform tagging processing on the video feature parameters, for example, if the identified video feature parameter corresponding to the video to be processed is 100 meters of enemy shooting distance value, the corresponding video tag may be "100". One video feature parameter may correspond to one video tag, or a plurality of video feature parameters may correspond to one video tag in a comprehensive manner, which is not limited herein.

In an optional implementation manner, the video tag can be used as a keyword of the video to be processed, and is displayed below a display interface corresponding to the video to be processed, so that a viewer can select a video in which the viewer is interested according to the video tag to watch the video.

In another optional implementation, adding a video tag corresponding to a video feature parameter to a video to be processed includes: acquiring a video time period corresponding to the video characteristic parameters in a video to be processed; and displaying the video label corresponding to the video characteristic parameter in the video display picture corresponding to the video time period.

For example, a preset playing time period after the video playing time can be obtained by recording the video playing time corresponding to the video characteristic parameter obtaining time, and the time period is taken as a video time period corresponding to the video characteristic parameter in the video to be processed. And adding and displaying a corresponding video label in a video display picture corresponding to the video time period to help a viewer to better understand the video content.

As a practical example, for example, in fig. 1b, when the video playing time is 3 minutes 05 seconds, and the video characteristic parameter is recognized as the enemy shooting distance value 100 meters, the video tag 11 is displayed on the corresponding video display screen 1 within 3 minutes 05 seconds to 3 minutes 35 seconds of the video playing.

On the basis of the foregoing embodiment, optionally, after adding the video tag corresponding to the video feature parameter to the video to be processed, the method further includes: grading the video to be processed according to the video label; and performing recommended display on the video to be processed according to the grade.

The specific scoring mode may be that each video tag corresponds to a corresponding score value, the score values are accumulated and calculated according to the video tags added to the video to be processed, the calculation result is used as the score of the video to be processed, and the video with high score is preferentially recommended and displayed. Certainly, the video tags can be divided into different types, the tags of different types have different weights, when the score is calculated, the weight value corresponding to the type to which the video tag belongs is multiplied by the score value corresponding to the video tag, and then all the video tags added to the video to be processed are accumulated and calculated. Both of the above two methods are applicable to the present embodiment, and are not limited herein.

According to the technical scheme, the audio characteristic information in the video to be processed is obtained, the audio characteristic information comprises at least one item of sound track information, voiceprint information and system voice prompt information, the video characteristic parameter corresponding to the video to be processed is determined according to the audio characteristic information, the video label corresponding to the video characteristic parameter is added to the video to be processed, the audio characteristic information in the video is used for obtaining richer video labels, the problems that in the prior art, the content of the video labels is only provided through video pictures, the content of the video labels is too single, the video understanding quality is reduced are solved, the richness of the video labels is improved, and the quality of viewers understanding the video content is improved.

Example two

Fig. 2 is a flowchart illustrating a video processing method according to a second embodiment of the present invention. The present embodiment is optimized based on the above embodiments, and provides a preferred video processing method, specifically, further optimizing the video feature parameters corresponding to the video to be processed, which are determined according to the audio feature information, includes: and inputting the audio characteristic information into a pre-trained voice recognition model to obtain video characteristic parameters corresponding to the video to be processed.

The video processing method provided by the embodiment specifically comprises the following steps:

s210, acquiring audio characteristic information in the video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information, and system voice prompt information.

S220, inputting the audio characteristic information into a pre-trained voice recognition model to obtain video characteristic parameters corresponding to the video to be processed.

In this embodiment, the audio feature information may be vectorized first, and then the feature vector obtained after the vectorization is input to the pre-trained voice recognition model. The voice recognition model can be used for recognizing the input audio characteristic information so as to output corresponding video characteristic parameters. Specifically, the voice recognition model may be a model trained according to a preset machine learning algorithm.

The operation principle of the voice recognition model may be that when the audio characteristic information is input, the voice recognition model performs voice recognition on the input audio characteristic information, analyzes the recognized characteristic information, and determines whether the input audio characteristic information includes a corresponding characteristic parameter, if so, the characteristic parameter is output as the video characteristic parameter of the video to be processed, and if not, no output is performed. For example, a game video containing enemy gunshot is input into a voice recognition model, and the voice recognition model performs voice recognition and feature analysis on audio feature information of the video and then can output a corresponding enemy shooting distance value.

The beneficial effect of using the voice recognition model to perform voice recognition in the embodiment is that the accuracy and the real-time performance of voice recognition can be improved, and then the accuracy of adding the video tag can be improved in the process of adding the video tag.

Optionally, before the audio feature information is input into a pre-trained voice recognition model to obtain a video feature parameter corresponding to a video to be processed, the method further includes: acquiring an audio characteristic information sample with a target video characteristic parameter label; and training the set artificial intelligence model by using the audio characteristic information sample to obtain the voice recognition model.

The audio characteristic information sample may be extracted from each live video in the network live platform, or may be downloaded from the internet through a specific search engine, which is not limited herein. Taking the example of extracting audio characteristic information samples from each live video in a network live platform, searching a plurality of live rooms of games from a target network live platform, then respectively extracting a plurality of sections of audio signals with typical sound characteristics from the plurality of live rooms, and labeling corresponding video characteristic parameter labels on the extracted plurality of sections of audio signals, thereby obtaining the audio characteristic information samples. Specifically, the manner of labeling the obtained audio feature information samples may be a manual evaluation labeling manner, that is, the audio signals with typical sound features obtained from each live broadcast room are labeled with corresponding video feature parameter labels in a manual manner to serve as the audio feature information samples under different video feature parameters.

In this embodiment, the artificial intelligence model may be a training model established based on a machine learning algorithm, for example, a Recurrent Neural Network (RNN), where the RNN is an artificial neural Network in which nodes are directionally connected to form a ring, and an internal state of the Network may show a dynamic timing behavior. Unlike feed-forward neural networks, the RNN can use its internal memory to process input sequences of arbitrary timing, which makes it easier to handle e.g. non-segmented handwriting recognition, speech recognition, etc. Specifically, the process of training the artificial intelligence model may be a process of adjusting parameters of each neural network, and through continuous training, optimal neural network parameters are obtained, and the set artificial intelligence model having the optimal neural network parameters is the model to be finally obtained. Illustratively, after obtaining a plurality of audio characteristic information samples with target video characteristic parameter labels, training a set artificial intelligence model by using the plurality of audio characteristic information samples, and continuously adjusting neural network parameters in the set artificial intelligence model, so that the set artificial intelligence model has the capability of identifying the target video characteristic parameters from the input audio characteristic information, thereby obtaining the sound identification model.

Optionally, the video to be processed includes shooting game video; correspondingly, the audio characteristic information is input into a pre-trained voice recognition model, and video characteristic parameters corresponding to the video to be processed are obtained, and the method comprises the following steps: inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video; or inputting the sound track information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video and the shooting distance of the enemy; or inputting the sound track information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video, the shooting distance of the enemy and the type of a gun.

For example, in a shooting game type video, since the sound effect sound generated by shooting a firearm can be recognized through a sound channel and a voiceprint, specific shooting data information can be acquired by inputting sound channel information and/or voiceprint information into a pre-trained sound recognition model. Specifically, the sound channel information with the enemy shooting sound effect is input into the sound recognition model, and the direction of the enemy can be obtained through the output of the sound recognition model; inputting the voiceprint information with the enemy shooting sound effect into the sound recognition model, and outputting to obtain the enemy shooting distance and/or the type of firearms used by the enemy; and (4) inputting the voiceprint information with the shooting sound effect of the party into the sound recognition model, and outputting to obtain the type of the firearm used by the party.

Optionally, the video to be processed includes a multiplayer online tactical sports game video; correspondingly, the audio characteristic information is input into a pre-trained voice recognition model, and video characteristic parameters corresponding to the video to be processed are obtained, and the method comprises the following steps: and inputting system voice prompt information in the audio characteristic information into a pre-trained voice recognition model to obtain game event keywords corresponding to the multi-player online tactical competitive game video.

For example, in a MOBA (Multiplayer Online Battle Arena) game video, since a character used by a player may make a specific sound or when the player triggers a specific game event, the system may make a voice prompt, and therefore, specific player operation data may be obtained by inputting system voice prompt information into a pre-trained voice recognition model. Specifically, the system voice prompt information with continuous killing voice prompts is input into the pre-trained voice recognition model, and keywords of the continuous killing events of the game, such as the continuous killing number of the players, can be output from the voice recognition model.

And S230, adding the video label corresponding to the video characteristic parameter to the video to be processed.

According to the technical scheme, after the audio characteristic information in the video to be processed is obtained, the audio characteristic information is input into the pre-trained sound recognition model, the video characteristic parameter corresponding to the video to be processed is obtained, the video label corresponding to the video characteristic parameter is added to the video to be processed, the video sound effect is recognized through the sound recognition model, rich video labels are obtained from more dimensions, the richness degree of the video labels is improved, and the understanding quality of video contents of viewers is improved, the accuracy and the real-time performance of sound recognition are improved, and the accuracy of video label addition is improved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a video processing apparatus according to a third embodiment of the present invention. Referring to fig. 3, the video processing apparatus includes: the information obtaining module 310, the parameter determining module 320, and the tag adding module 330 are described in detail below.

An information obtaining module 310, configured to obtain audio feature information in a video to be processed, where the audio feature information includes: at least one of vocal tract information, voiceprint information and system voice prompt information;

a parameter determining module 320, configured to determine, according to the audio feature information, a video feature parameter corresponding to the video to be processed;

a tag adding module 330, configured to add a video tag corresponding to the video feature parameter to the video to be processed.

The video processing apparatus provided by this embodiment, by obtaining the audio characteristic information in the video to be processed, where the audio characteristic information includes at least one of channel information, voiceprint information, and system voice prompt information, and determining the video characteristic parameter corresponding to the video to be processed according to the audio characteristic information, and adding the video tag corresponding to the video characteristic parameter to the video to be processed, obtains a richer video tag by using the audio characteristic information in the video, thereby solving the problems that in the prior art, the content of the video tag is only provided by a video frame, which results in that the content of the video tag is too single, and the video understanding quality is reduced, improving the richness of the video tag, and improving the quality of viewers understanding the video content.

Optionally, the parameter determining module 320 may include:

and the information input submodule is used for inputting the audio characteristic information into a pre-trained voice recognition model to obtain a video characteristic parameter corresponding to the video to be processed.

Optionally, the parameter determining module 320 may further include:

the sample acquisition submodule is used for acquiring an audio characteristic information sample with a target video characteristic parameter label before the audio characteristic information is input into a pre-trained voice recognition model and the video characteristic parameter corresponding to the video to be processed is obtained;

and the model training submodule is used for training a set artificial intelligence model by using the audio characteristic information sample to obtain the voice recognition model.

Optionally, the video to be processed includes a shooting game video;

correspondingly, the information input submodule may be specifically configured to:

inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video; or the like, or, alternatively,

inputting the sound channel information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video and the shooting distance of the enemy; or inputting the sound track information and the sound pattern information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of the enemy corresponding to the shooting game video, the shooting distance of the enemy and the type of the firearms.

Optionally, the video to be processed includes a multiplayer online tactical sports game video;

and inputting system voice prompt information in the audio characteristic information into a pre-trained voice recognition model to obtain game event keywords corresponding to the multi-player online tactical competitive game video.

Optionally, the tag adding module 330 may be specifically configured to:

acquiring a video time period corresponding to the video characteristic parameter in the video to be processed;

and displaying the video label corresponding to the video characteristic parameter in the video display picture corresponding to the video time period.

Optionally, the video processing apparatus may further include:

the video scoring module is used for scoring the video to be processed according to the video tags after the video tags corresponding to the video characteristic parameters are added to the video to be processed;

and the video recommending module is used for recommending and displaying the video to be processed according to the grade.

The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention, and as shown in fig. 4, the computer device according to the fourth embodiment of the present invention includes: a processor 41 and a memory 42. The number of the processors in the computer device may be one or more, fig. 4 illustrates one processor 41, the processor 41 and the memory 42 in the computer device may be connected by a bus or in other ways, and fig. 4 illustrates the connection by a bus.

The video processing apparatus provided in the above-described embodiment is integrated into the processor 41 of the computer device in this embodiment. Further, the memory 42 in the computer device is used as a computer readable storage medium for storing one or more programs, which may be software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the video processing method in the embodiment of the present invention (for example, the modules in the video processing apparatus shown in fig. 3 include the information obtaining module 310, the parameter determining module 320, and the tag adding module 330). The processor 41 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory 42, that is, implements the video processing method in the above-described method embodiment.

The memory 42 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 42 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 42 may further include memory located remotely from processor 41, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And, when one or more programs included in the above-described computer apparatus are executed by the one or more processors 41, the programs perform the following operations:

acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information; determining a video characteristic parameter corresponding to the video to be processed according to the audio characteristic information; and adding the video label corresponding to the video characteristic parameter to the video to be processed.

EXAMPLE five

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a video processing apparatus, implements a video processing method according to an embodiment of the present invention, where the method includes: acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information; determining a video characteristic parameter corresponding to the video to be processed according to the audio characteristic information; and adding the video label corresponding to the video characteristic parameter to the video to be processed.

Of course, the computer-readable storage medium provided in the embodiments of the present invention, when being executed, is not limited to implement the method operations described above, and may also implement the relevant operations in the video processing method provided in any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the video processing apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A video processing method, comprising:

acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information and voiceprint information;

determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information, wherein the video characteristic parameters are used for representing key event content in the video;

the determining the video characteristic parameter corresponding to the video to be processed according to the audio characteristic information includes: identifying audio characteristic information extracted from a video to be processed by adopting a preset algorithm, and acquiring video characteristic parameters corresponding to the video to be processed according to an identification result;

adding a video label corresponding to the video characteristic parameter to the video to be processed;

the video to be processed comprises shooting game video;

correspondingly, inputting the audio characteristic information into a pre-trained voice recognition model to obtain video characteristic parameters corresponding to the video to be processed, including:

and inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video.

2. The method according to claim 1, before inputting the audio feature information into a pre-trained voice recognition model to obtain video feature parameters corresponding to the video to be processed, further comprising:

acquiring an audio characteristic information sample with a target video characteristic parameter label;

and training a set artificial intelligence model by using the audio characteristic information sample to obtain the voice recognition model.

3. The method of claim 1, wherein the pending video comprises a shooting game-like video;

correspondingly, inputting the audio characteristic information into a pre-trained voice recognition model to obtain a video characteristic parameter corresponding to the video to be processed, and further comprising:

inputting the sound channel information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video and the shooting distance of the enemy; or the like, or, alternatively,

and inputting the sound channel information and the sound pattern information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of the enemy corresponding to the shooting game video, the shooting distance of the enemy and the type of the firearms.

4. The method according to claim 1, wherein adding the video tag corresponding to the video feature parameter to the video to be processed comprises:

5. A video processing apparatus, comprising:

the information acquisition module is used for acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information and voiceprint information;

the parameter determining module is used for determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information, wherein the video characteristic parameters are used for representing key event contents in the video;

the label adding module is used for adding the video label corresponding to the video characteristic parameter to the video to be processed;

the video processing device also comprises an information input submodule;

the video to be processed comprises shooting game video; correspondingly, the information input submodule is specifically configured to:

6. A computer device, the device comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the video processing method of any of claims 1-4.

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the video processing method according to any one of claims 1 to 4.