CN109640112B - Video processing method, device, equipment and storage medium - Google Patents

Video processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN109640112B
CN109640112B CN201910037302.9A CN201910037302A CN109640112B CN 109640112 B CN109640112 B CN 109640112B CN 201910037302 A CN201910037302 A CN 201910037302A CN 109640112 B CN109640112 B CN 109640112B
Authority
CN
China
Prior art keywords
video
information
processed
characteristic information
audio characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910037302.9A
Other languages
Chinese (zh)
Other versions
CN109640112A (en
Inventor
乔文彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Information Technology Co Ltd
Original Assignee
Guangzhou Huya Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Information Technology Co Ltd filed Critical Guangzhou Huya Information Technology Co Ltd
Priority to CN201910037302.9A priority Critical patent/CN109640112B/en
Publication of CN109640112A publication Critical patent/CN109640112A/en
Application granted granted Critical
Publication of CN109640112B publication Critical patent/CN109640112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a video processing method, a video processing device, video processing equipment and a storage medium. The method comprises the following steps: acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information; determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information; and adding the video label corresponding to the video characteristic parameter to the video to be processed. By the technical scheme, the richness of the video tags can be improved, and the quality of understanding of video contents by a viewer is improved.

Description

Video processing method, device, equipment and storage medium
Technical Field
Embodiments of the present invention relate to video processing technologies, and in particular, to a video processing method, an apparatus, a device, and a storage medium.
Background
With the gradual development of network videos and the gradual enrichment of video contents, the requirements of users on video watching experience are higher and higher.
In the prior art, the tag extraction processing method for the game video mainly provides video tag content by identifying a video picture, so that the provided video tag is limited to the content which can be displayed on the picture, the content of the video tag is too single, and the quality of understanding the game video by a viewer when watching the game video is reduced.
Disclosure of Invention
Embodiments of the present invention provide a video processing method, apparatus, device, and storage medium, so as to improve the richness of video tags and improve the quality of understanding of video content by viewers.
In a first aspect, an embodiment of the present invention provides a video processing method, including:
acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information;
determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information;
and adding the video label corresponding to the video characteristic parameter to the video to be processed.
In a second aspect, an embodiment of the present invention further provides a video processing apparatus, where the apparatus includes:
the information acquisition module is used for acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information;
the parameter determining module is used for determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information;
and the label adding module is used for adding the video label corresponding to the video characteristic parameter to the video to be processed.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a video processing method as in any of the embodiments of the present invention.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the video processing method according to any one of the embodiments of the present invention.
According to the embodiment of the invention, the audio characteristic information in the video to be processed is obtained, the audio characteristic information comprises at least one item of sound track information, voiceprint information and system voice prompt information, the video characteristic parameter corresponding to the video to be processed is determined according to the audio characteristic information, the video tag corresponding to the video characteristic parameter is added to the video to be processed, and the richer video tag is obtained by utilizing the audio characteristic information in the video, so that the problems that the content of the video tag is too single and the video understanding quality is reduced due to the fact that the content of the video tag is only provided through a video picture in the prior art are solved, the richness of the video tag is improved, and the quality of the video content understanding of a viewer is improved.
Drawings
Fig. 1a is a schematic flowchart of a video processing method according to an embodiment of the present invention;
FIG. 1b is a schematic diagram of a video tag display mode according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a video processing method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a video processing apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1a is a schematic flowchart of a video processing method according to an embodiment of the present invention. The method is applicable to the case of tagging video content, and can be executed by a video processing apparatus, which may be composed of hardware and/or software, and may be generally integrated in a server and all computer devices including video processing functions. The method specifically comprises the following steps:
s110, acquiring audio characteristic information in the video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information, and system voice prompt information.
This embodiment mainly relies on the label that the image can't be discerned to present simple, utilizes the audio frequency characteristic information of video to carry out accurate discernment to in video processing and live broadcast process, increase the abundant degree of label from a plurality of dimensions. The video tag may be keyword information for labeling the video highlight content.
In this embodiment, the video to be processed may be, for example, a game-like video, which may be a recorded video segment, or a live video stream, which is not limited herein. The audio feature information in the video to be processed may be sound data of the video, such as video sound effects of the game video, which includes at least one of sound track information, voiceprint information and system voice prompt information. The vocal tract information may be multi-dimensional sound information having a stereo vocal tract, the voiceprint information may be information such as a volume of sound and a feature of sound wave, and the system voice prompt information may be, for example, a key event prompt sound generated when a key event of the system is triggered.
And S120, determining a video characteristic parameter corresponding to the video to be processed according to the audio characteristic information.
In this embodiment, the video feature parameter may be a feature parameter for characterizing the content of a key event in the video, for example, data identified by the highlight operation time of a player in the game video. Because some operation data can not be directly displayed on the game video picture, the video characteristic parameters can be obtained by analyzing the audio characteristic information of the video. For example, when an enemy player in a game video starts shooting, the enemy shooting distance value is not displayed on the video screen, and therefore the distance value cannot be determined directly on the video screen, but can be determined by the size of the gunshot.
For example, a preset algorithm may be adopted to identify audio feature information extracted from a video to be processed, and a video feature parameter corresponding to the video to be processed is obtained according to an identification result. For example, the audio characteristic information extracted from the video to be processed may be subjected to text recognition, and text or data information matched with a preset keyword is screened out from the text recognition result as the video characteristic parameter of the video to be processed.
The method for determining the video characteristic parameters corresponding to the video to be processed by utilizing the audio characteristic information has the advantages that more helpful labels can be abstracted more comprehensively in multiple dimensions, and label dimensions which cannot be touched by image recognition are created, so that the quality of video content understanding of video viewers is greatly improved.
And S130, adding the video label corresponding to the video characteristic parameter to the video to be processed.
In this embodiment, different video feature parameters may correspond to different video tags, so as to perform tagging processing on the video feature parameters, for example, if the identified video feature parameter corresponding to the video to be processed is 100 meters of enemy shooting distance value, the corresponding video tag may be "100". One video feature parameter may correspond to one video tag, or a plurality of video feature parameters may correspond to one video tag in a comprehensive manner, which is not limited herein.
In an optional implementation manner, the video tag can be used as a keyword of the video to be processed, and is displayed below a display interface corresponding to the video to be processed, so that a viewer can select a video in which the viewer is interested according to the video tag to watch the video.
In another optional implementation, adding a video tag corresponding to a video feature parameter to a video to be processed includes: acquiring a video time period corresponding to the video characteristic parameters in a video to be processed; and displaying the video label corresponding to the video characteristic parameter in the video display picture corresponding to the video time period.
For example, a preset playing time period after the video playing time can be obtained by recording the video playing time corresponding to the video characteristic parameter obtaining time, and the time period is taken as a video time period corresponding to the video characteristic parameter in the video to be processed. And adding and displaying a corresponding video label in a video display picture corresponding to the video time period to help a viewer to better understand the video content.
As a practical example, for example, in fig. 1b, when the video playing time is 3 minutes 05 seconds, and the video characteristic parameter is recognized as the enemy shooting distance value 100 meters, the video tag 11 is displayed on the corresponding video display screen 1 within 3 minutes 05 seconds to 3 minutes 35 seconds of the video playing.
On the basis of the foregoing embodiment, optionally, after adding the video tag corresponding to the video feature parameter to the video to be processed, the method further includes: grading the video to be processed according to the video label; and performing recommended display on the video to be processed according to the grade.
The specific scoring mode may be that each video tag corresponds to a corresponding score value, the score values are accumulated and calculated according to the video tags added to the video to be processed, the calculation result is used as the score of the video to be processed, and the video with high score is preferentially recommended and displayed. Certainly, the video tags can be divided into different types, the tags of different types have different weights, when the score is calculated, the weight value corresponding to the type to which the video tag belongs is multiplied by the score value corresponding to the video tag, and then all the video tags added to the video to be processed are accumulated and calculated. Both of the above two methods are applicable to the present embodiment, and are not limited herein.
According to the technical scheme, the audio characteristic information in the video to be processed is obtained, the audio characteristic information comprises at least one item of sound track information, voiceprint information and system voice prompt information, the video characteristic parameter corresponding to the video to be processed is determined according to the audio characteristic information, the video label corresponding to the video characteristic parameter is added to the video to be processed, the audio characteristic information in the video is used for obtaining richer video labels, the problems that in the prior art, the content of the video labels is only provided through video pictures, the content of the video labels is too single, the video understanding quality is reduced are solved, the richness of the video labels is improved, and the quality of viewers understanding the video content is improved.
Example two
Fig. 2 is a flowchart illustrating a video processing method according to a second embodiment of the present invention. The present embodiment is optimized based on the above embodiments, and provides a preferred video processing method, specifically, further optimizing the video feature parameters corresponding to the video to be processed, which are determined according to the audio feature information, includes: and inputting the audio characteristic information into a pre-trained voice recognition model to obtain video characteristic parameters corresponding to the video to be processed.
The video processing method provided by the embodiment specifically comprises the following steps:
s210, acquiring audio characteristic information in the video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information, and system voice prompt information.
S220, inputting the audio characteristic information into a pre-trained voice recognition model to obtain video characteristic parameters corresponding to the video to be processed.
In this embodiment, the audio feature information may be vectorized first, and then the feature vector obtained after the vectorization is input to the pre-trained voice recognition model. The voice recognition model can be used for recognizing the input audio characteristic information so as to output corresponding video characteristic parameters. Specifically, the voice recognition model may be a model trained according to a preset machine learning algorithm.
The operation principle of the voice recognition model may be that when the audio characteristic information is input, the voice recognition model performs voice recognition on the input audio characteristic information, analyzes the recognized characteristic information, and determines whether the input audio characteristic information includes a corresponding characteristic parameter, if so, the characteristic parameter is output as the video characteristic parameter of the video to be processed, and if not, no output is performed. For example, a game video containing enemy gunshot is input into a voice recognition model, and the voice recognition model performs voice recognition and feature analysis on audio feature information of the video and then can output a corresponding enemy shooting distance value.
The beneficial effect of using the voice recognition model to perform voice recognition in the embodiment is that the accuracy and the real-time performance of voice recognition can be improved, and then the accuracy of adding the video tag can be improved in the process of adding the video tag.
Optionally, before the audio feature information is input into a pre-trained voice recognition model to obtain a video feature parameter corresponding to a video to be processed, the method further includes: acquiring an audio characteristic information sample with a target video characteristic parameter label; and training the set artificial intelligence model by using the audio characteristic information sample to obtain the voice recognition model.
The audio characteristic information sample may be extracted from each live video in the network live platform, or may be downloaded from the internet through a specific search engine, which is not limited herein. Taking the example of extracting audio characteristic information samples from each live video in a network live platform, searching a plurality of live rooms of games from a target network live platform, then respectively extracting a plurality of sections of audio signals with typical sound characteristics from the plurality of live rooms, and labeling corresponding video characteristic parameter labels on the extracted plurality of sections of audio signals, thereby obtaining the audio characteristic information samples. Specifically, the manner of labeling the obtained audio feature information samples may be a manual evaluation labeling manner, that is, the audio signals with typical sound features obtained from each live broadcast room are labeled with corresponding video feature parameter labels in a manual manner to serve as the audio feature information samples under different video feature parameters.
In this embodiment, the artificial intelligence model may be a training model established based on a machine learning algorithm, for example, a Recurrent Neural Network (RNN), where the RNN is an artificial neural Network in which nodes are directionally connected to form a ring, and an internal state of the Network may show a dynamic timing behavior. Unlike feed-forward neural networks, the RNN can use its internal memory to process input sequences of arbitrary timing, which makes it easier to handle e.g. non-segmented handwriting recognition, speech recognition, etc. Specifically, the process of training the artificial intelligence model may be a process of adjusting parameters of each neural network, and through continuous training, optimal neural network parameters are obtained, and the set artificial intelligence model having the optimal neural network parameters is the model to be finally obtained. Illustratively, after obtaining a plurality of audio characteristic information samples with target video characteristic parameter labels, training a set artificial intelligence model by using the plurality of audio characteristic information samples, and continuously adjusting neural network parameters in the set artificial intelligence model, so that the set artificial intelligence model has the capability of identifying the target video characteristic parameters from the input audio characteristic information, thereby obtaining the sound identification model.
Optionally, the video to be processed includes shooting game video; correspondingly, the audio characteristic information is input into a pre-trained voice recognition model, and video characteristic parameters corresponding to the video to be processed are obtained, and the method comprises the following steps: inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video; or inputting the sound track information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video and the shooting distance of the enemy; or inputting the sound track information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video, the shooting distance of the enemy and the type of a gun.
For example, in a shooting game type video, since the sound effect sound generated by shooting a firearm can be recognized through a sound channel and a voiceprint, specific shooting data information can be acquired by inputting sound channel information and/or voiceprint information into a pre-trained sound recognition model. Specifically, the sound channel information with the enemy shooting sound effect is input into the sound recognition model, and the direction of the enemy can be obtained through the output of the sound recognition model; inputting the voiceprint information with the enemy shooting sound effect into the sound recognition model, and outputting to obtain the enemy shooting distance and/or the type of firearms used by the enemy; and (4) inputting the voiceprint information with the shooting sound effect of the party into the sound recognition model, and outputting to obtain the type of the firearm used by the party.
Optionally, the video to be processed includes a multiplayer online tactical sports game video; correspondingly, the audio characteristic information is input into a pre-trained voice recognition model, and video characteristic parameters corresponding to the video to be processed are obtained, and the method comprises the following steps: and inputting system voice prompt information in the audio characteristic information into a pre-trained voice recognition model to obtain game event keywords corresponding to the multi-player online tactical competitive game video.
For example, in a MOBA (Multiplayer Online Battle Arena) game video, since a character used by a player may make a specific sound or when the player triggers a specific game event, the system may make a voice prompt, and therefore, specific player operation data may be obtained by inputting system voice prompt information into a pre-trained voice recognition model. Specifically, the system voice prompt information with continuous killing voice prompts is input into the pre-trained voice recognition model, and keywords of the continuous killing events of the game, such as the continuous killing number of the players, can be output from the voice recognition model.
And S230, adding the video label corresponding to the video characteristic parameter to the video to be processed.
According to the technical scheme, after the audio characteristic information in the video to be processed is obtained, the audio characteristic information is input into the pre-trained sound recognition model, the video characteristic parameter corresponding to the video to be processed is obtained, the video label corresponding to the video characteristic parameter is added to the video to be processed, the video sound effect is recognized through the sound recognition model, rich video labels are obtained from more dimensions, the richness degree of the video labels is improved, and the understanding quality of video contents of viewers is improved, the accuracy and the real-time performance of sound recognition are improved, and the accuracy of video label addition is improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a video processing apparatus according to a third embodiment of the present invention. Referring to fig. 3, the video processing apparatus includes: the information obtaining module 310, the parameter determining module 320, and the tag adding module 330 are described in detail below.
An information obtaining module 310, configured to obtain audio feature information in a video to be processed, where the audio feature information includes: at least one of vocal tract information, voiceprint information and system voice prompt information;
a parameter determining module 320, configured to determine, according to the audio feature information, a video feature parameter corresponding to the video to be processed;
a tag adding module 330, configured to add a video tag corresponding to the video feature parameter to the video to be processed.
The video processing apparatus provided by this embodiment, by obtaining the audio characteristic information in the video to be processed, where the audio characteristic information includes at least one of channel information, voiceprint information, and system voice prompt information, and determining the video characteristic parameter corresponding to the video to be processed according to the audio characteristic information, and adding the video tag corresponding to the video characteristic parameter to the video to be processed, obtains a richer video tag by using the audio characteristic information in the video, thereby solving the problems that in the prior art, the content of the video tag is only provided by a video frame, which results in that the content of the video tag is too single, and the video understanding quality is reduced, improving the richness of the video tag, and improving the quality of viewers understanding the video content.
Optionally, the parameter determining module 320 may include:
and the information input submodule is used for inputting the audio characteristic information into a pre-trained voice recognition model to obtain a video characteristic parameter corresponding to the video to be processed.
Optionally, the parameter determining module 320 may further include:
the sample acquisition submodule is used for acquiring an audio characteristic information sample with a target video characteristic parameter label before the audio characteristic information is input into a pre-trained voice recognition model and the video characteristic parameter corresponding to the video to be processed is obtained;
and the model training submodule is used for training a set artificial intelligence model by using the audio characteristic information sample to obtain the voice recognition model.
Optionally, the video to be processed includes a shooting game video;
correspondingly, the information input submodule may be specifically configured to:
inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video; or the like, or, alternatively,
inputting the sound channel information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video and the shooting distance of the enemy; or inputting the sound track information and the sound pattern information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of the enemy corresponding to the shooting game video, the shooting distance of the enemy and the type of the firearms.
Optionally, the video to be processed includes a multiplayer online tactical sports game video;
correspondingly, the information input submodule may be specifically configured to:
and inputting system voice prompt information in the audio characteristic information into a pre-trained voice recognition model to obtain game event keywords corresponding to the multi-player online tactical competitive game video.
Optionally, the tag adding module 330 may be specifically configured to:
acquiring a video time period corresponding to the video characteristic parameter in the video to be processed;
and displaying the video label corresponding to the video characteristic parameter in the video display picture corresponding to the video time period.
Optionally, the video processing apparatus may further include:
the video scoring module is used for scoring the video to be processed according to the video tags after the video tags corresponding to the video characteristic parameters are added to the video to be processed;
and the video recommending module is used for recommending and displaying the video to be processed according to the grade.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention, and as shown in fig. 4, the computer device according to the fourth embodiment of the present invention includes: a processor 41 and a memory 42. The number of the processors in the computer device may be one or more, fig. 4 illustrates one processor 41, the processor 41 and the memory 42 in the computer device may be connected by a bus or in other ways, and fig. 4 illustrates the connection by a bus.
The video processing apparatus provided in the above-described embodiment is integrated into the processor 41 of the computer device in this embodiment. Further, the memory 42 in the computer device is used as a computer readable storage medium for storing one or more programs, which may be software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the video processing method in the embodiment of the present invention (for example, the modules in the video processing apparatus shown in fig. 3 include the information obtaining module 310, the parameter determining module 320, and the tag adding module 330). The processor 41 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory 42, that is, implements the video processing method in the above-described method embodiment.
The memory 42 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 42 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 42 may further include memory located remotely from processor 41, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
And, when one or more programs included in the above-described computer apparatus are executed by the one or more processors 41, the programs perform the following operations:
acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information; determining a video characteristic parameter corresponding to the video to be processed according to the audio characteristic information; and adding the video label corresponding to the video characteristic parameter to the video to be processed.
EXAMPLE five
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a video processing apparatus, implements a video processing method according to an embodiment of the present invention, where the method includes: acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information, voiceprint information and system voice prompt information; determining a video characteristic parameter corresponding to the video to be processed according to the audio characteristic information; and adding the video label corresponding to the video characteristic parameter to the video to be processed.
Of course, the computer-readable storage medium provided in the embodiments of the present invention, when being executed, is not limited to implement the method operations described above, and may also implement the relevant operations in the video processing method provided in any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the video processing apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (7)

1. A video processing method, comprising:
acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information and voiceprint information;
determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information, wherein the video characteristic parameters are used for representing key event content in the video;
the determining the video characteristic parameter corresponding to the video to be processed according to the audio characteristic information includes: identifying audio characteristic information extracted from a video to be processed by adopting a preset algorithm, and acquiring video characteristic parameters corresponding to the video to be processed according to an identification result;
adding a video label corresponding to the video characteristic parameter to the video to be processed;
the video to be processed comprises shooting game video;
correspondingly, inputting the audio characteristic information into a pre-trained voice recognition model to obtain video characteristic parameters corresponding to the video to be processed, including:
and inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video.
2. The method according to claim 1, before inputting the audio feature information into a pre-trained voice recognition model to obtain video feature parameters corresponding to the video to be processed, further comprising:
acquiring an audio characteristic information sample with a target video characteristic parameter label;
and training a set artificial intelligence model by using the audio characteristic information sample to obtain the voice recognition model.
3. The method of claim 1, wherein the pending video comprises a shooting game-like video;
correspondingly, inputting the audio characteristic information into a pre-trained voice recognition model to obtain a video characteristic parameter corresponding to the video to be processed, and further comprising:
inputting the sound channel information and the sound stripe information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video and the shooting distance of the enemy; or the like, or, alternatively,
and inputting the sound channel information and the sound pattern information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of the enemy corresponding to the shooting game video, the shooting distance of the enemy and the type of the firearms.
4. The method according to claim 1, wherein adding the video tag corresponding to the video feature parameter to the video to be processed comprises:
acquiring a video time period corresponding to the video characteristic parameter in the video to be processed;
and displaying the video label corresponding to the video characteristic parameter in the video display picture corresponding to the video time period.
5. A video processing apparatus, comprising:
the information acquisition module is used for acquiring audio characteristic information in a video to be processed, wherein the audio characteristic information comprises: at least one of vocal tract information and voiceprint information;
the parameter determining module is used for determining video characteristic parameters corresponding to the video to be processed according to the audio characteristic information, wherein the video characteristic parameters are used for representing key event contents in the video;
the determining the video characteristic parameter corresponding to the video to be processed according to the audio characteristic information includes: identifying audio characteristic information extracted from a video to be processed by adopting a preset algorithm, and acquiring video characteristic parameters corresponding to the video to be processed according to an identification result;
the label adding module is used for adding the video label corresponding to the video characteristic parameter to the video to be processed;
the video processing device also comprises an information input submodule;
the video to be processed comprises shooting game video; correspondingly, the information input submodule is specifically configured to:
and inputting the sound channel information in the audio characteristic information into a pre-trained sound recognition model to obtain the direction of an enemy corresponding to the shooting game video.
6. A computer device, the device comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the video processing method of any of claims 1-4.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the video processing method according to any one of claims 1 to 4.
CN201910037302.9A 2019-01-15 2019-01-15 Video processing method, device, equipment and storage medium Active CN109640112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910037302.9A CN109640112B (en) 2019-01-15 2019-01-15 Video processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910037302.9A CN109640112B (en) 2019-01-15 2019-01-15 Video processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109640112A CN109640112A (en) 2019-04-16
CN109640112B true CN109640112B (en) 2021-11-23

Family

ID=66061982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910037302.9A Active CN109640112B (en) 2019-01-15 2019-01-15 Video processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109640112B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677722A (en) * 2019-09-29 2020-01-10 上海依图网络科技有限公司 Video processing method, and apparatus, medium, and system thereof
CN111031392A (en) * 2019-12-23 2020-04-17 广州视源电子科技股份有限公司 Media file playing method, system, device, storage medium and processor
CN111447489A (en) * 2020-04-02 2020-07-24 北京字节跳动网络技术有限公司 Video processing method and device, readable medium and electronic equipment
CN111885414B (en) * 2020-07-24 2023-03-21 腾讯科技(深圳)有限公司 Data processing method, device and equipment and readable storage medium
CN114095738A (en) 2020-07-30 2022-02-25 京东方科技集团股份有限公司 Video and live broadcast processing method, live broadcast system, electronic device, terminal and medium
CN111901668B (en) * 2020-09-07 2022-06-24 三星电子(中国)研发中心 Video playing method and device
CN113038175B (en) * 2021-02-26 2023-03-24 北京百度网讯科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978963A (en) * 2014-04-08 2015-10-14 富士通株式会社 Speech recognition apparatus, method and electronic equipment
CN107357875A (en) * 2017-07-04 2017-11-17 北京奇艺世纪科技有限公司 A kind of voice search method, device and electronic equipment
CN107483879A (en) * 2016-06-08 2017-12-15 中兴通讯股份有限公司 Video marker method, apparatus and video frequency monitoring method and system
CN107507625A (en) * 2016-06-14 2017-12-22 讯飞智元信息科技有限公司 Sound source distance determines method and device
CN107770614A (en) * 2016-08-18 2018-03-06 中国电信股份有限公司 The label producing method and device of content of multimedia
CN108563670A (en) * 2018-01-12 2018-09-21 武汉斗鱼网络科技有限公司 Video recommendation method, device, server and computer readable storage medium
CN109126132A (en) * 2018-08-02 2019-01-04 Oppo广东移动通信有限公司 Position indicating method, device, storage medium and the electronic equipment of game role

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6697564B1 (en) * 2000-03-03 2004-02-24 Siemens Corporate Research, Inc. Method and system for video browsing and editing by employing audio
JP5622744B2 (en) * 2009-11-06 2014-11-12 株式会社東芝 Voice recognition device
CN107527617A (en) * 2017-09-30 2017-12-29 上海应用技术大学 Monitoring method, apparatus and system based on voice recognition
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN108962216B (en) * 2018-06-12 2021-02-02 北京市商汤科技开发有限公司 Method, device, equipment and storage medium for processing speaking video
CN109166586B (en) * 2018-08-02 2023-07-07 平安科技(深圳)有限公司 Speaker identification method and terminal

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978963A (en) * 2014-04-08 2015-10-14 富士通株式会社 Speech recognition apparatus, method and electronic equipment
CN107483879A (en) * 2016-06-08 2017-12-15 中兴通讯股份有限公司 Video marker method, apparatus and video frequency monitoring method and system
CN107507625A (en) * 2016-06-14 2017-12-22 讯飞智元信息科技有限公司 Sound source distance determines method and device
CN107770614A (en) * 2016-08-18 2018-03-06 中国电信股份有限公司 The label producing method and device of content of multimedia
CN107357875A (en) * 2017-07-04 2017-11-17 北京奇艺世纪科技有限公司 A kind of voice search method, device and electronic equipment
CN108563670A (en) * 2018-01-12 2018-09-21 武汉斗鱼网络科技有限公司 Video recommendation method, device, server and computer readable storage medium
CN109126132A (en) * 2018-08-02 2019-01-04 Oppo广东移动通信有限公司 Position indicating method, device, storage medium and the electronic equipment of game role

Also Published As

Publication number Publication date
CN109640112A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109640112B (en) Video processing method, device, equipment and storage medium
CN108769823B (en) Direct broadcasting room display methods, device, equipment
US10824874B2 (en) Method and apparatus for processing video
WO2019228302A1 (en) Live broadcast room display method, apparatus and device, and storage medium
US20170169018A1 (en) Method and Electronic Device for Recommending Media Data
CN107463698B (en) Method and device for pushing information based on artificial intelligence
CN109194978A (en) Live video clipping method, device and electronic equipment
CN110347872B (en) Video cover image extraction method and device, storage medium and electronic equipment
CN110557659B (en) Video recommendation method and device, server and storage medium
US20160317933A1 (en) Automatic game support content generation and retrieval
CN111757170B (en) Video segmentation and marking method and device
CN110267116A (en) Video generation method, device, electronic equipment and computer-readable medium
CN113779381B (en) Resource recommendation method, device, electronic equipment and storage medium
CN110347866B (en) Information processing method, information processing device, storage medium and electronic equipment
CN109618236A (en) Video comments treating method and apparatus
CN108460122B (en) Video searching method, storage medium, device and system based on deep learning
JP2018525675A (en) Method and device for generating live text broadcast content using past broadcast text
Habibian et al. Recommendations for recognizing video events by concept vocabularies
CN110072140A (en) A kind of video information reminding method, device, equipment and storage medium
CN111147871B (en) Singing recognition method and device in live broadcast room, server and storage medium
US20210394060A1 (en) Method and system for automatically generating video highlights for a video game player using artificial intelligence (ai)
CN114095742A (en) Video recommendation method and device, computer equipment and storage medium
KR102586286B1 (en) Contextual digital media processing systems and methods
CN105848737B (en) Analysis device, recording medium, and analysis method
US11410706B2 (en) Content pushing method for display device, pushing device and display device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant