CN111953910A

CN111953910A - Video processing method and device based on artificial intelligence and electronic equipment

Info

Publication number: CN111953910A
Application number: CN202010800282.9A
Authority: CN
Inventors: 林少彬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2020-11-17
Anticipated expiration: 2040-08-11
Also published as: CN111953910B

Abstract

The application discloses a video processing method based on artificial intelligence, which comprises the following steps: acquiring game record data and a corresponding game video; extracting at least one game feature and corresponding game instruction time from the game record data; acquiring a matched comment strategy model according to the at least one game feature; generating corresponding comment voice based on the comment strategy model; and synthesizing a commentary video according to the commentary voice and the game video, and enabling the time line starting point of the commentary voice to be matched with the game instruction time. The method can accurately, dynamically and automatically generate professional commentary contents for the game, provides quick and intelligent automatic commentary services for the online game, realizes AI game commentary, can automatically generate commentary videos, and greatly improves the processing efficiency of the commentary videos.

Description

Video processing method and device based on artificial intelligence and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video processing method and apparatus based on artificial intelligence, and an electronic device.

Background

With the rapid development of scientific technology, live video broadcast becomes a daily life entertainment and communication mode, and the video displayed to users during live broadcast is combined with rich elements such as images, characters, explanation of anchor and the like, has perfect sound and form and excellent effect, and is gradually popular on the internet. One of the popular video live broadcasts at present is a live broadcast of a MOBA (Multiplayer Online Battle Arena) game.

However, in the current live game process, there is a problem that it is not possible to intelligently provide corresponding game commentary for game-to-game, and only the anchor can play the game commentary according to the game-to-game situation, which has disadvantages that the whole video synthesis process needs manual participation, and the generation of a segment of highlight review video needs to pass: complicated full-manual processes such as segment selection, speech writing, video editing, voice generation, video synthesis and the like are adopted, and the long manufacturing process and much manpower input determine that the highlight review technology cannot be widely popularized and applied in batches.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Disclosure of Invention

The embodiment of the application provides a game video processing method and device based on artificial intelligence, electronic equipment and a storage medium.

In a first aspect, an embodiment of the present application provides an artificial intelligence based game video processing method, which includes:

acquiring game record data and a corresponding game video;

extracting at least one game feature and corresponding game instruction time from the game record data;

acquiring a matched comment strategy model according to at least one game feature;

generating corresponding comment voice based on the comment strategy model; and

and synthesizing a comment video according to the comment voice and the game video, and matching the starting point of the time line of the comment voice with the game instruction time.

In a second aspect, an embodiment of the present application provides an electronic device based on artificial intelligence, including:

the data acquisition module is used for acquiring game record data and a corresponding game video;

the game instruction acquisition module is used for extracting at least one game feature and corresponding game instruction time from the game record data;

the comment strategy obtaining module is used for obtaining the matched comment strategy model according to the at least one game feature;

the comment voice generating module is used for generating corresponding comment voice based on the comment strategy model; and

and the video synthesis module is used for synthesizing a comment video according to the comment voice and the game video so that the time line starting point of the comment voice is matched with the game instruction time.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory; one or more processors coupled with the memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the artificial intelligence based game video processing method provided by the first aspect described above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the artificial intelligence based video processing method provided in the first aspect.

The video processing system based on artificial intelligence provided by the embodiment of the application can accurately, dynamically and automatically generate professional commentary content aiming at MOBA category games, provides quick and intelligent automatic commentary service for online games, realizes AI game commentary, can automatically generate commentary videos, and greatly improves the processing efficiency of the commentary videos.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows an architecture diagram of an artificial intelligence based video processing system according to an embodiment of the present application.

Fig. 2 shows a schematic diagram illustrating a policy model in an artificial intelligence based video processing system according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating identification of game video game time in an artificial intelligence based video processing system according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating alignment of a game instruction frame and a video image frame in an artificial intelligence based video processing system according to an embodiment of the present application.

Fig. 5 shows a screenshot of an artificial intelligence based video processing system generating an explanation video according to an embodiment of the present application.

Fig. 6 shows a flowchart of an artificial intelligence based video processing method according to an embodiment of the present application.

Fig. 7 shows a flow chart of the method shown in fig. 6 for generating narration speech.

Fig. 8 shows a flowchart of an artificial intelligence based video processing method according to an embodiment of the present application.

Fig. 9 is a flowchart illustrating another artificial intelligence based video processing method according to an embodiment of the present application.

Fig. 10 is a flowchart illustrating another artificial intelligence based video processing method according to an embodiment of the present application.

Fig. 11 is a flowchart illustrating another artificial intelligence based video processing method according to an embodiment of the present application.

Fig. 12 shows a schematic diagram of extracting game features and commentary strategies from a third party commentary video.

Fig. 13 is a flowchart illustrating another artificial intelligence based video processing method according to an embodiment of the present application.

Fig. 14 shows a block diagram of an artificial intelligence based video processing apparatus according to an embodiment of the present application.

Fig. 15 is a hardware block diagram of an artificial intelligence based video processing electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Definition of terms

Moba (multiplayer Online game arena), multiplayer Online tactical sports game. The play of such games is: in combat, where equipment is typically purchased, players are often divided into two teams that compete against each other in a scattered game map, each player controlling a selected character through an interface.

Referring to fig. 1, the present application discloses an architecture diagram of an artificial intelligence based video processing system. As shown in fig. 1, in the video processing system, input data is game record data and game video, and output data is comment video.

The game record data may include: game instruction data, game statistics, and the like.

The game instruction data is a command for controlling the movement, skill, action and other behaviors of the in-game elements such as the player character and the non-player character in the game during the game, and generally exists in the form of instruction frames, and each instruction frame comprises a plurality of game operation instructions. Generally, the frame rate of the game instruction frame is slightly higher than the human reaction limit.

The narration video refers to a video with a narration subtitle, a video with a narration voice, and a video with both a narration subtitle and a narration voice.

As shown in fig. 1, the video processing system mainly includes the following steps when processing video:

in step S1, the game frame processing obtains the comment strategy.

Specifically, step S1 may include the steps of:

step S11, extracting game features from the game instruction data; and

and step S12, mining the comment strategy according to the game characteristics to obtain the comment strategy.

The game feature refers to the behavior of a game to a Player-controlled Character or a Non-Player Character (NPC) in a game, and a specific game state generated by interaction of the behaviors or interaction with a game map environment. The NPC may include mobile monsters, soldiers, super soldiers, and may also include non-mobile defense towers, crystals, and the like.

Behavior here refers to character movement, attack, release skills, and other interactions such as issuing warning signals, issuing aggregate signals, issuing attack signals, issuing retreat signals, and the like.

The game state herein refers to the state of the player or NPC such as blood volume, level, location, equipment, skill cooling time, performance, revival, death, refreshment, score, tower count, win rate, money count, blue amount, energy bar value, blob go, skill delivery failure such as flash wall hit, skill empty, and the like.

Based on the game instructions, the game characteristics can be obtained through real-time in-game element attribute calculation (such as the position of a new coordinate of a map after the movement of a player, and the cooling time of skills after the hero skill is released).

Fig. 2 discloses a schematic diagram of how commentary strategies are mined from game features. First, four categories can be obtained after summarizing game basic elements: hero, NPC, group war and statistics. Aiming at the basic attributes and characteristics of each category, two or even more levels of categories are supplemented and expanded step by step, for example, for the skill category of hero, sub-categories such as hero one skill, hero two skill, and large admission can be expanded, for each skill sub-category, sub-categories such as whether hero has large admission, whether hero has large admission to who (enemy hero or wild monster), and hero large admission use skill can be expanded, and the sub-categories are expanded step by step. In this way, a declarative policy model can be obtained.

For the commentary strategy model, the inputs are game features and the outputs are commentary strategies.

Each commentary strategy matches one or more play features, such as the magic bops, the 5 hero of the large and medium opponents. On the other hand, each commentary strategy has a corresponding commentary generation strategy, and corresponding commentary can be generated according to game features.

In step S2, the video frame is processed to obtain a sampled video.

It will be appreciated that to insert captions or speech commentary into game video, the timeline for the captions and audio is aligned with the timeline for the video. If the time of the game command data and the video data is in a synchronous state, no additional alignment operation is needed. However, in most cases, the time of recording the game video may not be synchronized with the game instruction data, or the game video may be clipped, and the time line of the game video may not be synchronized with the game instruction data, and the time line of the game video may be interrupted.

Specifically, the first step is to decode the game video to extract the image frames. It can be understood that a video is generally compressed and stored by using a certain standard, and video decoding can be realized according to a corresponding specification to obtain each video frame image. Of course, the video image can be deframed directly by using a decoder library of a third party. For example, FFmpeg is free software of open source code that can run the functions of recording, converting, streaming of audio and video in multiple formats, including libavcodec, which is a decoder library for audio and video in multiple projects, and libavformat, an audio and video format conversion library.

Referring to fig. 3, a schematic diagram of an image frame of a game video is shown, with the current time of the game displayed at the top middle position of the image frame. In one particular embodiment, a screenshot of the time zone may be captured and image recognition techniques may then be employed to identify the corresponding game time. For example, a MNIST-like (MNIST database is a large database of handwritten numbers, which is commonly used to train various image processing systems) handwritten number recognition process may be used to obtain the corresponding game time within each frame of image. Through the steps, the game time corresponding to each frame of image can be obtained.

As shown in fig. 3, the frame 101 cuts a boundary for the time region, and through the digital recognition model obtained by fine tuning, it can be recognized that the current game time number is: 4-3-5, corresponding to the game time: 4 minutes 35 seconds.

In step S3, the game instruction frame is aligned with the video image frame.

As described in step S1, the caption or audio is corresponding to the game feature at a certain game time, and the game instruction frame needs to be aligned with the video image frame to accurately embed the caption or audio into the video.

In a specific embodiment, the game instruction time and the game time in the video are within an allowable error range, for example, within not more than 1 second. As described in step S2, the game time can be identified by identifying the game video image frame, and the game instruction frame generally includes the corresponding game time directly, and the game time can be matched with the game video image frame.

In one embodiment, in order to improve the accuracy of the match between the caption or audio and the game time, the caption or audio needs to be accurately positioned on a certain accurate video frame, that is, each game instruction frame is mapped onto a certain specific video frame. For example, the frame rate of the game instruction frame is represented as fps1, and the frame rate of the game video is represented as fps 2.

In one embodiment, fps1 is the same as fps2, and the game command frame and the game video image frame can be in one-to-one correspondence without additional processing.

In one specific embodiment, fps1 is smaller than fps2, and the game video image frames can be sampled, so that the frame rate of the sampled game video is equal to fps1, and the game command frames and the game video image frames can be in one-to-one correspondence.

In a specific embodiment, fps1 is larger than fps2, and the game video image frames can be interpolated, that is, intermediate frames are intelligently generated according to the front and rear image frames, so that the frame rate of the game video is equal to fps1, and the game instruction frames and the game video image frames can be in one-to-one correspondence.

Through the front frame rate alignment processing, the frame rates of the game instruction frame and the image sampling frame are consistent, the number of data frames of the game instruction frame and the image sampling frame in each second is the same, and meanwhile through the front game time identification process, the game time corresponding to the image sampling frame can be obtained, so that the game instruction frame and the game instruction frame can be aligned in time. For example, the 1 st to 20 th frames of the game instruction frame correspond to the 1 st second, the 20 th frames need to correspond to the image sampling frame to identify the 20 th image with the game time of the 1 st second, and then the game instruction frame corresponds to the video image frame one by one according to the video time sequence of the frame number, the video image frame, and the data frame alignment can be completed.

Referring to fig. 4, which shows the alignment process of the game instruction frame and the video image frame, the game video image is sampled by the game time identification, the original game video of 40FPS is sampled into 20FPS, and the game instruction frame is aligned one by one from the side. In the above manner, the caption or audio can be accurately mapped to a specific video frame.

In step S4, the comment speech is generated.

After the alignment relationship between the game instruction frame and the game video is determined, the generated commentary or commentary voice needs to be embedded into the corresponding position of the video. In one particular embodiment, the commentary Text may be generated into commentary Speech using Text-To-Speech (TTS) technology.

The TTS technology simultaneously applies linguistics and psychology, and intelligently converts characters into natural voice streams through the design of a neural network under the support of an embedded chip. The TTS technology carries out real-time conversion on the text file, and the conversion time can be calculated in seconds.

Under the action of a special intelligent voice controller, the voice rhythm of the text output is smooth, so that a listener feels natural when listening to information and does not have the feeling of indifference and acerbity of machine voice output. The TTS speech synthesis technology adopts the real Mandarin as the standard pronunciation, can realize the rapid speech synthesis of 120 plus 150 Chinese characters/minute, the reading speed reaches 3-4 Chinese characters/second, and the user can hear clear and pleasant tone quality and coherent and smooth intonation.

In step S5, the commentary video is synthesized.

The video merging and the video decoding are reverse operations, namely, a plurality of frames of video images are coded and compressed according to a preset video standard and stored, and meanwhile, the commentary and the voice are synthesized into the game video, so that the commentary video is finally obtained.

Referring to fig. 5, a schematic diagram of the synthesized explanatory video is shown. At the corresponding game time, the caption 102 and speech are automatically inserted into the video.

By the video processing method based on artificial intelligence, professional commentary content aiming at MOBA category games can be accurately, dynamically and automatically generated, rapid intelligent automatic commentary service is provided for online games, AI game commentary is realized, commentary videos can be automatically generated, and the processing efficiency of the commentary videos is greatly improved.

Referring to fig. 6, a flow of a game video processing method provided by an embodiment of the present application is shown, where the method includes:

step S101, game record data and corresponding game video are obtained.

The game instruction data is a command for controlling the movement, skill, action and other behaviors of the in-game elements such as the player character and the non-player character in the game during the game, and generally exists in the form of instruction frames, and each instruction frame comprises a plurality of game operation instructions.

The game video may be recorded by a third party video, which may also have been edited. The game video may also be generated using playback skill dynamic rendering based on the game record data described above.

Step S102, at least one game feature and corresponding game instruction time are extracted from the game record data.

Based on the game instructions, the game characteristics can be obtained through real-time in-game element attribute calculation (such as the position of a new coordinate of a map after the movement of a player, and the cooling time of skills after the hero skill is released). For example, a player is killed by a soldier, a player is killed by a monster, and a player is killed by a defense tower.

And step S103, acquiring a matched comment strategy model according to the at least one game feature.

The strategy model can be an empirical rule matching model or a neural network model. When a rule matching model is employed, each commentary strategy matches one or more game features, for example, the magic ox flashes open big, and the 5-hero grand opponents. When the neural network model is adopted, the existing manual explanation data and game record data are adopted to train the neural network model, the input data of the neural network model is game characteristics, and the input data is an explanation strategy.

And step S104, generating corresponding comment voice based on the comment strategy model.

As described above, each commentary strategy has a corresponding commentary generation strategy, and corresponding commentary can be generated according to game features. Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like. In this embodiment, the video processing system should give the most appropriate commentary for different game features.

As shown in fig. 7, in a specific embodiment, step S104 may include:

step S1041, obtaining a corpus matched with the comment policy from a predefined comment corpus.

The corpus here refers to various templates generated by manually indexing commentary.

The corpus can also be a fixed expression, for example, when a certain key wild is refreshed, a fixed commentary can be provided.

Step S1042, generating a commentary according to the corpus.

When the commentary is to be generated, the content in the template is replaced dynamically to make the commentary accord with the current scene. Of course, if the corpus is a fixed commentary, the input is direct.

And step S1043, generating the commentary voice by adopting a text-to-speech technology according to the commentary.

In one particular embodiment, the commentary Text may be generated into commentary Speech using Text-To-Speech (TTS) technology.

It should be noted that the method of generating the caption shown in fig. 7 is merely an example, and the caption may be generated by using a neural network.

And step S105, synthesizing a comment video according to the comment voice and the game video, and matching the time line starting point of the comment voice with the game instruction time.

The synthesis of the commentary video involves processing of speech and processing of subtitles. Generally speaking, the commentary voice and video are stored in different streams, and it is the alignment of the time line that is processed, that is, the time line start point of the voice corresponding to a certain commentary is synchronized with the video frame.

Referring to fig. 8, a flow of a game video processing method according to an embodiment of the present application is shown, which is similar to the method shown in fig. 6, except that after step S103, the method further includes:

and step S106, generating a corresponding caption according to the caption strategy model.

The subtitle processing generally has two modes, one mode is to adopt a single subtitle file, at this time, aiming at the alignment of the time line which is also to be processed by the subtitle, after the time line alignment is completed, the video playing software can automatically process the display of the subtitle. This is convenient in that no additional processing of the video is required. In another mode, the subtitles are directly used as an image layer of the video and are directly merged with the original video data. At this time, the subtitle layer needs to be accurately superimposed on the corresponding video frame.

According to the artificial intelligence based video processing method provided by the embodiment, the captions are automatically added besides the narration voice is automatically embedded, and the video processing efficiency is improved.

Referring to fig. 9, a flow of a game video processing method according to an embodiment of the present application is shown, which is similar to the method shown in fig. 8, except that the method further includes:

in step S107, a game video is acquired.

And step S108, carrying out video decoding on the game to obtain a video frame.

The video coding mode is a mode of converting a file in an original video format into a file in another video format by a compression technology. The most important codec standards in video streaming are h.261, h.263, and h.264 of the international telecommunications union, M-JPEG of the moving picture experts group, and MPEG series standards of the moving picture experts group of the international organization for standardization.

Video is a continuous sequence of images, consisting of successive frames, a frame being an image. Due to the persistence of vision effect of the human eye, when a sequence of frames is played at a certain rate, we see a video with continuous motion. Because of the extremely high similarity between the continuous frames, in order to facilitate storage and transmission, the original video needs to be encoded and compressed to remove the redundancy in the spatial and temporal dimensions.

The video decoding is the inverse process of the above process, i.e. the process of obtaining video frames by performing decoding operation according to the corresponding video coding standard. Assuming that the frame rate of the video is 40FPS, it means that there are 40 images in the video one second after decoding.

Step S109, identifying the game time of the video frame pair.

According to the video processing method based on artificial intelligence, by identifying the game time of the game video, the time line can be conveniently aligned when the commentary video is synthesized.

Referring to fig. 10, a flow of a game video processing method according to an embodiment of the present application is shown, which is similar to the method shown in fig. 9, except that, in step S110, the method further includes:

step S110, align the game instruction frame with the video frame.

Step S110 may be performed in parallel with step S104. That is, the alignment operation between the game instruction frame and the video frame is completed while the narration voice is generated based on the TTS technology, and when the two operations are completed, the video can be synthesized. By the method, the waiting time for TTS to generate the explication voice can be reduced, and the video processing efficiency can be greatly improved.

Referring to fig. 11, a flow of a game video processing method according to an embodiment of the present application is shown, which is similar to the method shown in fig. 10, except that the method further includes:

in step S201, a third party comment video is acquired.

The third party commentary video refers to a game commentary video produced by a third party anchor, and can be acquired through each video live broadcast platform.

Step S202, extracting third-party commentary from the third-party commentary video.

Referring to fig. 12, a screenshot of a third party commentary video including a commentary that "this dreamy stream lost a mixed bomb hurt true heart high, worship for" where the commentary can be recognized from the screenshot by voice recognition techniques, or image recognition techniques is shown.

Step S203, extracting a corresponding third party commentary strategy and game features from the third party commentary.

Taking the example of "this shenmenxi lost a mixed bomb to hurt the true heart height, wearing clothes", the game features that can be extracted include: shenmenxi (hero), mixed bomb (Dazhu).

Step S204, training and updating the comment strategy model by adopting the third party comment strategy and game characteristics.

According to the artificial intelligence-based video processing method, the comment strategy model can be trained by adopting the comment video of the third-party anchor, so that the output of the comment strategy model is more diversified, more comment points can be excavated, and more national and higher-quality commentary can be output.

Referring to fig. 13, a flow of a game video processing method according to an embodiment of the present application is shown, which is similar to the method shown in fig. 11, except that the method further includes:

step S301, obtaining the barrage information input by the user when the commentary video is played.

A bullet screen refers to a commentary subtitle that pops up when a video is viewed over a network. The commentary subtitles often include comments or point of interest information about the video and commentary by the user.

Step S302, extracting user interest characteristics from the bullet screen information.

Step S303, training and updating the comment strategy model according to the user interest characteristics.

According to the artificial intelligence-based video processing method, the comment strategy model can be trained by adopting the barrage during the playing of the comment video, so that the output of the comment strategy model is more diversified, more comment points can be excavated, and more national and higher-quality commentary can be output.

Referring to fig. 14, a block diagram of an artificial intelligence based game video processing device according to an embodiment of the present application is shown, the device including:

a data obtaining module 31, configured to obtain game record data and a corresponding game video;

a game instruction acquisition module 32, configured to extract at least one game feature and corresponding game instruction time from the game record data;

the comment strategy obtaining module 33 is configured to obtain a matched comment strategy model according to the at least one game feature;

the comment voice generating module 34 is configured to generate a corresponding comment voice based on the comment policy model; and

and a video synthesis module 35, configured to synthesize a comment video according to the comment voice and the game video, so that a time line starting point of the comment voice matches the game instruction time.

According to the video processing device based on artificial intelligence, professional commentary contents aiming at MOBA category games can be accurately, dynamically and automatically generated, quick and intelligent automatic commentary services are provided for online games, AI game commentary is achieved, commentary videos can be automatically generated, and the processing efficiency of the commentary videos is greatly improved.

Referring to fig. 15, a computer device provided by an exemplary embodiment of the present application is shown, the computer device 1 includes a processor 10, a main memory 11, a non-volatile memory 12, and a wireless module 13. The processor 10 is connected to the main memory 11 via a first bus 16, it being understood that the first bus 16 is merely illustrative and not limited to a physical bus, and any hardware architecture and technique for connecting the main memory 11 to the processor 10 may be used.

The main Memory 11 is typically a volatile Memory, such as a Dynamic Random Access Memory (DRAM).

The non-volatile memory 12 and the wireless module 13 are connected to a first bus 16 via an input/output (IO) bus 17, which may interact with the processor 10. The IO bus may be, for example, a Peripheral Component Interconnect (PCI) bus or a high-speed serial computer expansion bus (PCI-E).

The non-volatile memory 12 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the Computer-Readable Storage Medium 200 includes a Non-volatile Computer-Readable Storage Medium (Non-Transitory Computer-Readable Storage Medium).

The network module 13 may be connected to the cloud server 20 through a wireless signal.

The non-volatile memory 12 stores therein a video processing program 120. When the video processing program 120 is started, it can execute the above-mentioned video processing method, so that the professional commentary content for the MOBA type game can be accurately, dynamically and automatically generated, a fast and intelligent automatic commentary service is provided for the online game, and an AI game commentary is realized.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A video processing method based on artificial intelligence is characterized by comprising the following steps:

acquiring game record data and a corresponding game video;

acquiring a matched comment strategy model according to the at least one game feature;

comment voices corresponding to the comment strategy models are used; and

and synthesizing a commentary video according to the commentary voice and the game video, and matching the time line starting point of the commentary voice with the game instruction time.

2. The artificial intelligence based video processing method of claim 1, wherein the generating of the corresponding commentary voice according to the commentary policy comprises:

obtaining corpora matched with the comment strategies from a predefined comment corpus;

generating the commentary according to the corpus; and

generating the narration speech from the commentary based on a text-to-speech technique.

3. The artificial intelligence based video processing method of claim 2, wherein the method further comprises:

generating subtitles according to the commentary;

the synthesizing of the commentary video according to the commentary voice and the game video further comprises inserting the subtitles into the commentary video.

4. The artificial intelligence based video processing method of claim 1, wherein the method further comprises:

decoding the game video to obtain a video frame; and

and intercepting at least partial area of the video frame to perform image recognition to obtain game time corresponding to the video frame.

5. The artificial intelligence based video processing method of claim 4, wherein the game record data includes a plurality of game instruction frames, a frame rate of the game instruction frames being a first frame rate, the method further comprising:

and processing the video frame to enable the frame rate of the video frame to be the same as the first frame rate.

6. The artificial intelligence based video processing method of claim 1, wherein the method further comprises:

acquiring a third party comment video;

extracting third-party commentary from the third-party commentary video;

extracting a corresponding third party comment strategy and game characteristics from the third party commentary; and

and training and updating the comment strategy model by adopting the third-party comment strategy and game features.

7. The artificial intelligence based video processing method of claim 1, wherein the method further comprises:

acquiring bullet screen information input by a user when the commentary video is played;

extracting user interest characteristics from the bullet screen information; and

and training and updating the comment strategy model according to the user interest characteristics.

8. An artificial intelligence based video processing apparatus, the apparatus comprising:

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 7.