CN116074566B - Game video highlight recording method, device, equipment and storage medium - Google Patents

Game video highlight recording method, device, equipment and storage medium Download PDF

Info

Publication number
CN116074566B
CN116074566B CN202310062849.0A CN202310062849A CN116074566B CN 116074566 B CN116074566 B CN 116074566B CN 202310062849 A CN202310062849 A CN 202310062849A CN 116074566 B CN116074566 B CN 116074566B
Authority
CN
China
Prior art keywords
highlight
frame
video
audio
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202310062849.0A
Other languages
Chinese (zh)
Other versions
CN116074566A (en
Inventor
刘超
张南
程诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Mingdong Tianxia Network Technology Co ltd
Original Assignee
Shenzhen Mingdong Tianxia Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Mingdong Tianxia Network Technology Co ltd filed Critical Shenzhen Mingdong Tianxia Network Technology Co ltd
Priority to CN202310062849.0A priority Critical patent/CN116074566B/en
Publication of CN116074566A publication Critical patent/CN116074566A/en
Application granted granted Critical
Publication of CN116074566B publication Critical patent/CN116074566B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4666Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4781Games

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for recording a game video highlight, and relates to the technical field of video processing. The method comprises the steps of obtaining audio and video stream data generated in real time in a game process, obtaining a current image to be identified containing spectrum characteristic information through fast Fourier transform, frequency point amplitude coding and drawing processing based on a current audio frame in the audio and video stream data, inputting the image into a highlight frame classification model based on CNN and highlight audio frames to finish pretraining, obtaining a current classification result, and finally recording and storing at least one video frame in synchronization with the current audio frame when the highlight confidence in the classification result is greater than or equal to a preset threshold value to obtain a game video highlight frame fragment, so that highlight moment picture identification is indirectly carried out by replacing a plurality of video frames with one audio frame, and the purposes of greatly simplifying the process, improving the identification efficiency and reducing required computing resources can be achieved.

Description

一种游戏视频精彩画面录制方法、装置、设备及存储介质A method, device, equipment and storage medium for recording game video highlights

技术领域Technical field

本发明属于视频处理技术领域,具体涉及一种游戏视频精彩画面录制方法、装置、设备及存储介质。The invention belongs to the field of video processing technology, and specifically relates to a method, device, equipment and storage medium for recording exciting scenes of game videos.

背景技术Background technique

随着计算机技术的快速发展,玩家对于游戏体验有了各种各样的需求,其中,较为突出的是:玩家想要回顾观看自己在游戏中存在的精彩时刻画面,例如,连续多次击败对手(如:二连击破、三连决胜或五连绝世等)等画面,也就意味着游戏运行平台需要在游戏过程中自动识别精彩时刻画面并进行录制保存,以便在游戏结束后推送给玩家进行回顾。With the rapid development of computer technology, players have various demands for game experience. Among them, the most prominent one is: players want to review and watch their wonderful moments in the game, such as defeating opponents multiple times in a row. (such as: two consecutive defeats, three consecutive victories or five consecutive unbeatables, etc.), which means that the game running platform needs to automatically identify the wonderful moments during the game and record and save them so that they can be pushed to the player after the game is over. Do a review.

现有的精彩时刻画面识别方案主要是基于视频画面图像来识别是否为精彩时刻画面,即需要预先获取一张游戏视频目标精彩画面,然后实时地将游戏视频当前帧作为待识别图像,使用感知哈希算法分别获取目标精彩画面和待识别图像的哈希值,最后在目标精彩画面的哈希值与待识别图像的哈希值之间的距离小于预设阈值的情况下,将该待识别图像作为精彩时刻画面。但是随着游戏视频帧率的提升(例如达到120帧每秒,甚至更高),前述精彩时刻画面识别方案将存在过程繁琐和需要消耗大量计算资源的问题,因此如何提供一种可化繁为简的精彩时刻画面识别新方案,以便减少所需计算资源,是本领域技术人员亟需研究的课题。The existing highlight moment picture recognition scheme is mainly based on the video picture image to identify whether it is a highlight moment picture, that is, it is necessary to obtain a game video target highlight picture in advance, and then use the current frame of the game video as the image to be recognized in real time, using Perception Ha The hash algorithm obtains the hash value of the target highlight picture and the image to be identified respectively. Finally, when the distance between the hash value of the target highlight picture and the hash value of the image to be identified is less than the preset threshold, the image to be identified is As a wonderful moment picture. However, as the frame rate of game videos increases (for example, to 120 frames per second, or even higher), the aforementioned wonderful moment picture recognition solution will have the problem of being cumbersome and consuming a lot of computing resources. Therefore, how to provide a method that can turn the complexity into A new solution for identifying Jane's wonderful moments images in order to reduce the required computing resources is an urgent topic for those skilled in the field to study.

发明内容Contents of the invention

本发明的目的是提供一种游戏视频精彩画面录制方法、装置、计算机设备及计算机可读存储介质,用以解决现有精彩时刻画面识别方案所存过程繁琐和需要消耗大量计算资源的问题。The purpose of the present invention is to provide a method, device, computer equipment, and computer-readable storage medium for recording highlight scenes of game videos, so as to solve the problems of existing highlight scene recognition solutions that have complicated processes and consume a large amount of computing resources.

为了实现上述目的,本发明采用以下技术方案:In order to achieve the above objects, the present invention adopts the following technical solutions:

第一方面,提供了一种游戏视频精彩画面录制方法,包括:The first aspect provides a method for recording exciting scenes of game videos, including:

获取在游戏过程中实时产生的音视频流数据;Obtain audio and video stream data generated in real time during the game;

对所述音视频流数据中的当前音频帧进行快速傅立叶变换处理,得到当前频谱;Perform fast Fourier transform processing on the current audio frame in the audio and video stream data to obtain the current spectrum;

将在所述当前频谱中的且与K个频点一一对应的K个幅值分别编码为红绿蓝RGB三通道颜色值,得到包含有K个RGB值的当前待识别数据,其中,K表示不小于64的自然数,所述K个频点在人体听觉频域区间内等间距分布;The K amplitudes in the current spectrum and corresponding to the K frequency points are respectively encoded as red, green and blue RGB three-channel color values to obtain the current data to be identified containing K RGB values, where K Represents a natural number not less than 64, and the K frequency points are equally spaced in the human auditory frequency domain;

根据所述当前待识别数据的K个RGB值,绘制得到像素矩阵为k*k的当前待识别图像,其中,k为不小于K的平方根的自然数;According to the K RGB values of the current data to be recognized, draw the current image to be recognized with a pixel matrix of k*k, where k is a natural number not less than the square root of K;

将所述当前待识别图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,其中,所述精彩音频帧是指与游戏视频目标精彩画面同期的音频帧,并用于为所述精彩帧分类模型提供进行精彩帧分类训练的正样本;The current to-be-recognized image is input into the pre-trained highlight frame classification model based on the convolutional neural network CNN and the highlight audio frames to obtain the current classification results, where the highlight audio frames refer to the same period as the highlight scenes of the game video target. Audio frames, and used to provide positive samples for wonderful frame classification training for the wonderful frame classification model;

当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,其中,所述精彩置信度是指在所述当前分类结果中将所述当前音频帧分类为精彩帧的置信度。When the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold, at least one video frame in the audio and video stream data that is synchronized with the current audio frame is recorded and saved to obtain the game video highlights. Picture fragment, wherein the highlight confidence level refers to the confidence level of classifying the current audio frame as a highlight frame in the current classification result.

基于上述发明内容,提供了一种基于同期音频帧间接识别游戏精彩时刻画面的新方案,即在获取于游戏过程中实时产生的音视频流数据后,基于所述音视频流数据中的当前音频帧,通过快速傅立叶变换、频点幅值编码和绘图处理,得到包含有频谱特征信息的当前待识别图像,然后将该图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,最后在该分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,如此通过用一个音频帧来代替多个视频帧间接进行精彩时刻画面识别,可以实现大幅度简化过程、提升识别效率和降低所需计算资源的目的,便于实际应用和推广。Based on the above content of the invention, a new solution is provided to indirectly identify the highlights of the game based on audio frames of the same period. That is, after obtaining the audio and video stream data generated in real time during the game, based on the current audio in the audio and video stream data Frame, through fast Fourier transform, frequency point amplitude encoding and drawing processing, the current image to be recognized containing spectral feature information is obtained, and then the image is input into the wonderful frame based on the convolutional neural network CNN and the wonderful audio frame to complete the pre-training Classify the model to obtain the current classification result. Finally, when the highlight confidence in the classification result is greater than or equal to the preset confidence threshold, perform a test on at least one video frame in the audio and video stream data that is synchronized with the current audio frame. Record and save to get the highlight clips of the game video. In this way, by using one audio frame instead of multiple video frames to indirectly identify the highlight moments, you can greatly simplify the process, improve the recognition efficiency and reduce the required computing resources, which is convenient for practical use. application and promotion.

在一个可能的设计中,将与K个频点一一对应的K个幅值分别编码为红绿蓝RGB三通道颜色值,包括:In a possible design, K amplitudes corresponding to K frequency points are encoded into red, green, and blue RGB three-channel color values, including:

通过变换数值单位的方式,将所述K个幅值变换为在同一数值单位下且分别在区间[0,16777215]内的待转数值;By converting numerical units, the K amplitudes are converted into numerical values to be converted that are in the same numerical unit and are respectively within the interval [0,16777215];

将所述待转数值从十进制数字转换为二进制数字;Convert the value to be converted from a decimal number into a binary number;

从左至右对所述二进制数字进行补0,得到24位二进制数字;Add 0 to the binary number from left to right to obtain a 24-bit binary number;

将所述24位二进制数字中的前8位二进制数字转换为十进制数字,得到所述红绿蓝RGB三通道颜色值中的红色通道颜色值;Convert the first 8 binary digits of the 24-bit binary digits into decimal digits to obtain the red channel color value of the red, green, and blue RGB three-channel color values;

将所述24位二进制数字中的中8位二进制数字转换为十进制数字,得到所述红绿蓝RGB三通道颜色值中的绿色通道颜色值;Convert the middle 8 binary digits of the 24-bit binary digits into decimal digits to obtain the green channel color value of the red, green, and blue RGB three-channel color values;

将所述24位二进制数字中的后8位二进制数字转换为十进制数字,得到所述红绿蓝RGB三通道颜色值中的蓝色通道颜色值。Convert the last 8 binary digits of the 24-bit binary digits into decimal digits to obtain the blue channel color value of the red, green, and blue RGB three-channel color values.

在一个可能的设计中,所述CNN采用Resnet50网络结构、Mobile-net网络结构或VGG16网络结构。In a possible design, the CNN adopts Resnet50 network structure, Mobile-net network structure or VGG16 network structure.

在一个可能的设计中,当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,包括:In a possible design, when the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold, at least one video frame in the audio and video stream data that is synchronized with the current audio frame is Record and save to get exciting game video clips, including:

当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,判断位于在前最近精彩帧与所述当前音频帧之间的音频帧数是否等于零,其中,所述精彩置信度是指在所述当前分类结果中将所述当前音频帧分类为精彩帧的置信度,所述在前最近精彩帧是指在所述音视频流数据中位于所述当前音频帧之前的且对应精彩置信度大于等于所述预设置信度阈值的音频帧;When the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold, it is determined whether the number of audio frames between the most recent highlight frame and the current audio frame is equal to zero, where the highlight confidence is Refers to the confidence of classifying the current audio frame as a highlight frame in the current classification result. The most recent highlight frame refers to the highlight frame that is located before the current audio frame in the audio and video stream data and corresponds to the highlight frame. Audio frames whose confidence is greater than or equal to the preset confidence threshold;

若判定所述音频帧数等于零,则对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,否则进一步判断所述音频帧数是否大于等于预设帧数阈值;If it is determined that the number of audio frames is equal to zero, then record and save at least one video frame in the audio and video stream data that is synchronized with the current audio frame to obtain a game video highlight segment, otherwise it is further determined that the audio frame Whether the number is greater than or equal to the preset frame number threshold;

若判定所述音频帧数大于等于所述预设帧数阈值,则对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,否则对所述音视频流数据中的且与中间音频帧和所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,其中,所述中间音频帧是指在所述音视频流数据中位于所述在前最近精彩帧与所述当前音频帧之间的至少一个音频帧。If it is determined that the number of audio frames is greater than or equal to the preset frame number threshold, at least one video frame in the audio and video stream data that is synchronized with the current audio frame is recorded and saved to obtain a game video highlight segment. , otherwise record and save at least one video frame in the audio and video stream data that is synchronized with the intermediate audio frame and the current audio frame to obtain the game video highlight clip, wherein the intermediate audio frame refers to the At least one audio frame located between the most recent highlight frame and the current audio frame in the audio and video stream data.

在一个可能的设计中,在得到游戏视频精彩画面片段之后,所述方法还包括:In a possible design, after obtaining the highlight clips of the game video, the method further includes:

判断在前最近的游戏视频精彩画面片段与最新得到的游戏视频精彩画面片段在时序上是否连续;Determine whether the most recent game video highlight clip and the latest game video highlight clip are sequentially consecutive;

若判定在时序上连续,则将两个游戏视频精彩画面片段合并为一个游戏视频精彩画面片段,否则进一步判断所述在前最近的游戏视频精彩画面片段的时长是否小于等于预设时长阈值;If it is determined that they are continuous in time sequence, merge the two game video highlight clips into one game video highlight clip, otherwise further determine whether the duration of the most recent game video highlight clip is less than or equal to the preset duration threshold;

若判定所述时长小于等于所述预设时长阈值,则删除保存的所述在前最近的游戏视频精彩画面片段。If it is determined that the duration is less than or equal to the preset duration threshold, then the saved most recent game video highlights are deleted.

在一个可能的设计中,所述方法还包括:In a possible design, the method further includes:

在游戏结束时汇总在游戏过程中录制的所有游戏视频精彩画面片段,得到至少一个游戏视频精彩画面片段;At the end of the game, all the game video highlight clips recorded during the game are collected to obtain at least one game video highlight clip;

针对所述至少一个游戏视频精彩画面片段中的各个游戏视频精彩画面片段,按照如下公式累加计算得到对应的精彩置信度总和:For each of the game video highlight clips in the at least one game video highlight clip, the corresponding highlight confidence sum is calculated cumulatively according to the following formula:

式中,k表示正整数,GCTk表示在所述至少一个游戏视频精彩画面片段中的第k个游戏视频精彩画面片段的精彩置信度总和,Nk表示与所述第k个游戏视频精彩画面片段一起同期采集的多个音频帧的总帧数,n表示正整数,GCk,n表示在所述多个音频帧中的第n个音频帧的精彩置信度;In the formula, k represents a positive integer, GCT k represents the sum of the confidence levels of the k-th game video highlight segment in the at least one game video highlight segment, and N k represents the difference between the k-th game video highlight segment and the k-th game video highlight segment. The total number of frames of multiple audio frames collected simultaneously by the clip, n represents a positive integer, GC k, n represents the highlight confidence of the nth audio frame among the multiple audio frames;

按照精彩置信度总和从高至低顺序对所述至少一个游戏视频精彩画面片段进行排序,得到游戏视频精彩画面片段序列;Sort the at least one game video highlight clip in descending order according to the sum of the highlight confidence levels to obtain a sequence of game video highlight clips;

将所述游戏视频精彩画面片段序列中的前M个游戏视频精彩画面片段推送给游戏玩家,其中,M表示小于等于K的预设正整数,K表示所述游戏视频精彩画面片段序列的总片段数。Push the first M game video highlight clips in the game video highlight clip sequence to game players, where M represents a preset positive integer less than or equal to K, and K represents the total clips of the game video highlight clip sequence. number.

在一个可能的设计中,在得到游戏视频精彩画面片段之后,所述方法还包括:In a possible design, after obtaining the highlight clips of the game video, the method further includes:

从所述游戏视频精彩画面片段中随机抽取一个视频帧;Randomly select a video frame from the highlight footage of the game video;

采用感知哈希算法对所述视频帧进行图像处理,得到所述视频帧的图像指纹信息;Perform image processing on the video frame using a perceptual hashing algorithm to obtain image fingerprint information of the video frame;

判断所述视频帧的图像指纹信息与游戏视频目标精彩画面的图像指纹信息的相异数据位数是否大于等于预设位数阈值;Determine whether the number of different data digits between the image fingerprint information of the video frame and the image fingerprint information of the game video target highlight is greater than or equal to a preset digit threshold;

若是,则删除已录制保存的所述游戏视频精彩画面片段。If so, delete the recorded and saved highlight footage of the game video.

第二方面,提供了一种游戏视频精彩画面录制装置,包括有数据获取模块、傅立叶变换处理模块、频点幅值编码模块、待识别图像绘制模块、精彩帧分类模块和视频帧保存模块;In the second aspect, a device for recording highlights of game videos is provided, including a data acquisition module, a Fourier transform processing module, a frequency point amplitude coding module, an image drawing module to be recognized, a highlight frame classification module and a video frame saving module;

所述数据获取模块,用于获取在游戏过程中实时产生的音视频流数据;The data acquisition module is used to acquire audio and video stream data generated in real time during the game;

所述傅立叶变换处理模块,通信连接所述数据获取模块,用于对所述音视频流数据中的当前音频帧进行快速傅立叶变换处理,得到当前频谱;The Fourier transform processing module is communicatively connected to the data acquisition module, and is used to perform fast Fourier transform processing on the current audio frame in the audio and video stream data to obtain the current spectrum;

所述频点幅值编码模块,通信连接所述傅立叶变换处理模块,用于将在所述当前频谱中的且与K个频点一一对应的K个幅值分别编码为红绿蓝RGB三通道颜色值,得到包含有K个RGB值的当前待识别数据,其中,K表示不小于64的自然数,所述K个频点在人体听觉频域区间内等间距分布;The frequency point amplitude encoding module is communicatively connected to the Fourier transform processing module, and is used to encode the K amplitude values in the current spectrum and corresponding to the K frequency points into red, green and blue RGB respectively. The channel color value is used to obtain the current data to be identified containing K RGB values, where K represents a natural number not less than 64, and the K frequency points are equally spaced in the human auditory frequency domain;

所述待识别图像绘制模块,通信连接所述频点幅值编码模块,用于根据所述当前待识别数据的K个RGB值,绘制得到像素矩阵为k*k的当前待识别图像,其中,k为不小于K的平方根的自然数;The image drawing module to be identified is communicatively connected to the frequency point amplitude encoding module, and is used to draw the current image to be identified with a pixel matrix of k*k according to the K RGB values of the current data to be identified, where, k is a natural number not less than the square root of K;

所述精彩帧分类模块,通信连接所述待识别图像绘制模块,用于将所述当前待识别图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,其中,所述精彩音频帧是指与游戏视频目标精彩画面同期的音频帧,并用于为所述精彩帧分类模型提供进行精彩帧分类训练的正样本;The highlight frame classification module is communicatively connected to the to-be-recognized image drawing module, and is used to input the current to-be-recognized image into a pre-trained highlight frame classification model based on the convolutional neural network CNN and highlight audio frames to obtain the current classification. As a result, the highlight audio frame refers to the audio frame that is in the same period as the target highlight picture of the game video, and is used to provide the highlight frame classification model with positive samples for highlight frame classification training;

所述视频帧保存模块,分别通信连接所述数据获取模块和精彩帧分类模块,用于当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,其中,所述精彩置信度是指在所述当前分类结果中将所述当前音频帧分类为精彩帧的置信度。The video frame saving module is respectively connected to the data acquisition module and the highlight frame classification module by communication, and is used to store the audio and video stream data when the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold. At least one video frame that is in the same period as the current audio frame is recorded and saved to obtain a highlight video clip of the game video, wherein the highlight confidence level refers to classifying the current audio frame as in the current classification result. Confidence of highlight frames.

第三方面,本发明提供了一种计算机设备,包括有依次通信连接的存储器、处理器和收发器,其中,所述存储器用于存储计算机程序,所述收发器用于收发语音信号,所述处理器用于读取所述计算机程序,执行如第一方面或第一方面中任意可能设计所述的游戏视频精彩画面录制方法。In a third aspect, the present invention provides a computer device, including a memory, a processor and a transceiver connected in sequence, wherein the memory is used to store computer programs, the transceiver is used to send and receive voice signals, and the processing The computer program is configured to read the computer program and execute the game video highlight recording method as described in the first aspect or any possible design in the first aspect.

第四方面,本发明提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,当所述指令在计算机上运行时,执行如第一方面或第一方面中任意可能设计所述的游戏视频精彩画面录制方法。In a fourth aspect, the present invention provides a computer-readable storage medium. Instructions are stored on the computer-readable storage medium. When the instructions are run on a computer, the invention executes the first aspect or any possibility in the first aspect. Design the method for recording the wonderful scenes of game videos.

第五方面,本发明提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使所述计算机执行如第一方面或第一方面中任意可能设计所述的游戏视频精彩画面录制方法。In a fifth aspect, the present invention provides a computer program product containing instructions that, when the instructions are run on a computer, cause the computer to execute the game video highlights described in the first aspect or any possible design in the first aspect. Screen recording method.

上述方案的有益效果:Beneficial effects of the above scheme:

(1)本发明创造性提供了一种基于同期音频帧间接识别游戏精彩时刻画面的新方案,即在获取于游戏过程中实时产生的音视频流数据后,基于所述音视频流数据中的当前音频帧,通过快速傅立叶变换、频点幅值编码和绘图处理,得到包含有频谱特征信息的当前待识别图像,然后将该图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,最后在该分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,如此通过用一个音频帧来代替多个视频帧间接进行精彩时刻画面识别,可以实现大幅度简化过程、提升识别效率和降低所需计算资源的目的,便于实际应用和推广;(1) The present invention creatively provides a new solution for indirectly identifying the highlights of the game based on audio frames of the same period. That is, after obtaining the audio and video stream data generated in real time during the game, based on the current audio and video stream data in the For audio frames, through fast Fourier transform, frequency point amplitude encoding and drawing processing, the current image to be recognized containing spectral feature information is obtained, and then the image is input into the convolutional neural network CNN and the wonderful audio frame to complete the pre-trained wonderful The frame classification model obtains the current classification result. Finally, when the highlight confidence in the classification result is greater than or equal to the preset confidence threshold, at least one video frame in the audio and video stream data that is synchronized with the current audio frame is Record and save to get the highlight clips of the game video. In this way, by using one audio frame instead of multiple video frames to indirectly identify the highlight moments, you can greatly simplify the process, improve the recognition efficiency and reduce the required computing resources, which is convenient for Practical application and promotion;

(2)还可以合并相邻片段并剔除孤立瞬时片段,实现避免录制所得游戏视频精彩画面片段过于零碎化的目的,以便向玩家推送具有高回顾价值的内容;(2) Adjacent clips can also be merged and isolated instantaneous clips eliminated to prevent the recorded game video clips from being too fragmented, so as to push content with high review value to players;

(3)还可以在游戏结束时针对在游戏过程中录制的所有游戏视频精彩画面片段,分别计算得到各个片段的回顾价值,并基于回顾价值向游戏玩家推送最具有回顾价值的内容,如此可以进一步提升玩家的游戏体验,便于实际应用和推广。(3) At the end of the game, you can also calculate the retrospective value of each clip for all the game video highlights recorded during the game, and push the content with the most retrospective value to the game players based on the retrospective value. This can further Improve players' gaming experience and facilitate practical application and promotion.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1为本申请实施例提供的游戏视频精彩画面录制方法的流程示意图。Figure 1 is a schematic flowchart of a method for recording game video highlights provided by an embodiment of the present application.

图2为本申请实施例提供的游戏视频精彩画面录制装置的结构示意图。Figure 2 is a schematic structural diagram of a device for recording game video highlights provided by an embodiment of the present application.

图3为本申请实施例提供的计算机设备的结构示意图。Figure 3 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将结合附图和实施例或现有技术的描述对本发明作简单地介绍,显而易见地,下面关于附图结构的描述仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在此需要说明的是,对于这些实施例方式的说明用于帮助理解本发明,但并不构成对本发明的限定。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly introduced below in conjunction with the accompanying drawings and the description of the embodiments or the prior art. Obviously, the following description of the structure of the drawings is only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts. It should be noted here that the description of these embodiments is used to help understand the present invention, but does not constitute a limitation of the present invention.

应当理解,尽管本文可能使用术语第一和第二等等来描述各种对象,但是这些对象不应当受到这些术语的限制。这些术语仅用于区分一个对象和另一个对象。例如可以将第一对象称作第二对象,并且类似地可以将第二对象称作第一对象,同时不脱离本发明的示例实施例的范围。It should be understood that although the terms first, second, etc. may be used herein to describe various objects, these objects should not be limited by these terms. These terms are only used to distinguish one object from another. For example, a first object may be referred to as a second object, and similarly a second object may be referred to as a first object, without departing from the scope of example embodiments of the invention.

应当理解,对于本文中可能出现的术语“和/或”,其仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A、单独存在B或者同时存在A和B等三种情况;又例如,A、B和/或C,可以表示存在A、B和C中的任意一种或他们的任意组合;对于本文中可能出现的术语“/和”,其是描述另一种关联对象关系,表示可以存在两种关系,例如,A/和B,可以表示:单独存在A或者同时存在A和B等两种情况;另外,对于本文中可能出现的字符“/”,一般表示前后关联对象是一种“或”关系。It should be understood that the term "and/or" that may appear in this article is only an association relationship describing related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, The existence of B alone or the existence of A and B at the same time; for another example, A, B and/or C can mean the existence of any one of A, B and C or any combination of them; for what may appear in this article The term "/and" describes another type of associated object relationship, indicating that there can be two relationships, for example, A/and B, which can mean: A alone exists or A and B exist simultaneously; in addition, for The character "/" that may appear in this article generally indicates that the related objects are an "or" relationship.

实施例:Example:

如图1所示,本实施例第一方面提供的所述游戏视频精彩画面录制方法,可以但不限于由具有一定计算资源的且能够运行游戏程序的计算机设备执行,例如由平台服务器、个人计算机(Personal Computer,PC,指一种大小、价格和性能适用于个人使用的多用途计算机;台式机、笔记本电脑到小型笔记本电脑和平板电脑以及超级本等都属于个人计算机)、智能手机、个人数字助理(Personal Digital Assistant,PDA)或可穿戴设备等电子设备执行。如图1所示,所述游戏视频精彩画面录制方法,可以但不限于包括有如下步骤S1~S6。As shown in Figure 1, the method for recording game video highlights provided in the first aspect of this embodiment can be, but is not limited to, executed by a computer device with certain computing resources and capable of running game programs, such as a platform server, a personal computer (Personal Computer, PC refers to a multi-purpose computer with a size, price and performance suitable for personal use; desktops, laptops to small laptops and tablets and ultrabooks are all personal computers), smart phones, personal digital It is executed by electronic devices such as Personal Digital Assistant (PDA) or wearable devices. As shown in Figure 1, the method for recording game video highlights may include, but is not limited to, the following steps S1 to S6.

S1.获取在游戏过程中实时产生的音视频流数据。S1. Obtain the audio and video stream data generated in real time during the game.

在所述步骤S1中,由于为了向玩家提供基本的游戏沉浸感,现有游戏程序在游戏过程中都会向玩家实时展示游戏视频画面和与该游戏视频画面同期的游戏音频声音,因此可以通过现有常规技术获取在游戏过程中实时产生的且用于展示前述游戏视频画面和游戏音频声音的所述音视频流数据。In step S1, in order to provide players with a basic sense of game immersion, existing game programs will display the game video screen and the game audio sound synchronized with the game video screen to the player in real time during the game process. There are conventional techniques to obtain the audio and video stream data generated in real time during the game and used to display the aforementioned game video images and game audio sounds.

S2.对所述音视频流数据中的当前音频帧进行快速傅立叶变换处理,得到当前频谱。S2. Perform fast Fourier transform processing on the current audio frame in the audio and video stream data to obtain the current spectrum.

在所述步骤S2中,所述当前音频帧即为用于展示当前游戏音频声音的最小单位数据,其可以具体但不限于为AAC(Advanced Audio Coding,高级音频编码)音频帧或MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面三)音频帧,前者含有1024个采样点,后者含有1152个采样点,因此可以通过常规方式从所述音视频流数据中获取所述当前音频帧。由于所述当前音频帧的帧播放时长与音频采样率负相关(即音频采样率越高,帧播放时长越短),例如当音频采样率为48kHz时,帧播放时长为21.32毫秒,而当音频采样率为20.05kHz时,帧播放时长为46.43毫秒,因此为了后续能够代替更多的视频帧去进行精彩时刻画面识别,所述当前音频帧的音频采样率优选采用较低采样率,例如采用8kHz或11.025kHz。此外,所述快速傅立叶变换处理为现有常用的信号处理方式,于此不再赘述。In the step S2, the current audio frame is the smallest unit of data used to display the current game audio sound, which may be specifically, but not limited to, an AAC (Advanced Audio Coding, Advanced Audio Coding) audio frame or an MP3 (Moving Picture Experts Group Audio Layer III, dynamic image experts compress standard audio layer III) audio frame. The former contains 1024 sampling points, and the latter contains 1152 sampling points. Therefore, the current audio frame can be obtained from the audio and video stream data in a conventional manner. audio frame. Since the frame playback duration of the current audio frame is negatively related to the audio sampling rate (that is, the higher the audio sampling rate, the shorter the frame playback duration), for example, when the audio sampling rate is 48kHz, the frame playback duration is 21.32 milliseconds, and when the audio When the sampling rate is 20.05kHz, the frame playback duration is 46.43 milliseconds. Therefore, in order to be able to replace more video frames for highlight moment picture recognition in the future, the audio sampling rate of the current audio frame is preferably a lower sampling rate, such as 8kHz. or 11.025kHz. In addition, the fast Fourier transform processing is an existing commonly used signal processing method, and will not be described again here.

S3.将在所述当前频谱中的且与K个频点一一对应的K个幅值分别编码为红绿蓝RGB三通道颜色值,得到包含有K个RGB值的当前待识别数据,其中,K表示不小于64的自然数,所述K个频点在人体听觉频域区间内等间距分布。S3. Encode the K amplitudes in the current spectrum and one-to-one correspondence with the K frequency points into red, green, and blue RGB three-channel color values to obtain the current data to be identified containing K RGB values, where , K represents a natural number not less than 64, and the K frequency points are equally spaced in the human auditory frequency domain.

在所述步骤S3中,由于所述当前音频帧是用于向玩家展示当前游戏音频声音,因此与诸如二连击破、三连决胜或五连绝世等精彩时刻画面同期的精彩时刻声音的频谱特征也必然位于人体听觉频域区间,进而优选设置所述K个频点在所述人体听觉频域区间内等间距分布;所述人体听觉频域区间一般为20Hz~20kHz,若举例设频率间隔为10Hz,可使得K取值为1999,满足不小于64的设计需求。此外,具体的编码方式可以但不限于包括:通过变换数值单位的方式,将所述K个幅值变换为在同一数值单位下且分别在区间[0,16777215]内的待转数值;将所述待转数值从十进制数字转换为二进制数字;从左至右对所述二进制数字进行补0,得到24位二进制数字;将所述24位二进制数字中的前8位二进制数字转换为十进制数字,得到所述红绿蓝RGB三通道颜色值中的红色通道颜色值;将所述24位二进制数字中的中8位二进制数字转换为十进制数字,得到所述红绿蓝RGB三通道颜色值中的绿色通道颜色值;将所述24位二进制数字中的后8位二进制数字转换为十进制数字,得到所述红绿蓝RGB三通道颜色值中的蓝色通道颜色值。In the step S3, since the current audio frame is used to display the current game audio sound to the player, the spectrum of the sound of the wonderful moment in the same period as the picture of the wonderful moment such as two consecutive defeats, three consecutive victories or five consecutive unparalleled moments. The characteristics must also be located in the human hearing frequency domain, and then it is preferable to set the K frequency points to be equally spaced in the human hearing frequency domain; the human hearing frequency range is generally 20Hz ~ 20kHz, if the frequency interval is set as an example is 10Hz, the value of K can be made to be 1999, which meets the design requirements of no less than 64. In addition, specific encoding methods may include, but are not limited to: converting the K amplitudes into numerical values to be converted under the same numerical unit and within the interval [0,16777215] by converting numerical units; The numerical value to be converted is converted from a decimal number to a binary number; the binary number is supplemented with 0s from left to right to obtain a 24-bit binary number; the first 8 binary digits in the 24-bit binary number are converted into a decimal number, Obtain the red channel color value in the red, green and blue RGB three-channel color values; convert the 8-bit binary digits in the 24-bit binary numbers into decimal numbers to obtain the red, green, and blue RGB three-channel color values. Green channel color value; convert the last 8 binary digits of the 24-bit binary digits into decimal digits to obtain the blue channel color value of the red, green, and blue RGB three-channel color values.

S4.根据所述当前待识别数据的K个RGB值,绘制得到像素矩阵为k*k的当前待识别图像,其中,k为不小于K的平方根的自然数。S4. According to the K RGB values of the current data to be recognized, draw the current image to be recognized with a pixel matrix of k*k, where k is a natural number not less than the square root of K.

在所述步骤S4中,绘制所述当前待识别图像的具体方式可以但不限于为将第个RGB值作为第/>行且第/>列像素的RGB值,其中,/>为介于1~K之间的自然数,floor()表示下取整函数,至于其它的像素,可以采取补零或补平均RGB值等常用方式进行补值处理,从而可以得到一个矩形的待识别图像,例如针对有1999个幅值的情况,可以得到一个像素矩阵为45*45的待识别图像;而针对有396个幅值的情况,可以得到一个像素矩阵为20*20的待识别图像。此外,考虑初始得到的待识别图像的尺寸大小可能过小,对于基于卷积神经网络的分类效果不太理想,因此在得到所述当前待识别图像后,所述方法还包括:当K小于预设数量阈值时,对所述当前待识别图像分别进行放大处理,得到对应的且具有标准尺寸大小的待识别图像。例如放大得到尺寸大小为64*64的图像。In the step S4, the specific way of drawing the current image to be recognized may be, but is not limited to, the first RGB value as the /> OK and No./> RGB values of column pixels, where, /> It is a natural number between 1 and K, and floor() represents the lower rounding function. As for other pixels, common methods such as zero padding or average RGB value padding can be used to complement the value, so that a rectangle to be identified can be obtained. For example, if there are 1999 amplitude values, an image to be recognized can be obtained with a pixel matrix of 45*45; and if there are 396 amplitude values, an image to be recognized can be obtained with a pixel matrix of 20*20. In addition, considering that the size of the initially obtained image to be identified may be too small, the classification effect based on the convolutional neural network is not ideal. Therefore, after obtaining the current image to be identified, the method also includes: when K is smaller than the predetermined When a quantity threshold is set, the current images to be recognized are enlarged to obtain corresponding images to be recognized that have standard sizes. For example, zoom in to obtain an image with a size of 64*64.

S5.将所述当前待识别图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,其中,所述精彩音频帧是指与游戏视频目标精彩画面同期的音频帧,并用于为所述精彩帧分类模型提供进行精彩帧分类训练的正样本。S5. Input the current image to be recognized into the pre-trained highlight frame classification model based on the convolutional neural network CNN and the highlight audio frame to obtain the current classification result, where the highlight audio frame refers to the highlight picture of the game video target Audio frames of the same period are used to provide positive samples for wonderful frame classification training for the wonderful frame classification model.

在所述步骤S5中,所述卷积神经网络CNN(Convolutional Neural Networks)是一种基于卷积计算且具有深度结构的现有前馈神经网络(Feedforward Neural Networks),包括但不限于有输入层、卷积层、激活层、池化层、全连接层和输出层,并可通过输出层使用归一化指数Softmax函数来做图像识别的分类;具体的,所述CNN可以但不限于采用Resnet50网络结构、Mobile-net网络结构或VGG16网络结构等。所述精彩音频帧需要提前获取,例如可将与诸如二连击破、三连决胜或五连绝世等游戏视频目标精彩画面同期的音频帧作为所述精彩音频帧。所述精彩音频帧用于为所述精彩帧分类模型提供进行精彩帧分类训练的正样本的具体方式,包括但不限于有:对所述精彩音频帧进行所述快速傅立叶变换处理,得到精彩音频谱;然后将在所述精彩音频谱中的且与所述K个频点一一对应的K个幅值分别编码为红绿蓝RGB三通道颜色值,得到包含有K个RGB值的正样本数据;最后根据所述正样本数据的K个RGB值,绘制得到像素矩阵为k*k的正样本图像。所述精彩帧分类模型的具体预训练方式,可以但不限于包括:将基于不同的多个所述精彩音频帧而得的多个所述正样本图像输入基于所述CNN的分类模型中进行训练,并在训练过程中出现训练集准确率达到预设高值区间且变化幅度小于预设幅度阈值的情况时,采用自适应梯度AdaGrad算法调整学习率,然后继续训练,直到学习率调整幅度小于预设调整阈值时,停止训练,得到已完成训练的精彩帧分类模型;前述自适应梯度AdaGrad算法为现有算法,即利用每次迭代历史的梯度平方根的和来修改学习率。此外,还可以参照前述正样本的提供方式,利用一些非精彩音频帧来为所述精彩帧分类模型提供进行精彩帧分类训练的负样本图像,并在模型训练过程中,也将这些负样本图像输入基于所述CNN的分类模型中进行训练,以便确保后续精彩帧分类的正确性;以及还可以将一些正样本图像和负样本图像组成测试样本集,以便在模型训练完成后,利用所述测试样本集对所述已完成训练的精彩帧分类模型进行精彩帧分类测试,验证是否可以准确进行精彩帧分类。In the step S5, the convolutional neural network CNN (Convolutional Neural Networks) is an existing feedforward neural network (Feedforward Neural Networks) based on convolution calculation and has a deep structure, including but not limited to having an input layer , convolution layer, activation layer, pooling layer, fully connected layer and output layer, and the normalized index Softmax function can be used through the output layer to perform image recognition classification; specifically, the CNN can, but is not limited to, use Resnet50 Network structure, Mobile-net network structure or VGG16 network structure, etc. The wonderful audio frames need to be obtained in advance. For example, the audio frames that are in the same period as the wonderful scenes of game video targets such as two consecutive defeats, three consecutive wins, or five consecutive unparalleled games can be used as the wonderful audio frames. The wonderful audio frames are used to provide positive samples for wonderful frame classification training for the wonderful frame classification model, including but not limited to: performing the fast Fourier transform processing on the wonderful audio frames to obtain the wonderful audio frames. Spectrum; then the K amplitudes in the wonderful audio spectrum and corresponding to the K frequency points are respectively encoded into red, green and blue RGB three-channel color values to obtain positive samples containing K RGB values. data; finally, according to the K RGB values of the positive sample data, a positive sample image with a pixel matrix of k*k is drawn. The specific pre-training method of the highlight frame classification model may, but is not limited to, include: inputting multiple positive sample images based on different multiple highlight audio frames into the classification model based on the CNN for training , and when the accuracy of the training set reaches the preset high value range and the change amplitude is less than the preset amplitude threshold during the training process, the adaptive gradient AdaGrad algorithm is used to adjust the learning rate, and then continues training until the learning rate adjustment amplitude is less than the preset amplitude threshold. When adjusting the threshold, training is stopped and a wonderful frame classification model that has completed training is obtained; the aforementioned adaptive gradient AdaGrad algorithm is an existing algorithm, which uses the sum of the square roots of the gradients in each iteration history to modify the learning rate. In addition, you can also refer to the aforementioned method of providing positive samples, and use some non-highlight audio frames to provide negative sample images for the wonderful frame classification model for the wonderful frame classification training, and during the model training process, these negative sample images are also used Enter the classification model based on the CNN for training to ensure the correctness of subsequent highlight frame classification; and some positive sample images and negative sample images can also be composed into a test sample set, so that after the model training is completed, the test can be used The sample set performs a highlight frame classification test on the highlighted frame classification model that has completed training to verify whether the highlighted frame classification can be accurately performed.

S6.当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,其中,所述精彩置信度是指在所述当前分类结果中将所述当前音频帧分类为精彩帧的置信度。S6. When the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold, record and save at least one video frame in the audio and video stream data that is synchronized with the current audio frame to obtain the game A video highlight segment, wherein the highlight confidence level refers to the confidence level of classifying the current audio frame as a highlight frame in the current classification result.

在所述步骤S6中,所述置信度为分类识别后输出的常规信息,所述预设置信度阈值用于作为是否将所述当前音频帧分类为精彩帧的判断依据,可以举例为50%。若发现所述精彩置信度大于等于所述预设置信度阈值,则表示可将所述当前音频帧分类为精彩帧,此时反映在所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧为游戏视频精彩画面,因此需要将所述至少一个视频帧作为一个游戏视频精彩画面片段录制保存起来。由于所述当前音频帧的帧播放时长较长,举例为46.43毫秒,因此如果在所述音视频流数据中的视频帧率为120帧每秒,那么所述至少一个视频帧的帧数会在6帧左右,进而可以用一个音频帧来代替这约6个视频帧间接进行精彩时刻画面识别,实现大幅度简化过程、提升识别效率和降低所需计算资源的目的。In step S6, the confidence level is conventional information output after classification and recognition. The preset confidence level threshold is used as a basis for judging whether to classify the current audio frame as a highlight frame, which can be 50% for example. . If it is found that the highlight confidence level is greater than or equal to the preset confidence threshold, it means that the current audio frame can be classified as a highlight frame. At this time, the content reflected in the audio and video stream data is different from the current audio frame. At least one video frame in the same period is a game video highlight, so the at least one video frame needs to be recorded and saved as a game video highlight segment. Since the frame playback duration of the current audio frame is long, for example, 46.43 milliseconds, if the video frame rate in the audio and video stream data is 120 frames per second, then the number of frames of the at least one video frame will be About 6 frames can be used to replace these 6 video frames with one audio frame to indirectly identify the highlight moments, thus greatly simplifying the process, improving recognition efficiency, and reducing the required computing resources.

在所述步骤S6中,考虑一个音频帧的帧播放时长仅为数十毫秒,而一个诸如二连击破、三连决胜或五连绝世等的游戏视频精彩画面会持续数秒甚至以上,为了避免因在这数秒及以上时间内有部分音频帧识别为非精彩帧而缺失录制与这部分音频帧同期的视频帧,以便确保游戏视频精彩画面的完整性,优选的,当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,包括但不限于有如下步骤S61~S63。In the step S6, it is considered that the frame playback time of an audio frame is only tens of milliseconds, while a game video highlight such as two-consecutive defeat, three-consecutive victory or five-consecutive peerless scene will last for several seconds or even more. In order to avoid Because some audio frames are identified as non-highlight frames within these seconds or more, the video frames synchronized with these audio frames are omitted to ensure the integrity of the game video highlights. Preferably, when the current classification result is When the highlight confidence is greater than or equal to the preset confidence threshold, record and save at least one video frame in the audio and video stream data that is synchronized with the current audio frame to obtain game video highlight segments, including but not limited to There are the following steps S61 to S63.

S61.当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,判断位于在前最近精彩帧与所述当前音频帧之间的音频帧数是否等于零,其中,所述精彩置信度是指在所述当前分类结果中将所述当前音频帧分类为精彩帧的置信度,所述在前最近精彩帧是指在所述音视频流数据中位于所述当前音频帧之前的且对应精彩置信度大于等于所述预设置信度阈值的音频帧。S61. When the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold, determine whether the number of audio frames between the most recent highlight frame and the current audio frame is equal to zero, where the highlight confidence The degree refers to the confidence of classifying the current audio frame as a highlight frame in the current classification result, and the most recent highlight frame refers to the audio and video stream data that is located before and before the current audio frame. Audio frames corresponding to a highlight confidence level greater than or equal to the preset confidence threshold.

在所述步骤S61中,所述在前最近精彩帧即为最近确定为精彩帧的历史音频帧。In the step S61, the previous most recent wonderful frame is the historical audio frame that was recently determined to be the wonderful frame.

S62.若判定所述音频帧数等于零,则对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,否则进一步判断所述音频帧数是否大于等于预设帧数阈值。S62. If it is determined that the number of audio frames is equal to zero, then record and save at least one video frame in the audio and video stream data that is synchronized with the current audio frame to obtain a highlight segment of the game video, otherwise further determine that the Whether the number of audio frames is greater than or equal to the preset frame number threshold.

在所述步骤S62中,所述预设帧数阈值可以根据音频帧播放时长以及游戏视频目标精彩画面的一般维持时长来提前确定,例如当音频帧播放时长为46.43毫秒时,所述预设帧数阈值可以举例设置为21帧。In step S62, the preset frame number threshold can be determined in advance based on the audio frame playback duration and the general maintenance duration of the game video target highlight screen. For example, when the audio frame playback duration is 46.43 milliseconds, the preset frame The number threshold can be set to 21 frames, for example.

S63.若判定所述音频帧数大于等于所述预设帧数阈值,则对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,否则对所述音视频流数据中的且与中间音频帧和所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,其中,所述中间音频帧是指在所述音视频流数据中位于所述在前最近精彩帧与所述当前音频帧之间的至少一个音频帧。S63. If it is determined that the number of audio frames is greater than or equal to the preset frame number threshold, record and save at least one video frame in the audio and video stream data that is synchronized with the current audio frame to obtain game video highlights. Picture fragments, otherwise record and save at least one video frame in the audio and video stream data that is synchronized with the intermediate audio frame and the current audio frame to obtain the game video exciting picture fragments, wherein the intermediate audio frame refers to At least one audio frame located between the most recent highlight frame and the current audio frame in the audio and video stream data.

在所述步骤S63中,所述中间音频帧即为在游戏视频精彩画面维持时间内被识别为非精彩帧的音频帧,因此在一定条件下,通过对所述音视频流数据中的且与所述中间音频帧和所述当前音频帧同期的至少一个视频帧进行录制保存,可以确保游戏视频精彩画面的完整性。In the step S63, the intermediate audio frame is an audio frame that is identified as a non-highlight frame within the game video highlight screen maintenance time. Therefore, under certain conditions, by comparing the AND in the audio and video stream data Recording and saving the intermediate audio frame and at least one video frame in the same period as the current audio frame can ensure the integrity of the wonderful scenes of the game video.

在所述步骤S6之后,还考虑是用音频帧来代替同期的多个视频帧间接进行精彩时刻画面识别,而为了确保这种间接识别结果的准确性,有必要在得到所述游戏视频精彩画面片段后进行抽帧检验,即优选的,在得到游戏视频精彩画面片段之后,所述方法还包括但不限于有如下步骤S71~S74:S71.从所述游戏视频精彩画面片段中随机抽取一个视频帧;S72.采用感知哈希算法对所述视频帧进行图像处理,得到所述视频帧的图像指纹信息;S73.判断所述视频帧的图像指纹信息与游戏视频目标精彩画面的图像指纹信息的相异数据位数是否大于等于预设位数阈值;S74.若是,则删除已录制保存的所述游戏视频精彩画面片段。前述感知哈希算法为一种用于对图像数据进行指纹信息生成的现有算法,其原理于此不再赘述。基于前述步骤S71~S74,可以抽帧校验所述游戏视频精彩画面片段是否为游戏精彩时刻画面,若是则继续保留,否则(即所述视频帧的图像指纹信息与游戏视频目标精彩画面的图像指纹信息差异较大)需要删除该游戏视频精彩画面片段。此外,所述预设位数阈值可以举例为10。After the step S6, it is also considered to use audio frames instead of multiple video frames in the same period to indirectly identify the highlight moments. In order to ensure the accuracy of this indirect recognition result, it is necessary to obtain the highlight moments of the game video. After the clip, a frame extraction check is performed, that is, preferably, after obtaining the highlight clips of the game video, the method also includes but is not limited to the following steps S71 to S74: S71. Randomly extract a video from the highlight clips of the game video. Frame; S72. Use perceptual hashing algorithm to perform image processing on the video frame to obtain the image fingerprint information of the video frame; S73. Determine the difference between the image fingerprint information of the video frame and the image fingerprint information of the game video target highlights. Whether the number of different data bits is greater than or equal to the preset number of bits threshold; S74. If so, delete the recorded and saved highlight clips of the game video. The aforementioned perceptual hashing algorithm is an existing algorithm used to generate fingerprint information from image data, and its principle will not be described again here. Based on the aforementioned steps S71 to S74, a frame can be drawn to check whether the game video highlight scene is a game highlight scene, if so, it will be retained, otherwise (that is, the image fingerprint information of the video frame and the image of the game video target highlight scene The fingerprint information is quite different) and the highlights of the game video need to be deleted. In addition, the preset number of bits threshold may be 10, for example.

由此基于前述步骤S1~S6所描述的游戏视频精彩画面录制方法,提供了一种基于同期音频帧间接识别游戏精彩时刻画面的新方案,即在获取于游戏过程中实时产生的音视频流数据后,基于所述音视频流数据中的当前音频帧,通过快速傅立叶变换、频点幅值编码和绘图处理,得到包含有频谱特征信息的当前待识别图像,然后将该图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,最后在该分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,如此通过用一个音频帧来代替多个视频帧间接进行精彩时刻画面识别,可以实现大幅度简化过程、提升识别效率和降低所需计算资源的目的,便于实际应用和推广。Therefore, based on the method of recording game video highlights described in the aforementioned steps S1 to S6, a new solution for indirectly identifying highlights of the game based on audio frames of the same period is provided, that is, the audio and video stream data generated in real time during the game is obtained. Finally, based on the current audio frame in the audio and video stream data, through fast Fourier transform, frequency point amplitude encoding and drawing processing, the current image to be recognized containing spectral feature information is obtained, and then the image is input into the image based on the convolutional neural network. Network CNN and highlight audio frames are used to complete the pre-trained highlight frame classification model to obtain the current classification result. Finally, when the highlight confidence in the classification result is greater than or equal to the preset confidence threshold, the audio and video stream data and At least one video frame in the same period as the current audio frame is recorded and saved to obtain highlight clips of the game video. In this way, by using one audio frame instead of multiple video frames to indirectly identify highlight moments, the process can be greatly simplified and improved. The purpose of identifying efficiency and reducing required computing resources is to facilitate practical application and promotion.

本实施例在前述第一方面的技术方案基础上,还提供了一种如何合并相邻片段并剔除孤立瞬时片段的可能设计一,即在得到游戏视频精彩画面片段之后,所述方法还包括但不限于有如下步骤S81~S83。Based on the technical solution of the first aspect, this embodiment also provides a possible design of how to merge adjacent segments and eliminate isolated instantaneous segments. That is, after obtaining the highlight clips of the game video, the method also includes: It is not limited to the following steps S81 to S83.

S81.判断在前最近的游戏视频精彩画面片段与最新得到的游戏视频精彩画面片段在时序上是否连续。S81. Determine whether the most recent game video highlight clip and the latest game video highlight clip are sequentially continuous.

在所述步骤S81中,两片段在时序上是否连续可以基于人眼反应时间(即一般为0.1秒~0.4秒)来判定,例如若所述在前最近的游戏视频精彩画面片段的最末帧与所述最新得到的游戏视频精彩画面片段的首帧在时序上的相差时间大于等于0.4秒,则可以认为两片段在时序上不连续,不能合并为一个片段,否则允许合并为一个片段。In the step S81, whether the two segments are continuous in time sequence can be determined based on the human eye reaction time (that is, generally 0.1 seconds to 0.4 seconds). For example, if the last frame of the most recent game video highlight segment If the timing difference between the first frame and the latest obtained game video highlight fragment is greater than or equal to 0.4 seconds, it can be considered that the two fragments are temporally discontinuous and cannot be merged into one fragment. Otherwise, they are allowed to be merged into one fragment.

S82.若判定在时序上连续,则将两个游戏视频精彩画面片段合并为一个游戏视频精彩画面片段,否则进一步判断所述在前最近的游戏视频精彩画面片段的时长是否小于等于预设时长阈值。S82. If it is determined that they are continuous in time sequence, merge the two game video highlight clips into one game video clip. Otherwise, further determine whether the duration of the most recent game video highlight clip is less than or equal to the preset duration threshold. .

在所述步骤S82中,由于所述在前最近的游戏视频精彩画面片段不与所述最新得到的游戏视频精彩画面片段合并处理,因此其为一个时长固定的片段,有必要进一步基于时长比较结果来判断其是否为孤立瞬时片段,若是,则由于精彩时刻过短需要删除。此外,所述预设时长阈值可以具体基于精彩时刻画面的最短维持时长来确定,例如为1秒。In step S82, since the most recent game video highlight clip is not merged with the latest game video highlight clip, it is a clip with a fixed duration, and it is necessary to further compare the duration based on the result. To determine whether it is an isolated momentary clip, if so, it needs to be deleted because the highlight moment is too short. In addition, the preset duration threshold may be specifically determined based on the shortest duration of the highlight image, for example, 1 second.

S83.若判定所述时长小于等于所述预设时长阈值,则删除保存的所述在前最近的游戏视频精彩画面片段。S83. If it is determined that the duration is less than or equal to the preset duration threshold, delete the saved most recent game video highlights.

由此基于前述可能设计一,可以合并相邻片段并剔除孤立瞬时片段,实现避免录制所得游戏视频精彩画面片段过于零碎化的目的,以便向玩家推送具有高回顾价值的内容。Based on the aforementioned possible design one, adjacent clips can be merged and isolated instantaneous clips can be eliminated to achieve the purpose of preventing the recorded game video highlight clips from being too fragmented, so as to push content with high review value to players.

本实施例在前述可能设计一的技术方案基础上,还提供了一种如何向玩家推送高回顾价值内容的可能设计二,即所述方法还包括但不限于有如下内容S91~S94。Based on the technical solution of possible design one mentioned above, this embodiment also provides a possible design two of how to push high review value content to players, that is, the method also includes but is not limited to the following contents S91 to S94.

S91.在游戏结束时汇总在游戏过程中录制的所有游戏视频精彩画面片段,得到至少一个游戏视频精彩画面片段。S91. At the end of the game, collect all game video highlight clips recorded during the game to obtain at least one game video highlight clip.

S92.针对所述至少一个游戏视频精彩画面片段中的各个游戏视频精彩画面片段,按照如下公式累加计算得到对应的精彩置信度总和:S92. For each of the game video highlight clips in the at least one game video highlight clip, accumulate and calculate the corresponding highlight confidence sum according to the following formula:

式中,k表示正整数,GCTk表示在所述至少一个游戏视频精彩画面片段中的第k个游戏视频精彩画面片段的精彩置信度总和,Nk表示与所述第k个游戏视频精彩画面片段一起同期采集的多个音频帧的总帧数,n表示正整数,GCk,n表示在所述多个音频帧中的第n个音频帧的精彩置信度。In the formula, k represents a positive integer, GCT k represents the sum of the confidence levels of the k-th game video highlight segment in the at least one game video highlight segment, and N k represents the difference between the k-th game video highlight segment and the k-th game video highlight segment. The total number of frames of multiple audio frames collected simultaneously by the clip, n represents a positive integer, GC k, n represents the highlight confidence of the nth audio frame among the multiple audio frames.

在所述步骤S92中,基于上述公式可知,各个片段的精彩置信度总与对应的同期音频帧数以及同期音频帧的精彩置信度分别正相关,因此可以用于反映对应片段的回顾价值:片段时长越长且精彩画面越多,回顾价值越高。In the step S92, based on the above formula, it can be known that the highlight confidence of each segment is always positively correlated with the corresponding number of audio frames in the same period and the excitement confidence of the audio frame in the same period respectively, so it can be used to reflect the retrospective value of the corresponding segment: Segment The longer it is and the more exciting scenes it contains, the higher the review value is.

S93.按照精彩置信度总和从高至低顺序对所述至少一个游戏视频精彩画面片段进行排序,得到游戏视频精彩画面片段序列。S93. Sort the at least one game video highlight clip in descending order according to the sum of the highlight confidence levels to obtain a sequence of game video highlight clips.

S94.将所述游戏视频精彩画面片段序列中的前M个游戏视频精彩画面片段推送给游戏玩家,其中,M表示小于等于K的预设正整数,K表示所述游戏视频精彩画面片段序列的总片段数。S94. Push the first M game video highlight clips in the game video highlight clip sequence to game players, where M represents a preset positive integer less than or equal to K, and K represents the number of the game video highlight clip sequence. Total number of segments.

在所述步骤S94中,若K的取值为100,则M的取值可举例为10,即将回顾价值最高的前10个游戏视频精彩画面片段推送给游戏玩家。In step S94, if the value of K is 100, then the value of M can be, for example, 10, that is, the top 10 game video highlights with the highest review value are pushed to the game players.

由此基于前述可能设计二,可以在游戏结束时针对在游戏过程中录制的所有游戏视频精彩画面片段,分别计算得到各个片段的回顾价值,并基于回顾价值向游戏玩家推送最具有回顾价值的内容,如此可以进一步提升玩家的游戏体验。Based on the aforementioned possible design 2, the retrospective value of each clip can be calculated separately at the end of the game for all game video highlight clips recorded during the game, and content with the most retrospective value can be pushed to game players based on the retrospective value. , which can further enhance the player’s gaming experience.

如图2所示,本实施例第二方面提供了一种实现第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法的虚拟装置,包括有数据获取模块、傅立叶变换处理模块、频点幅值编码模块、待识别图像绘制模块、精彩帧分类模块和视频帧保存模块;As shown in Figure 2, the second aspect of this embodiment provides a virtual device that implements the method for recording game video highlights described in the first aspect, possible design one or possible design two, including a data acquisition module, Fourier transform processing module, frequency point amplitude coding module, image drawing module to be recognized, highlight frame classification module and video frame saving module;

所述数据获取模块,用于获取在游戏过程中实时产生的音视频流数据;The data acquisition module is used to acquire audio and video stream data generated in real time during the game;

所述傅立叶变换处理模块,通信连接所述数据获取模块,用于对所述音视频流数据中的当前音频帧进行快速傅立叶变换处理,得到当前频谱;The Fourier transform processing module is communicatively connected to the data acquisition module, and is used to perform fast Fourier transform processing on the current audio frame in the audio and video stream data to obtain the current spectrum;

所述频点幅值编码模块,通信连接所述傅立叶变换处理模块,用于将在所述当前频谱中的且与K个频点一一对应的K个幅值分别编码为红绿蓝RGB三通道颜色值,得到包含有K个RGB值的当前待识别数据,其中,K表示不小于64的自然数,所述K个频点在人体听觉频域区间内等间距分布;The frequency point amplitude encoding module is communicatively connected to the Fourier transform processing module, and is used to encode the K amplitude values in the current spectrum and corresponding to the K frequency points into red, green and blue RGB respectively. The channel color value is used to obtain the current data to be identified containing K RGB values, where K represents a natural number not less than 64, and the K frequency points are equally spaced in the human auditory frequency domain;

所述待识别图像绘制模块,通信连接所述频点幅值编码模块,用于根据所述当前待识别数据的K个RGB值,绘制得到像素矩阵为k*k的当前待识别图像,其中,k为不小于K的平方根的自然数;The image drawing module to be identified is communicatively connected to the frequency point amplitude encoding module, and is used to draw the current image to be identified with a pixel matrix of k*k according to the K RGB values of the current data to be identified, where, k is a natural number not less than the square root of K;

所述精彩帧分类模块,通信连接所述待识别图像绘制模块,用于将所述当前待识别图像输入基于卷积神经网络CNN和精彩音频帧来完成预训练的精彩帧分类模型,得到当前分类结果,其中,所述精彩音频帧是指与游戏视频目标精彩画面同期的音频帧,并用于为所述精彩帧分类模型提供进行精彩帧分类训练的正样本;The highlight frame classification module is communicatively connected to the to-be-recognized image drawing module, and is used to input the current to-be-recognized image into a pre-trained highlight frame classification model based on the convolutional neural network CNN and highlight audio frames to obtain the current classification. As a result, the highlight audio frame refers to the audio frame that is in the same period as the target highlight picture of the game video, and is used to provide the highlight frame classification model with positive samples for highlight frame classification training;

所述视频帧保存模块,分别通信连接所述数据获取模块和精彩帧分类模块,用于当所述当前分类结果中的精彩置信度大于等于预设置信度阈值时,对所述音视频流数据中的且与所述当前音频帧同期的至少一个视频帧进行录制保存,得到游戏视频精彩画面片段,其中,所述精彩置信度是指在所述当前分类结果中将所述当前音频帧分类为精彩帧的置信度。The video frame saving module is respectively connected to the data acquisition module and the highlight frame classification module by communication, and is used to store the audio and video stream data when the highlight confidence in the current classification result is greater than or equal to the preset confidence threshold. At least one video frame that is in the same period as the current audio frame is recorded and saved to obtain a highlight video clip of the game video, wherein the highlight confidence level refers to classifying the current audio frame as in the current classification result. Confidence of highlight frames.

本实施例第二方面提供的前述装置的工作过程、工作细节和技术效果,可以参见第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法,于此不再赘述。For the working process, working details and technical effects of the aforementioned device provided in the second aspect of this embodiment, please refer to the method for recording game video highlights described in the first aspect, possible design one or possible design two, and will not be described again here.

如图3所示,本实施例第三方面提供了一种实现如第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法的实体设备,包括有依次通信连接的存储器、处理器和收发器,其中,所述存储器用于存储计算机程序,所述收发器用于收发语音信号,所述处理器用于读取所述计算机程序,执行如第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法。具体举例的,所述存储器可以但不限于包括随机存取存储器(Random-Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、闪存(FlashMemory)、先进先出存储器(First Input First Output,FIFO)和/或先进后出存储器(First Input Last Output,FILO)等等。As shown in Figure 3, the third aspect of this embodiment provides a physical device that implements the method for recording game video highlights as described in the first aspect, possible design one, or possible design two, including a memory connected by communication in sequence, A processor and a transceiver, wherein the memory is used to store a computer program, the transceiver is used to send and receive voice signals, and the processor is used to read the computer program and execute the first aspect, a possible design or a possible design 2. The method for recording highlights of game videos described in 2. For specific examples, the memory may include, but is not limited to, random access memory (Random-Access Memory, RAM), read-only memory (Read-Only Memory, ROM), flash memory (FlashMemory), first-in-first-out memory (First Input First Output, FIFO) and/or First Input Last Output (FILO), etc.

本实施例第三方面提供的前述芯片的工作过程、工作细节和技术效果,可以参见第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法,于此不再赘述。For the working process, working details and technical effects of the aforementioned chip provided in the third aspect of this embodiment, please refer to the method for recording game video highlights described in the first aspect, possible design one or possible design two, and will not be described again here.

本实施例第四方面提供了一种存储包含如第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法的指令的计算机可读存储介质,即所述计算机可读存储介质上存储有指令,当所述指令在计算机上运行时,执行如第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法。其中,所述计算机可读存储介质是指存储数据的载体,可以但不限于包括软盘、光盘、硬盘、闪存、优盘和/或记忆棒(Memory Stick)等计算机可读存储介质,所述计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。The fourth aspect of this embodiment provides a computer-readable storage medium that stores instructions for the method for recording game video highlights as described in the first aspect, possible design one, or possible design two, that is, the computer-readable storage medium Instructions are stored on the computer, and when the instructions are run on the computer, the method for recording game video highlights as described in the first aspect, possible design one, or possible design two is executed. Wherein, the computer-readable storage medium refers to a carrier for storing data, which may include but is not limited to computer-readable storage media such as floppy disks, optical disks, hard disks, flash memory, USB flash drives, and/or Memory Sticks. The computer may Is a general-purpose computer, special-purpose computer, computer network or other programmable device.

本实施例第四方面提供的前述计算机可读存储介质的工作过程、工作细节和技术效果,可以参见如第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法,于此不再赘述。For the working process, working details and technical effects of the computer-readable storage medium provided in the fourth aspect of this embodiment, please refer to the method for recording game video highlights as described in the first aspect, possible design one or possible design two. Herein No longer.

本实施例第五方面提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使所述计算机执行如第一方面、可能设计一或可能设计二所述的游戏视频精彩画面录制方法。其中,所述计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。The fifth aspect of this embodiment provides a computer program product containing instructions. When the instructions are run on a computer, the computer is caused to execute the game video highlights described in the first aspect, possible design one or possible design two. Screen recording method. Wherein, the computer may be a general-purpose computer, a special-purpose computer, a computer network or other programmable devices.

最后应说明的是:以上所述仅为本发明的优选实施例而已,并不用于限制本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。Finally, it should be noted that the above descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.

Claims (10)

1. A method for recording a video highlight of a game, comprising:
acquiring audio and video stream data generated in real time in the game process;
performing fast Fourier transform processing on the current audio frame in the audio-video stream data to obtain a current frequency spectrum;
respectively encoding K magnitudes which are in the current frequency spectrum and correspond to K frequency points one by one into RGB three-channel color values to obtain current data to be identified, wherein K represents a natural number which is not less than 64, and the K frequency points are distributed at equal intervals in a human auditory frequency domain interval;
drawing to obtain a current image to be recognized, wherein the pixel matrix of the current image to be recognized is K, according to K RGB values of the current data to be recognized, and K is a natural number not smaller than the square root of K;
inputting the current image to be identified into a highlight frame classification model which is trained in advance based on a convolutional neural network CNN and a highlight audio frame to obtain a current classification result, wherein the highlight audio frame is an audio frame synchronous with a game video target highlight picture and is used for providing a positive sample for highlight frame classification training for the highlight frame classification model;
When the highlight confidence in the current classification result is greater than or equal to a preset confidence threshold, recording and storing at least one video frame which is in the audio-video stream data and is synchronous with the current audio frame to obtain a game video highlight picture segment, wherein the highlight confidence is the confidence of classifying the current audio frame into the highlight frame in the current classification result.
2. The method of claim 1, wherein the encoding the K magnitudes corresponding to the K frequency points one to one into the RGB three-channel color values comprises:
transforming the K magnitudes into values of the same numerical unit and respectively in intervals by means of transforming the numerical units
A value to be converted in [0,16777215 ];
converting the value to be converted from decimal numbers to binary numbers;
0 is complemented on the binary digits from left to right to obtain 24-bit binary digits;
converting the first 8 digits in the 24 digits into decimal digits to obtain a red channel color value in the red, green and blue RGB three-channel color values;
converting the middle 8-bit binary digits in the 24-bit binary digits into decimal digits to obtain a green channel color value in the red, green and blue RGB three-channel color values;
And converting the last 8 binary digits in the 24-bit binary digits into decimal digits to obtain a blue channel color value in the red, green and blue RGB three-channel color values.
3. The game video highlight recording method according to claim 1, wherein the CNN adopts a network structure of Resnet50, a Mobile-net or a VGG 16.
4. The method for recording a video highlight of a game according to claim 1, wherein when the highlight confidence in the current classification result is greater than or equal to a preset confidence threshold, recording and storing at least one video frame in the audio-video stream data and in synchronization with the current audio frame to obtain a video highlight segment of the game, comprising:
when the highlight confidence in the current classification result is greater than or equal to a preset confidence threshold, judging whether the audio frame number between the previous latest highlight frame and the current audio frame is equal to zero, wherein the highlight confidence refers to the confidence of classifying the current audio frame into the highlight frame in the current classification result, and the previous latest highlight frame refers to the audio frame which is positioned before the current audio frame in the audio-video stream data and corresponds to the highlight confidence which is greater than or equal to the preset confidence threshold;
If the audio frame number is equal to zero, recording and storing at least one video frame which is in the audio-video stream data and is synchronous with the current audio frame to obtain a game video highlight frame segment, otherwise, further judging whether the audio frame number is greater than or equal to a preset frame number threshold value;
if the audio frame number is greater than or equal to the preset frame number threshold, recording and storing at least one video frame in the audio-video stream data and in the same period as the current audio frame to obtain a game video highlight frame segment, otherwise recording and storing at least one video frame in the audio-video stream data and in the same period as the middle audio frame and the current audio frame to obtain a game video highlight frame segment, wherein the middle audio frame refers to at least one audio frame in the audio-video stream data, which is positioned between the last highlight frame and the current audio frame.
5. The method of claim 1, wherein after obtaining the video highlight clip, the method further comprises:
judging whether the previous latest game video highlight frame segment is continuous with the latest obtained game video highlight frame segment in time sequence;
If the time sequence is continuous, merging the two game video highlight frame fragments into one game video highlight frame fragment, otherwise, further judging whether the duration of the previous latest game video highlight frame fragment is smaller than or equal to a preset duration threshold value;
and if the duration is less than or equal to the preset duration threshold, deleting the stored latest previous game video highlight frame fragment.
6. The method of claim 5, further comprising:
summarizing all game video highlight frame fragments recorded in the game process at the end of the game to obtain at least one game video highlight frame fragment;
for each game video highlight segment in the at least one game video highlight segment, accumulating and calculating according to the following formula to obtain a corresponding highlight confidence sum:
wherein k represents a positive integer, GCT k A highlight confidence sum, N, representing a kth game video highlight clip in the at least one game video highlight clip k Representing the total number of frames of a plurality of audio frames taken contemporaneously with the kth game video highlight segment, n representing a positive integer, GC k,n Representing a highlight confidence level for an nth audio frame of the plurality of audio frames;
sequencing the at least one game video highlight frame segment according to the highlight confidence sum from high to low to obtain a game video highlight frame segment sequence;
pushing the first M game video highlight frame fragments in the game video highlight frame fragment sequence to a game player, wherein M represents a preset positive integer less than or equal to K, and K represents the total fragment number of the game video highlight frame fragment sequence.
7. The method of claim 1, wherein after obtaining the video highlight clip, the method further comprises:
randomly extracting a video frame from the game video highlight frame;
performing image processing on the video frame by adopting a perceptual hash algorithm to obtain image fingerprint information of the video frame;
judging whether the number of different data bits of the image fingerprint information of the video frame and the image fingerprint information of the game video target highlight is greater than or equal to a preset bit number threshold value;
if yes, deleting the recorded and saved game video highlight frame fragments.
8. The game video highlight image recording device is characterized by comprising a data acquisition module, a Fourier transform processing module, a frequency point amplitude encoding module, an image drawing module to be identified, a highlight frame classification module and a video frame storage module;
the data acquisition module is used for acquiring audio and video stream data generated in real time in the game process;
the Fourier transform processing module is in communication connection with the data acquisition module and is used for carrying out fast Fourier transform processing on the current audio frame in the audio-video stream data to obtain a current frequency spectrum;
the frequency point amplitude coding module is in communication connection with the Fourier transform processing module and is used for respectively coding K amplitude values which are in the current frequency spectrum and correspond to K frequency points one by one into RGB three-channel color values to obtain current data to be identified which contain K RGB values, wherein K represents a natural number which is not less than 64, and the K frequency points are distributed at equal intervals in a human auditory frequency domain interval;
the image drawing module to be identified is in communication connection with the frequency point amplitude encoding module and is used for drawing the current image to be identified with a pixel matrix of K according to K RGB values of the current data to be identified, wherein K is a natural number not smaller than the square root of K;
The highlight frame classifying module is in communication connection with the image drawing module to be recognized and is used for inputting the current image to be recognized into a highlight frame classifying model which is trained in advance based on a convolutional neural network CNN and a highlight audio frame to obtain a current classifying result, wherein the highlight audio frame is an audio frame synchronous with a game video target highlight picture and is used for providing a positive sample for highlight frame classifying training for the highlight frame classifying model;
the video frame storage module is respectively in communication connection with the data acquisition module and the highlight frame classification module, and is used for recording and storing at least one video frame in the audio and video stream data and in the same period as the current audio frame when the highlight confidence in the current classification result is greater than or equal to a preset confidence threshold value to obtain a game video highlight frame fragment, wherein the highlight confidence is the confidence of classifying the current audio frame into the highlight frame in the current classification result.
9. A computer device comprising a memory, a processor and a transceiver in communication connection in sequence, wherein the memory is configured to store a computer program, the transceiver is configured to transmit and receive data, and the processor is configured to read the computer program and perform the game video highlight recording method according to any one of claims 1 to 7.
10. A computer readable storage medium having instructions stored thereon which, when executed on a computer, perform the method of recording a game video highlight according to any one of claims 1 to 7.
CN202310062849.0A 2023-01-13 2023-01-13 Game video highlight recording method, device, equipment and storage medium Expired - Fee Related CN116074566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310062849.0A CN116074566B (en) 2023-01-13 2023-01-13 Game video highlight recording method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310062849.0A CN116074566B (en) 2023-01-13 2023-01-13 Game video highlight recording method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116074566A CN116074566A (en) 2023-05-05
CN116074566B true CN116074566B (en) 2023-10-20

Family

ID=86176462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310062849.0A Expired - Fee Related CN116074566B (en) 2023-01-13 2023-01-13 Game video highlight recording method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116074566B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625661A (en) * 2020-05-14 2020-09-04 国家计算机网络与信息安全管理中心 Audio and video segment classification method and device
CN112788200A (en) * 2020-12-04 2021-05-11 光大科技有限公司 Method and device for determining frequency spectrum information, storage medium and electronic device
WO2021163882A1 (en) * 2020-02-18 2021-08-26 深圳市欢太科技有限公司 Game screen recording method and apparatus, and computer-readable storage medium
CN113643728A (en) * 2021-08-12 2021-11-12 荣耀终端有限公司 Audio recording method, electronic device, medium, and program product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10390082B2 (en) * 2016-04-01 2019-08-20 Oath Inc. Computerized system and method for automatically detecting and rendering highlights from streaming videos

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021163882A1 (en) * 2020-02-18 2021-08-26 深圳市欢太科技有限公司 Game screen recording method and apparatus, and computer-readable storage medium
CN111625661A (en) * 2020-05-14 2020-09-04 国家计算机网络与信息安全管理中心 Audio and video segment classification method and device
CN112788200A (en) * 2020-12-04 2021-05-11 光大科技有限公司 Method and device for determining frequency spectrum information, storage medium and electronic device
CN113643728A (en) * 2021-08-12 2021-11-12 荣耀终端有限公司 Audio recording method, electronic device, medium, and program product

Also Published As

Publication number Publication date
CN116074566A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN111292764B (en) Identification system and identification method
CN109145784B (en) Method and apparatus for processing video
CN111401474B (en) Training method, device, equipment and storage medium for video classification model
CN108648746A (en) A kind of open field video natural language description generation method based on multi-modal Fusion Features
TWI712316B (en) Method and device for generating video summary
CN111198958A (en) Method, device and terminal for matching background music
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
CN111104555B (en) Video hash retrieval method based on attention mechanism
CN109871749B (en) Pedestrian re-identification method and device based on deep hash and computer system
CN114898263B (en) Video key frame extraction method based on image information entropy and HOG_SSIM
EP4390725A1 (en) Video retrieval method and apparatus, device, and storage medium
CN110334243A (en) Audio representation learning method based on multi-layer temporal pooling
CN110381392A (en) A kind of video abstraction extraction method and its system, device, storage medium
CN111027419B (en) Method, device, equipment and medium for detecting video irrelevant content
CN110046568B (en) A video action recognition method based on time-aware structure
CN111488813B (en) Video emotion marking method and device, electronic equipment and storage medium
CN115798459B (en) Audio processing method and device, storage medium and electronic equipment
WO2023185175A1 (en) Video processing method and apparatus
CN116074566B (en) Game video highlight recording method, device, equipment and storage medium
CN116167014A (en) Multi-mode associated emotion recognition method and system based on vision and voice
CN111401637A (en) Prediction method of user experience quality based on user behavior and expression data
CN108764258B (en) Optimal image set selection method for group image insertion
CN112101091B (en) Video classification method, electronic device and storage medium
CN117041680A (en) Video material acquisition method and device, electronic equipment and storage medium
CN116310758A (en) Indoor scene recognition method and system based on combined semantic region relation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20231020