CN116016983A

CN116016983A - Method, device, equipment and storage medium for identifying competition picture

Info

Publication number: CN116016983A
Application number: CN202211709337.0A
Authority: CN
Inventors: 胡阳
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-04-25

Abstract

The disclosure provides a method, a device, equipment and a storage medium for identifying a competition picture, relates to the technical field of artificial intelligence, and particularly relates to deep learning and computer vision technology. The method comprises the following steps: responding to the determination that the user switches the live video to the background, and acquiring the live video in real time; identifying live pictures in the live video by using a picture identification model, and determining whether the live pictures are match pictures or not; and generating and displaying a reminding popup window to remind the user to continuously watch the live video in response to the fact that the live video is determined to be the game video. The method for identifying the competition pictures realizes real-time detection and judgment on whether the live broadcast pictures are competition pictures, so that a user can be reminded of continuously watching the competition live broadcast in time after the competition is restored.

Description

Method, device, equipment and storage medium for identifying competition picture

5 technical field

The disclosure relates to the technical field of artificial intelligence, in particular to deep learning and computer vision technologies, and particularly relates to a method, a device, equipment and a storage medium for identifying a competition picture.

Background

Video live broadcasting of basketball events such as 0NBA, CBA, FIBA has wide audience.

According to the rules of basketball game, there are several pauses and internode rest during the game, and the duration varies from tens of seconds to tens of minutes. At this time, the live party will typically switch the live view to a video advertisement or commentator shot. Many viewers are not interested in the content, and they tend to be

During the pause period of the game, the live audio and video are not disturbed, but the live audio and video are expected to return to the live broadcast to continue watching the game in the first 5 times of the game resume.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for identifying a game screen.

0 according to a first aspect of the present disclosure, there is provided a method for identifying a game screen, including: responding to the determination that the user switches the live video to the background, and acquiring the live video in real time; identifying live pictures in the live video by using a picture identification model, and determining whether the live pictures are match pictures or not; and generating and displaying a reminding popup window to remind the user to continuously watch the live video in response to the fact that the live video is determined to be the game video.

According to a second aspect of the present disclosure, there is provided an identification device of a game screen, including: the acquisition module is configured to acquire the live video in real time in response to determining that the user switches the live video to the background; the identification module is configured to identify live pictures in the live video by using a picture identification model and determine whether the live pictures are match pictures or not; a generation module configured to

In response to determining that the live view is a game view, a reminder pop is generated and displayed to remind user 0 to continue watching the live video.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described in any one of the implementations of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram to which the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method of identifying a game screen according to the present disclosure;

FIG. 3 is a flow chart of another embodiment of a method of identifying a game screen according to the present disclosure;

FIG. 4 is a schematic diagram of the structure of the modified pre-training model;

FIG. 5 is an application scenario diagram of a method of recognition of a game screen of the present disclosure;

FIG. 6 is a schematic diagram of a structure of one embodiment of an identification device of a game screen according to the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing a method of recognizing a game screen according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which an embodiment of a game screen recognition method or a game screen recognition apparatus of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or transmit information or the like. Various client applications can be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablets, laptop and desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-described electronic devices. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may analyze and process live video acquired from the

terminal devices

101, 102, 103 and generate processing results (e.g., reminder popups).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that, the method for identifying a game screen provided in the embodiments of the present disclosure is generally executed by the server 105, and accordingly, the device for identifying a game screen is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of identifying a game screen according to the present disclosure is shown. The method for identifying the competition picture comprises the following steps:

step 201, in response to determining that the user switches the live video to the background, acquiring the live video in real time.

In this embodiment, the execution subject (e.g., the server 105 shown in fig. 1) of the method for recognizing a game screen may acquire live video in real time when it is determined that the user switches live video to background. The user can find the resources of the live game through the browser and watch the resources, and the duration of the live game is generally from tens of seconds to tens of minutes when the live game has a middle place rest. During a middle place, an advertisement may be played, a live shot may be switched to a commentator in the commentary, and a user is generally not interested in the advertisement or the commentary shot, and at this time, the user switches the advertisement or the commentary shot to the background to do other things. At this time, the executing body recognizes that the user switches the live video to the background, and then the executing body continuously acquires the live video from the background in real time, that is, the executing body uses the characteristic that the video at the PC (Personal Computer ) end can also continuously play in the background, and captures the video picture being played locally. Optionally, the executing body creates a separate browser page, where a video control (< video >) is included, and then captures a live video that has been switched to the background using the screen capturing capability provided by the browser plug-in, and takes the captured live video stream as the video source of the video control.

And 202, identifying live pictures in the live video by using a picture identification model, and determining whether the live pictures are match pictures.

In this embodiment, the executing body identifies the live broadcast picture in the live broadcast video by using the picture identification model, so as to determine whether the live broadcast picture is a match picture, that is, the executing body performs real-time image classification reasoning on each frame of the live broadcast picture, so as to implement real-time detection and judgment on whether the live broadcast picture is a match picture. Here, the screen recognition model is obtained by applying the transfer learning method to the pre-training model. First, the execution subject will obtain a pre-training model, mobileNet, which is a pre-training model provided by the tensorflow. Js framework for mobile and embedded vision applications. In this embodiment, the transfer learning technique is applied to the pre-training model MobileNet, the last 6 layers responsible for classification in the MobileNet model are deleted, the first 87 layers responsible for feature extraction in the MobileNet model are reserved, and 2 layers are spliced after that, and the 2 layers are dedicated to classification of whether the live broadcast picture is a competition picture in this embodiment, so that the picture identification model in this embodiment is obtained.

It should be noted that, the transfer learning is a machine learning method, that is, the model developed for the task a is used as an initial point and reused in the process of developing the model for the task B, that is, the transfer learning refers to that a pre-trained model is reused in another task.

After the picture identification model is generated, the execution subject performs real-time image classification reasoning on each frame of the live broadcast picture by using the picture identification model, so as to determine whether the live broadcast picture is a competition picture.

In addition, training and reasoning of the picture identification model in the embodiment are performed in a local browser of a user, so that live broadcast blocking caused by occupation of network bandwidth is avoided.

In step 203, in response to determining that the live broadcast picture is a game picture, a reminder pop-up is generated and displayed.

In this embodiment, the executing host generates and displays a reminding popup window to remind the user to continue watching the live video when determining that the live video is a match video. If the executing main body determines that the current live broadcast picture is a competition picture based on the recognition result of the picture recognition model on the live broadcast picture, the executing main body generates a reminding popup window and invokes the capability of the browser plug-in, and sends a popup window reminding to a user to remind the user that the competition is restored, so that the user can timely return to the live broadcast to watch the competition. Compared with the prior art that only the competition scene and the competition data can be identified, the picture identification model in the embodiment can identify whether the current picture is a competition picture or not, namely, in the embodiment, the current live broadcast picture is identified as a competition picture or a non-competition picture, and the non-competition picture can comprise an advertisement picture, an commentator commentary picture and the like, so that when the current picture is identified as the competition picture, a user is reminded to continuously watch the competition in time. As an example, when the execution subject determines that the live broadcast screen is a game screen, it generates a popup window in which "reminder" is displayed: live has resumed ", and the popup window is displayed in the current browser page. It should be noted that, the style of the pop-up window and the reminding mode may be set according to actual needs, which is not specifically limited in this embodiment.

According to the method for identifying the competition pictures, firstly, live video is switched to the background in response to determining that a user switches the live video to the background, and the live video is obtained in real time; then, identifying live pictures in the live video by using a picture identification model, and determining whether the live pictures are match pictures or not; and finally, generating and displaying a reminding popup window to remind the user to continuously watch the live video in response to the fact that the live video is determined to be the match video. According to the method for identifying the competition pictures, the real-time detection and judgment of whether the live pictures are competition pictures or not is realized by carrying out real-time image classification reasoning on each frame of the live pictures carried out in the background, so that a user can be reminded of continuously watching the competition live after the competition is restored, and the competition content is prevented from being missed. In addition, training and reasoning of the picture identification model in the embodiment are performed in a local browser of a user, so that live broadcast blocking caused by occupation of network bandwidth is avoided.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of a method of identifying a game screen according to the present disclosure. The method for identifying the competition picture comprises the following steps:

in step 301, in response to determining that the user switches live video to background, a browser page is created.

In this embodiment, the execution subject of the method for recognizing a game screen (e.g., the server 105 shown in fig. 1) creates a browser page when it is determined that the user switches live video to background, where the browser page contains a video control. That is, the executing host creates an independent browser page containing a video control when determining that the user switches the live video to the background.

And 302, acquiring live video from the background in real time by utilizing a plug-in of the browser.

In this embodiment, the executing body captures live video that has been switched to the background by using the screen capturing capability provided by the browser plug-in.

And step 303, taking the live video as a video source of the video control.

In this embodiment, the executing entity uses the captured live video stream as the video source of the video control. Thereby realizing the acquisition of live video from the background.

In some optional implementations of this embodiment, after step 303, the method for identifying a game screen further includes: setting the size of the video control to be a preset size which is the same as the size of the input image received by the picture identification model; image data is read from the video control.

In this implementation, since the pretrained model MobileNet employed is a pretrained model provided by the tensorflow. Js framework, and the pretrained model receives input image sizes of 224px x224 px, px (Pixel) pixels. Therefore, setting the size of the video control to 224px x224 px allows the image data required for the model to be read from the video control directly using the API (tf. Thereby facilitating the subsequent identification and reasoning of the read image data.

And 304, loading a pre-training model, and modifying the pre-training model by using a transfer learning method.

In this embodiment, the execution body loads the pre-training model, and reforms the pre-training model by using a migration learning method. That is, the executing main body loads the pre-training model MobileNet first, and reforms the pre-training model by using the migration learning method, so as to obtain a reformed pre-training model, and the reformed pre-training model can better classify and infer the obtained live broadcast picture. It should be noted that, the transfer learning is a machine learning method, that is, the model developed for the task a is used as an initial point and reused in the process of developing the model for the task B, that is, the transfer learning refers to that a pre-trained model is reused in another task.

In some alternative implementations of the present embodiment, step 304 includes: and reserving a feature extraction layer of the pre-training model, and splicing a classification layer after the feature extraction layer to obtain the modified pre-training model, wherein the classification layer is used for determining whether the live broadcast picture is a competition picture or not.

In the implementation manner, the transfer learning technology is applied to a pre-training model MobileNet, and specifically comprises the following steps: the last 6 layers responsible for classification in the MobileNet model are deleted, the first 87 layers responsible for feature extraction in the MobileNet model are reserved, and then 2 classification layers are spliced, wherein the 2 classification layers are special for classifying whether the live broadcast picture is a competition picture or not in the embodiment, so that a pre-training model after transformation is obtained. With further reference to fig. 4, fig. 4 is a schematic structural diagram of the modified pre-training model, as shown in fig. 4, the pre-training model MobileNet is loaded first, and then the front 87 layers for extracting features of an image (Images) in the (training) MobileNet are cut off, and the shape is [ null,7,7,256]. The cut 87 layers are then taken as a new model head (new head) and flattened (flat) to obtain a flattened model head with shape of [ null,12544 ]. Then, a layer 2 sorting layer (dense 1) dedicated to the sorting of whether or not the live view is a game view in the present embodiment is spliced behind it

And dense 2), the shape of the spliced 2-layer classification layer is [ null,2], thereby obtaining the pre-training model after modification 5. Applying transfer learning to the transformation of a pre-trained model improves the model

The transformation efficiency of the model.

And step 305, training the reformed pre-training model to obtain a picture recognition model.

In this embodiment, the executing entity trains the modified pre-training model,

thereby obtaining a picture recognition model. It should be noted that, compared with the original pre-training model, the pre-training model after 0 model modification is changed only by the last 2 layers, so that when the modified pre-training model is trained, only the last 2 layers for classification reasoning are trained, and the model training efficiency is greatly improved.

In some alternative implementations of the present embodiment, step 305 includes: acquiring real-time labeling data, wherein the real-time labeling data is obtained by labeling a live broadcast picture in real time by a user; 5 training the modified model by using the real-time labeling data in response to determining that the model training request is received

Training the model to obtain a picture identification model.

In the implementation mode, 2 labeling buttons are provided for a user, the user observes whether the live video is a match picture or not through the video control, and at the same time, the user can complete one frame by clicking the corresponding buttons

And marking the image, wherein the executing main body can acquire real-time marking data of the user. After the marking 0 is finished, the user can click a training button to send a model training request, and when the execution main body receives the model training request of the user, the execution main body trains the reconstructed pre-training model by using the acquired real-time marking data to obtain a picture recognition model. According to the experience of use, the competition picture and the non-competition picture can be respectively marked with 20 frames, and a better training effect can be realized. To be better fit with

The live broadcast pictures of different games are adopted, and the embodiment adopts a scheme of training when a user is locally and offline and instantly marked, namely 5, so that the training efficiency of the model is improved.

And 306, carrying out classification reasoning on the read image data by using the picture identification model, and determining whether the image data is a game picture or not based on a classification reasoning result.

In this embodiment, the executing body performs classification inference on the read multi-frame image data by using the trained picture recognition model, and determines whether the image data is a game picture based on the classification inference result 0 corresponding to the multi-frame image.

In some alternative implementations of the present embodiment, step 306 includes: and determining that the image data is the game picture in response to determining that the classification inference results of the continuous preset number of frame image data are all game pictures and the confidence level of the classification inference results are larger than a preset threshold value.

In this implementation manner, if the classification inference results of the continuous multi-frame image data are all "match images" and the confidence levels of the classification inference results are all greater than the preset threshold, it is determined that the image data are match images, where the preset number of frames may be set to be 10 continuous frames, the preset threshold may be set to be 0.999, and of course, specific values may be set according to actual requirements, which is not limited in this embodiment. Thereby accurately determining whether the current picture is a game picture.

In step 307, in response to determining that the live view is a game view, a reminder pop is generated and displayed.

In this embodiment, the executing host generates and displays a reminding popup window to remind the user to continue watching the live video when determining that the live video is a match video. Step 307 is substantially identical to step 203 of the foregoing embodiment, and specific implementation may refer to the foregoing description of step 203, which is not repeated herein.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the method for identifying the competition pictures in the embodiment adopts the scheme of marking data in real time, can be well adapted to live pictures which are different according to different competition events and different sites, and can not cause inaccurate classification due to the fact that similar pictures are not covered by a model training set, so that training efficiency and identification accuracy of a picture identification model are further improved.

With continued reference to fig. 5, one application scenario diagram of the method of recognition of a game screen of the present disclosure is shown. In the application scene, when a user watches the live game on a browser page, if the game in the current live video has a break, the user can switch the live video to the background, at this time, the execution subject can capture the video picture being played locally by utilizing the characteristic that the PC end video can be continuously played in the background.

Step 501, a Chrome extension procedure is started.

First, the executing entity will initiate the Chrome google extension program.

Step 502, automatically create a tab page and call the ability of the Chrome capture screen.

The executing body automatically creates an independent browser page containing a < video > control and invokes the screen capturing capability provided by the Chrome browser plug-in to capture the live view that has been switched to the background.

Step 503, taking the captured live broadcast picture as a video source of the video control.

Step 504, load the TensorFlow. Js framework and the MobileNet pre-training model.

The execution body loads a pre-training model MobileNet provided by a tensorflow.js framework, the pre-training model receives an input image with a size of 224x224, and based on this, the size of the < video > control is also set to 224x224, so that image data required by the model can be read from the < video > control directly using an API (tf.browser.from pixels) provided by the tensorflow.js.

In step 505, the MobileNet is modified by using the transfer learning.

The transfer learning technology is applied to the pre-training model MobileNet, the last 6 layers responsible for classification in the MobileNet model are deleted, the first 87 layers responsible for feature extraction in the MobileNet model are reserved, and 2 layers are spliced after the former, wherein the 2 layers are specially used for classifying whether the live broadcast picture is a match picture or not in the embodiment.

Step 506, training a model based on the user annotated image.

Then, in order to better adapt to live broadcast pictures of different games, the embodiment adopts a scheme of local offline instant marking and instant training of the user. The method comprises the steps that 2 annotation buttons are provided for a user, the user observes whether the live video is a match picture or not through a < video > control, and the user can complete the annotation of one frame of image by clicking the corresponding buttons. According to the experience of use, the competition picture and the non-competition picture can be respectively marked with 20 frames, and a better training effect can be realized. After the labeling is completed, the user clicks the training button to start training the model. Due to the fact that the transfer learning technology is applied, only the last 2 layers on the splicing in the model need to be trained, the training time is only about 1 minute, and the training efficiency of the model is improved.

And 507, reasoning by using the trained model.

In addition, a button is provided for starting reasoning, and after a user clicks, the model obtained in the previous training is used for reasoning the live broadcast picture in the < video > control, and the reasoning speed can reach tens of times per second, so that the real-time requirement is met.

Step 508, judging whether the reasoning results exceeding the continuous 10 frames of images are all 'competition pictures', and the confidence of the reasoning results is more than 0.999.

To avoid outlier interference in sporadic reasoning results, a threshold is then set for notification to the user: step 509 is performed when there are more than 10 consecutive frames of classification reasoning results, both being "game frames" and the confidence level of the classification is greater than 0.999; otherwise, step 507 is executed again, that is, the reasoning is continued for the live view.

Step 509, the popup alerts the user to return to live.

And calling the capability of the Chrome plug-in, sending out a popup window prompt to the user to remind the user that the competition is restored, so that the user can timely return to the live broadcast to watch the competition.

With further reference to fig. 6, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of a device for identifying a game screen, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for recognizing a game screen of the present embodiment includes: an acquisition module 601, an identification module 602 and a generation module 603. Wherein, the acquiring module 601 is configured to acquire live video in real time in response to determining that the user switches the live video to the background; the identification module 602 is configured to identify live pictures in the live video by using a picture identification model, and determine whether the live pictures are match pictures; the generating module 603 is configured to generate and display a reminder popup to remind the user to continue watching the live video in response to determining that the live view is a game view.

In the present embodiment, in the apparatus 600 for recognizing a game screen: the specific processing and the technical effects of the obtaining module 601, the identifying module 602, and the generating module 603 may refer to the related descriptions of the steps 201 to 203 in the corresponding embodiment of fig. 2, and are not repeated herein.

In some optional implementations of the present embodiment, the acquisition module is further configured to: responding to the determination that the user switches the live video to the background, and creating a browser page, wherein the browser page contains a video control; acquiring live video from the background in real time by utilizing plug-in components of the browser; and taking the live video as a video source of the video control.

In some optional implementations of this embodiment, the apparatus 600 for identifying a game screen further includes: the setting module is configured to set the size of the video control to be a preset size which is the same as the size of the input image received by the picture identification model; and a reading module configured to read the image data from the video control.

In some optional implementations of this embodiment, the identification module includes: the loading sub-module is configured to load a pre-training model, and reform the pre-training model by utilizing a transfer learning method; the training sub-module is configured to train the transformed pre-training model to obtain a picture recognition model; and the reasoning sub-module is configured to conduct classification reasoning on the read image data by using the picture identification model, and determine whether the image data is a game picture or not based on a classification reasoning result.

In some optional implementations of the present embodiment, the loading sub-module is further configured to: and reserving a feature extraction layer of the pre-training model, and splicing a classification layer after the feature extraction layer to obtain the modified pre-training model, wherein the classification layer is used for determining whether the live broadcast picture is a competition picture or not.

In some optional implementations of the present embodiment, the training sub-module is further configured to: acquiring real-time labeling data, wherein the real-time labeling data is obtained by labeling a live broadcast picture in real time by a user; and in response to determining that the model training request is received, training the modified pre-training model by using the real-time labeling data to obtain the picture recognition model.

In some alternative implementations of the present embodiment, the inference submodule is further configured to: and determining that the image data is the game picture in response to determining that the classification inference results of the continuous preset number of frame image data are all game pictures and the confidence level of the classification inference results are larger than a preset threshold value.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, a method of recognizing a game screen. For example, in some embodiments, the method of identifying a game screen may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the above-described method of recognizing a game screen may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method of recognition of the game screen in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of identifying a game frame, comprising:

responding to determining that a user switches the live video to the background, and acquiring the live video in real time;

identifying live pictures in the live video by using a picture identification model, and determining whether the live pictures are match pictures or not;

and generating and displaying a reminding popup window to remind a user to continuously watch the live video in response to the fact that the live video is determined to be a match video.

2. The method of claim 1, wherein the acquiring the live video in real-time in response to determining that a user switches the live video to the background comprises:

responding to the determination that a user switches the live video to the background, and creating a browser page, wherein the browser page contains a video control;

acquiring the live video from the background in real time by utilizing a plug-in of the browser;

and taking the live video as a video source of the video control.

3. The method of claim 2, further comprising:

setting the size of the video control to be a preset size, wherein the preset size is the same as the size of an input image received by the picture identification model;

image data is read from the video control.

4. A method according to claim 3, wherein the identifying live pictures in the live video using a picture identification model, determining whether the live pictures are game pictures, comprises:

loading a pre-training model, and modifying the pre-training model by using a transfer learning method;

training the reformed pre-training model to obtain the picture identification model;

and carrying out classification reasoning on the read image data by using the picture identification model, and determining whether the image data is a game picture or not based on a classification reasoning result.

5. The method of claim 4, wherein the transforming the pre-trained model using a transfer learning method comprises:

and reserving a feature extraction layer of the pre-training model, and splicing a classification layer after the feature extraction layer to obtain the modified pre-training model, wherein the classification layer is used for determining whether the live broadcast picture is a competition picture or not.

6. The method of claim 4, wherein the training the modified pre-training model to obtain the picture recognition model comprises:

acquiring real-time labeling data, wherein the real-time labeling data is obtained by labeling live pictures in real time by a user;

and in response to determining that a model training request is received, training the reformed pre-training model by using the real-time labeling data to obtain the picture recognition model.

7. The method of claim 4, wherein the classifying inference of the read image data using the picture recognition model, determining whether the image data is a game picture based on the classifying inference result, comprises:

and determining that the image data are game pictures in response to determining that the classification inference results of the continuous preset number of frame image data are all game pictures and the confidence level of the classification inference results are larger than a preset threshold value.

8. An apparatus for recognizing a game screen, comprising:

an acquisition module configured to acquire live video in real time in response to determining that a user switches the live video to the background;

the identification module is configured to identify live pictures in the live video by using a picture identification model and determine whether the live pictures are match pictures or not;

and the generation module is configured to generate and display a reminding popup window for reminding a user to continuously watch the live video in response to determining that the live video is a match video.

9. The apparatus of claim 8, wherein the acquisition module is further configured to:

and taking the live video as a video source of the video control.

10. The apparatus of claim 9, further comprising:

the setting module is configured to set the size of the video control to be a preset size, and the preset size is the same as the size of the input image received by the picture identification model;

a reading module configured to read image data from the video control.

11. The apparatus of claim 10, wherein the identification module comprises:

the loading sub-module is configured to load a pre-training model, and reform the pre-training model by utilizing a transfer learning method;

the training sub-module is configured to train the transformed pre-training model to obtain the picture identification model;

and the reasoning sub-module is configured to conduct classification reasoning on the read image data by utilizing the picture identification model, and determine whether the image data is a game picture or not based on a classification reasoning result.

12. The apparatus of claim 11, wherein the loading sub-module is further configured to:

13. The apparatus of claim 11, wherein the training sub-module is further configured to:

14. The apparatus of claim 11, wherein the inference sub-module is further configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.