CN108805091B

CN108805091B - Method and apparatus for generating a model

Info

Publication number: CN108805091B
Application number: CN201810617804.4A
Authority: CN
Inventors: 李伟健; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-06-15
Filing date: 2018-06-15
Publication date: 2021-08-10
Anticipated expiration: 2038-06-15
Also published as: WO2019237657A1; CN108805091A

Abstract

The embodiment of the application discloses a method and a device for generating a model. One embodiment of the method comprises: the method comprises the steps of obtaining a training sample set, and dividing the training sample set into a preset number of training sample groups, wherein training samples comprise sample videos and sample identification results labeled in advance aiming at the sample videos, and the sample identification results are used for indicating whether the sample videos are videos obtained by shooting screens displaying sample objects; for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as input, taking a sample recognition result corresponding to the input sample video as expected output, and training by utilizing a machine learning method to obtain an initial video recognition model corresponding to the training sample group; and generating a video identification model based on the obtained initial video identification model. The embodiment can obtain the model which can be used for identifying the video, and enriches the generation mode of the model.

Description

Method and apparatus for generating a model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a model.

Background

At present, the mode of sharing information by shooting videos becomes an important information sharing mode in the life of people. In practice, in order to use the video shot by other users as the video shot by the users, the videos shot by other users are recorded.

It can be understood that recording videos of other users often brings adverse effects such as infringement and lossy fairness, and therefore a platform for information sharing can identify the videos and further intercept the videos.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a model and a method and a device for identifying a video.

In a first aspect, an embodiment of the present application provides a method for generating a model, where the method includes: the method comprises the steps of obtaining a training sample set, and dividing the training sample set into a preset number of training sample groups, wherein training samples comprise sample videos and sample identification results labeled in advance aiming at the sample videos, the sample videos are videos obtained by shooting sample objects, and the sample identification results are used for indicating whether the sample videos are videos obtained by shooting screens displaying the sample objects; for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as input, taking a sample recognition result corresponding to the input sample video as expected output, and training by utilizing a machine learning method to obtain an initial video recognition model corresponding to the training sample group; and generating a video identification model based on the obtained initial video identification model.

In some embodiments, for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and obtaining an initial video recognition model corresponding to the training sample group by training using a machine learning method, the method includes: selecting a training sample group from a preset number of training sample groups as a candidate training sample group, and executing the following training steps based on the candidate training sample group and an initial model: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in a preset number of training sample groups; in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

In some embodiments, for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and obtaining an initial video recognition model corresponding to the training sample group by training using a machine learning method, further includes: and in response to determining that the unselected training sample group exists, selecting the training sample group from the unselected training sample group as a new candidate training sample group, taking the initial video identification model obtained last time as a new initial model, and continuing to execute the training step.

In some embodiments, for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and obtaining an initial video recognition model corresponding to the training sample group by training using a machine learning method, the method includes: determining a numerical value for representing the degree of goodness of a preset number of training sample groups; based on the determined values, selecting an optimal training sample set from a preset number of training sample sets as a candidate training sample set, and based on the candidate training sample set and the initial model, performing the following training steps: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in a preset number of training sample groups; in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

In some embodiments, for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and obtaining an initial video recognition model corresponding to the training sample group by training using a machine learning method, further includes: in response to determining that there is an unselected training sample set, based on the determined values, selecting an optimal training sample set from the unselected training sample set as a new candidate training sample set, taking the most recently obtained initial video recognition model as a new initial model, and continuing to perform the training step.

In some embodiments, determining a value for characterizing a degree of goodness of a preset number of training sample sets comprises: acquiring a preset verification sample set, wherein the verification samples comprise verification videos and verification identification results labeled in advance for the verification videos; for a training sample set in a preset number of training sample sets, executing the following steps: taking a sample video of a training sample in the group of training samples as input, taking a sample identification result corresponding to the input sample video as output, and training by using a machine learning method to obtain a to-be-verified video identification model corresponding to the group of training samples; inputting the verification video of the verification sample in the verification sample set into the to-be-verified video recognition model corresponding to the training sample set to obtain an actual recognition result, determining a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and generating a numerical value for representing the quality degree of the training sample set based on the determined loss value.

In some embodiments, generating a video recognition model based on the obtained initial video recognition model comprises: assigning a weight to the obtained initial video identification model based on the determined numerical value; and fusing the obtained initial video recognition models based on the distributed weights to generate the video recognition models.

In some embodiments, generating a video recognition model based on the obtained initial video recognition model comprises: and determining the initial video identification model obtained at the last time as the video identification model.

In a second aspect, an embodiment of the present application provides an apparatus for generating a model, where the apparatus includes: the device comprises a sample acquisition unit, a comparison unit and a comparison unit, wherein the sample acquisition unit is configured to acquire a training sample set and divide the training sample set into a preset number of training sample groups, the training samples comprise sample videos and sample identification results labeled in advance aiming at the sample videos, the sample videos are videos obtained by shooting sample objects, and the sample identification results are used for indicating whether the sample videos are videos obtained by shooting screens displaying the sample objects; the model training unit is configured to take a sample video of a training sample in a set of training samples as input, take a sample recognition result corresponding to the input sample video as expected output and obtain an initial video recognition model corresponding to the set of training samples by utilizing a machine learning method for a training sample set in a preset number of training sample sets; a model generation unit configured to generate a video recognition model based on the obtained initial video recognition model.

In some embodiments, the model training unit comprises: a first execution module configured to select a training sample set from a preset number of training sample sets as a candidate training sample set, and based on the candidate training sample set and an initial model, execute the following training steps: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in a preset number of training sample groups; in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

In some embodiments, the model training unit further comprises: and the second execution module is configured to respond to the fact that the unselected training sample group exists, select the training sample group from the unselected training sample group as a new candidate training sample group, use the initial video recognition model obtained last time as a new initial model, and continue to execute the training step.

In some embodiments, the model training unit comprises: a value determination module configured to determine a value representing a degree of goodness of a preset number of training sample sets; a third executing module configured to select an optimal training sample set from a preset number of training sample sets as a candidate training sample set based on the determined numerical value, and execute the following training steps based on the candidate training sample set and the initial model: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in a preset number of training sample groups; in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

In some embodiments, the model training unit further comprises: and the fourth execution module is configured to respond to the determination that the unselected training sample groups exist, based on the determined numerical value, select an optimal training sample group from the unselected training sample groups as a new candidate training sample group, use the initial video recognition model obtained last time as a new initial model, and continue to execute the training step.

In some embodiments, the numerical determination module comprises: the system comprises a sample acquisition module, a verification module and a verification module, wherein the sample acquisition module is configured to acquire a preset verification sample set, and the verification samples comprise verification videos and verification identification results which are labeled in advance for the verification videos; a value generation module configured to perform the following steps for a training sample set of a preset number of training sample sets: taking a sample video of a training sample in the group of training samples as input, taking a sample identification result corresponding to the input sample video as output, and training by using a machine learning method to obtain a to-be-verified video identification model corresponding to the group of training samples; inputting the verification video of the verification sample in the verification sample set into the to-be-verified video recognition model corresponding to the training sample set to obtain an actual recognition result, determining a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and generating a numerical value for representing the quality degree of the training sample set based on the determined loss value.

In some embodiments, the model generation unit comprises: a weight assignment module configured to assign weights to the obtained initial video identification model based on the determined values; and the model fusion module is configured to fuse the obtained initial video recognition models based on the assigned weights to generate the video recognition models.

In a third aspect, an embodiment of the present application provides a method for identifying a video, where the method includes: acquiring a video to be identified, wherein the video to be identified is a video obtained by shooting an object; inputting a video to be recognized into a video recognition model generated by adopting the method described in any embodiment of the first aspect, and generating a recognition result corresponding to the video to be recognized, where the recognition result is used to indicate whether the video to be recognized is a video obtained by shooting a screen of a display object.

In a fourth aspect, an embodiment of the present application provides an apparatus for identifying a video, where the apparatus includes: a video acquisition unit configured to acquire a video to be identified, wherein the video to be identified is a video obtained by shooting a subject; and a result generating unit, configured to input the video to be recognized into the video recognition model generated by adopting the method described in any one of the embodiments of the first aspect, and generate a recognition result corresponding to the video to be recognized, wherein the recognition result is used for indicating whether the video to be recognized is a video obtained by shooting the screen of the display object.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as described in any of the embodiments of the first and third aspects above.

In a sixth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method described in any of the first and third aspects above.

The method and the device for generating the model provided by the embodiment of the application are characterized in that a training sample set is obtained, the training sample set is divided into a preset number of training sample groups, the training samples comprise sample videos and sample recognition results labeled in advance for the sample videos, the sample videos are videos obtained by shooting sample objects, the sample recognition results are used for indicating whether the sample videos are videos obtained by shooting screens displaying the sample objects, then for the training sample groups in the preset number of training sample groups, the sample videos of the training samples in the training sample group are used as input, the sample recognition results corresponding to the input sample videos are used as expected output, initial video recognition models corresponding to the training sample group are obtained by training through a machine learning method, and finally based on the obtained initial video recognition models, the video recognition model is generated, so that a model which can be used for recognizing the video can be obtained, and the generation mode of the model is enriched.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a model according to the present application;

FIG. 3 is a schematic illustration of an application scenario of a method for generating a model according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for generating a model according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating a model according to the present application;

FIG. 6 is a flow diagram of one embodiment of a method for identifying videos, according to the present application;

FIG. 7 is a block diagram illustrating an embodiment of an apparatus for identifying video according to the present application;

FIG. 8 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 for a method for generating a model, an apparatus for generating a model, a method for identifying a video, or an apparatus for identifying a video to which embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminals

101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the

terminals

101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user 110 may use the

terminals

101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The

terminals

101 and 102 may have various client applications installed thereon, such as a model training application, a video recognition application, a social application, a payment application, a web browser, an instant messenger, and the like.

Here, the

terminals

101 and 102 may be hardware or software. When the

terminals

101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the

terminals

101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

When the

terminals

101, 102 are hardware, a video capture device may also be installed thereon. The video acquisition equipment can be various equipment capable of realizing the function of acquiring video, such as a camera, a sensor and the like. The user 110 may capture video using a video capture device on the

terminal

101, 102.

Database server 104 may be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. The sample can include a sample video and a sample identification result pre-labeled for the sample video. In this way, the user 110 may also select a sample from a set of samples stored by the database server 104 via the

terminals

101, 102.

The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the

terminals

101, 102. The background server may train the initial model using the samples in the sample set sent by the terminal 101, 102, and may send the training result (e.g., the generated video recognition model) to the terminal 101, 102. In this way, the user can apply the generated video recognition model for video recognition.

Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating a model or the method for identifying a video provided in the embodiments of the present application is generally performed by the server 105. Accordingly, the means for generating a model or the means for identifying a video are also typically provided in the server 105.

It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.

It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present application is shown. The method for generating the model comprises the following steps:

step 201, obtaining a training sample set, and dividing the training sample set into a preset number of training sample sets.

In this embodiment, an executing entity (e.g., the server shown in fig. 1) of the method for generating a model may obtain a training sample set from a database server (e.g., the database server 104 shown in fig. 1) or a terminal (e.g., the

terminals

101 and 102 shown in fig. 1) through a wired connection manner or a wireless connection manner, and divide the training sample set into a preset number of training sample sets. The training samples comprise sample videos and sample identification results which are labeled in advance aiming at the sample videos. The sample video may be a video obtained by photographing a sample object. The sample object may be various things such as a person, an object such as an animal, or a behavior such as running, swimming, or the like.

In this embodiment, the sample recognition result may include, but is not limited to, at least one of the following: characters, numbers, symbols. The sample recognition result may be used to indicate whether the sample video is a video obtained by photographing a screen on which the above sample object is displayed. For example, the sample recognition result may include a number 1 and a number 0, where the number 1 may be used to indicate that the sample video is a video obtained by photographing a screen displaying the sample object; the numeral 0 may be used to indicate that the sample video is not a video obtained by photographing a screen on which the above sample object is displayed.

In this embodiment, the execution subject may divide the training sample set into a preset number of training sample groups in various ways. For example, the executing entity may divide the training sample set into a preset number of training sample groups in an equal division manner, or may divide the training sample set so that a number of training samples included in each of the preset number of training sample groups is greater than or equal to a preset threshold. It should be noted that the preset number can be preset by a technician.

Step 202, regarding a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and training by using a machine learning method to obtain an initial video recognition model corresponding to the training sample group.

In this embodiment, for the training sample group in the preset number of training sample groups obtained in step 201, the executing entity may use the sample video of the training sample in the training sample group as an input, use the sample recognition result corresponding to the input sample video as an expected output, and train and obtain the initial video recognition model corresponding to the training sample group by using a machine learning method. The initial video recognition model is a model obtained by training with training samples in a training sample group, and can be used for determining a final video recognition model.

As an example, for each training sample group in the preset number of training sample groups, a preset initial model (e.g., a Convolutional Neural Network (CNN), a residual error Network (ResNet), etc.) may be used for training, and finally, the preset number of initial video recognition models corresponding to the training sample groups are obtained. Specifically, for each training sample group in a preset number of training sample groups, the executing entity may input a sample video of a training sample in the training sample group into the initial model to obtain a recognition result corresponding to the input sample video, then train the initial model by using a machine learning method with the sample recognition result corresponding to the input sample video as an expected output of the initial model, and determine the trained initial model as the initial video recognition model.

In some optional implementation manners of this embodiment, the executing entity may obtain a preset number of initial video recognition models based on a preset number of training sample sets by:

step 2021, selecting a training sample group from a preset number of training sample groups as a candidate training sample group.

In this embodiment, the executing entity may select a training sample set from the preset number of training sample sets obtained in step 201 as a candidate training sample set, and execute the training steps of step 2022 to step 2024. The selection manner of the training sample set is not limited in this application. For example, the selection may be random, or a training sample group with a large number of training samples may be preferentially selected.

Step 2022: taking a sample video of a training sample in the candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training the initial model by using a machine learning method to obtain an initial video identification model.

Specifically, the executing entity may obtain the initial video recognition model corresponding to the candidate training sample group through the following steps:

the executing agent may select a training sample from the candidate training sample set, and execute the following steps: inputting a sample video of the selected training sample into the initial model to obtain a recognition result; taking a sample identification result corresponding to the input sample video as expected output of the initial model, and adjusting parameters of the initial model based on the obtained identification result and the sample identification result; determining whether the training sample which is not selected exists in the candidate training sample group; and determining the adjusted initial model as the initial video recognition model corresponding to the candidate training sample group in response to the fact that the unselected training samples do not exist. It should be noted that the selection manner of the training samples is not limited in the present application. For example, the selection may be random, or a training sample with better definition of the sample video may be preferentially selected.

Step 2023: and determining whether the unselected training sample groups exist in the preset number of training sample groups.

Step 2024: in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

It can be understood that, when there is no unselected training sample group in the preset number of training sample groups, that is, for each training sample group in the preset number of training sample groups, a corresponding initial video recognition model is generated by training, so that the executing body may obtain the preset number of initial video recognition models in response to determining that there is no unselected training sample group in the preset number of training sample groups.

Optionally, the executing agent may further select a training sample set from the unselected training sample set as a new candidate training sample set in response to determining that the unselected training sample set exists, and continue to execute the training step 2022 and 2024 by using the most recently obtained initial video recognition model as a new initial model.

In this implementation manner, the execution subject may use the initial video recognition model obtained through training of the preferentially selected training sample set as the initial model corresponding to the subsequently selected training sample set, so that sample data may be effectively utilized to generate a more accurate initial video recognition model.

And step 203, generating a video identification model based on the obtained initial video identification model.

In this embodiment, the executing entity may generate the video recognition model based on the initial video recognition model obtained in step 202.

Specifically, the executing entity may select one initial video recognition model from the obtained initial video recognition models as the video recognition model, or process the obtained initial video recognition model to obtain the video recognition model.

As an example, the executing entity may assign the same weight to each of the initial video recognition models based on the number of the obtained initial video recognition models, and further fuse the obtained initial video recognition models based on the assigned weights to obtain the video recognition models.

For example, the resulting initial video recognition model includes: "y ═ ax + b"; "y ═ cx + d". Wherein x is an independent variable and can be used for representing the input of the model; y is a dependent variable and can be used for representing the output of the model; a and b are coefficients of the first initial video identification model; c and d are coefficients of the second initial video recognition model. Here, since two initial video recognition models are obtained, it is possible to determine that the weight assigned to each initial video recognition model is 0.5(0.5 ═ 1 ÷ 2), and further, it is possible to obtain a video recognition model "y ═ 0.5x (a + c) +0.5(b + d)" (y ═ 0.5 × (ax + b) +0.5 × (cx + d)) by fusing the model "y ═ ax + b" and the model "y ═ cx + d" based on the assigned weights.

In some optional implementations of the embodiment, based on the initial video recognition model obtained in step 2021 and 2024 in the above optional implementations, the executing entity may directly determine the initial video recognition model obtained last time as the video recognition model.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method of model generation according to the present embodiment. In the application scenario of fig. 3, a terminal 301 used by a user may have a model training application installed thereon. After the user opens the application and uploads the training sample set or the storage path of the training sample set, the server 302 providing background support for the application may run a method for generating a model, including:

first, a training sample set 303 may be obtained and the training sample set 303 may be divided into two (a preset number of)

training sample groups

304, 305, where the training samples include a sample video and a sample recognition result pre-labeled for the sample video, the sample video being a video obtained by shooting a sample object, and the sample recognition result being used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object.

Then, for the training sample group 304, the executing entity may take the sample video of the training sample in the training sample group as input, take the sample recognition result corresponding to the input sample video as expected output, and train by using a machine learning method to obtain an initial video recognition model 306 corresponding to the training sample group; for the training sample group 305, the executing entity may obtain an initial video recognition model 307 corresponding to the training sample group by training using a machine learning method, with the sample video of the training sample in the training sample group as an input, and with the sample recognition result corresponding to the input sample video as an expected output.

Finally, the executing entity may generate a video recognition model 308 based on the obtained initial video recognition model 306 and the initial video recognition model 307.

At this time, the server 302 may also transmit prompt information indicating that the model training is completed to the terminal 301. The prompt message may be a voice and/or text message. In this way, the user can acquire the video identification model at a preset storage position.

In the method provided by the embodiment of the application, the training sample set is obtained, the training sample set is divided into the training sample groups with the preset number, then for the training sample groups in the training sample groups with the preset number, the sample videos of the training samples in the training sample groups are used as input, the sample recognition results corresponding to the input sample videos are used as expected output, the initial video recognition models corresponding to the training sample groups are obtained by utilizing a machine learning method, and finally the video recognition models are generated based on the obtained initial video recognition models, so that the model which can be used for recognizing the videos can be obtained, and the generation mode of the model is enriched.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating a model is shown. The process 400 of the method for generating a model includes the steps of:

step 401, obtaining a training sample set, and dividing the training sample set into a preset number of training sample sets.

terminals

101 and 102 shown in fig. 1) through a wired connection manner or a wireless connection manner, and divide the training sample set into a preset number of training sample sets.

It should be noted that step 401 may be implemented in a similar manner to step 201 in the foregoing embodiment. Accordingly, the above description regarding step 201 is also applicable to step 401 of this embodiment, and is not repeated here.

In step 402, a value representing the degree of goodness of a preset number of training sample sets is determined.

In this embodiment, for the preset number of training sample sets obtained in step 401, the executing entity may determine a value for characterizing the degree of superiority and inferiority of the preset number of training sample sets. Specifically, the executing agent may determine the value for representing the degree of superiority and inferiority of a preset number of training sample sets in various manners, for example, the executing agent may determine the number of training samples included in each training sample set, and determine the quantity value of the determined number as the value for representing the degree of superiority and inferiority of the preset number of training sample sets. Here, it can be understood that the more training samples included in the training sample set, the more parameter adjustment times for the initial model may be increased, and the more accurate the initial recognition model obtained by training may be, so that the execution subject may determine a value for representing the degree of superiority and inferiority of the preset number of training sample sets according to the number of training samples included in the training sample set.

Here, the correspondence between the magnitude of the numerical value and the degree of superiority and inferiority may be set in advance by a skilled person. Specifically, the correspondence may be set to be larger, and the training sample set is more optimal; it can also be set that the smaller the value, the better the training sample set.

In some optional implementations of the embodiment, the executing entity may determine the value for characterizing the degree of superiority and inferiority of the preset number of training sample sets by:

first, the execution subject may obtain a preset verification sample set, where the verification sample includes a video for verification and a verification identification result pre-labeled for the video for verification.

Then, for a training sample set in a preset number of training sample sets, the executing entity may perform the following steps: taking a sample video of a training sample in the group of training samples as input, taking a sample identification result corresponding to the input sample video as output, and training by using a machine learning method to obtain a to-be-verified video identification model corresponding to the group of training samples; inputting the verification video of the verification sample in the verification sample set into the to-be-verified video recognition model corresponding to the training sample set to obtain an actual recognition result, determining a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and generating a numerical value for representing the quality degree of the training sample set based on the determined loss value.

Wherein the loss value may be used to characterize the difference between the actual output and the desired output. It can be understood that the smaller the difference is, the more accurate the trained video identification model to be verified is, and further, the better the training sample set is utilized. Based on the relationship between the loss value and the degree of goodness of the training sample set, the executing agent may generate a numerical value for representing the degree of goodness of the training sample set based on the determined loss value in various ways. For example, the loss value may be directly determined as a numerical value for characterizing the degree of goodness of the training sample set, in which case, the smaller the numerical value for characterizing the degree of goodness of the training sample set, the better the training sample set; the reciprocal of the loss value may also be determined as a numerical value for characterizing the degree of goodness of the training sample set, in which case, the larger the numerical value for characterizing the degree of goodness of the training sample set, the better the training sample set.

Here, it should be noted that the execution subject may calculate a loss value of the obtained actual recognition result with respect to the verification recognition result corresponding to the input verification video by using various preset loss functions, for example, may calculate a loss value by using a norm of L2 as a loss function.

And step 403, based on the determined numerical value, selecting an optimal training sample group from a preset number of training sample groups as a candidate training sample group.

In this embodiment, the executing entity may select an optimal training sample set from the preset number of training sample sets obtained in step 401 as a candidate training sample set based on the value determined in step 402, and execute the training steps from step 404 to step 406.

It should be noted that, the specific implementation of this embodiment is to select an optimal training sample set from a preset number of training sample sets as a candidate training sample set, so that when the greater the value used for characterizing the degree of superiority and inferiority of the training sample set, the better the training sample set, the execution subject may select, from the preset number of training sample sets, the training sample set corresponding to the determined maximum value as the candidate training sample set; when the smaller the value for representing the degree of superiority of the training sample set is, the better the training sample set is, the execution subject may select, as the candidate training sample set, the training sample set corresponding to the determined minimum value from the preset number of training sample sets.

And 404, taking the sample video of the training sample in the candidate training sample group as input, taking the sample identification result corresponding to the input sample video as expected output, and training the initial model by using a machine learning method to obtain an initial video identification model.

Step 405, determining whether there is an unselected training sample group in a preset number of training sample groups.

In step 406, in response to determining that there is no unselected training sample group, a preset number of initial video recognition models are obtained.

In some optional implementation manners of this embodiment, the executing agent may further select, in response to determining that there is an unselected training sample group, an optimal training sample group from the unselected training sample group as a new candidate training sample group based on the determined value, use the initial video recognition model obtained last time as a new initial model, and continue to execute the

training step

404 and 406.

Step 407, generating a video recognition model based on the obtained initial video recognition model.

In this embodiment, the executing entity may generate the video recognition model based on the initial video recognition model obtained in step 406.

Specifically, the executing entity may select an initial video recognition model from the obtained initial recognition models as the video recognition model, or process the obtained initial video recognition model to obtain the video recognition model.

In some optional implementations of this embodiment, the executing entity may generate the video recognition model by: first, the executing agent may assign a weight to the obtained initial video recognition model based on the value determined in step 402. Then, the executing body may fuse the obtained initial video recognition models based on the assigned weights to generate the video recognition models. Specifically, the executing agent may determine the degree of superiority of each training sample set based on the determined numerical value, and further assign weights to the obtained initial video recognition models in various manners, so that the initial video recognition model corresponding to the superior training sample set has a larger weight, and the initial video recognition model corresponding to the inferior training sample set has a smaller weight.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the process 400 of the method for generating a model in this embodiment highlights the steps of determining the value for representing the degree of superiority and inferiority of the preset number of training sample sets, and further selecting the training sample set from the preset number of training sample sets for training based on the determined value. Therefore, the scheme described in this embodiment can firstly utilize a better training sample set to train, and obtain a more accurate initial video recognition model, so that subsequent training can perform smaller adjustment on the initial video recognition model on the basis, and the model generation efficiency is improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a model, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a model of the present embodiment includes: a sample acquisition unit 501, a model training unit 502, and a model generation unit 503. The sample acquiring unit 501 is configured to acquire a training sample set, and divide the training sample set into a preset number of training sample groups, where a training sample includes a sample video and a sample identification result labeled in advance for the sample video, the sample video is a video obtained by shooting a sample object, and the sample identification result is used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object; the model training unit 502 is configured to, for a training sample group in a preset number of training sample groups, take a sample video of a training sample in the training sample group as an input, take a sample recognition result corresponding to the input sample video as an expected output, and train by using a machine learning method to obtain an initial video recognition model corresponding to the training sample group; the model generation unit 503 is configured to generate a video recognition model based on the obtained initial video recognition model.

In this embodiment, the sample acquiring unit 501 of the apparatus 500 for generating a model may acquire a training sample set from a database server (e.g., the database server 104 shown in fig. 1) or a terminal (e.g., the

terminals

101 and 102 shown in fig. 1) through a wired connection manner or a wireless connection manner, and divide the training sample set into a preset number of training sample groups. The training samples comprise sample videos and sample identification results which are labeled in advance aiming at the sample videos. The sample video may be a video obtained by photographing a sample object. The sample object may be various things.

In this embodiment, the sample recognition result may include, but is not limited to, at least one of the following: characters, numbers, symbols. The sample recognition result may be used to indicate whether the sample video is a video obtained by photographing a screen on which the above sample object is displayed.

In this embodiment, the sample acquiring unit 501 may divide the training sample set into a preset number of training sample groups in various ways. It should be noted that the preset number can be preset by a technician.

In this embodiment, for a training sample group in a preset number of training sample groups obtained by the sample obtaining unit 501, the model training unit 502 may take a sample video of a training sample in the training sample group as an input, take a sample recognition result corresponding to the input sample video as an expected output, and train by using a machine learning method to obtain an initial video recognition model corresponding to the training sample group. The initial video recognition model is a model obtained by training with training samples in a training sample group, and can be used for determining a final video recognition model.

In this embodiment, the model generation unit 503 may generate the video recognition model based on the initial video recognition model obtained by the model training unit 502.

In some optional implementations of this embodiment, the model training unit 502 may include: a first execution module (not shown in the figure) configured to select a training sample set from a preset number of training sample sets as a candidate training sample set, and based on the candidate training sample set and the initial model, execute the following training steps: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in a preset number of training sample groups; in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

In some optional implementations of this embodiment, the model training unit 502 may further include: and a second execution module (not shown in the figure) configured to, in response to determining that the unselected training sample set exists, select the training sample set from the unselected training sample set as a new candidate training sample set, use the newly obtained initial video recognition model as a new initial model, and continue to execute the training step.

In some optional implementations of this embodiment, the model training unit 502 may include: a value determining module (not shown in the figures) configured to determine a value representing a degree of goodness of a preset number of training sample sets; a third executing module (not shown in the figure) configured to select an optimal training sample set from a preset number of training sample sets as a candidate training sample set based on the determined values, and execute the following training steps based on the candidate training sample set and the initial model: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in a preset number of training sample groups; in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

In some optional implementations of this embodiment, the model training unit 502 may further include: and a fourth executing module (not shown in the figure) configured to, in response to determining that the unselected training sample sets exist, select an optimal training sample set from the unselected training sample sets as a new candidate training sample set based on the determined values, and continue to execute the training step by using the initial video recognition model obtained last time as a new initial model.

In some optional implementations of this embodiment, the value determining module (not shown in the figure) may include: a sample acquisition module (not shown in the figure) configured to acquire a preset verification sample set, wherein the verification sample includes a video for verification and a recognition result for verification pre-labeled for the video for verification; a value generation module (not shown in the figure) configured to perform, for a training sample set of a preset number of training sample sets, the following steps: taking a sample video of a training sample in the group of training samples as input, taking a sample identification result corresponding to the input sample video as output, and training by using a machine learning method to obtain a to-be-verified video identification model corresponding to the group of training samples; inputting the verification video of the verification sample in the verification sample set into the to-be-verified video recognition model corresponding to the training sample set to obtain an actual recognition result, determining a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and generating a numerical value for representing the quality degree of the training sample set based on the determined loss value.

In some optional implementations of this embodiment, the model generating unit 503 may include: a weight assignment module (not shown in the figure) configured to assign weights to the obtained initial video recognition models based on the determined values; and a model fusion module (not shown in the figure) configured to fuse the obtained initial video recognition models based on the assigned weights to generate the video recognition models.

In some optional implementations of this embodiment, the model generating unit 503 may be further configured to: and determining the initial video identification model obtained at the last time as the video identification model.

The apparatus 500 provided in the foregoing embodiment of the present application obtains a training sample set through the sample obtaining unit 501, and divides the training sample set into a preset number of training sample sets, where a training sample includes a sample video and a sample recognition result pre-labeled for the sample video, the sample video is a video obtained by shooting a sample object, the sample recognition result is used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object, then, for a training sample set in the preset number of training sample sets, the sample video of a training sample in the set of training samples is taken as an input, the model training unit 502 takes a sample recognition result corresponding to the input sample video as an expected output, an initial video recognition model corresponding to the set of training samples is obtained by training through a machine learning method, and finally, the model generating unit 503 identifies a model based on the obtained initial video, the video recognition model is generated, so that a model which can be used for recognizing the video can be obtained, and the generation mode of the model is enriched.

Referring to fig. 6, a flow 600 of one embodiment of a method for identifying videos provided herein is shown. The method for identifying a video may include the steps of:

step 601, obtaining a video to be identified.

In the present embodiment, the execution subject (for example, the server 105 shown in fig. 1) of the method for identifying a video may acquire a band identification video by a wired connection manner or a wireless connection manner. For example, the execution subject may obtain a video stored in a database server (e.g., database server 104 shown in fig. 1), or may receive a video captured by a terminal (e.g.,

terminals

101 and 102 shown in fig. 1) or other device.

In this embodiment, the video to be recognized may be a video obtained by shooting a subject. The object may be various things such as a person, an object such as an animal, or a behavior such as running, swimming, or the like.

Step 602, inputting the video to be recognized into the video recognition model, and generating a recognition result corresponding to the video to be recognized.

In this embodiment, the executing entity may input the video to be recognized obtained in step 601 into the video model, so as to generate a recognition result corresponding to the video to be recognized. The recognition result may be used to indicate whether the video to be recognized is a video obtained by shooting a screen displaying the object.

In this embodiment, the video recognition model may be generated using the method described above in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.

It should be noted that the method for identifying a video according to the present embodiment may be used to test the video identification model generated by the foregoing embodiments. And then the video identification model can be continuously optimized according to the test result. The method may also be a practical application method of the video recognition model generated by the above embodiments. The video identification is carried out by adopting the video identification model generated by the embodiments, so that the detection of the video obtained by recording the screen can be realized, and the accuracy of the video identification is improved.

With continuing reference to FIG. 7, the present application provides one embodiment of an apparatus for identifying video as an implementation of the method illustrated in FIG. 6 above. The embodiment of the device corresponds to the embodiment of the method shown in fig. 6, and the device can be applied to various electronic devices.

As shown in fig. 7, the apparatus 700 for recognizing a video of the present embodiment may include: a video acquisition unit 701 and a result generation unit 702. Wherein the video acquiring unit 701 is configured to acquire a video to be identified, wherein the video to be identified is a video obtained by shooting a subject; the result generating unit 702 is configured to input the video to be recognized into the model generated by the method described in the embodiment of fig. 2, and generate a recognition result corresponding to the video to be recognized, where the recognition result is used to indicate whether the video to be recognized is a video obtained by shooting the screen of the display object.

It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 6. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a touch panel, a keyboard, a mouse, a camera, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a sample acquisition unit, a model training unit, and a model generation unit. As another example, it can also be described as: a processor includes an acquisition unit, a training unit, and a generation unit. Where the names of these units do not in some cases constitute a limitation on the units themselves, for example, a sample acquisition unit may also be described as a "unit that acquires a set of training samples".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: the method comprises the steps of obtaining a training sample set, and dividing the training sample set into a preset number of training sample groups, wherein training samples comprise sample videos and sample identification results labeled in advance aiming at the sample videos, the sample videos are videos obtained by shooting sample objects, and the sample identification results are used for indicating whether the sample videos are videos obtained by shooting screens displaying the sample objects; for a training sample group in a preset number of training sample groups, taking a sample video of a training sample in the training sample group as input, taking a sample recognition result corresponding to the input sample video as expected output, and training by utilizing a machine learning method to obtain an initial video recognition model corresponding to the training sample group; and generating a video identification model based on the obtained initial video identification model.

Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a video to be identified, wherein the video to be identified is a video obtained by shooting an object; and inputting the video to be recognized into the video recognition model, and generating a recognition result corresponding to the video to be recognized, wherein the recognition result is used for indicating whether the video to be recognized is a video obtained by shooting a screen of a display object. The video recognition model may be generated using the method for generating a model as described in the embodiments above.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a model, comprising:

the method comprises the steps of obtaining a training sample set, and dividing the training sample set into a preset number of training sample groups, wherein training samples comprise sample videos and sample identification results labeled in advance for the sample videos, the sample videos are videos obtained by shooting sample objects, and the sample identification results are used for indicating whether the sample videos are videos obtained by shooting screens displaying the sample objects;

for the training sample group in the preset number of training sample groups, taking the sample video of the training sample in the training sample group as input, taking the sample recognition result corresponding to the input sample video as expected output, and obtaining an initial video recognition model corresponding to the training sample group by training with a machine learning method, specifically comprising: based on the degree of goodness and badness of the training sample sets with the preset number, selecting an optimal training sample set from the training sample sets with the preset number as a candidate training sample set, and based on the candidate training sample set and an initial model, executing the following training steps: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in the preset number of training sample groups; in response to determining that an unselected training sample group exists, selecting an optimal training sample group from the unselected training sample group as a new candidate training sample group, taking an initial video recognition model obtained last time as a new initial model, and continuing to execute the training step;

and generating a video identification model based on the obtained initial video identification model.

2. The method according to claim 1, wherein for the training sample group in the preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and training by using a machine learning method to obtain an initial video recognition model corresponding to the training sample group, includes:

in response to determining that there are no unselected training sample sets, a preset number of initial video recognition models are obtained.

3. The method according to claim 1, wherein for the training sample group in the preset number of training sample groups, taking a sample video of a training sample in the training sample group as an input, taking a sample recognition result corresponding to the input sample video as an expected output, and training by using a machine learning method to obtain an initial video recognition model corresponding to the training sample group, includes:

determining a numerical value for representing the degree of goodness of the preset number of training sample groups;

based on the determined values, selecting an optimal training sample set from the preset number of training sample sets as a candidate training sample set, and based on the candidate training sample set and the initial model, performing the following training steps: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in the preset number of training sample groups; obtaining a preset number of initial video recognition models in response to determining that there is no unselected training sample group;

in response to determining that there is an unselected training sample set, based on the determined values, selecting an optimal training sample set from the unselected training sample set as a new candidate training sample set, taking the most recently obtained initial video recognition model as a new initial model, and continuing to perform the training step.

4. The method of claim 3, wherein the determining a value characterizing a degree of goodness of the preset number of training sample sets comprises:

acquiring a preset verification sample set, wherein the verification samples comprise verification videos and verification identification results labeled in advance for the verification videos;

for the training sample groups in the preset number of training sample groups, executing the following steps: taking a sample video of a training sample in the group of training samples as input, taking a sample identification result corresponding to the input sample video as output, and training by using a machine learning method to obtain a to-be-verified video identification model corresponding to the group of training samples; inputting the verification video of the verification sample in the verification sample set into the to-be-verified video recognition model corresponding to the training sample set to obtain an actual recognition result, determining a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and generating a numerical value for representing the quality degree of the training sample set based on the determined loss value.

5. The method according to one of claims 3-4, wherein said generating a video recognition model based on the obtained initial video recognition model comprises:

assigning a weight to the obtained initial video identification model based on the determined numerical value;

and fusing the obtained initial video recognition models based on the distributed weights to generate the video recognition models.

6. The method according to one of claims 2 to 4, wherein said generating a video recognition model based on the obtained initial video recognition model comprises:

and determining the initial video identification model obtained at the last time as the video identification model.

7. An apparatus for generating a model, comprising:

the device comprises a sample acquisition unit, a comparison unit and a comparison unit, wherein the sample acquisition unit is configured to acquire a training sample set and divide the training sample set into a preset number of training sample groups, wherein training samples comprise sample videos and sample identification results labeled in advance for the sample videos, the sample videos are videos obtained by shooting sample objects, and the sample identification results are used for indicating whether the sample videos are videos obtained by shooting screens displaying the sample objects;

the model training unit is configured to, for a training sample group in the preset number of training sample groups, take a sample video of a training sample in the training sample group as an input, take a sample recognition result corresponding to the input sample video as an expected output, and train by using a machine learning method to obtain an initial video recognition model corresponding to the training sample group, and specifically includes: a first execution module configured to select an optimal training sample set from the preset number of training sample sets as a candidate training sample set based on the degrees of superiority and inferiority of the preset number of training sample sets, and execute the following training steps based on the candidate training sample set and an initial model: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in the preset number of training sample groups; a second execution module configured to, in response to determining that there is an unselected training sample group, select an optimal training sample group from the unselected training sample group as a new candidate training sample group, use a most recently obtained initial video recognition model as a new initial model, and continue to execute the training step;

a model generation unit configured to generate a video recognition model based on the obtained initial video recognition model.

8. The apparatus of claim 7, wherein the first performing module is further configured to obtain a preset number of initial video recognition models in response to determining that there is no unselected training sample set.

9. The apparatus of claim 7, wherein the model training unit comprises:

a value determination module configured to determine a value representing a degree of goodness of the preset number of training sample sets;

a third executing module configured to select an optimal training sample set from the preset number of training sample sets as a candidate training sample set based on the determined numerical value, and execute the following training steps based on the candidate training sample set and the initial model: taking a sample video of a training sample in a candidate training sample group as input, taking a sample identification result corresponding to the input sample video as expected output, and training an initial model by using a machine learning method to obtain an initial video identification model; determining whether an unselected training sample group exists in the preset number of training sample groups; obtaining a preset number of initial video recognition models in response to determining that there is no unselected training sample group;

and a fourth execution module configured to, in response to determining that there is an unselected training sample group, select, based on the determined values, an optimal training sample group from the unselected training sample group as a new candidate training sample group, use the most recently obtained initial video recognition model as a new initial model, and continue to execute the training step.

10. The apparatus of claim 9, wherein the numerical determination module comprises:

the system comprises a sample acquisition module, a verification module and a verification module, wherein the sample acquisition module is configured to acquire a preset verification sample set, and the verification samples comprise verification videos and verification identification results which are labeled in advance for the verification videos;

a value generation module configured to perform, for a training sample set of the preset number of training sample sets, the following steps: taking a sample video of a training sample in the group of training samples as input, taking a sample identification result corresponding to the input sample video as output, and training by using a machine learning method to obtain a to-be-verified video identification model corresponding to the group of training samples; inputting the verification video of the verification sample in the verification sample set into the to-be-verified video recognition model corresponding to the training sample set to obtain an actual recognition result, determining a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and generating a numerical value for representing the quality degree of the training sample set based on the determined loss value.

11. The apparatus according to one of claims 9-10, wherein the model generation unit comprises:

a weight assignment module configured to assign weights to the obtained initial video identification model based on the determined values;

and the model fusion module is configured to fuse the obtained initial video recognition models based on the assigned weights to generate the video recognition models.

12. The apparatus according to one of claims 8-10, wherein the model generation unit is further configured to:

13. A method for identifying videos, comprising:

acquiring a video to be identified, wherein the video to be identified is a video obtained by shooting an object;

inputting the video to be recognized into a video recognition model generated by adopting the method according to any one of claims 1 to 6, and generating a recognition result corresponding to the video to be recognized, wherein the recognition result is used for indicating whether the video to be recognized is a video obtained by shooting a screen displaying the object.

14. An apparatus for identifying video, comprising:

a video acquisition unit configured to acquire a video to be identified, wherein the video to be identified is a video obtained by shooting a subject;

a result generating unit, configured to input the video to be recognized into a video recognition model generated by the method according to any one of claims 1 to 6, and generate a recognition result corresponding to the video to be recognized, wherein the recognition result is used for indicating whether the video to be recognized is a video obtained by shooting a screen displaying the object.

15. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6, 13.

16. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-6, 13.