WO2019237657A1 - 用于生成模型的方法和装置 - Google Patents

用于生成模型的方法和装置 Download PDF

Info

Publication number
WO2019237657A1
WO2019237657A1 PCT/CN2018/116339 CN2018116339W WO2019237657A1 WO 2019237657 A1 WO2019237657 A1 WO 2019237657A1 CN 2018116339 W CN2018116339 W CN 2018116339W WO 2019237657 A1 WO2019237657 A1 WO 2019237657A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
sample
training
training sample
model
Prior art date
Application number
PCT/CN2018/116339
Other languages
English (en)
French (fr)
Inventor
李伟健
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2019237657A1 publication Critical patent/WO2019237657A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for generating a model.
  • the embodiments of the present application provide a method and a device for generating a model, and a method and a device for identifying a video.
  • an embodiment of the present application provides a method for generating a model.
  • the method includes obtaining a training sample set and dividing the training sample set into a preset number of training sample groups, where the training samples include sample videos. And a sample recognition result pre-labeled for the sample video, the sample video is a video obtained by shooting a sample object, and the sample recognition result is used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object; for a preset A number of training sample groups in the training sample group.
  • the sample video of the training samples in the group of training samples is used as the input, and the sample recognition result corresponding to the input sample video is used as the expected output.
  • the group is trained by machine learning to obtain the group.
  • the initial video recognition model corresponding to the training sample based on the obtained initial video recognition model, a video recognition model is generated.
  • a sample video of the training samples in the group of training samples is used as an input, and a sample recognition result corresponding to the input sample video is used as an expected output
  • the initial video recognition model corresponding to the set of training samples is obtained by training using a machine learning method, including: selecting a training sample group from a preset number of training sample groups as a candidate training sample group, and based on the candidate training sample group and the initial model, Perform the following training steps: take the sample video of the training samples in the candidate training sample group as input, take the sample recognition result corresponding to the input sample video as the desired output, and use the machine learning method to train the initial model to obtain the initial video recognition Model; determining whether there are unselected training sample groups in a preset number of training sample groups; and in response to determining that there are no unselected training sample groups, obtaining a preset number of initial video recognition models.
  • a sample video of the training samples in the group of training samples is used as an input, and a sample recognition result corresponding to the input sample video is used as an expected output
  • the initial video recognition model corresponding to the set of training samples is obtained by training with a machine learning method, and further includes: in response to determining that there is an unselected training sample group, selecting a training sample group from the unselected training sample group as a new
  • the candidate training sample group uses the latest initial video recognition model as the new initial model, and continues to perform the training step.
  • a sample video of the training samples in the group of training samples is used as an input, and a sample recognition result corresponding to the input sample video is used as an expected output
  • the initial video recognition model corresponding to the set of training samples is obtained by training using a machine learning method, which includes: determining a value for characterizing the pros and cons of a preset number of training sample groups; and based on the determined value, from the preset number of In the training sample group, the optimal training sample group is selected as the candidate training sample group, and based on the candidate training sample group and the initial model, the following training steps are performed: the sample video of the training samples in the candidate training sample group is used as input, and the input
  • the sample recognition result corresponding to the sample video of the video is used as the expected output, and the initial model is trained using the machine learning method to obtain the initial video recognition model; determining whether there is an unselected training sample group in the preset number of training sample groups; responding to Determine that there are no uns
  • a sample video of the training samples in the group of training samples is used as an input, and a sample recognition result corresponding to the input sample video is used as an expected output
  • the initial video recognition model corresponding to the set of training samples is obtained by training using a machine learning method, and further includes: in response to determining that there is an unselected training sample group, based on the determined value, selecting from the unselected training sample group
  • the optimal training sample group is used as a new candidate training sample group, and the initial video recognition model obtained last time is used as the new initial model, and the training step is continued.
  • determining a value used to characterize the pros and cons of a preset number of training sample groups includes: obtaining a preset set of verification samples, wherein the verification samples include verification videos and pre-labeled verification videos.
  • Recognition results for verification; for a training sample group in a preset number of training sample groups perform the following steps: use the sample video of the training samples in this group of training samples as input, and use the sample recognition results corresponding to the input sample video
  • a machine learning method is used to train and obtain the to-be-verified video recognition model corresponding to the set of training samples; the verification video of the verification samples in the verification sample set is input to the to-be-verified video recognition model corresponding to the set of training samples to obtain the actual
  • the recognition result determines a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video. Based on the determined loss value, a value is used to characterize the pros and cons of the training sample group.
  • generating a video recognition model based on the obtained initial video recognition model includes: assigning weights to the obtained initial video recognition model based on the determined values; and based on the assigned weights, the obtained initial video recognition model is assigned.
  • the video recognition model is fused to generate a video recognition model.
  • generating a video recognition model based on the obtained initial video recognition model includes determining the initial video recognition model obtained last time as a video recognition model.
  • an embodiment of the present application provides a device for generating a model.
  • the device includes a sample obtaining unit configured to obtain a training sample set, and dividing the training sample set into a preset number of training sample groups.
  • the training sample includes a sample video and a sample labeling result pre-labeled for the sample video.
  • the sample video is a video obtained by shooting a sample object, and the sample recognition result is used to indicate whether the sample video is a shooting location on a screen displaying the sample object.
  • the obtained video; the model training unit is configured to, for a set of training sample groups in a preset number of training sample groups, take as input a sample video of the training samples in the set of training samples, and take the samples corresponding to the input sample video
  • the recognition result is used as an expected output, and an initial video recognition model corresponding to the set of training samples is obtained by training using a machine learning method.
  • the model generating unit is configured to generate a video recognition model based on the obtained initial video recognition model.
  • the model training unit includes: a first execution module configured to select a training sample group as a candidate training sample group from a preset number of training sample groups, and based on the candidate training sample group and the initial model, execute the following Training steps: take the sample video of the training samples in the candidate training sample group as input, take the sample recognition result corresponding to the input sample video as the desired output, and use the machine learning method to train the initial model to obtain the initial video recognition model; Determine whether there are unselected training sample groups in the preset number of training sample groups; and in response to determining that there are no unselected training sample groups, obtain a preset number of initial video recognition models.
  • the model training unit further includes: a second execution module configured to, in response to determining that an unselected training sample group exists, select a training sample group from the unselected training sample group as a new candidate training For the sample group, the newly obtained initial video recognition model is used as the new initial model, and the training step is continued.
  • the model training unit includes: a value determination module configured to determine a value used to characterize the pros and cons of a preset number of training sample groups; a third execution module configured to be based on the determined value, From the preset number of training sample groups, select the optimal training sample group as a candidate training sample group, and based on the candidate training sample group and the initial model, perform the following training steps: use the sample video of the training samples in the candidate training sample group as Input, taking the sample recognition result corresponding to the input sample video as the desired output, using machine learning to train the initial model to obtain the initial video recognition model; determining whether there are unselected trainings in a preset number of training sample groups Sample groups; in response to determining that there are no unselected training sample groups, obtaining a preset number of initial video recognition models.
  • the model training unit further includes a fourth execution module configured to respond to a determination that there is an unselected training sample group, and based on the determined value, select an optimal from the unselected training sample group
  • the training sample set is used as a new candidate training sample set, and the newly obtained initial video recognition model is used as the new initial model, and the training step is continued.
  • the value determination module includes: a sample acquisition module configured to obtain a preset verification sample set, wherein the verification sample includes a verification video and a verification identification result pre-labeled for the verification video; a value generation module , Configured to perform the following steps on a training sample group in a preset number of training sample groups: using the sample video of the training samples in the group of training samples as input, and using the sample recognition result corresponding to the input sample video as The output is machine learning method training to obtain the to-be-verified video recognition model corresponding to the set of training samples; the verification video of the verification samples in the verification sample set is input to the to-be-verified video recognition model corresponding to the set of training samples to obtain the actual recognition As a result, a loss value of the actual recognition result with respect to the verification recognition result corresponding to the input verification video is determined, and based on the determined loss value, a value used to characterize the pros and cons of the training sample group is generated.
  • the model generation unit includes: a weight allocation module configured to assign a weight to the obtained initial video recognition model based on the determined value; and a model fusion module configured to assign a weight to all of the initial video recognition models based on the assigned weight.
  • the obtained initial video recognition model is fused to generate a video recognition model.
  • an embodiment of the present application provides a method for identifying a video.
  • the method includes: obtaining a video to be identified, where the video to be identified is a video obtained by shooting an object;
  • a recognition result corresponding to the video to be recognized is generated, where the recognition result is used to indicate whether the video to be recognized is taken by shooting a screen of a display object Get the video.
  • an embodiment of the present application provides a device for identifying a video.
  • the device includes a video obtaining unit configured to obtain a video to be identified, where the video to be identified is a video obtained by shooting an object;
  • the result generating unit is configured to generate a recognition result corresponding to the to-be-recognized video in a video recognition model generated by using the method described in any one of the first aspects of the video input to be recognized, where the recognition result is used for Indicates whether the video to be identified is a video obtained by shooting a screen of a display object.
  • an embodiment of the present application provides an electronic device, including: one or more processors; a storage device storing one or more programs thereon, and when one or more programs are processed by one or more processors Execution causes one or more processors to implement the method as described in any one of the first and third aspects described above.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored.
  • the program is executed by a processor, the method is implemented as described in any one of the first aspect and the third aspect. .
  • the method and device for generating a model obtained in the embodiments of the present application obtain a training sample set and divide the training sample set into a preset number of training sample groups, where the training samples include sample videos and pre-labeled sample videos. Sample recognition result.
  • the sample video is a video obtained by shooting a sample object.
  • the sample recognition result is used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object, and then for a preset number of training sample groups Training sample group of the training sample in the group of training samples as input, the sample recognition result corresponding to the input sample video as the desired output, using machine learning method training to obtain the initial corresponding to the group of training samples A video recognition model, and finally a video recognition model is generated based on the obtained initial video recognition model, so that a model that can be used for video recognition can be obtained, and it helps to enrich the model generation method.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for generating a model according to the present application
  • FIG. 3 is a schematic diagram of an application scenario of a method for generating a model according to the present application
  • FIG. 4 is a flowchart of still another embodiment of a method for generating a model according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for generating a model according to the present application.
  • FIG. 6 is a flowchart of an embodiment of a method for identifying a video according to the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of an apparatus for identifying a video according to the present application.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 of a method for generating a model, a device for generating a model, a method for identifying a video, or a device for identifying a video to which embodiments of the present application can be applied.
  • the system architecture 100 may include terminals 101 and 102, a network 103, a database server 104, and a server 105.
  • the network 103 is used to provide a medium for a communication link between the terminals 101, 102, the database server 104, and the server 105.
  • the network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user 110 can use the terminals 101 and 102 to interact with the server 105 through the network 103 to receive or send messages and the like.
  • Terminals 101 and 102 can be installed with various client applications, such as model training applications, video recognition applications, social applications, payment applications, web browsers, and instant messaging tools.
  • the terminals 101 and 102 here may be hardware or software.
  • the terminals 101 and 102 are hardware, they can be various electronic devices with display screens, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Pictures Experts Group Audio Layer III, Motion Picture Experts Compression standard audio layer 3), laptop portable computers and desktop computers, etc.
  • the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • a video capture device may also be installed thereon.
  • the video capture device can be a variety of devices that can capture video, such as cameras, sensors, and so on.
  • the user 110 may use a video capture device on the terminals 101 and 102 to capture video.
  • the database server 104 may be a database server that provides various services.
  • a database server may store a sample collection.
  • the sample set contains a large number of samples.
  • the sample may include a sample video and a sample recognition result pre-labeled for the sample video.
  • the user 110 can also select samples from the sample set stored in the database server 104 through the terminals 101 and 102.
  • the server 105 may also be a server that provides various services, such as a background server that provides support for various applications displayed on the terminals 101 and 102.
  • the background server can use the samples in the sample set sent by the terminals 101 and 102 to train the initial model, and can send the training results (such as the generated video recognition model) to the terminals 101 and 102. In this way, the user can apply the generated video recognition model for video recognition.
  • the database server 104 and the server 105 here may also be hardware or software. When they are hardware, they can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When they are software, they can be implemented as multiple software or software modules (for example, to provide distributed services), or they can be implemented as a single software or software module. It is not specifically limited here.
  • the method for generating a model or the method for identifying a video provided by the embodiment of the present application is generally executed by the server 105. Accordingly, a device for generating a model or a device for identifying a video is also generally provided in the server 105.
  • the database server 104 may not be set in the system architecture 100.
  • FIG. 1 the number of terminals, networks, database servers, and servers in FIG. 1 is merely exemplary. You can have any number of terminals, networks, database servers, and servers, depending on your implementation needs.
  • a flowchart 200 of one embodiment of a method for generating a model according to the present application is shown.
  • the method for generating a model includes the following steps:
  • Step 201 Obtain a training sample set, and divide the training sample set into a preset number of training sample groups.
  • an execution subject (for example, the server shown in FIG. 1) of the method for generating a model may use a wired connection method or a wireless connection method from a database server (for example, the database server 104 shown in FIG. 1) or a terminal (
  • the terminals 101 and 102 shown in FIG. 1 obtain a training sample set, and divide the training sample set into a preset number of training sample groups.
  • the training samples include sample videos and sample recognition results pre-labeled for the sample videos.
  • the sample video may be a video obtained by shooting a sample object. Sample objects can be various things, such as people, animals, or other behaviors, such as running and swimming.
  • the sample recognition result may include, but is not limited to, at least one of the following: text, numbers, and symbols.
  • the sample recognition result may be used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object.
  • the sample recognition result may include a number 1 and a number 0, where the number 1 may be used to indicate that the sample video is a video obtained by shooting the screen displaying the sample object; the number 0 may be used to indicate that the sample video is not displayed on the display The video obtained by shooting the sample object's screen.
  • the execution body may divide the training sample set into a preset number of training sample groups in various ways.
  • the above execution body may divide the training sample set into a preset number of training sample groups in an equal division manner, or may divide the training sample set so that each training sample group in the preset number of training sample groups is The number of training samples included is greater than or equal to a preset threshold. It should be noted that the preset number can be set in advance by a technician.
  • step 202 for a training sample group in a preset number of training sample groups, a sample video of the training samples in the group of training samples is used as an input, and a sample recognition result corresponding to the input sample video is used as a desired output using a machine
  • the learning method is trained to obtain the initial video recognition model corresponding to the set of training samples.
  • the execution body may use as input the sample video of the training samples in the group of training samples, and use the input sample video.
  • the corresponding sample recognition results are used as the expected output, and the machine learning method is used to train to obtain the initial video recognition model corresponding to the set of training samples.
  • the initial video recognition model is a model trained using training samples in a training sample group, and can be used to determine the final video recognition model.
  • a preset initial model such as a Convolutional Neural Network (CNN), a ResNet, etc.
  • CNN Convolutional Neural Network
  • ResNet ResNet
  • the execution body may input a sample video of the training samples in the group of training samples into an initial model, and obtain a recognition result corresponding to the input sample video.
  • the sample recognition result corresponding to the input sample video is the expected output of the initial model, use the machine learning method to train the initial model, and determine the trained initial model as the initial video recognition model.
  • the above-mentioned execution subject may obtain a preset number of initial video recognition models based on a preset number of training sample groups by the following steps:
  • Step 2021 Select a training sample group as a candidate training sample group from a preset number of training sample groups.
  • the execution body may select a training sample group as a candidate training sample group from a preset number of training sample groups obtained in step 201, and perform the training steps of steps 2022 to 2024.
  • the selection method of the training sample group is not limited in this application. For example, it can be randomly selected, or a training sample group with more training samples is preferentially selected.
  • Step 2022 The sample video of the training samples in the candidate training sample group is used as an input, and the sample recognition result corresponding to the input sample video is used as an expected output.
  • the initial model is trained by using a machine learning method to obtain an initial video recognition model.
  • the execution body can obtain the initial video recognition model corresponding to the candidate training sample group by the following steps:
  • the above execution body may select training samples from the candidate training sample group and perform the following steps: input the sample video of the selected training sample into the initial model to obtain the recognition result; use the sample recognition result corresponding to the input sample video as the initial The expected output of the model, adjust the parameters of the initial model based on the obtained recognition results and sample recognition results; determine whether there are unselected training samples in the candidate training sample group; in response to the absence of unselected training samples, adjust The subsequent initial model is determined as the initial video recognition model corresponding to the candidate training sample group.
  • the manner of selecting training samples is not limited in this application. For example, it may be randomly selected, or training samples with better sharpness of the sample video may be preferentially selected.
  • Step 2023 Determine whether there are unselected training sample groups in the preset number of training sample groups.
  • Step 2024 In response to determining that there are no unselected training sample groups, a preset number of initial video recognition models are obtained.
  • the above-mentioned execution body may obtain a preset number of initial video recognition models in response to determining that there is no unselected training sample group in the preset number of training sample groups.
  • the above-mentioned execution body may also respond to the determination that there is an unselected training sample group, select a training sample group from the unselected training sample group as a new candidate training sample group, and recognize the initial video obtained recently. As the new initial model, continue to perform the training steps 2022-2024 described above.
  • the above-mentioned executive body may use an initial video recognition model obtained through training of a selected training sample group as an initial model corresponding to a training group selected subsequently, thereby effectively using sample data and generating more Identify models for accurate initial video.
  • Step 203 Generate a video recognition model based on the obtained initial video recognition model.
  • the execution subject may generate a video recognition model.
  • the execution body may select an initial video recognition model from the obtained initial video recognition models as a video recognition model, or process the obtained initial video recognition models to obtain a video recognition model.
  • the above execution body may assign the same weight to each initial video recognition model based on the number of obtained initial video recognition models, and further, based on the allocated weights, fuse the obtained initial video recognition models to obtain a video Identify the model.
  • x is an independent variable that can be used to represent the input of the model
  • y is a dependent variable that can be used to represent the output of the model
  • a and b are the coefficients of the first initial video recognition model
  • c and d are the second initial Coefficients for video recognition models.
  • the execution body may directly determine the initial video recognition model obtained last time as a video. Identify the model.
  • FIG. 3 is a schematic diagram of an application scenario of the method for generating a model according to this embodiment.
  • a terminal 301 used by a user may be installed with a model training application.
  • the server 302 that provides background support for the application can run a method for generating a model, including:
  • a training sample set 303 can be obtained and the training sample set 303 is divided into two (preset number) training sample groups 304 and 305, where the training samples include sample videos and sample recognition results pre-labeled for the sample videos.
  • the video is a video obtained by shooting a sample object, and the sample recognition result is used to indicate whether the sample video is a video obtained by shooting a screen on which the sample object is displayed.
  • the above-mentioned execution body may use the sample video of the training samples in the group of training samples as input, and use the sample recognition result corresponding to the input sample video as the desired output, and use machine learning method training to obtain the The initial video recognition model 306 corresponding to the group of training samples; for the training sample group 305, the above-mentioned execution body may use the sample video of the training samples in the group of training samples as input, and use the sample recognition result corresponding to the input sample video as The expected output is trained using a machine learning method to obtain an initial video recognition model 307 corresponding to the set of training samples.
  • the above-mentioned execution subject may generate a video recognition model 308 based on the obtained initial video recognition model 306 and the initial video recognition model 307.
  • the server 302 may also send prompt information to the terminal 301 to indicate completion of model training.
  • the prompt information may be voice and / or text information. In this way, the user can obtain the video recognition model in a preset storage location.
  • the method provided by the foregoing embodiments of the present application obtains a training sample set and divides the training sample set into a preset number of training sample groups, and then for the training sample group in the preset number of training sample groups, the group of training samples
  • the sample video of the training samples in the input is used as the input, and the sample recognition result corresponding to the input sample video is used as the expected output.
  • the initial video recognition model corresponding to the set of training samples is obtained by training using machine learning methods, and finally based on the obtained initial
  • the video recognition model generates a video recognition model, so that a model that can be used for video recognition can be obtained, and it can help enrich the model generation method.
  • a flowchart 400 of yet another embodiment of a method for generating a model is shown.
  • the process 400 of the method for generating a model includes the following steps:
  • Step 401 Obtain a training sample set, and divide the training sample set into a preset number of training sample groups.
  • an execution subject (for example, the server shown in FIG. 1) of the method for generating a model may use a wired connection method or a wireless connection method from a database server (for example, the database server 104 shown in FIG. 1) or a terminal (
  • the terminals 101 and 102 shown in FIG. 1 obtain a training sample set, and divide the training sample set into a preset number of training sample groups.
  • step 401 may be implemented in a manner similar to step 201 in the foregoing embodiment. Accordingly, the above description of step 201 is also applicable to step 401 of this embodiment, and details are not described herein again.
  • Step 402 Determine a value used to characterize the pros and cons of a preset number of training sample groups.
  • the above-mentioned execution subject may determine a value used to characterize the merits of the preset number of training sample groups.
  • the above-mentioned execution body may use various methods to determine a value used to characterize the pros and cons of a preset number of training sample groups.
  • the above-mentioned execution body may determine the number of training samples included in each training sample group, and The number value of the number of is determined as a value used to characterize the pros and cons of a preset number of training sample groups.
  • the number of training samples included in the training sample group determines a value used to characterize the pros and cons of a preset number of training sample groups.
  • the correspondence between the magnitude of the numerical value and the degree of pros and cons can be set in advance by a technician. Specifically, the larger the corresponding value, the better the training sample group; and the smaller the value, the better the training sample group.
  • the above-mentioned execution body may determine a value used to characterize the pros and cons of a preset number of training sample groups through the following steps:
  • the execution entity may obtain a preset verification sample set, where the verification sample includes a verification video and a verification recognition result previously marked for the verification video.
  • the above-mentioned execution body may perform the following steps: use the sample video of the training samples in the group of training samples as input, and identify the samples corresponding to the input sample video The result is used as an output, and the machine learning method is used to train and obtain the to-be-verified video recognition model corresponding to the set of training samples.
  • the verification video of the verification samples in the verification sample set is input to the to-be-verified video recognition model corresponding to the set of training samples to obtain
  • the actual recognition result determines a loss value of the actual recognition result relative to the verification recognition result corresponding to the input verification video, and based on the determined loss value, a value is used to characterize the pros and cons of the training sample group.
  • the loss value can be used to characterize the difference between the actual output and the expected output. It can be understood that the smaller the above-mentioned difference is, the more accurate the training video recognition model to be verified is, and the better the training sample set used is. Therefore, based on the relationship between the above-mentioned loss value and the degree of pros and cons of the training sample group, the above-mentioned execution body can generate a value for characterizing the degree of pros and cons of the training sample group based on the determined loss value in various ways. For example, the loss value can be directly determined as a value used to characterize the pros and cons of the training sample group.
  • the reciprocal of the value is determined as a value used to characterize the strength of the training sample group. At this time, the larger the value used to characterize the strength of the training sample group, the better the training sample group is.
  • the above-mentioned execution body may calculate the loss value of the actual recognition result obtained with respect to the verification recognition result corresponding to the input verification video by using various preset loss functions.
  • the L2 norm is used as a loss function to calculate the loss value.
  • Step 403 Based on the determined value, an optimal training sample group is selected as a candidate training sample group from a preset number of training sample groups.
  • the execution body may select the optimal training sample group as the candidate training sample group from the preset number of training sample groups obtained in step 401, and perform steps 404 to steps. 406 training steps.
  • the specific implementation of this embodiment consists in selecting the optimal training sample group as a candidate training sample group from a preset number of training sample groups, so the larger the value when used to characterize the pros and cons of the training sample group
  • the above-mentioned execution body may select the determined training sample group corresponding to the largest value from a preset number of training sample groups as a candidate training sample group; when used to characterize the optimization of the training sample group The smaller the value of the inferiority degree is, the better the training sample group is.
  • the above-mentioned execution body may select a training sample group corresponding to the determined and smallest value from a preset number of training sample groups as a candidate training sample group.
  • step 404 the sample video of the training samples in the candidate training sample group is used as an input, and the sample recognition result corresponding to the input sample video is used as an expected output.
  • the initial model is trained by using a machine learning method to obtain an initial video recognition model.
  • the execution body can obtain the initial video recognition model corresponding to the candidate training sample group by the following steps:
  • the above execution body may select training samples from the candidate training sample group and perform the following steps: input the sample video of the selected training sample into the initial model to obtain the recognition result; use the sample recognition result corresponding to the input sample video as the initial The expected output of the model, adjust the parameters of the initial model based on the obtained recognition results and sample recognition results; determine whether there are unselected training samples in the candidate training sample group; in response to the absence of unselected training samples, adjust The subsequent initial model is determined as the initial video recognition model corresponding to the candidate training sample group.
  • the manner of selecting training samples is not limited in this application. For example, it may be randomly selected, or training samples with better sharpness of the sample video may be preferentially selected.
  • Step 405 Determine whether there are unselected training sample groups in the preset number of training sample groups.
  • Step 406 Obtain a preset number of initial video recognition models in response to determining that there are no unselected training sample groups.
  • the above-mentioned execution body may obtain a preset number of initial video recognition models in response to determining that there is no unselected training sample group in the preset number of training sample groups.
  • the above-mentioned execution body may also respond to determining that there is an unselected training sample group, and based on the determined value, select the optimal training from the unselected training sample group.
  • the sample group is used as a new candidate training sample group, and the newly obtained initial video recognition model is used as a new initial model, and the above training steps 404-406 are continuously performed.
  • Step 407 Generate a video recognition model based on the obtained initial video recognition model.
  • the execution subject may generate a video recognition model.
  • the execution body may select an initial video recognition model from the obtained initial recognition models as a video recognition model, or process the obtained initial video recognition models to obtain a video recognition model.
  • the execution body may generate a video recognition model through the following steps: First, the execution body may assign weights to the obtained initial video recognition model based on the value determined in step 402. Then, the above-mentioned execution subject may fuse the obtained initial video recognition model based on the assigned weights to generate a video recognition model. Specifically, the execution body may determine the pros and cons of each training sample group based on the determined value, and then assign weights to the obtained initial video recognition model in various ways, so that the better training sample group corresponds to The initial video recognition model corresponds to a larger weight, and the poorer training sample set corresponds to a smaller initial video recognition model.
  • the process 400 of the method for generating a model in this embodiment highlights the determination of the degree of pros and cons for characterizing a preset number of training sample groups. A value, and further based on the determined value, a step of selecting a training sample group from a preset number of training sample groups for training. Therefore, the solution described in this embodiment can first be trained with a better training sample group to obtain a more accurate initial video recognition model, so that subsequent training can make minor adjustments to the initial video recognition model based on this. Improved the efficiency of model generation.
  • this application provides an embodiment of a device for generating a model.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 500 for generating a model in this embodiment includes a sample obtaining unit 501, a model training unit 502, and a model generating unit 503.
  • the sample acquisition unit 501 is configured to obtain a training sample set and divide the training sample set into a preset number of training sample groups.
  • the training samples include a sample video and a sample recognition result pre-labeled for the sample video.
  • the sample video is A video obtained by shooting a sample object, and the sample recognition result is used to indicate whether the sample video is a video obtained by shooting a screen on which the sample object is displayed;
  • the model training unit 502 is configured to, for a preset number of training sample groups, A training sample group.
  • the sample video of the training samples in the group of training samples is used as an input, and the sample recognition result corresponding to the input sample video is used as the expected output.
  • the initial video corresponding to the group of training samples is obtained by training using the machine learning method.
  • Recognition model; the model generation unit 503 is configured to generate a video recognition model based on the obtained initial video recognition model.
  • the sample acquisition unit 501 of the apparatus 500 for generating a model may be connected from a database server (such as the database server 104 shown in FIG. 1) or a terminal (such as the one shown in FIG. 1) through a wired connection method or a wireless connection method.
  • the terminals 101 and 102) acquire a training sample set, and divide the training sample set into a preset number of training sample groups.
  • the training samples include sample videos and sample recognition results pre-labeled for the sample videos.
  • the sample video may be a video obtained by shooting a sample object. Sample objects can be various things.
  • the sample recognition result may include, but is not limited to, at least one of the following: text, numbers, and symbols.
  • the sample recognition result may be used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object.
  • the sample acquisition unit 501 may divide the training sample set into a preset number of training sample groups in various ways. It should be noted that the preset number can be set in advance by a technician.
  • the model training unit 502 may use the sample video of the training samples in the group of training samples as input, and input the input.
  • the sample recognition result corresponding to the sample video is used as the expected output, and the initial video recognition model corresponding to the set of training samples is obtained by training using the machine learning method.
  • the initial video recognition model is a model trained using training samples in a training sample group, and can be used to determine the final video recognition model.
  • the model generation unit 503 may generate a video recognition model.
  • the execution body may select an initial video recognition model from the obtained initial video recognition models as a video recognition model, or process the obtained initial video recognition models to obtain a video recognition model.
  • the model training unit 502 may include: a first execution module (not shown in the figure) configured to select a training sample group as a candidate from a preset number of training sample groups The training sample group, and based on the candidate training sample group and the initial model, perform the following training steps: taking as input the sample video of the training samples in the candidate training sample group, and using the sample recognition result corresponding to the input sample video as the desired output, Use the machine learning method to train the initial model to obtain the initial video recognition model; determine whether there are unselected training sample groups in a preset number of training sample groups; and in response to determining that there are no unselected training sample groups, obtain a pre-selection Set a number of initial video recognition models.
  • a first execution module (not shown in the figure) configured to select a training sample group as a candidate from a preset number of training sample groups The training sample group, and based on the candidate training sample group and the initial model, perform the following training steps: taking as input the sample video of the training samples in the candidate training sample group, and
  • the model training unit 502 may further include a second execution module (not shown in the figure) configured to respond to the determination that there is an unselected training sample group, and never From the selected training sample group, a training sample group is selected as a new candidate training sample group, and the newly obtained initial video recognition model is used as a new initial model, and the training step is continued.
  • a second execution module (not shown in the figure) configured to respond to the determination that there is an unselected training sample group, and never From the selected training sample group, a training sample group is selected as a new candidate training sample group, and the newly obtained initial video recognition model is used as a new initial model, and the training step is continued.
  • the model training unit 502 may include: a numerical determination module (not shown in the figure) configured to determine the degree of pros and cons for characterizing the preset number of training sample groups. Value; a third execution module (not shown in the figure) is configured to select an optimal training sample group from a preset number of training sample groups as a candidate training sample group based on the determined value, and based on the candidate training sample Group and initial model, execute the following training steps: take the sample video of the training samples in the candidate training sample group as input, take the sample recognition result corresponding to the input sample video as the desired output, and use the machine learning method to train the initial model To obtain an initial video recognition model; determine whether there are unselected training sample groups in a preset number of training sample groups; and in response to determining that there are no unselected training sample groups, obtain a preset number of initial video recognition models.
  • the model training unit 502 may further include a fourth execution module (not shown in the figure) configured to respond to the determination that there is an unselected training sample group, based on the For the determined value, the optimal training sample group is selected from the unselected training sample group as a new candidate training sample group, and the newly obtained initial video recognition model is used as the new initial model, and the training step is continued.
  • a fourth execution module (not shown in the figure) configured to respond to the determination that there is an unselected training sample group, based on the For the determined value, the optimal training sample group is selected from the unselected training sample group as a new candidate training sample group, and the newly obtained initial video recognition model is used as the new initial model, and the training step is continued.
  • the value determination module may include: a sample acquisition module (not shown in the figure) configured to obtain a preset verification sample set, where: The verification sample includes a verification video and a verification recognition result pre-labeled for the verification video; a numerical generation module (not shown in the figure) is configured to perform the following steps for a training sample group of a preset number of training sample groups : Taking the sample video of the training samples in the group of training samples as input, taking the sample recognition result corresponding to the input sample video as the output, and training using a machine learning method to obtain the to-be-verified video recognition model corresponding to the group of training samples; The verification video of the verification samples in the verification sample set is input to the video recognition model to be verified corresponding to the set of training samples to obtain the actual recognition result, and the actual recognition result is determined relative to the verification recognition result corresponding to the input verification video. Based on the determined loss value, an optimal value for characterizing the training sample group is generated. Inferior value
  • the model generating unit 503 may include: a weight allocation module (not shown in the figure) configured to allocate weights to the obtained initial video recognition model based on the determined values
  • a model fusion module (not shown in the figure) configured to fuse the obtained initial video recognition model based on the assigned weights to generate a video recognition model.
  • the model generating unit 503 may be further configured to determine the initial video recognition model obtained last time as a video recognition model.
  • the apparatus 500 obtains a training sample set through the sample obtaining unit 501, and divides the training sample set into a preset number of training sample groups, where the training samples include a sample video and samples pre-labeled for the sample video.
  • Recognition result the sample video is a video obtained by shooting a sample object, and the sample recognition result is used to indicate whether the sample video is a video obtained by shooting a screen displaying the sample object, and then for a preset number of training sample groups
  • the training sample group uses the sample video of the training samples in the group of training samples as input, and the model training unit 502 takes the sample recognition result corresponding to the input sample video as the desired output, and uses machine learning to train to obtain the group of training sample locations.
  • the final model generation unit 503 generates a video recognition model based on the obtained initial video recognition model, so as to obtain a model that can be used for identifying the video, and contributes to enriching the generation method of the model.
  • FIG. 6 illustrates a process 600 of an embodiment of a method for identifying a video provided by the present application.
  • the method for identifying a video may include the following steps:
  • Step 601 Obtain a video to be identified.
  • an execution subject of the method for identifying a video may obtain the identification video through a wired connection method or a wireless connection method.
  • the above-mentioned execution body may obtain a video stored therein from a database server (for example, the database server 104 shown in FIG. 1), or may receive a video collected by a terminal (for example, the terminals 101 and 102 shown in FIG. 1) or other devices .
  • the video to be identified may be a video obtained by shooting an object.
  • Objects can be various things, such as people, animals, or behaviors such as running and swimming.
  • Step 602 Input a video to be identified into a video recognition model, and generate a recognition result corresponding to the video to be identified.
  • the execution body may input the video to be identified obtained in step 601 into a video model, so as to generate a recognition result corresponding to the video to be identified.
  • the recognition result may be used to indicate whether the video to be recognized is a video obtained by shooting a screen displaying the foregoing object.
  • the video recognition model may be generated by using the method described in the embodiment of FIG. 2 described above.
  • For a specific generation process refer to the related description of the embodiment in FIG. 2, and details are not described herein again.
  • the method for identifying videos in this embodiment may be used to test the video recognition models generated by the foregoing embodiments. Based on the test results, the video recognition model can be continuously optimized. This method may also be a practical application method of the video recognition model generated by the foregoing embodiments. Adopting the video recognition models generated by the above embodiments for video recognition can detect the video obtained by recording the screen and help improve the accuracy of video recognition.
  • the present application provides an embodiment of an apparatus for identifying a video.
  • the device embodiment corresponds to the method embodiment shown in FIG. 6, and the device can be specifically applied to various electronic devices.
  • the apparatus 700 for identifying video in this embodiment may include a video obtaining unit 701 and a result generating unit 702.
  • the video acquisition unit 701 is configured to acquire a video to be identified, where the video to be identified is a video obtained by photographing an object;
  • the result generation unit 702 is configured to use the video to be identified as input as described in the embodiment of FIG. 2 above.
  • a recognition result corresponding to the video to be recognized is generated, where the recognition result is used to indicate whether the video to be recognized is a video obtained by shooting a screen of a display object.
  • FIG. 8 a schematic structural diagram of a computer system 800 suitable for implementing an electronic device according to an embodiment of the present application is shown.
  • the electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 800 includes a central processing unit (CPU) 801, which can be loaded into a random access memory (RAM) 803 according to a program stored in a read-only memory (ROM) 802 or from a storage section 808. Instead, perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read-only memory
  • various programs and data required for the operation of the system 800 are also stored.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • An input / output (I / O) interface 805 is also connected to the bus 804.
  • the following components are connected to the I / O interface 805: an input portion 806 including a touch screen, a keyboard, a mouse, a camera device, etc .; an output portion 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; and a hard disk And a storage section 808; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 809 performs communication processing via a network such as the Internet.
  • the driver 810 is also connected to the I / O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that a computer program read therefrom is installed into the storage section 808 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication section 809, and / or installed from a removable medium 811.
  • CPU central processing unit
  • the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes a sample acquisition unit, a model training unit, and a model generation unit.
  • a processor includes an acquisition unit, a training unit, and a generation unit.
  • the names of these units do not constitute a limitation on the unit itself in some cases.
  • a sample acquisition unit may also be described as a “unit that acquires a training sample set”.
  • the present application also provides a computer-readable medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device in.
  • the computer-readable medium carries one or more programs.
  • the electronic device is configured to obtain a training sample set and divide the training sample set into a preset number of training samples.
  • the training samples include sample videos and sample recognition results pre-labeled for the sample videos
  • the sample videos are videos obtained by shooting sample objects
  • the sample recognition results are used to indicate whether the sample videos are performed on the screen displaying the sample objects
  • the electronic device when the one or more programs are executed by the electronic device, the electronic device may be further configured to: obtain a video to be identified, where the video to be identified is a video obtained by photographing an object; and input the video to be identified into the video identification
  • a recognition result corresponding to the video to be recognized is generated, where the recognition result is used to indicate whether the video to be recognized is a video obtained by shooting a screen of a display object.
  • the video recognition model may be generated by using the method for generating a model as described in the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种用于生成模型的方法和装置。所述方法包括:获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组(201),其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频;对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型(202);基于所得到的初始视频识别模型,生成视频识别模型(203)。所述方法能够得到一种可以用于识别视频的模型,且丰富了模型的生成方式。

Description

用于生成模型的方法和装置
本专利申请要求于2018年6月15日提交的、申请号为201810617804.4、申请人为北京字节跳动网络技术有限公司、发明名称为“用于生成模型的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及用于生成模型的方法和装置。
背景技术
目前,通过拍摄视频实现信息分享已经成为人们生活中重要的信息分享模式。实践中,有不少用户为了将其他用户拍摄获得的视频作为其个人拍摄获得的视频,会对其他用户拍摄获得的视频进行录制。
可以理解的是,录制其他用户的视频往往会带来侵权、有损公平性等不良影响,因此,用于进行信息分享的平台可以对该类视频进行识别,进而对其进行拦截。
发明内容
本申请实施例提出了用于生成模型的方法和装置,以及用于识别视频的方法和装置。
第一方面,本申请实施例提供了一种用于生成模型的方法,该方法包括:获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频;对于预设数量个训练样本组中的训练样本组,将该组 训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型;基于所得到的初始视频识别模型,生成视频识别模型。
在一些实施例中,对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,包括:从预设数量个训练样本组中选取训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在一些实施例中,对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,还包括:响应于确定存在未被选取的训练样本组,从未被选取的训练样本组中选取训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行训练步骤。
在一些实施例中,对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,包括:确定用于表征预设数量个训练样本组的优劣程度的数值;基于所确定的数值,从预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对 应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在一些实施例中,对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,还包括:响应于确定存在未被选取的训练样本组,基于所确定的数值,从未被选取的训练样本组中选取最优的训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行训练步骤。
在一些实施例中,确定用于表征预设数量个训练样本组的优劣程度的数值,包括:获取预先设置的验证样本集合,其中,验证样本包括验证用视频和针对验证用视频预先标注的验证用识别结果;对于预设数量个训练样本组中的训练样本组,执行以下步骤:将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为输出,利用机器学习方法训练得到该组训练样本所对应的待验证视频识别模型;将验证样本集合中的验证样本的验证用视频输入该组训练样本所对应的待验证视频识别模型,获得实际识别结果,确定实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,基于所确定的损失值,生成用于表征该组训练样本组的优劣程度的数值。
在一些实施例中,基于所得到的初始视频识别模型,生成视频识别模型,包括:基于所确定的数值,为所获得的初始视频识别模型分配权重;基于所分配的权重,对所获得的初始视频识别模型进行融合,生成视频识别模型。
在一些实施例中,基于所得到的初始视频识别模型,生成视频识别模型,包括:将最后一次获得的初始视频识别模型确定为视频识别模型。
第二方面,本申请实施例提供了一种用于生成模型的装置,该装 置包括:样本获取单元,被配置成获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频;模型训练单元,被配置成对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型;模型生成单元,被配置成基于所得到的初始视频识别模型,生成视频识别模型。
在一些实施例中,模型训练单元包括:第一执行模块,被配置成从预设数量个训练样本组中选取训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在一些实施例中,模型训练单元还包括:第二执行模块,被配置成响应于确定存在未被选取的训练样本组,从未被选取的训练样本组中选取训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行训练步骤。
在一些实施例中,模型训练单元包括:数值确定模块,被配置成确定用于表征预设数量个训练样本组的优劣程度的数值;第三执行模块,被配置成基于所确定的数值,从预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定预设数量个训练样本组中是否存在未被选取的训练样本组; 响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在一些实施例中,模型训练单元还包括:第四执行模块,被配置成响应于确定存在未被选取的训练样本组,基于所确定的数值,从未被选取的训练样本组中选取最优的训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行训练步骤。
在一些实施例中,数值确定模块包括:样本获取模块,被配置成获取预先设置的验证样本集合,其中,验证样本包括验证用视频和针对验证用视频预先标注的验证用识别结果;数值生成模块,被配置成对于预设数量个训练样本组中的训练样本组,执行以下步骤:将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为输出,利用机器学习方法训练得到该组训练样本所对应的待验证视频识别模型;将验证样本集合中的验证样本的验证用视频输入该组训练样本所对应的待验证视频识别模型,获得实际识别结果,确定实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,基于所确定的损失值,生成用于表征该组训练样本组的优劣程度的数值。
在一些实施例中,模型生成单元包括:权重分配模块,被配置成基于所确定的数值,为所获得的初始视频识别模型分配权重;模型融合模块,被配置成基于所分配的权重,对所获得的初始视频识别模型进行融合,生成视频识别模型。
第三方面,本申请实施例提供了一种用于识别视频的方法,该方法包括:获取待识别视频,其中,待识别视频为对对象进行拍摄所获得的视频;将待识别视频输入采用如上述第一方面中任一实施例所描述的方法生成的视频识别模型中,生成待识别视频所对应的识别结果,其中,识别结果用于指示待识别视频是否为对显示对象的屏幕进行拍摄所获得的视频。
第四方面,本申请实施例提供了一种用于识别视频的装置,该装置包括:视频获取单元,被配置成获取待识别视频,其中,待识别视 频为对对象进行拍摄所获得的视频;结果生成单元,被配置成将待识别视频输入采用如上述第一方面中任一实施例所描述的方法生成的视频识别模型中,生成待识别视频所对应的识别结果,其中,识别结果用于指示待识别视频是否为对显示对象的屏幕进行拍摄所获得的视频。
第五方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述第一方面和第三方面中任一实施例所描述的方法。
第六方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面和第三方面中任一实施例所描述的方法。
本申请实施例提供的用于生成模型的方法和装置,通过获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频,而后对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,最后基于所得到的初始视频识别模型,生成视频识别模型,从而能够得到一种可以用于识别视频的模型,且有助于丰富模型的生成方式。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本申请的用于生成模型的方法的一个实施例的流程 图;
图3是根据本申请的用于生成模型的方法的一个应用场景的示意图;
图4是根据本申请的用于生成模型的方法的又一个实施例的流程图;
图5是根据本申请的用于生成模型的装置的一个实施例的结构示意图;
图6是根据本申请用于识别视频的方法的一个实施例的流程图;
图7是根据本申请用于识别视频的装置的一个实施例的结构示意图;
图8是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请实施例的用于生成模型的方法、用于生成模型的装置、用于识别视频的方法或用于识别视频的装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端101、102,网络103、数据库服务器104和服务器105。网络103用以在终端101、102,数据库服务器104与服务器105之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户110可以使用终端101、102通过网络103与服务器105进行交互,以接收或发送消息等。终端101、102上可以安装有各种客户端 应用,例如模型训练类应用、视频识别类应用、社交类应用、支付类应用、网页浏览器和即时通讯工具等。
这里的终端101、102可以是硬件,也可以是软件。当终端101、102为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、膝上型便携计算机和台式计算机等等。当终端101、102为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
当终端101、102为硬件时,其上还可以安装有视频采集设备。视频采集设备可以是各种能实现采集视频功能的设备,如摄像头、传感器等等。用户110可以利用终端101、102上的视频采集设备来采集视频。
数据库服务器104可以是提供各种服务的数据库服务器。例如数据库服务器中可以存储有样本集合。样本集合中包含有大量的样本。其中,样本可以包括样本视频以及针对样本视频预先标注的样本识别结果。这样,用户110也可以通过终端101、102,从数据库服务器104所存储的样本集合中选取样本。
服务器105也可以是提供各种服务的服务器,例如对终端101、102上显示的各种应用提供支持的后台服务器。后台服务器可以利用终端101、102发送的样本集合中的样本,对初始模型进行训练,并可以将训练结果(如生成的视频识别模型)发送给终端101、102。这样,用户可以应用生成的视频识别模型进行视频识别。
这里的数据库服务器104和服务器105同样可以是硬件,也可以是软件。当它们为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当它们为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本申请实施例所提供的用于生成模型的方法或用 于识别视频的方法一般由服务器105执行。相应地,用于生成模型的装置或用于识别视频的装置一般也设置于服务器105中。
需要指出的是,在服务器105可以实现数据库服务器104的相关功能的情况下,系统架构100中可以不设置数据库服务器104。
应该理解,图1中的终端、网络、数据库服务器和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端、网络、数据库服务器和服务器。
继续参考图2,示出了根据本申请的用于生成模型的方法的一个实施例的流程200。该用于生成模型的方法,包括以下步骤:
步骤201,获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组。
在本实施例中,用于生成模型的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从数据库服务器(例如图1所示的数据库服务器104)或者终端(例如图1所示的终端101、102)获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组。其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果。样本视频可以为对样本对象进行拍摄所获得的视频。样本对象可以为各种事物,例如人物、动物等物体,或者跑步、游泳等行为。
在本实施例中,样本识别结果可以包括但不限于以下至少一项:文字、数字、符号。样本识别结果可以用于指示样本视频是否为对显示上述样本对象的屏幕进行拍摄所获得的视频。例如,样本识别结果可以包括数字1和数字0,其中,数字1可以用于指示样本视频为对显示上述样本对象的屏幕进行拍摄所获得的视频;数字0可以用于指示样本视频不是对显示上述样本对象的屏幕进行拍摄所获得的视频。
在本实施例中,上述执行主体可以采用各种方式将训练样本集合划分成预设数量个训练样本组。例如,上述执行主体可以采用等分的方式将训练样本集合划分成预设数量个训练样本组,也可以对训练样本集合进行划分,使得预设数量个训练样本组中的每个训练样本组所 包括的训练样本的数量值大于等于预设阈值。需要说明的是,上述预设数量可以由技术人员预先设置。
步骤202,对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型。
在本实施例中,对于步骤201中得到的预设数量个训练样本组中的训练样本组,上述执行主体可以将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型。其中,初始视频识别模型是利用训练样本组中的训练样本训练得到的模型,可以用于确定最终的视频识别模型。
作为示例,针对预设数量个训练样本组中的各个训练样本组,均可以利用预先设置的初始模型(例如卷积神经网络(Convolutional Neural Network,CNN)、残差网络(ResNet)等)进行训练,最终获得与训练样本组相对应的预设数量个初始视频识别模型。具体的,对于预设数量个训练样本组中的每个训练样本组,上述执行主体可以将该组训练样本中的训练样本的样本视频输入初始模型,得到所输入的样本视频所对应的识别结果,然后以所输入的样本视频所对应的样本识别结果作为初始模型的期望输出,利用机器学习方法训练初始模型,并将训练后的初始模型确定为初始视频识别模型。
在本实施例的一些可选的实现方式中,上述执行主体可以基于预设数量个训练样本组,通过如下步骤获得预设数量个初始视频识别模型:
步骤2021,从预设数量个训练样本组中选取训练样本组作为候选训练样本组。
在本实施例中,上述执行主体可以从步骤201中获得的预设数量个训练样本组中选取训练样本组作为候选训练样本组,以及执行步骤2022至步骤2024的训练步骤。其中,训练样本组的选取方式在本申请中并不限制。例如可以是随机选取,也可以是优先选取训练样本较 多的训练样本组。
步骤2022:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型。
具体的,上述执行主体可以通过如下步骤获得候选训练样本组所对应的初始视频识别模型:
上述执行主体可以从候选训练样本组中的选取训练样本,并执行以下步骤:将所选取训练样本的样本视频输入初始模型,获得识别结果;将所输入的样本视频所对应的样本识别结果作为初始模型的期望输出,基于所获得的识别结果和样本识别结果,调整初始模型的参数;确定候选训练样本组中是否存在未被选取的训练样本;响应于不存在未被选取的训练样本,将调整后的初始模型确定为候选训练样本组所对应的初始视频识别模型。需要说明的是,训练样本的选取方式在本申请中并不限制。例如可以是随机选取,也可以是优先选取样本视频的清晰度较好的训练样本。
步骤2023:确定预设数量个训练样本组中是否存在未被选取的训练样本组。
步骤2024:响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
可以理解的是,当预设数量个训练样本组中不存在未被选取的训练样本组时,即针对预设数量个训练样本组中各个训练样本组,均训练生成了相应的初始视频识别模型,故上述执行主体可以响应于确定预设数量个训练样本组中不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
可选的,上述执行主体还可以响应于确定存在未被选取的训练样本组,从未被选取的训练样本组中选取训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行上述训练步骤2022-2024。
在该实现方式中,上述执行主体可以将通过优先选取的训练样本组训练获得的初始视频识别模型作为随后选取的训练样本组所对应的 初始模型,以此,可以有效地利用样本数据,生成更为准确的初始视频识别模型。
步骤203,基于所得到的初始视频识别模型,生成视频识别模型。
在本实施例中,基于步骤202所得到的初始视频识别模型,上述执行主体可以生成视频识别模型。
具体的,上述执行主体可以从所得到的初始视频识别模型中选取一个初始视频识别模型作为视频识别模型,或者对所得到的初始视频识别模型进行处理,获得视频识别模型。
作为示例,上述执行主体可以基于所得到的初始视频识别模型的数量,为各个初始视频识别模型分配相同的权重,进而,基于所分配的权重,对所得到的初始视频识别模型进行融合,获得视频识别模型。
例如,所得到的初始视频识别模型包括:“y=ax+b”;“y=cx+d”。其中,x为自变量,可以用于表征模型的输入;y为因变量,可以用于表征模型的输出;a和b为第一个初始视频识别模型的系数;c和d为第二个初始视频识别模型的系数。在这里,由于得到了两个初始视频识别模型,故可以确定为各个初始视频识别模型分配的权重为0.5(0.5=1÷2),进而可以对基于所分配的权重,对模型“y=ax+b”和模型“y=cx+d”进行融合,获得视频识别模型“y=0.5x(a+c)+0.5(b+d)”(y=0.5*(ax+b)+0.5*(cx+d))。
在本实施例的一些可选的实现方式中,基于上述可选实现方式中的步骤2021-2024所得到的初始视频识别模型,上述执行主体可以直接将最后一次获得的初始视频识别模型确定为视频识别模型。
继续参见图3,图3是根据本实施例的模型生成的方法的应用场景的一个示意图。在图3的应用场景中,用户所使用的终端301上可以安装有模型训练类应用。当用户打开该应用,并上传训练样本集合或训练样本集合的存储路径后,对该应用提供后台支持的服务器302可以运行用于生成模型的方法,包括:
首先,可以获取训练样本集合303以及将训练样本集合303划分成两个(预设数量个)训练样本组304、305,其中,训练样本包括样 本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频。
然后,对于训练样本组304,上述执行主体可以将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型306;对于训练样本组305,上述执行主体可以将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型307。
最后,上述执行主体可以基于所得到的初始视频识别模型306和初始视频识别模型307,生成视频识别模型308。
此时,服务器302还可以向终端301发送用于指示模型训练完成的提示信息。该提示信息可以是语音和/或文字信息。这样,用户可以在预设的存储位置获取到视频识别模型。
本申请的上述实施例提供的方法通过获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,而后对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,最后基于所得到的初始视频识别模型,生成视频识别模型,从而能够得到一种可以用于识别视频的模型,且有助于丰富模型的生成方式。
进一步参考图4,其示出了用于生成模型的方法的又一个实施例的流程400。该用于生成模型的方法的流程400,包括以下步骤:
步骤401,获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组。
在本实施例中,用于生成模型的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从数据库服务器 (例如图1所示的数据库服务器104)或者终端(例如图1所示的终端101、102)获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组。
需要说明的是,步骤401可以采用与前述实施例中的步骤201类似的方式实现。相应地,上文针对步骤201的描述也适可用于本实施例的步骤401,此处不再赘述。
步骤402,确定用于表征预设数量个训练样本组的优劣程度的数值。
在本实施例中,对于步骤401中得到的预设数量个训练样本组,上述执行主体可以确定用于表征预设数量个训练样本组的优劣程度的数值。具体的,上述执行主体可以采用各种方式确定用于表征预设数量个训练样本组的优劣程度的数值,例如,上述执行主体可以确定各个训练样本组包括的训练样本的数量,将所确定的数量的数量值确定为用于表征预设数量个训练样本组的优劣程度的数值。在这里,可以理解的是,训练样本组所包括的训练样本越多,对初始模型的参数调整次数则可能越多,进而训练得到的初始识别模型则可能更为准确,故上述执行主体可以根据训练样本组所包括的训练样本的数量,确定用于表征预设数量个训练样本组的优劣程度的数值。
需要说明的是,在这里,数值的大小与优劣程度的对应关系可以由技术人员预先设置。具体的,可以将对应关系设置为数值越大,训练样本组越优;也可以设置为数值越小,训练样本组越优。
在本实施例的一些可选的实现方式中,上述执行主体可以通过如下步骤确定用于表征预设数量个训练样本组的优劣程度的数值:
首先,上述执行主体可以获取预先设置的验证样本集合,其中,验证样本包括验证用视频和针对验证用视频预先标注的验证用识别结果。
然后,对于预设数量个训练样本组中的训练样本组,上述执行主体可以执行以下步骤:将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为输出,利用机器学习方法训练得到该组训练样本所对应的待验证视频识别模型;将 验证样本集合中的验证样本的验证用视频输入该组训练样本所对应的待验证视频识别模型,获得实际识别结果,确定实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,基于所确定的损失值,生成用于表征该组训练样本组的优劣程度的数值。
其中,损失值可以用于表征实际输出与期望输出之间的差异。可以理解的是,上述差异越小,则训练得到的待验证视频识别模型则越准确,进而,所利用的训练样本组则越优。故基于上述损失值与训练样本组的优劣程度的关系,上述执行主体可以采用各种方式基于所确定的损失值,生成用于表征训练样本组的优劣程度的数值。例如,可以直接将损失值确定为用于表征训练样本组的优劣程度的数值,此时,用于表征训练样本组的优劣程度的数值越小,训练样本组越优;也可以将损失值的倒数确定为用于表征训练样本组的优劣程度的数值,此时,用于表征训练样本组的优劣程度的数值越大,训练样本组越优。
在这里,需要说明的是,上述执行主体可以采用预设的各种损失函数计算所获得的实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,例如,可以采用L2范数作为损失函数计算损失值。
步骤403,基于所确定的数值,从预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组。
在本实施例中,上述执行主体可以基于步骤402确定的数值,从步骤401中获得的预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,以及执行步骤404至步骤406的训练步骤。
需要说明的是,本实施例的具体实现在于从预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,故当用于表征训练样本组的优劣程度的数值越大,训练样本组越优时,上述执行主体可以从预设数量个训练样本组中选取所确定的、最大的数值所对应的训练样本组作为候选训练样本组;当用于表征训练样本组的优劣程度的数值越小,训练样本组越优时,上述执行主体可以从预设数量个训练样本组中选取所确定的、最小的数值所对应的训练样本组作为候选训练样本组。
步骤404,将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型。
具体的,上述执行主体可以通过如下步骤获得候选训练样本组所对应的初始视频识别模型:
上述执行主体可以从候选训练样本组中的选取训练样本,并执行以下步骤:将所选取训练样本的样本视频输入初始模型,获得识别结果;将所输入的样本视频所对应的样本识别结果作为初始模型的期望输出,基于所获得的识别结果和样本识别结果,调整初始模型的参数;确定候选训练样本组中是否存在未被选取的训练样本;响应于不存在未被选取的训练样本,将调整后的初始模型确定为候选训练样本组所对应的初始视频识别模型。需要说明的是,训练样本的选取方式在本申请中并不限制。例如可以是随机选取,也可以是优先选取样本视频的清晰度较好的训练样本。
步骤405,确定预设数量个训练样本组中是否存在未被选取的训练样本组。
步骤406,响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
可以理解的是,当预设数量个训练样本组中不存在未被选取的训练样本组时,即针对预设数量个训练样本组中各个训练样本组,均训练生成了相应的初始视频识别模型,故上述执行主体可以响应于确定预设数量个训练样本组中不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在本实施例的一些可选的实现方式中,上述执行主体还可以响应于确定存在未被选取的训练样本组,基于所确定的数值,从未被选取的训练样本组中选取最优的训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行上述训练步骤404-406。
步骤407,基于所得到的初始视频识别模型,生成视频识别模型。
在本实施例中,基于步骤406所得到的初始视频识别模型,上述 执行主体可以生成视频识别模型。
具体的,上述执行主体可以从所得到的初始识别模型中选取一个初始视频识别模型作为视频识别模型,或者对所得到的初始视频识别模型进行处理,获得视频识别模型。
在本实施例的一些可选的实现方式中,上述执行主体可以通过如下步骤生成视频识别模型:首先,上述执行主体可以基于步骤402中确定的数值,为所获得的初始视频识别模型分配权重。然后,上述执行主体可以基于所分配的权重,对所获得的初始视频识别模型进行融合,生成视频识别模型。具体的,上述执行主体可以基于所确定的数值,确定各个训练样本组的优劣程度,进而通过各种方式为所获得的初始视频识别模型分配权重,以使较优的训练样本组所对应的初始视频识别模型所对应的权重较大,较劣的训练样本组所对应的初始视频识别模型所对应的权重较小。
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于生成模型的方法的流程400突出了确定用于表征预设数量个训练样本组的优劣程度的数值,进而基于所确定的数值,从预设数量个训练样本组中选取训练样本组进行训练的步骤。由此,本实施例描述的方案可以首先利用较优的训练样本组进行训练,获得较准确的初始视频识别模型,从而后续的训练可以在此基础上对初始视频识别模型进行较小的调整,提高了模型生成的效率。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种用于生成模型的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的用于生成模型的装置500包括:样本获取单元501、模型训练单元502和模型生成单元503。其中,样本获取单元501被配置成获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕 进行拍摄所获得的视频;模型训练单元502被配置成对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型;模型生成单元503被配置成基于所得到的初始视频识别模型,生成视频识别模型。
在本实施例中,用于生成模型的装置500的样本获取单元501可以通过有线连接方式或者无线连接方式从数据库服务器(例如图1所示的数据库服务器104)或者终端(例如图1所示的终端101、102)获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组。其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果。样本视频可以为对样本对象进行拍摄所获得的视频。样本对象可以为各种事物。
在本实施例中,样本识别结果可以包括但不限于以下至少一项:文字、数字、符号。样本识别结果可以用于指示样本视频是否为对显示上述样本对象的屏幕进行拍摄所获得的视频。
在本实施例中,样本获取单元501可以采用各种方式将训练样本集合划分成预设数量个训练样本组。需要说明的是,上述预设数量可以由技术人员预先设置。
在本实施例中,对于样本获取单元501中得到的预设数量个训练样本组中的训练样本组,模型训练单元502可以将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型。其中,初始视频识别模型是利用训练样本组中的训练样本训练得到的模型,可以用于确定最终的视频识别模型。
在本实施例中,基于模型训练单元502所得到的初始视频识别模型,模型生成单元503可以生成视频识别模型。
具体的,上述执行主体可以从所得到的初始视频识别模型中选取一个初始视频识别模型作为视频识别模型,或者对所得到的初始视频识别模型进行处理,获得视频识别模型。
在本实施例的一些可选的实现方式中,模型训练单元502可以包括:第一执行模块(图中未示出),被配置成从预设数量个训练样本组中选取训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在本实施例的一些可选的实现方式中,模型训练单元502还可以包括:第二执行模块(图中未示出),被配置成响应于确定存在未被选取的训练样本组,从未被选取的训练样本组中选取训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行训练步骤。
在本实施例的一些可选的实现方式中,模型训练单元502可以包括:数值确定模块(图中未示出),被配置成确定用于表征预设数量个训练样本组的优劣程度的数值;第三执行模块(图中未示出),被配置成基于所确定的数值,从预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
在本实施例的一些可选的实现方式中,模型训练单元502还可以包括:第四执行模块(图中未示出),被配置成响应于确定存在未被选取的训练样本组,基于所确定的数值,从未被选取的训练样本组中选取最优的训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行训练步骤。
在本实施例的一些可选的实现方式中,数值确定模块(图中未示 出)可以包括:样本获取模块(图中未示出),被配置成获取预先设置的验证样本集合,其中,验证样本包括验证用视频和针对验证用视频预先标注的验证用识别结果;数值生成模块(图中未示出),被配置成对于预设数量个训练样本组中的训练样本组,执行以下步骤:将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为输出,利用机器学习方法训练得到该组训练样本所对应的待验证视频识别模型;将验证样本集合中的验证样本的验证用视频输入该组训练样本所对应的待验证视频识别模型,获得实际识别结果,确定实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,基于所确定的损失值,生成用于表征该组训练样本组的优劣程度的数值。
在本实施例的一些可选的实现方式中,模型生成单元503可以包括:权重分配模块(图中未示出),被配置成基于所确定的数值,为所获得的初始视频识别模型分配权重;模型融合模块(图中未示出),被配置成基于所分配的权重,对所获得的初始视频识别模型进行融合,生成视频识别模型。
在本实施例的一些可选的实现方式中,模型生成单元503可以进一步被配置成:将最后一次获得的初始视频识别模型确定为视频识别模型。
本申请的上述实施例提供的装置500通过样本获取单元501获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频,而后对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,模型训练单元502将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,最后模型生成单元503基于所得到的初始视频识别模型,生成视频识别模型,从而能够得到一种可以用于识别视频的模型,且有助于丰富模型的生成方式。
请参见图6,其示出了本申请提供的用于识别视频的方法的一个实施例的流程600。该用于识别视频的方法可以包括以下步骤:
步骤601,获取待识别视频。
在本实施例中,用于识别视频的方法的执行主体(例如图1所示的服务器105)可以通过有线的连接方式或者无线的连接方式获取带识别视频。例如,上述执行主体可以从数据库服务器(例如图1所示的数据库服务器104)中获取存储于其中的视频,也可以接收终端(例如图1所示的终端101、102)或其他设备采集的视频。
在本实施例中,待识别视频可以为对对象进行拍摄所获得的视频。对象可以为各种事物,例如人物、动物等物体,或者跑步、游泳等行为。
步骤602,将待识别视频输入视频识别模型中,生成待识别视频所对应的识别结果。
在本实施例中,上述执行主体可以将步骤601中获取的待识别视频输入视频模型中,从而生成待识别视频所对应的识别结果。其中,识别结果可以用于指示待识别视频是否为对显示上述对象的屏幕进行拍摄所获得的视频。
在本实施例中,视频识别模型可以是采用如上述图2实施例所描述的方法而生成的。具体生成过程可以参见图2实施例的相关描述,在此不再赘述。
需要说明的是,本实施例用于识别视频的方法可以用于测试上述各实施例所生成的视频识别模型。进而根据测试结果可以不断地优化视频识别模型。该方法也可以是上述各实施例所生成的视频识别模型的实际应用方法。采用上述各实施例所生成的视频识别模型来进行视频识别,可以实现对通过录制屏幕获得的视频的检测,且有助于提高视频识别的准确性。
继续参见图7,作为对上述图6所示方法的实现,本申请提供了一种用于识别视频的装置的一个实施例。该装置实施例与图6所示的 方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图7所示,本实施例的用于识别视频的装置700可以包括:视频获取单元701和结果生成单元702。其中,视频获取单元701被配置成获取待识别视频,其中,待识别视频为对对象进行拍摄所获得的视频;结果生成单元702被配置成将待识别视频输入采用如上述图2实施例所描述的方法生成的模型中,生成待识别视频所对应的识别结果,其中,识别结果用于指示待识别视频是否为对显示对象的屏幕进行拍摄所获得的视频。
可以理解的是,该装置700中记载的诸单元与参考图6描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置700及其中包含的单元,在此不再赘述。
下面参见图8,其示出了适于用来实现本申请实施例的电子设备的计算机系统800的结构示意图。图8示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有系统800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
以下部件连接至I/O接口805:包括触摸屏、键盘、鼠标、摄像装置等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码 的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括样本获取单元、模型训练单元和模型生成单元。再例如,也可以描述为:一种处理器包括获取单元、训练单元和生成单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,样本获取单元还可以被描述为“获取训练样本集合的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,样本视频为对样本对象进行拍摄所获得的视频,样本识别结果用于指示样本视频是否为对显示样本对象的屏幕进行拍摄所获得的视频;对于预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型;基于所得到的初始视频识别模型,生成视频识别模型。
此外,当上述一个或者多个程序被该电子设备执行时,还可以使得该电子设备:获取待识别视频,其中,待识别视频为对对象进行拍 摄所获得的视频;将待识别视频输入视频识别模型中,生成待识别视频所对应的识别结果,其中,识别结果用于指示待识别视频是否为对显示对象的屏幕进行拍摄所获得的视频。视频识别模型可以是采用如上述各实施例所描述的用于生成模型的方法而生成的。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (20)

  1. 一种用于生成模型的方法,包括:
    获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,所述样本视频为对样本对象进行拍摄所获得的视频,所述样本识别结果用于指示所述样本视频是否为对显示所述样本对象的屏幕进行拍摄所获得的视频;
    对于所述预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型;
    基于所得到的初始视频识别模型,生成视频识别模型。
  2. 根据权利要求1所述的方法,其中,所述对于所述预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,包括:
    从所述预设数量个训练样本组中选取训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定所述预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
  3. 根据权利要求2所述的方法,其中,所述对于所述预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输 出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,还包括:
    响应于确定存在未被选取的训练样本组,从未被选取的训练样本组中选取训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行所述训练步骤。
  4. 根据权利要求1所述的方法,其中,所述对于所述预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,包括:
    确定用于表征所述预设数量个训练样本组的优劣程度的数值;
    基于所确定的数值,从所述预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定所述预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
  5. 根据权利要求4所述的方法,其中,所述对于所述预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型,还包括:
    响应于确定存在未被选取的训练样本组,基于所确定的数值,从未被选取的训练样本组中选取最优的训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行所述训练步骤。
  6. 根据权利要求4所述的方法,其中,所述确定用于表征所述预设数量个训练样本组的优劣程度的数值,包括:
    获取预先设置的验证样本集合,其中,验证样本包括验证用视频和针对验证用视频预先标注的验证用识别结果;
    对于所述预设数量个训练样本组中的训练样本组,执行以下步骤:将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为输出,利用机器学习方法训练得到该组训练样本所对应的待验证视频识别模型;将验证样本集合中的验证样本的验证用视频输入该组训练样本所对应的待验证视频识别模型,获得实际识别结果,确定实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,基于所确定的损失值,生成用于表征该组训练样本组的优劣程度的数值。
  7. 根据权利要求4-6之一所述的方法,其中,所述基于所得到的初始视频识别模型,生成视频识别模型,包括:
    基于所确定的数值,为所获得的初始视频识别模型分配权重;
    基于所分配的权重,对所获得的初始视频识别模型进行融合,生成视频识别模型。
  8. 根据权利要求2-6之一所述的方法,其中,所述基于所得到的初始视频识别模型,生成视频识别模型,包括:
    将最后一次获得的初始视频识别模型确定为视频识别模型。
  9. 一种用于生成模型的装置,包括:
    样本获取单元,被配置成获取训练样本集合,以及将训练样本集合划分成预设数量个训练样本组,其中,训练样本包括样本视频和针对样本视频预先标注的样本识别结果,所述样本视频为对样本对象进行拍摄所获得的视频,所述样本识别结果用于指示所述样本视频是否为对显示所述样本对象的屏幕进行拍摄所获得的视频;
    模型训练单元,被配置成对于所述预设数量个训练样本组中的训练样本组,将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法训练得到该组训练样本所对应的初始视频识别模型;
    模型生成单元,被配置成基于所得到的初始视频识别模型,生成视频识别模型。
  10. 根据权利要求9所述的装置,其中,所述模型训练单元包括:
    第一执行模块,被配置成从所述预设数量个训练样本组中选取训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定所述预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
  11. 根据权利要求10所述的装置,其中,所述模型训练单元还包括:
    第二执行模块,被配置成响应于确定存在未被选取的训练样本组,从未被选取的训练样本组中选取训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行所述训练步骤。
  12. 根据权利要求9所述的装置,其中,所述模型训练单元包括:
    数值确定模块,被配置成确定用于表征所述预设数量个训练样本组的优劣程度的数值;
    第三执行模块,被配置成基于所确定的数值,从所述预设数量个训练样本组中选取最优的训练样本组作为候选训练样本组,以及基于候选训练样本组和初始模型,执行以下训练步骤:将候选训练样本组 中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为期望输出,利用机器学习方法对初始模型进行训练,获得初始视频识别模型;确定所述预设数量个训练样本组中是否存在未被选取的训练样本组;响应于确定不存在未被选取的训练样本组,获得预设数量个初始视频识别模型。
  13. 根据权利要求12所述的装置,其中,所述模型训练单元还包括:
    第四执行模块,被配置成响应于确定存在未被选取的训练样本组,基于所确定的数值,从未被选取的训练样本组中选取最优的训练样本组作为新的候选训练样本组,将最近一次获得的初始视频识别模型作为新的初始模型,继续执行所述训练步骤。
  14. 根据权利要求12所述的装置,其中,所述数值确定模块包括:
    样本获取模块,被配置成获取预先设置的验证样本集合,其中,验证样本包括验证用视频和针对验证用视频预先标注的验证用识别结果;
    数值生成模块,被配置成对于所述预设数量个训练样本组中的训练样本组,执行以下步骤:将该组训练样本中的训练样本的样本视频作为输入,将所输入的样本视频所对应的样本识别结果作为输出,利用机器学习方法训练得到该组训练样本所对应的待验证视频识别模型;将验证样本集合中的验证样本的验证用视频输入该组训练样本所对应的待验证视频识别模型,获得实际识别结果,确定实际识别结果相对于所输入的验证用视频所对应的验证用识别结果的损失值,基于所确定的损失值,生成用于表征该组训练样本组的优劣程度的数值。
  15. 根据权利要求12-14之一所述的装置,其中,所述模型生成单元包括:
    权重分配模块,被配置成基于所确定的数值,为所获得的初始视频识别模型分配权重;
    模型融合模块,被配置成基于所分配的权重,对所获得的初始视频识别模型进行融合,生成视频识别模型。
  16. 根据权利要求10-14之一所述的装置,其中,所述模型生成单元进一步被配置成:
    将最后一次获得的初始视频识别模型确定为视频识别模型。
  17. 一种用于识别视频的方法,包括:
    获取待识别视频,其中,所述待识别视频为对对象进行拍摄所获得的视频;
    将所述待识别视频输入采用如权利要求1-8之一所述的方法生成的视频识别模型中,生成所述待识别视频所对应的识别结果,其中,所述识别结果用于指示所述待识别视频是否为对显示所述对象的屏幕进行拍摄所获得的视频。
  18. 一种用于识别视频的装置,包括:
    视频获取单元,被配置成获取待识别视频,其中,所述待识别视频为对对象进行拍摄所获得的视频;
    结果生成单元,被配置成将所述待识别视频输入采用如权利要求1-8之一所述的方法生成的视频识别模型中,生成所述待识别视频所对应的识别结果,其中,所述识别结果用于指示所述待识别视频是否为对显示所述对象的屏幕进行拍摄所获得的视频。
  19. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8、17中任一所述的方法。
  20. 一种计算机可读介质,其上存储有计算机程序,其中,所述 计算机程序被处理器执行时实现如权利要求1-8、17中任一所述的方法。
PCT/CN2018/116339 2018-06-15 2018-11-20 用于生成模型的方法和装置 WO2019237657A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810617804.4A CN108805091B (zh) 2018-06-15 2018-06-15 用于生成模型的方法和装置
CN201810617804.4 2018-06-15

Publications (1)

Publication Number Publication Date
WO2019237657A1 true WO2019237657A1 (zh) 2019-12-19

Family

ID=64086183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116339 WO2019237657A1 (zh) 2018-06-15 2018-11-20 用于生成模型的方法和装置

Country Status (2)

Country Link
CN (1) CN108805091B (zh)
WO (1) WO2019237657A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101566A (zh) * 2020-09-11 2020-12-18 石化盈科信息技术有限责任公司 预测模型训练方法、价格预测方法、存储介质及电子设备
CN112101464A (zh) * 2020-09-17 2020-12-18 西安泽塔云科技股份有限公司 基于深度学习的影像样本数据的获取方法和装置
CN112149807A (zh) * 2020-09-28 2020-12-29 北京百度网讯科技有限公司 用户特征信息的处理方法和装置
CN112200218A (zh) * 2020-09-10 2021-01-08 浙江大华技术股份有限公司 一种模型训练方法、装置及电子设备
CN112819078A (zh) * 2021-02-04 2021-05-18 上海明略人工智能(集团)有限公司 一种识别模型的迭代方法和装置
CN112925785A (zh) * 2021-03-29 2021-06-08 中国建设银行股份有限公司 数据清洗方法和装置
CN113138847A (zh) * 2020-01-19 2021-07-20 京东数字科技控股有限公司 基于联邦学习的计算机资源分配调度方法和装置
CN113807122A (zh) * 2020-06-11 2021-12-17 阿里巴巴集团控股有限公司 模型训练方法、对象识别方法及装置、存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805091B (zh) * 2018-06-15 2021-08-10 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109492128B (zh) * 2018-10-30 2020-01-21 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109740018B (zh) * 2019-01-29 2021-03-02 北京字节跳动网络技术有限公司 用于生成视频标签模型的方法和装置
CN109816023B (zh) * 2019-01-29 2022-01-04 北京字节跳动网络技术有限公司 用于生成图片标签模型的方法和装置
CN110007755A (zh) * 2019-03-15 2019-07-12 百度在线网络技术(北京)有限公司 基于动作识别的物体事件触发方法、装置及其相关设备
CN110009101B (zh) * 2019-04-11 2020-09-25 北京字节跳动网络技术有限公司 用于生成量化神经网络的方法和装置
CN111949860B (zh) * 2019-05-15 2022-02-08 北京字节跳动网络技术有限公司 用于生成相关度确定模型的方法和装置
CN110619537A (zh) * 2019-06-18 2019-12-27 北京无限光场科技有限公司 用于生成信息的方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833569A (zh) * 2010-04-08 2010-09-15 中国科学院自动化研究所 一种对电影人脸图像进行自动标识的方法
CN105354543A (zh) * 2015-10-29 2016-02-24 小米科技有限责任公司 视频处理方法及装置
CN107766940A (zh) * 2017-11-20 2018-03-06 北京百度网讯科技有限公司 用于生成模型的方法和装置
CN108805091A (zh) * 2018-06-15 2018-11-13 北京字节跳动网络技术有限公司 用于生成模型的方法和装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9767540B2 (en) * 2014-05-16 2017-09-19 Adobe Systems Incorporated Patch partitions and image processing
CN104598972A (zh) * 2015-01-22 2015-05-06 清华大学 一种大规模数据回归神经网络快速训练方法
CN105912500B (zh) * 2016-03-30 2017-11-14 百度在线网络技术(北京)有限公司 机器学习模型生成方法和装置
CN107766868A (zh) * 2016-08-15 2018-03-06 中国联合网络通信集团有限公司 一种分类器训练方法及装置
CN107992783A (zh) * 2016-10-26 2018-05-04 上海银晨智能识别科技有限公司 人脸图像处理方法及装置
CN106529008B (zh) * 2016-11-01 2019-11-26 天津工业大学 一种基于蒙特卡罗及lasso的双集成偏最小二乘建模方法
CN106529598B (zh) * 2016-11-11 2020-05-08 北京工业大学 一种基于不均衡医疗图像数据集的分类方法与系统
CN106897746B (zh) * 2017-02-28 2020-03-03 北京京东尚科信息技术有限公司 数据分类模型训练方法和装置
CN107423673A (zh) * 2017-05-11 2017-12-01 上海理湃光晶技术有限公司 一种人脸识别方法及系统
CN107657243B (zh) * 2017-10-11 2019-07-02 电子科技大学 基于遗传算法优化的神经网络雷达一维距离像目标识别方法
CN107967491A (zh) * 2017-12-14 2018-04-27 北京木业邦科技有限公司 木板识别的机器再学习方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833569A (zh) * 2010-04-08 2010-09-15 中国科学院自动化研究所 一种对电影人脸图像进行自动标识的方法
CN105354543A (zh) * 2015-10-29 2016-02-24 小米科技有限责任公司 视频处理方法及装置
CN107766940A (zh) * 2017-11-20 2018-03-06 北京百度网讯科技有限公司 用于生成模型的方法和装置
CN108805091A (zh) * 2018-06-15 2018-11-13 北京字节跳动网络技术有限公司 用于生成模型的方法和装置

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138847A (zh) * 2020-01-19 2021-07-20 京东数字科技控股有限公司 基于联邦学习的计算机资源分配调度方法和装置
CN113807122A (zh) * 2020-06-11 2021-12-17 阿里巴巴集团控股有限公司 模型训练方法、对象识别方法及装置、存储介质
CN112200218A (zh) * 2020-09-10 2021-01-08 浙江大华技术股份有限公司 一种模型训练方法、装置及电子设备
CN112200218B (zh) * 2020-09-10 2023-06-20 浙江大华技术股份有限公司 一种模型训练方法、装置及电子设备
CN112101566A (zh) * 2020-09-11 2020-12-18 石化盈科信息技术有限责任公司 预测模型训练方法、价格预测方法、存储介质及电子设备
CN112101464A (zh) * 2020-09-17 2020-12-18 西安泽塔云科技股份有限公司 基于深度学习的影像样本数据的获取方法和装置
CN112101464B (zh) * 2020-09-17 2024-03-15 西安锐思数智科技股份有限公司 基于深度学习的影像样本数据的获取方法和装置
CN112149807A (zh) * 2020-09-28 2020-12-29 北京百度网讯科技有限公司 用户特征信息的处理方法和装置
CN112819078A (zh) * 2021-02-04 2021-05-18 上海明略人工智能(集团)有限公司 一种识别模型的迭代方法和装置
CN112819078B (zh) * 2021-02-04 2023-12-15 上海明略人工智能(集团)有限公司 一种图片识别模型的迭代方法和装置
CN112925785A (zh) * 2021-03-29 2021-06-08 中国建设银行股份有限公司 数据清洗方法和装置

Also Published As

Publication number Publication date
CN108805091A (zh) 2018-11-13
CN108805091B (zh) 2021-08-10

Similar Documents

Publication Publication Date Title
WO2019237657A1 (zh) 用于生成模型的方法和装置
WO2019242222A1 (zh) 用于生成信息的方法和装置
CN111476871B (zh) 用于生成视频的方法和装置
WO2020000879A1 (zh) 图像识别方法和装置
WO2020000876A1 (zh) 用于生成模型的方法和装置
CN110298906B (zh) 用于生成信息的方法和装置
CN110288682B (zh) 用于控制三维虚拟人像口型变化的方法和装置
CN108416310B (zh) 用于生成信息的方法和装置
CN108900776A (zh) 用于确定响应时间的方法和装置
CN109993150B (zh) 用于识别年龄的方法和装置
CN111523413B (zh) 生成人脸图像的方法和装置
CN109034069B (zh) 用于生成信息的方法和装置
CN109981787B (zh) 用于展示信息的方法和装置
WO2020029608A1 (zh) 用于检测电极片毛刺的方法和装置
CN110084317B (zh) 用于识别图像的方法和装置
CN109862100B (zh) 用于推送信息的方法和装置
CN109214501B (zh) 用于识别信息的方法和装置
CN110211121B (zh) 用于推送模型的方法和装置
CN108510084B (zh) 用于生成信息的方法和装置
US11750898B2 (en) Method for generating target video, apparatus, server, and medium
CN110046571B (zh) 用于识别年龄的方法和装置
CN108921138B (zh) 用于生成信息的方法和装置
CN107402878B (zh) 测试方法和装置
CN109034085B (zh) 用于生成信息的方法和装置
CN111260756B (zh) 用于发送信息的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18922382

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18922382

Country of ref document: EP

Kind code of ref document: A1