WO2020000876A1 - 用于生成模型的方法和装置 - Google Patents

用于生成模型的方法和装置 Download PDF

Info

Publication number
WO2020000876A1
WO2020000876A1 PCT/CN2018/116175 CN2018116175W WO2020000876A1 WO 2020000876 A1 WO2020000876 A1 WO 2020000876A1 CN 2018116175 W CN2018116175 W CN 2018116175W WO 2020000876 A1 WO2020000876 A1 WO 2020000876A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
sequence
training
video
actual
Prior art date
Application number
PCT/CN2018/116175
Other languages
English (en)
French (fr)
Inventor
李伟健
许世坤
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020000876A1 publication Critical patent/WO2020000876A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for generating a model.
  • the captured video is usually processed to generate tags used to characterize the content displayed in the video.
  • the embodiments of the present application provide a method and a device for generating a model, and a method and a device for identifying a video.
  • an embodiment of the present application provides a method for generating a model, including: obtaining a training sample set, where the training sample includes a sample video including a sample object and a sample determined in advance for the sample object in the sample video A label sequence.
  • the sample label is used to characterize the content indicated by the sample object.
  • the sample label sequence includes at least two sample labels.
  • the sample labels in the sample label sequence have a hierarchical relationship.
  • the content corresponding to the low-level sample label belongs to the high-level sample.
  • the content corresponding to the label selecting training samples from the training sample set, and performing the following training steps: inputting the sample video in the selected training sample into the initial model, and obtaining at least two candidate label sequences corresponding to the sample object in the sample video ; Select a candidate tag sequence as the actual tag sequence from at least two candidate tag sequences; determine whether the initial model is trained based on the actual tag sequence and the sample tag sequence in the selected training sample; and in response to determining that the initial model training is completed, train the Completed initial mold Model as a video recognition model.
  • selecting the candidate tag sequence as the actual tag sequence from at least two candidate tag sequences includes: for the candidate tag sequence in the at least two candidate tag sequences, based on a probability corresponding to the candidate tag in the candidate tag sequence, The probability corresponding to the candidate tag sequence is determined; the candidate tag sequence with the highest probability is selected as the actual tag sequence.
  • determining the probability corresponding to the candidate tag sequence based on the probability corresponding to the candidate tag in the candidate tag sequence includes: multiplying the probability corresponding to the candidate tag in the candidate tag sequence to obtain Quadrature result; the obtained quadrature result is determined as a probability corresponding to the candidate tag sequence.
  • determining whether the initial model is trained based on the actual label sequence and the sample label sequence in the selected training sample includes: for actual labels in the actual label sequence, determining whether the actual label is relative to the actual label. The loss value of the sample label in the corresponding sample label sequence; based on the determined loss value, it is determined whether the initial model training is completed.
  • determining whether the initial model is trained based on the determined loss value includes: for actual labels in the actual label sequence, determining a level corresponding to the actual label, and determining a loss value corresponding to the actual label Whether it is less than or equal to the loss threshold preset for the determined level; and in response to determining that the loss values corresponding to the actual labels in the actual label sequence are all less than or equal to the corresponding loss threshold, it is determined that the initial model training is completed.
  • determining whether the initial model is trained based on the determined loss value includes: determining a level corresponding to the actual label in the actual label sequence; obtaining a preset weight for different levels of labels, and based on the obtained The weighted sum of the determined loss values to obtain a weighted sum; determine the obtained weighted sum as the total loss value of the actual label sequence relative to the sample label sequence, and respond to determining the total loss If the value is less than or equal to the preset total loss threshold, it is determined that the initial model training is completed.
  • the method further includes: in response to determining that the initial model is not trained, adjusting relevant parameters in the initial model, selecting training samples from the training samples that have not been selected, and using the most recently adjusted initial model as the initial The model uses the most recently selected training sample as the selected training sample, and continues to execute the training step.
  • an embodiment of the present application provides a device for generating a model.
  • the device includes a sample obtaining unit configured to obtain a training sample set, where the training sample includes a sample video including a sample object and a sample video.
  • the sample tag sequence in the sample object in the predetermined sample tag is used to characterize the content indicated by the sample object.
  • the sample tag sequence includes at least two sample tags.
  • the sample tags in the sample tag sequence have a hierarchical relationship.
  • the corresponding content belongs to the content corresponding to the high-level sample tags; the first execution unit is configured to select training samples from the training sample set, and execute the following training steps: inputting the sample video in the selected training samples into the initial model, Obtain at least two candidate tag sequences corresponding to the sample object in the sample video; select the candidate tag sequence as the actual tag sequence from the at least two candidate tag sequences; determine based on the actual tag sequence and the sample tag sequence in the selected training sample Whether the initial model is trained In response to determining an initial model training is completed, the completion of the initial model training as a video recognition model.
  • the first execution unit includes a probability determination module configured to determine, for at least two candidate tag sequences in the candidate tag sequence, the candidate tag sequence based on the probability corresponding to the candidate tags in the candidate tag sequence. The corresponding probability; the sequence selection module is configured to select the candidate tag sequence with the highest probability as the actual tag sequence.
  • the probability determination module is further configured to: product the probabilities corresponding to the candidate tags in the candidate tag sequence to obtain the product result; and determine the obtained product result as the candidate tag sequence. Corresponding probability.
  • the first execution unit includes: a loss determination module configured to determine, for an actual label in the actual label sequence, a loss value of the actual label relative to a sample label in a sample label sequence corresponding to the actual label.
  • a model determination module configured to determine whether the initial model is trained based on the determined loss value.
  • the model determination module is further configured to: for actual labels in the actual label sequence, determine a level corresponding to the actual label, and determine whether a loss value corresponding to the actual label is less than or equal to the determined level A preset loss threshold; in response to determining that the loss values corresponding to the actual labels in the actual label sequence are less than or equal to the corresponding loss threshold, it is determined that the initial model training is completed.
  • the model determination module is further configured to: determine a level corresponding to an actual label in the actual label sequence; obtain a weight set in advance for a label of a different level, and, based on the obtained weight, determine the determined loss Weighted summation processing to obtain a weighted summation value; determine the obtained weighted summation value as the total loss value of the actual label sequence relative to the sample label sequence, and respond to determining that the total loss value is less than or equal to a preset total loss
  • the threshold value determines the completion of the initial model training.
  • the apparatus further comprises: a second execution unit configured to respond to determining that the initial model is not trained, adjusting related parameters in the initial model, selecting training samples from unselected training samples, and using The last adjusted initial model is used as the initial model and the most recently selected training sample is used as the selected training sample, and the training step is continued.
  • a second execution unit configured to respond to determining that the initial model is not trained, adjusting related parameters in the initial model, selecting training samples from unselected training samples, and using The last adjusted initial model is used as the initial model and the most recently selected training sample is used as the selected training sample, and the training step is continued.
  • an embodiment of the present application provides a method for identifying a video.
  • the method includes: obtaining a to-be-recognized video including an object; and using the to-be-recognized video input as described in any one of the foregoing first embodiments
  • a tag sequence corresponding to the object in the video to be identified is generated, wherein the tag is used to represent the content indicated by the object, the tag sequence includes at least two tags, and the tags in the tag sequence have a hierarchical relationship, The content corresponding to the low-level tag belongs to the content corresponding to the high-level tag.
  • an embodiment of the present application provides a device for identifying a video.
  • the device includes a video acquiring unit configured to acquire a video to be identified including an object, and a sequence generating unit configured to input a video to be identified.
  • a tag sequence corresponding to an object in the video to be identified is generated, where the tag is used to characterize the content indicated by the object, and the tag sequence includes At least two tags.
  • the tags in the tag sequence have a hierarchical relationship, and the content corresponding to the tag with the lower rank belongs to the content corresponding to the tag with the higher rank.
  • an embodiment of the present application provides an electronic device, including: one or more processors; a storage device storing one or more programs thereon, and when one or more programs are processed by one or more processors Execution causes one or more processors to implement the method as described in any one of the first and third aspects described above.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored.
  • the program is executed by a processor, the method is implemented as described in any one of the first aspect and the third aspect. .
  • the method and device for generating a model provided in the embodiments of the present application are obtained by acquiring a training sample set, where the training sample includes a sample video including a sample object and a predetermined sample label sequence for the sample object in the sample video.
  • the sample tag sequence includes at least two sample tags.
  • the sample tags in the sample tag sequence have a hierarchical relationship. The content corresponding to the sample tag with a lower rank belongs to the content corresponding to the sample tag with a higher rank.
  • training samples from the training sample set select training samples from the training sample set, and perform the following training steps: input the sample video in the selected training sample into the initial model, and obtain at least two candidate label sequences corresponding to the sample objects in the sample video; from at least two Candidate tag sequence selects the candidate tag sequence as the actual tag sequence; determines whether the initial model is trained based on the actual tag sequence and the sample tag sequence in the selected training sample; in response to determining that the initial model training is completed, the trained initial model is taken as See Recognition model, it is possible to obtain a model may be used for identifying the video, and rich manner helps to generate the model.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for generating a model according to the present application
  • FIG. 3 is a schematic diagram of an application scenario of a method for generating a model according to the present application
  • FIG. 4 is a flowchart of still another embodiment of a method for generating a model according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of a device for generating a model according to the present application.
  • FIG. 6 is a flowchart of an embodiment of a method for identifying a video according to the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of an apparatus for identifying a video according to the present application.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 of a method for generating a model, a device for generating a model, a method for identifying a video, or a device for identifying a video to which embodiments of the present application can be applied.
  • the system architecture 100 may include terminals 101 and 102, a network 103, a database server 104, and a server 105.
  • the network 103 is used to provide a medium for a communication link between the terminals 101, 102, the database server 104, and the server 105.
  • the network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user 110 can use the terminals 101 and 102 to interact with the server 105 through the network 103 to receive or send messages and the like.
  • Terminals 101 and 102 can be installed with various client applications, such as model training applications, video recognition applications, social applications, payment applications, web browsers, and instant messaging tools.
  • the terminals 101 and 102 here may be hardware or software.
  • the terminals 101 and 102 are hardware, they can be various electronic devices with display screens, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Pictures Experts Group Audio Layer III, Motion Picture Experts Compression standard audio layer 3), laptop portable computers and desktop computers, etc.
  • the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • a video capture device may also be installed thereon.
  • the video capture device can be a variety of devices that can capture video, such as cameras, sensors, and so on.
  • the user 110 may use a video capture device on the terminals 101 and 102 to capture video.
  • the database server 104 may be a database server that provides various services.
  • a database server may store a sample collection.
  • the sample set contains a large number of samples.
  • the sample may include a sample video including a sample object and a sample label sequence determined in advance for the sample object in the sample video. In this way, the user 110 can also select samples from the sample set stored in the database server 104 through the terminals 101 and 102.
  • the server 105 may also be a server that provides various services, such as a background server that provides support for various applications displayed on the terminals 101 and 102.
  • the background server can use the samples in the sample set sent by the terminals 101 and 102 to train the initial model, and can send the training results (such as the generated video recognition model) to the terminals 101 and 102. In this way, the user can apply the generated video recognition model for video recognition.
  • the database server 104 and the server 105 here may also be hardware or software. When they are hardware, they can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When they are software, they can be implemented as multiple software or software modules (for example, to provide distributed services), or they can be implemented as a single software or software module. It is not specifically limited here.
  • the method for generating a model or the method for identifying a video provided by the embodiment of the present application is generally executed by the server 105. Accordingly, a device for generating a model or a device for identifying a video is also generally provided in the server 105.
  • the database server 104 may not be set in the system architecture 100.
  • FIG. 1 the number of terminals, networks, database servers, and servers in FIG. 1 is merely exemplary. You can have any number of terminals, networks, database servers, and servers as needed for your implementation.
  • a flowchart 200 of one embodiment of a method for generating a model according to the present application is shown.
  • the method for generating a model includes the following steps:
  • Step 201 Obtain a training sample set.
  • an execution subject (for example, the server shown in FIG. 1) of the method for generating a model may use a wired connection method or a wireless connection method from a database server (for example, the database server 104 shown in FIG. 1) or a terminal ( For example, the terminals 101 and 102 shown in FIG. 1 obtain training sample sets.
  • the training sample may include a sample video including a sample object and a sample label sequence determined in advance for the sample object in the sample video.
  • the sample object may be an image corresponding to the shooting content when the sample video is obtained, and the sample object may be an image of various things (that is, the shooting content may be various things), such as a person image, an animal image, Behavioral images and more.
  • Sample tags can be used to characterize what the sample object indicates. Sample tags can include but are not limited to at least one of the following: text, numbers, symbols, pictures.
  • the sample tag sequence may include at least two sample tags. The sample tags in the sample tag sequence have a hierarchical relationship, and the content corresponding to the sample tag with a lower rank belongs to the content corresponding to the sample tag with a higher rank.
  • the sample video is a video obtained by shooting a cat, that is, the sample object in the sample video is a cat image.
  • the sample tag sequence corresponding to the cat image in the sample video is "animal; pet; cat”, that is, it includes three sample tags, respectively, "animal", "pet”, and "cat". It can be understood that cats belong to pets, and pets belong to animals. Therefore, the sample tag "animal” has the highest rank; the sample tag "pet” has the second lowest rank; and the sample tag "cat” has the lowest rank.
  • a technician can determine a sample tag sequence corresponding to a sample object in a sample video in advance.
  • the sample tag sequence corresponding to the sample object in the sample video can be manually labeled; or the lowest-level sample tag corresponding to the sample object in the sample video can be manually labeled, and then based on the hierarchical relationship between the labels established in advance (for example, Level correspondence table), using the labeled lowest-level sample labels to determine the sample label sequence corresponding to the sample object in the sample video.
  • Step 202 Select training samples from the training sample set.
  • the above-mentioned executing subject may select training samples from the training sample set obtained in step 201, and perform the training steps of steps 203 to 206.
  • the method for selecting training samples is not limited in this application. For example, it may be randomly selected, or training samples with better sharpness of the sample video in the training samples may be preferentially selected.
  • Step 203 Input the sample video in the selected training sample into the initial model, and obtain at least two candidate tag sequences corresponding to the sample object in the sample video.
  • the execution body may input the sample video in the selected training sample into the initial model (such as Convolutional Neural Network (CNN), ResNet, etc.) to obtain the sample video.
  • the initial model such as Convolutional Neural Network (CNN), ResNet, etc.
  • CNN Convolutional Neural Network
  • ResNet ResNet
  • the candidate label is an intermediate result obtained by inputting the sample video into the initial model.
  • the above-mentioned execution body inputs the sample video in the selected training sample into the initial model, and can obtain multiple candidate tags corresponding to the sample object in the sample video. According to the hierarchical relationship between the tags, at least two candidates can be obtained. Label sequence.
  • the above-mentioned execution body inputs a sample video including a cat image (sample object) into the initial model, and multiple candidate tags can be obtained, for example, including "livestock (80%); cat (50%); poultry (50%); chicken. (40%) ". Furthermore, since cats belong to livestock and chickens belong to poultry, two candidate tag sequences can be obtained, which are "livestock (80%); cat (50%)” and “poultry (50%); chicken (40%)".
  • Step 204 Select a candidate tag sequence as an actual tag sequence from at least two candidate tag sequences.
  • the execution body may select a candidate tag sequence as an actual tag sequence from at least two candidate tag sequences obtained in step 203.
  • the actual label sequence is the final result used as the output of the initial model.
  • the execution body may select the candidate tag sequence as the actual tag sequence from at least two candidate tag sequences in various ways. For example, it may be randomly selected; or, the above-mentioned execution body may also select the candidate tag sequence from at least two candidate tag sequences as the actual tag sequence based on the probability of the lowest grade candidate tag in the candidate tag sequence (that is, select the lowest rank with the highest probability The candidate tag sequence corresponding to the candidate tag is the actual tag sequence).
  • the probability corresponding to "cat” (50%) The probability is greater than that corresponding to "chicken” (40%), so the above-mentioned execution body may use the candidate tag sequence "livestock (80%); cat (50%)" where the candidate tag "cat” is located as the actual tag sequence.
  • Step 205 Determine whether the initial model is trained based on the actual label sequence and the sample label sequence in the selected training sample.
  • the execution subject may determine whether the initial model is trained.
  • the execution body may determine whether the sample label in the sample label sequence and the actual label level are the same as the actual label. If the actual labels in the actual label sequence are the same as the corresponding sample labels in the sample label sequence, it can be determined that the initial model training is completed.
  • the execution body may further determine whether the initial model is trained by the following steps: First, for the actual labels in the actual label sequence, the execution body may determine that the actual label is relative to the The loss value of the sample label in the sample label sequence corresponding to the actual label. Then, the above-mentioned execution subject may determine whether the training of the initial model is completed based on the determined loss value.
  • the loss value can be used to characterize the difference between the actual output and the expected output.
  • various preset loss functions can be used to calculate the loss value of the actual label relative to the sample label corresponding to the actual label. For example, the L2 norm can be used as the loss function to calculate the loss value.
  • the above-mentioned execution body may determine whether the initial model has been trained through the following steps: First, for the actual label in the actual label sequence, the actual label may be determined. The corresponding level, and determining whether the loss value corresponding to the actual label is less than or equal to a loss threshold set in advance for the determined level. Then, in response to determining that the loss values corresponding to the actual labels in the actual label sequence are all less than or equal to the corresponding loss thresholds, it is determined that the initial model training is completed.
  • the actual tag sequence is "animal; cat", and the sample tag sequence is “animal; dog". It can be understood that the level corresponding to the actual tag “animal” is a high level, and the corresponding level to the actual tag “cat” The rank is low.
  • the preset loss threshold can be 5; for low-level tags, the preset loss threshold can be 1. Therefore, the above execution body can determine whether the loss value corresponding to the actual label "animal” is less than or equal to the loss threshold "5"; determine whether the loss value corresponding to the actual label "cat” is less than or equal to the loss threshold "1".
  • the execution body may further determine whether the initial model has been trained through the following steps: First, the level corresponding to the actual label in the actual label sequence may be determined. Then, the weights set in advance for the tags of different levels may be obtained, and based on the obtained weights, the determined loss value may be weighted and summed to obtain a weighted summation value. Finally, the obtained weighted sum can be determined as the total loss value of the actual label sequence relative to the sample label sequence, and in response to determining that the total loss value is less than or equal to a preset total loss threshold, it is determined that the initial model training is completed.
  • the rank corresponding to the actual tag "animal” is a high rank
  • the rank corresponding to the actual tag "cat” It is low grade.
  • the preset weight can be 0.4; for lower-level tags, the preset weight is 0.6.
  • the total loss threshold set by the technician in advance may be 5, and the above-mentioned execution subject may determine that the initial model training is completed in response to determining that the total loss value “3.6” is less than the total loss threshold “5”.
  • Step 206 In response to determining that the initial model training is completed, use the trained initial model as a video recognition model.
  • the above-mentioned execution subject may, in response to determining that the initial model training is completed, use the trained initial model as a video recognition model.
  • the above-mentioned execution body may also adjust relevant parameters in the initial model in response to determining that the initial model is not trained (for example, when the initial model is a convolutional neural network, use back-propagation technology to modify each convolution in the initial model Weights in the layer), selecting training samples from the unselected training samples, and using the most recently adjusted initial model as the initial model and using the most recently selected training sample as the selected training sample, continue to perform training step 203- 206.
  • the initial model is a convolutional neural network, use back-propagation technology to modify each convolution in the initial model Weights in the layer
  • FIG. 3 is a schematic diagram of an application scenario of the method for generating a model according to this embodiment.
  • a terminal 301 used by a user may be installed with a model training application.
  • the server 302 that provides background support for the application can run a method for generating a model, including:
  • a training sample set 303 can be obtained.
  • the training sample may include a sample video including a sample object and a sample label sequence determined in advance for the sample object in the sample video.
  • the sample label can be used to characterize the content indicated by the sample object.
  • the sample label sequence can include at least two sample labels.
  • the sample labels in the sample label sequence have a hierarchical relationship.
  • the content corresponding to the low-level sample label belongs to the high-level sample label. Corresponding content.
  • a training sample 3031 can be selected from the training sample set 303.
  • the following training steps may be performed: the sample video 30311 in the selected training sample 3031 is input into the initial model 304, and the candidate tag sequences 3051 and 3052 corresponding to the sample object in the sample video 30311 are obtained ; Select a candidate tag sequence as the actual tag sequence 306 from the candidate tag sequences 3051 and 3052; determine whether the training of the initial model 304 is completed based on the actual tag sequence 306 and the sample tag sequence 30312 in the selected training sample 3031; in response to determining When finished, use the trained initial model 304 as the video recognition model 307.
  • the server 302 may also send prompt information to the terminal 301 to indicate the completion of model training.
  • the prompt information may be voice and / or text information. In this way, the user can obtain the video recognition model in a preset storage location.
  • the method provided by the foregoing embodiment of the present application obtains a training sample set, where the training sample includes a sample video including a sample object and a predetermined sample label sequence for the sample object in the sample video, and the sample label is used to represent the sample object.
  • the content of the sample tag sequence includes at least two sample tags.
  • the sample tags in the sample tag sequence have a hierarchical relationship. The content corresponding to the sample tag with a lower rank belongs to the content corresponding to the sample tag with a higher rank, and is then selected from the training sample set.
  • Training samples and performing the following training steps: inputting the sample video in the selected training sample into the initial model, obtaining at least two candidate tag sequences corresponding to the sample objects in the sample video; selecting candidate tags from the at least two candidate tag sequences The sequence is used as the actual label sequence; based on the actual label sequence and the sample label sequence in the selected training sample, determine whether the initial model is trained; in response to determining that the initial model training is completed, use the trained initial model as a video recognition model, so that Be obtained model can be used to identify video, and rich manner helps to generate the model.
  • a flowchart 400 of yet another embodiment of a method for generating a model is shown.
  • the process 400 of the method for generating a model includes the following steps:
  • Step 401 Obtain a training sample set.
  • an execution subject (for example, the server shown in FIG. 1) of the method for generating a model may use a wired connection method or a wireless connection method from a database server (for example, the database server 104 shown in FIG. 1) or a terminal ( For example, the terminals 101 and 102 shown in FIG. 1 obtain training sample sets.
  • the training sample may include a sample video including a sample object and a sample label sequence determined in advance for the sample object in the sample video.
  • Step 402 Select training samples from the training sample set.
  • the above-mentioned executing subject may select training samples from the training sample set obtained in step 401, and perform the training steps of steps 403 to 406.
  • the method for selecting training samples is not limited in this application. For example, it may be randomly selected, or training samples with better sharpness of the sample video in the training samples may be preferentially selected.
  • Step 403 Input the sample video in the selected training sample into the initial model, and obtain at least two candidate tag sequences corresponding to the sample object in the sample video.
  • the execution body may input the sample video in the selected training sample into the initial model (such as Convolutional Neural Network (CNN), ResNet, etc.) to obtain the sample video.
  • the initial model such as Convolutional Neural Network (CNN), ResNet, etc.
  • CNN Convolutional Neural Network
  • ResNet ResNet
  • step 404 for at least two candidate tag sequences, the probability corresponding to the candidate tag sequence is determined based on the probability corresponding to the candidate tag in the candidate tag sequence.
  • the execution body may determine the probability corresponding to the candidate tag sequence based on the probability corresponding to the candidate tag in the candidate tag sequence.
  • the above-mentioned execution body may perform average calculation on the probability corresponding to the candidate tags in the candidate tag sequence, and use the calculation result as the probability corresponding to the candidate tag sequence.
  • the above-mentioned execution body may also determine the probability corresponding to the candidate tag sequence by the following steps: first, the product corresponds to the probability corresponding to the candidate tag in the candidate tag sequence to obtain Product results. Then, the obtained quadrature result is determined as a probability corresponding to the candidate tag sequence.
  • Step 405 Select the candidate tag sequence with the highest probability as the actual tag sequence.
  • the execution body may select the candidate tag sequence with the highest probability as the actual tag sequence.
  • Step 406 Determine whether the initial model training is completed based on the actual label sequence and the sample label sequence in the selected training sample.
  • the execution subject may determine whether the initial model has been trained.
  • Step 407 In response to determining that the initial model training is completed, use the completed initial model as a video recognition model.
  • the above-mentioned execution subject may, in response to determining that the initial model training is completed, use the trained initial model as a video recognition model.
  • steps 401, 402, 403, 406, and 407 can be implemented in a manner similar to steps 201, 202, 203, 205, and 206 in the foregoing embodiment.
  • steps 201, 202, 203, 205, and 206 described above is also applicable to steps 401, 402, 403, 406, and 407 of this embodiment, and details are not described herein again.
  • the process 400 of the method for generating a model in this embodiment highlights the probability corresponding to the candidate tag sequence and determines the probability of the candidate tag sequence.
  • this application provides an embodiment of a device for generating a model.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 500 for generating a model in this embodiment includes a sample acquisition unit 50 and a first execution unit 502.
  • the sample obtaining unit 501 is configured to obtain a training sample set, where the training samples include a sample video including a sample object and a predetermined sample label sequence for the sample object in the sample video.
  • the sample label is used to represent a sample object indicated by the sample object.
  • the sample tag sequence includes at least two sample tags, the sample tags in the sample tag sequence have a hierarchical relationship, and the content corresponding to the sample tag with a lower rank belongs to the content corresponding to the sample tag with a higher rank;
  • the first execution unit is configured Select training samples from the training sample set, and perform the following training steps: input the sample video in the selected training sample into the initial model, and obtain at least two candidate label sequences corresponding to the sample objects in the sample video; from at least two Candidate tag sequence selects the candidate tag sequence as the actual tag sequence; determines whether the initial model is trained based on the actual tag sequence and the sample tag sequence in the selected training sample; in response to determining that the initial model training is completed, the trained initial model is taken as video Identify the model.
  • the sample acquisition unit 501 of the apparatus 500 for generating a model may be connected from a database server (such as the database server 104 shown in FIG. 1) or a terminal (such as the one shown in FIG. 1) through a wired connection method or a wireless connection method.
  • Terminals 101 and 102 acquire a training sample set.
  • the training sample may include a sample video including a sample object and a sample label sequence determined in advance for the sample object in the sample video.
  • the sample object may be an image corresponding to the shooting content when the sample video is obtained, and the sample object may be an image of various things (that is, the shooting content may be various things), such as a person image, an animal image, Behavioral images and more.
  • Sample tags can be used to characterize what the sample object indicates. Sample tags can include but are not limited to at least one of the following: text, numbers, symbols, pictures.
  • the sample tag sequence may include at least two sample tags. The sample tags in the sample tag sequence have a hierarchical relationship, and the content corresponding to the sample tag with a lower rank belongs to the content corresponding to the sample tag with a higher rank.
  • a technician can determine a sample tag sequence corresponding to a sample object in a sample video in advance.
  • the sample tag sequence corresponding to the sample object in the sample video can be manually labeled; or the lowest-level sample tag corresponding to the sample object in the sample video can be manually labeled, and then based on the hierarchical relationship between the labels established in advance, the The lowest-ranked sample label determines the sample label sequence corresponding to the sample object in the sample video.
  • the first execution unit 502 may select training samples from the training sample set obtained by the sample acquisition unit 501, and execute the training steps of steps 5021 to 5024.
  • the method for selecting training samples is not limited in this application.
  • Step 5021 The sample video in the selected training sample is input into the initial model, and at least two candidate tag sequences corresponding to the sample object in the sample video are obtained.
  • the first execution unit 502 may input a sample video of the selected training samples into an initial model (such as a Convolutional Neural Network (CNN), a residual network (ResNet), etc.) to obtain samples. At least two candidate label sequences corresponding to the sample objects in the video.
  • an initial model such as a Convolutional Neural Network (CNN), a residual network (ResNet), etc.
  • a candidate tag sequence is selected as an actual tag sequence from at least two candidate tag sequences.
  • the first execution unit 502 may select a candidate tag sequence as an actual tag sequence from at least two candidate tag sequences obtained in step 5021.
  • the first execution unit 502 may select the candidate tag sequence as the actual tag sequence from at least two candidate tag sequences in various ways.
  • Step 5023 Determine whether the initial model training is completed based on the actual label sequence and the sample label sequence in the selected training sample.
  • the first execution unit 502 may determine whether the initial model has been trained.
  • Step 5024 In response to determining that the initial model training is completed, use the trained initial model as a video recognition model.
  • the above-mentioned execution subject may, in response to determining that the initial model training is completed, use the trained initial model as a video recognition model.
  • the first execution unit 502 may include: a probability determination module (not shown in the figure), configured to, for at least two candidate tag sequences among the candidate tag sequences, based on the candidate The probability corresponding to the candidate tag in the tag sequence determines the probability corresponding to the candidate tag sequence; the sequence selection module is configured to select the candidate tag sequence with the highest probability as the actual tag sequence.
  • a probability determination module (not shown in the figure), configured to, for at least two candidate tag sequences among the candidate tag sequences, based on the candidate The probability corresponding to the candidate tag in the tag sequence determines the probability corresponding to the candidate tag sequence
  • the sequence selection module is configured to select the candidate tag sequence with the highest probability as the actual tag sequence.
  • the probability determination module may be further configured to: perform a product integration on the probability corresponding to the candidate labels in the candidate label sequence to obtain a product integration result; and obtain the product integration result The result is determined as the probability corresponding to the candidate tag sequence.
  • the first execution unit 502 may include: a loss determination module (not shown in the figure) configured to determine the actual label relative to the actual label in the actual label sequence.
  • the loss value of the sample label in the sample label sequence corresponding to the actual label is configured to determine whether the initial model training is completed based on the determined loss value.
  • the model determination module may be further configured to: for the actual label in the actual label sequence, determine the level corresponding to the actual label, and determine the loss value corresponding to the actual label Whether it is less than or equal to the loss threshold preset for the determined level; and in response to determining that the loss values corresponding to the actual labels in the actual label sequence are all less than or equal to the corresponding loss threshold, it is determined that the initial model training is completed.
  • the model determination module may be further configured to: determine a level corresponding to an actual label in the actual label sequence; obtain a preset weight for labels of different levels, and based on the acquired The weighted sum of the determined loss values to obtain a weighted sum value;
  • the obtained weighted summation value is determined as the total loss value of the actual label sequence relative to the sample label sequence, and in response to determining that the total loss value is less than or equal to a preset total loss threshold, it is determined that the initial model training is completed.
  • the apparatus 500 may further include a second execution unit (not shown in the figure) configured to adjust relevant parameters in the initial model in response to determining that the initial model is not trained. , Selecting training samples from the unselected training samples, and using the most recently adjusted initial model as the initial model and using the most recently selected training sample as the selected training sample, continue to execute training steps 5021-5024.
  • a second execution unit (not shown in the figure) configured to adjust relevant parameters in the initial model in response to determining that the initial model is not trained. , Selecting training samples from the unselected training samples, and using the most recently adjusted initial model as the initial model and using the most recently selected training sample as the selected training sample, continue to execute training steps 5021-5024.
  • the apparatus 500 obtains a training sample set through the sample obtaining unit 501, where the training sample includes a sample video including a sample object and a sample tag sequence determined in advance for the sample object in the sample video.
  • the sample tag is used for Characterize the content indicated by the sample object.
  • the sample tag sequence includes at least two sample tags.
  • the sample tags in the sample tag sequence have a hierarchical relationship. The content corresponding to the sample tag with a lower rank belongs to the content corresponding to the sample tag with a higher rank.
  • the first execution unit 502 selects training samples from the training sample set, and performs the following training steps: inputting the sample video in the selected training sample into the initial model, and obtaining at least two candidate label sequences corresponding to the sample objects in the sample video; Select a candidate tag sequence as the actual tag sequence from at least two candidate tag sequences; determine whether the initial model has been trained based on the actual tag sequence and the sample tag sequence in the selected training sample; and in response to determining that the initial model training is completed, training is completed of
  • the initial model is used as a video recognition model, so that a model that can be used for video recognition can be obtained, and it helps to enrich the model generation method.
  • FIG. 6 illustrates a process 600 of an embodiment of a method for identifying a video provided by the present application.
  • the method for identifying a video may include the following steps:
  • Step 601 Obtain a video to be identified including an object.
  • an execution subject of the method for identifying a video may obtain a video to be identified including an object through a wired connection method or a wireless connection method.
  • the above-mentioned execution body may obtain a video stored therein from a database server (for example, the database server 104 shown in FIG. 1), or may receive a video collected by a terminal (for example, the terminals 101 and 102 shown in FIG. 1) or other devices .
  • the video to be identified may be a video to be identified.
  • the object may be an image corresponding to the captured content when the video to be identified is captured, and the object may be an image of various things (that is, the captured content may be various things), such as a person image, an animal image, a behavior image, and the like.
  • Step 602 Input a video to be identified into a video recognition model, and generate a tag sequence corresponding to an object in the video to be identified.
  • the above-mentioned execution body may input the video to be identified obtained in step 601 into a video recognition model, thereby generating a tag sequence corresponding to the object in the video to be identified.
  • the label may be used to characterize the content indicated by the object, and the label may include but is not limited to at least one of the following: text, numbers, symbols, and pictures.
  • the tag sequence may include at least two tags. The tags in the tag sequence have a hierarchical relationship, and the content corresponding to the tag with the lower rank belongs to the content corresponding to the tag with the higher rank.
  • the video recognition model may be generated by using the method described in the embodiment of FIG. 2 described above.
  • For a specific generation process refer to the related description of the embodiment in FIG. 2, and details are not described herein again.
  • the method for identifying videos in this embodiment may be used to test the video recognition models generated by the foregoing embodiments. Based on the test results, the video recognition model can be continuously optimized. This method may also be a practical application method of the video recognition model generated by the foregoing embodiments. Adopting the video recognition models generated by the above embodiments for video recognition can detect the video obtained by recording the screen and help improve the accuracy of video recognition.
  • the present application provides an embodiment of an apparatus for identifying a video.
  • This device embodiment corresponds to the method embodiment shown in FIG. 6, and the device can be specifically applied to various electronic devices.
  • the apparatus 700 for identifying a video in this embodiment may include a video acquiring unit 701 and a sequence generating unit 702.
  • the video acquisition unit 701 is configured to acquire the video to be identified including the object;
  • the sequence generation unit 702 is configured to input the video to be identified into a model generated by using the method described in the embodiment of FIG. 2 above to generate the video to be identified.
  • the label may be used to characterize the content indicated by the object, and the label may include but is not limited to at least one of the following: text, numbers, symbols, and pictures.
  • the tag sequence may include at least two tags.
  • the tags in the tag sequence have a hierarchical relationship, and the content corresponding to the tag with the lower rank belongs to the content corresponding to the tag with the higher rank.
  • FIG. 8 a schematic structural diagram of a computer system 800 suitable for implementing an electronic device according to an embodiment of the present application is shown.
  • the electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 800 includes a central processing unit (CPU) 801, which can be loaded into a random access memory (RAM) 803 according to a program stored in a read-only memory (ROM) 802 or from a storage section 808. Instead, perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read-only memory
  • various programs and data required for the operation of the system 800 are also stored.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • An input / output (I / O) interface 805 is also connected to the bus 804.
  • the following components are connected to the I / O interface 805: an input portion 806 including a touch screen, a keyboard, a mouse, a camera device, etc .; an output portion 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc .; and a speaker; And a storage section 808; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 809 performs communication processing via a network such as the Internet.
  • the driver 810 is also connected to the I / O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that a computer program read therefrom is installed into the storage section 808 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication section 809, and / or installed from a removable medium 811.
  • CPU central processing unit
  • the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes a sample acquisition unit and a first execution unit.
  • the names of these units do not constitute a limitation on the unit itself in some cases.
  • a sample acquisition unit can also be described as a “unit that acquires a training sample set”.
  • the present application also provides a computer-readable medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device in.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device: obtains a training sample set, where the training sample includes a sample video including a sample object and a Sample tag sequence of sample object in sample video. Sample tag is used to characterize the content indicated by the sample object.
  • the sample tag sequence includes at least two sample tags.
  • the sample tags in the sample tag sequence have a hierarchical relationship.
  • the content corresponding to the label belongs to the content corresponding to the high-level sample label; selecting training samples from the training sample set, and performing the following training steps: inputting the sample video in the selected training sample into the initial model, and obtaining the samples in the sample video At least two candidate tag sequences corresponding to the object; selecting candidate tag sequences from the at least two candidate tag sequences as actual tag sequences; and determining whether the initial model has been trained based on the actual tag sequences and the sample tag sequences in the selected training samples; In response It is determined that the initial model training is completed, and the completed initial model is used as a video recognition model.
  • the electronic device may also be caused to: obtain the video to be identified including the object; input the video to be identified into the video recognition model, and generate the corresponding object in the video to be identified A sequence of tags, where tags are used to characterize what the object indicates.
  • the tag sequence includes at least two tags.
  • the tags in the tag sequence have a hierarchical relationship, and the content corresponding to the tag with the lower rank belongs to the content corresponding to the tag with the higher rank.
  • the video recognition model may be generated by using the method for generating a model as described in the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种用于生成模型的方法和装置。该方法包括:获取训练样本集(201),其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列;从训练样本集中选取训练样本(202),以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列(203);从至少两个候选标签序列选择候选标签序列作为实际标签序列(204);基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成(205);响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型(206)。通过该方法能够得到一种用于识别视频的模型,且丰富了模型的生成方式。

Description

用于生成模型的方法和装置
本专利申请要求于2018年6月27日提交的、申请号为201810679114.1、申请人为北京字节跳动网络技术有限公司、发明名称为“用于生成模型的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及用于生成模型的方法和装置。
背景技术
目前,通过拍摄视频实现信息分享已经成为人们生活中重要的信息分享模式。实践中,为了提高用户对视频的观看体验,通常会对拍摄得到的视频进行处理,以生成用于表征视频中所显示的内容的标签。
发明内容
本申请实施例提出了用于生成模型的方法和装置,以及用于识别视频的方法和装置。
第一方面,本申请实施例提供了一种用于生成模型的方法,该包括:获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容;从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标 签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
在一些实施例中,从至少两个候选标签序列选择候选标签序列作为实际标签序列,包括:对于至少两个候选标签序列中候选标签序列,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率;选取概率最大的候选标签序列作为实际标签序列。
在一些实施例中,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率,包括:对该候选标签序列中的候选标签所对应的概率进行求积,获得求积结果;将所获得的求积结果确定为该候选标签序列所对应的概率。
在一些实施例中,基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成,包括:对于实际标签序列中的实际标签,确定该实际标签相对于该实际标签所对应的样本标签序列中的样本标签的损失值;基于所确定的损失值,确定初始模型是否训练完成。
在一些实施例中,基于所确定的损失值,确定初始模型是否训练完成,包括:对于实际标签序列中的实际标签,确定该实际标签所对应的等级,以及确定该实际标签所对应的损失值是否小于等于针对所确定的等级预先设置的损失阈值;响应于确定实际标签序列中的实际标签所对应的损失值均小于等于相对应的损失阈值,确定初始模型训练完成。
在一些实施例中,基于所确定的损失值,确定初始模型是否训练完成,包括:确定实际标签序列中的实际标签所对应的等级;获取针对不同等级的标签预先设置的权重,以及基于所获取的权重,对所确定的损失值进行加权求和处理,获得加权求和值;将所获得的加权求和值确定为实际标签序列相对于样本标签序列的总损失值,以及响应于确定总损失值小于等于预先设置的总损失阈值,确定初始模型训练完成。
在一些实施例中,该方法还包括:响应于确定初始模型未训练完成,调整初始模型中的相关参数,从未被选取的训练样本中选取训练样本,以及使用最近一次调整的初始模型作为初始模型且使用最近一次选取的训练样本作为所选取的训练样本,继续执行训练步骤。
第二方面,本申请实施例提供了一种用于生成模型的装置,该装置包括:样本获取单元,被配置成获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容;第一执行单元,被配置成从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
在一些实施例中,第一执行单元包括:概率确定模块,被配置成对于至少两个候选标签序列中候选标签序列,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率;序列选取模块,被配置成选取概率最大的候选标签序列作为实际标签序列。
在一些实施例中,概率确定模块进一步被配置成:对该候选标签序列中的候选标签所对应的概率进行求积,获得求积结果;将所获得的求积结果确定为该候选标签序列所对应的概率。
在一些实施例中,第一执行单元包括:损失确定模块,被配置成对于实际标签序列中的实际标签,确定该实际标签相对于该实际标签所对应的样本标签序列中的样本标签的损失值;模型确定模块,被配置成基于所确定的损失值,确定初始模型是否训练完成。
在一些实施例中,模型确定模块进一步被配置成:对于实际标签序列中的实际标签,确定该实际标签所对应的等级,以及确定该实际 标签所对应的损失值是否小于等于针对所确定的等级预先设置的损失阈值;响应于确定实际标签序列中的实际标签所对应的损失值均小于等于相对应的损失阈值,确定初始模型训练完成。
在一些实施例中,模型确定模块进一步被配置成:确定实际标签序列中的实际标签所对应的等级;获取针对不同等级的标签预先设置的权重,以及基于所获取的权重,对所确定的损失值进行加权求和处理,获得加权求和值;将所获得的加权求和值确定为实际标签序列相对于样本标签序列的总损失值,以及响应于确定总损失值小于等于预先设置的总损失阈值,确定初始模型训练完成。
在一些实施例中,该装置还包括:第二执行单元,被配置成响应于确定初始模型未训练完成,调整初始模型中的相关参数,从未被选取的训练样本中选取训练样本,以及使用最近一次调整的初始模型作为初始模型且使用最近一次选取的训练样本作为所选取的训练样本,继续执行训练步骤。
第三方面,本申请实施例提供了一种用于识别视频的方法,该方法包括:获取包括对象的待识别视频;将待识别视频输入采用如上述第一方面中任一实施例所描述的方法生成的视频识别模型中,生成待识别视频中的对象所对应的标签序列,其中,标签用于表征对象所指示的内容,标签序列包括至少两个标签,标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。
第四方面,本申请实施例提供了一种用于识别视频的装置,该装置包括:视频获取单元,被配置成获取包括对象的待识别视频;序列生成单元,被配置成将待识别视频输入采用如上述第一方面中任一实施例所描述的方法生成的视频识别模型中,生成待识别视频中的对象所对应的标签序列,其中,标签用于表征对象所指示的内容,标签序列包括至少两个标签,标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。
第五方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述第一 方面和第三方面中任一实施例所描述的方法。
第六方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面和第三方面中任一实施例所描述的方法。
本申请实施例提供的用于生成模型的方法和装置,通过获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容,而后从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型,从而能够得到一种可以用于识别视频的模型,且有助于丰富模型的生成方式。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本申请的用于生成模型的方法的一个实施例的流程图;
图3是根据本申请的用于生成模型的方法的一个应用场景的示意图;
图4是根据本申请的用于生成模型的方法的又一个实施例的流程图;
图5是根据本申请的用于生成模型的装置的一个实施例的结构示 意图;
图6是根据本申请用于识别视频的方法的一个实施例的流程图;
图7是根据本申请用于识别视频的装置的一个实施例的结构示意图;
图8是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请实施例的用于生成模型的方法、用于生成模型的装置、用于识别视频的方法或用于识别视频的装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端101、102,网络103、数据库服务器104和服务器105。网络103用以在终端101、102,数据库服务器104与服务器105之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户110可以使用终端101、102通过网络103与服务器105进行交互,以接收或发送消息等。终端101、102上可以安装有各种客户端应用,例如模型训练类应用、视频识别类应用、社交类应用、支付类应用、网页浏览器和即时通讯工具等。
这里的终端101、102可以是硬件,也可以是软件。当终端101、102为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、膝上型便 携计算机和台式计算机等等。当终端101、102为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
当终端101、102为硬件时,其上还可以安装有视频采集设备。视频采集设备可以是各种能实现采集视频功能的设备,如摄像头、传感器等等。用户110可以利用终端101、102上的视频采集设备来采集视频。
数据库服务器104可以是提供各种服务的数据库服务器。例如数据库服务器中可以存储有样本集合。样本集合中包含有大量的样本。其中,样本可以包括包括样本对象的样本视频以及针对样本视频中的样本对象预先确定的样本标签序列。这样,用户110也可以通过终端101、102,从数据库服务器104所存储的样本集合中选取样本。
服务器105也可以是提供各种服务的服务器,例如对终端101、102上显示的各种应用提供支持的后台服务器。后台服务器可以利用终端101、102发送的样本集合中的样本,对初始模型进行训练,并可以将训练结果(如生成的视频识别模型)发送给终端101、102。这样,用户可以应用生成的视频识别模型进行视频识别。
这里的数据库服务器104和服务器105同样可以是硬件,也可以是软件。当它们为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当它们为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本申请实施例所提供的用于生成模型的方法或用于识别视频的方法一般由服务器105执行。相应地,用于生成模型的装置或用于识别视频的装置一般也设置于服务器105中。
需要指出的是,在服务器105可以实现数据库服务器104的相关功能的情况下,系统架构100中可以不设置数据库服务器104。
应该理解,图1中的终端、网络、数据库服务器和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端、网络、 数据库服务器和服务器。
继续参考图2,示出了根据本申请的用于生成模型的方法的一个实施例的流程200。该用于生成模型的方法,包括以下步骤:
步骤201,获取训练样本集。
在本实施例中,用于生成模型的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从数据库服务器(例如图1所示的数据库服务器104)或者终端(例如图1所示的终端101、102)获取训练样本集。其中,训练样本可以包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列。
在本实施例中,样本对象可以为拍摄获得样本视频时的拍摄内容所对应的影像,样本对象可以是各种事物的影像(即拍摄内容可以是各种事物),例如人物影像、动物影像、行为影像等。样本标签可以用于表征样本对象所指示的内容。样本标签可以包括但不限于以下至少一项:文字、数字、符号、图片。样本标签序列可以包括至少两个样本标签。样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容。
作为示例,样本视频为对猫进行拍摄所获得的视频,即样本视频中的样本对象即为猫影像。在这里,预先确定了该样本视频中的猫影像所对应的样本标签序列为“动物;宠物;猫”,即包括三个样本标签,分别为“动物”、“宠物”、“猫”。可以理解的是,猫属于宠物,宠物属于动物,因此,样本标签“动物”的等级最高;样本标签“宠物”的等级次之;样本标签“猫”的等级最低。
需要说明的是,在这里,技术人员可以预先确定样本视频中的样本对象所对应的样本标签序列。具体的,可以人工标注样本视频中的样本对象所对应的样本标签序列;也可以人工标注样本视频中的样本对象所对应的最低等级的样本标签,再根据预先建立的标签间的等级关系(例如等级对应关系表),利用所标注的最低等级的样本标签,确定样本视频中的样本对象所对应的样本标签序列。
步骤202,从训练样本集中选取训练样本。
在本实施例中,上述执行主体可以从步骤201中获得的训练样本集中选取训练样本,以及执行步骤203至步骤206的训练步骤。其中,训练样本的选取方式在本申请中并不限制。例如可以是随机选取,也可以是优先选取训练样本中的样本视频的清晰度较好的训练样本。
步骤203,将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列。
在本实施例中,上述执行主体可以将所选取的训练样本中的样本视频输入初始模型(例如卷积神经网络(Convolutional Neural Network,CNN)、残差网络(ResNet)等),获得样本视频中的样本对象所对应的至少两个候选标签序列。
可以理解的是,对于机器学习模型来说,将一组数据输入模型,通常会获得多个结果,所获得的结果中的每个结果都会对应一个概率,进而,模型会将概率最大的结果作为模型的最终结果并输出。在本实施例中,候选标签即为将样本视频输入初始模型所获得的中间结果。在这里,上述执行主体将所选取的训练样本中的样本视频输入初始模型,可以获得样本视频中的样本对象所对应的多个候选标签,进而根据标签间的等级关系,可以获得至少两个候选标签序列。
作为示例,上述执行主体将包括猫影像(样本对象)的样本视频输入初始模型,可以获得多个候选标签,例如包括“家畜(80%);猫(50%);家禽(50%);鸡(40%)”。进而,由于猫属于家畜,鸡属于家禽,故可以获得两个候选标签序列,分别为“家畜(80%);猫(50%)”和“家禽(50%);鸡(40%)”。
步骤204,从至少两个候选标签序列选择候选标签序列作为实际标签序列。
在本实施例中,上述执行主体可以从步骤203中得到的至少两个候选标签序列中选择候选标签序列作为实际标签序列。其中,实际标签序列即为用于作为初始模型的输出的最终结果。
在这里,上述执行主体可以采用各种方式从至少两个候选标签序列选择候选标签序列作为实际标签序列。例如,可以随机选取;或者, 上述执行主体也可以基于候选标签序列中的最低等级的候选标签的概率,从至少两个候选标签序列选择候选标签序列作为实际标签序列(即选取概率最大的最低等级的候选标签所对应的候选标签序列作为实际标签序列)。
作为示例,对于候选标签序列“家畜(80%);猫(50%)”和候选标签序列“家禽(50%);鸡(40%)”,由于“猫”所对应的概率(50%)大于“鸡”所对应的概率(40%),故上述执行主体可以将候选标签“猫”所在的候选标签序列“家畜(80%);猫(50%)”作为实际标签序列。
步骤205,基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成。
在本实施例中,基于步骤204中得到的实际标签序列和所选取的训练样本中的样本标签序列,上述执行主体可以确定初始模型是否训练完成。
作为示例,对于实际标签序列中的实际标签,上述执行主体可以确定样本标签序列中与该实际标签等级相同的样本标签与该实际标签是否相同。若实际标签序列中的实际标签均与样本标签序列中相对应的样本标签相同,则可以确定初始模型训练完成。
在本实施例的一些可选的实现方式中,上述执行主体还可以通过如下步骤确定初始模型是否训练完成:首先,对于实际标签序列中的实际标签,上述执行主体可以确定该实际标签相对于该实际标签所对应的样本标签序列中的样本标签的损失值。然后,上述执行主体可以基于所确定的损失值,确定初始模型是否训练完成。在这里,需要说明的是,损失值可以用于表征实际输出与期望输出之间的差异。实践中,可以采用预设的各种损失函数计算实际标签相对于该实际标签所对应的样本标签的损失值,例如,可以采用L2范数作为损失函数计算损失值。
在本实施例的一些可选的实现方式中,基于所确定的损失值,上述执行主体可以通过如下步骤确定初始模型是否训练完成:首先,对于实际标签序列中的实际标签,可以确定该实际标签所对应的等级,以及确定该实际标签所对应的损失值是否小于等于针对所确定的等级 预先设置的损失阈值。然后,可以响应于确定实际标签序列中的实际标签所对应的损失值均小于等于相对应的损失阈值,确定初始模型训练完成。
示例性的,实际标签序列为“动物;猫”,样本标签序列为“动物;狗”,可以理解的是,实际标签“动物”所对应的等级是高等级,实际标签“猫”所对应的等级是低等级。而技术人员针对高等级的标签,预先设置的损失阈值可以是5;针对低等级的标签,预先设置的损失阈值可以是1。故上述执行主体可以确定实际标签“动物”所对应的损失值是否小于等于损失阈值“5”;确定实际标签“猫”所对应的损失值是否小于等于损失阈值“1”。进而,可以响应于确定实际标签“动物”所对应的损失值小于等于损失阈值“5”,且实际标签“猫”所对应的损失值小于等于损失阈值“1”,确定初始模型训练完成。
在本实施例的一些可选的实现方式中,上述执行主体还可以通过以下步骤确定初始模型是否训练完成:首先,可以确定实际标签序列中的实际标签所对应的等级。然后,可以获取针对不同等级的标签预先设置的权重,以及基于所获取的权重,对所确定的损失值进行加权求和处理,获得加权求和值。最后,可以将所获得的加权求和值确定为实际标签序列相对于样本标签序列的总损失值,以及响应于确定总损失值小于等于预先设置的总损失阈值,确定初始模型训练完成。
示例性的,对于上述实际标签序列为“动物;猫”,样本标签序列为“动物;狗”的示例,实际标签“动物”所对应的等级是高等级,实际标签“猫”所对应的等级是低等级。而技术人员针对高等级的标签,预先设置的权重可以为0.4;针对低等级的标签,预先设置的权重为0.6。假如确定了实际标签“动物”所对应的损失值是0,实际标签“猫”所对应的损失值是6,则上述执行主体可以基于上述权重,对上述损失值进行加权求和处理,获得加权求和值3.6(3.6=0×0.4+6×0.6),即获得总损失值。而技术人员预先设置的总损失阈值可以为5,进而上述执行主体可以响应于确定总损失值“3.6”小于总损失阈值“5”,确定初始模型训练完成。
步骤206,响应于确定初始模型训练完成,将训练完成的初始模 型作为视频识别模型。
在本实施例中,上述执行主体可以响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
可选的,上述执行主体还可以响应于确定初始模型未训练完成,调整初始模型中的相关参数(例如,当初始模型为卷积神经网络时,采用反向传播技术修改初始模型中各卷积层中的权重),从未被选取的训练样本中选取训练样本,以及使用最近一次调整的初始模型作为初始模型且使用最近一次选取的训练样本作为所选取的训练样本,继续执行训练步骤203-206。
继续参见图3,图3是根据本实施例的模型生成的方法的应用场景的一个示意图。在图3的应用场景中,用户所使用的终端301上可以安装有模型训练类应用。当用户打开该应用,并上传训练样本集或训练样本集的存储路径后,对该应用提供后台支持的服务器302可以运行用于生成模型的方法,包括:
首先,可以获取训练样本集303。其中,训练样本可以包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列。样本标签可以用于表征样本对象所指示的内容,样本标签序列可以包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容。
然后,可以从训练样本集303中选取训练样本3031。
接着,对于所选取的训练样本3031,可以执行以下训练步骤:将所选取的训练样本3031中的样本视频30311输入初始模型304,获得样本视频30311中的样本对象所对应的候选标签序列3051和3052;从候选标签序列3051和3052选择一个候选标签序列作为实际标签序列306;基于实际标签序列306和所选取的训练样本3031中的样本标签序列30312,确定初始模型304的训练是否完成;响应于确定完成,将训练完成的初始模型304作为视频识别模型307。
此时,服务器302还可以向终端301发送用于指示模型训练完成 的提示信息。该提示信息可以是语音和/或文字信息。这样,用户可以在预设的存储位置获取到视频识别模型。
本申请的上述实施例提供的方法通过获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容,而后从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型,从而能够得到一种可以用于识别视频的模型,且有助于丰富模型的生成方式。
进一步参考图4,其示出了用于生成模型的方法的又一个实施例的流程400。该用于生成模型的方法的流程400,包括以下步骤:
步骤401,获取训练样本集。
在本实施例中,用于生成模型的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从数据库服务器(例如图1所示的数据库服务器104)或者终端(例如图1所示的终端101、102)获取训练样本集。其中,训练样本可以包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列。
步骤402,从训练样本集中选取训练样本。
在本实施例中,上述执行主体可以从步骤401中获得的训练样本集中选取训练样本,以及执行步骤403至步骤406的训练步骤。其中,训练样本的选取方式在本申请中并不限制。例如可以是随机选取,也可以是优先选取训练样本中的样本视频的清晰度较好的训练样本。
步骤403,将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列。
在本实施例中,上述执行主体可以将所选取的训练样本中的样本视频输入初始模型(例如卷积神经网络(Convolutional Neural Network,CNN)、残差网络(ResNet)等),获得样本视频中的样本对象所对应的至少两个候选标签序列。
步骤404,对于至少两个候选标签序列中候选标签序列,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率。
在本实施例中,对于至少两个候选标签序列中候选标签序列,上述执行主体可以基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率。
具体的,作为示例,上述执行主体可以对候选标签序列中的候选标签所对应的概率进行均值计算,并将计算结果作为候选标签序列所对应的概率。例如,对于候选标签序列“家畜(80%);猫(50%)”,上述执行主体可以获得候选标签序列所对应的概率为65%(65%=(80%+50%)÷2)。
在本实施例的一些可选的实现方式中,上述执行主体还可以通过如下步骤确定候选标签序列所对应的概率:首先,对候选标签序列中的候选标签所对应的概率进行求积,获得求积结果。然后,将所获得的求积结果确定为该候选标签序列所对应的概率。
步骤405,选取概率最大的候选标签序列作为实际标签序列。
在本实施例中,基于步骤404所确定的候选标签序列所对应的概率,上述执行主体可以选取概率最大的候选标签序列作为实际标签序列。
步骤406,基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成。
在本实施例中,基于步骤405得到的实际标签序列和所选取的训练样本中的样本标签序列,上述执行主体可以确定初始模型是否训练完成。
步骤407,响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
在本实施例中,上述执行主体可以响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
需要说明的是,步骤401、402、403、406、407可以采用与前述实施例中的步骤201、202、203、205、206类似的方式实现。相应地,上文针对步骤201、202、203、205、206的描述也适可用于本实施例的步骤401、402、403、406、407,此处不再赘述。
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于生成模型的方法的流程400突出了确定候选标签序列所对应的概率,并通过候选标签序列的概率选取候选标签序列作为实际标签序列的步骤。由此,本实施例描述的方案可以综合利用候选标签序列中的各个候选标签,实现了信息处理的全面性和准确性,且通过候选标签的概率确定候选标签序列的概率简单方便,进而可以提高信息生成的效率。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种用于生成模型的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的用于生成模型的装置500包括:样本获取单元50和第一执行单元502。其中,样本获取单元501被配置成获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容;第一执行单元,被配置成从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的 样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
在本实施例中,用于生成模型的装置500的样本获取单元501可以通过有线连接方式或者无线连接方式从数据库服务器(例如图1所示的数据库服务器104)或者终端(例如图1所示的终端101、102)获取训练样本集。其中,训练样本可以包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列。
在本实施例中,样本对象可以为拍摄获得样本视频时的拍摄内容所对应的影像,样本对象可以是各种事物的影像(即拍摄内容可以是各种事物),例如人物影像、动物影像、行为影像等。样本标签可以用于表征样本对象所指示的内容。样本标签可以包括但不限于以下至少一项:文字、数字、符号、图片。样本标签序列可以包括至少两个样本标签。样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容。
需要说明的是,在这里,技术人员可以预先确定样本视频中的样本对象所对应的样本标签序列。具体的,可以人工标注样本视频中的样本对象所对应的样本标签序列;也可以人工标注样本视频中的样本对象所对应的最低等级的样本标签,再根据预先建立的标签间的等级关系,利用所标注的最低等级的样本标签,确定样本视频中的样本对象所对应的样本标签序列。
在本实施例中,第一执行单元502可以从样本获取单元501获得的训练样本集中选取训练样本,以及执行步骤5021至步骤5024的训练步骤。其中,训练样本的选取方式在本申请中并不限制。
步骤5021,将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列。
在本实施例中,第一执行单元502可以将所选取的训练样本中的样本视频输入初始模型(例如卷积神经网络(Convolutional Neural Network,CNN)、残差网络(ResNet)等),获得样本视频中的样本对象所对应的至少两个候选标签序列。
步骤5022,从至少两个候选标签序列选择候选标签序列作为实际 标签序列。
在本实施例中,第一执行单元502可以从步骤5021中得到的至少两个候选标签序列中选择候选标签序列作为实际标签序列。
在这里,第一执行单元502可以采用各种方式从至少两个候选标签序列选择候选标签序列作为实际标签序列。
步骤5023,基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成。
在本实施例中,基于步骤5022中得到的实际标签序列和所选取的训练样本中的样本标签序列,第一执行单元502可以确定初始模型是否训练完成。
步骤5024,响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
在本实施例中,上述执行主体可以响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
在本实施例的一些可选的实现方式中,第一执行单元502可以包括:概率确定模块(图中未示出),被配置成对于至少两个候选标签序列中候选标签序列,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率;序列选取模块,被配置成选取概率最大的候选标签序列作为实际标签序列。
在本实施例的一些可选的实现方式中,概率确定模块可以进一步被配置成:对该候选标签序列中的候选标签所对应的概率进行求积,获得求积结果;将所获得的求积结果确定为该候选标签序列所对应的概率。
在本实施例的一些可选的实现方式中,第一执行单元502可以包括:损失确定模块(图中未示出),被配置成对于实际标签序列中的实际标签,确定该实际标签相对于该实际标签所对应的样本标签序列中的样本标签的损失值;模型确定模块(图中未示出),被配置成基于所确定的损失值,确定初始模型是否训练完成。
在本实施例的一些可选的实现方式中,模型确定模块可以进一步被配置成:对于实际标签序列中的实际标签,确定该实际标签所对应 的等级,以及确定该实际标签所对应的损失值是否小于等于针对所确定的等级预先设置的损失阈值;响应于确定实际标签序列中的实际标签所对应的损失值均小于等于相对应的损失阈值,确定初始模型训练完成。
在本实施例的一些可选的实现方式中,模型确定模块可以进一步被配置成:确定实际标签序列中的实际标签所对应的等级;获取针对不同等级的标签预先设置的权重,以及基于所获取的权重,对所确定的损失值进行加权求和处理,获得加权求和值;
将所获得的加权求和值确定为实际标签序列相对于样本标签序列的总损失值,以及响应于确定总损失值小于等于预先设置的总损失阈值,确定初始模型训练完成。
在本实施例的一些可选的实现方式中,装置500还可以包括:第二执行单元(图中未示出),被配置成响应于确定初始模型未训练完成,调整初始模型中的相关参数,从未被选取的训练样本中选取训练样本,以及使用最近一次调整的初始模型作为初始模型且使用最近一次选取的训练样本作为所选取的训练样本,继续执行训练步骤5021-5024。
可以理解的是,该装置500中记载的诸单元与参考图2描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置500及其中包含的单元,在此不再赘述。
本申请的上述实施例提供的装置500通过样本获取单元501获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容,而后第一执行单元502从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本 标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型,从而能够得到一种可以用于识别视频的模型,且有助于丰富模型的生成方式。
请参见图6,其示出了本申请提供的用于识别视频的方法的一个实施例的流程600。该用于识别视频的方法可以包括以下步骤:
步骤601,获取包括对象的待识别视频。
在本实施例中,用于识别视频的方法的执行主体(例如图1所示的服务器105)可以通过有线的连接方式或者无线的连接方式获取包括对象的待识别视频。例如,上述执行主体可以从数据库服务器(例如图1所示的数据库服务器104)中获取存储于其中的视频,也可以接收终端(例如图1所示的终端101、102)或其他设备采集的视频。
在本实施例中,待识别视频可以为待对其进行识别的视频。对象可以为拍摄获得待识别视频时的拍摄内容所对应的影像,对象可以是各种事物的影像(即拍摄内容可以是各种事物),例如人物影像、动物影像、行为影像等。
步骤602,将待识别视频输入视频识别模型中,生成待识别视频中的对象所对应的标签序列。
在本实施例中,上述执行主体可以将步骤601中获取的待识别视频输入视频识别模型中,从而生成待识别视频中的对象所对应的标签序列。其中,标签可以用于表征对象所指示的内容,标签可以包括但不限于以下至少一项:文字、数字、符号、图片。标签序列可以包括至少两个标签。标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。
在本实施例中,视频识别模型可以是采用如上述图2实施例所描述的方法而生成的。具体生成过程可以参见图2实施例的相关描述,在此不再赘述。
需要说明的是,本实施例用于识别视频的方法可以用于测试上述各实施例所生成的视频识别模型。进而根据测试结果可以不断地优化视频识别模型。该方法也可以是上述各实施例所生成的视频识别模型 的实际应用方法。采用上述各实施例所生成的视频识别模型来进行视频识别,可以实现对通过录制屏幕获得的视频的检测,且有助于提高视频识别的准确性。
继续参见图7,作为对上述图6所示方法的实现,本申请提供了一种用于识别视频的装置的一个实施例。该装置实施例与图6所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图7所示,本实施例的用于识别视频的装置700可以包括:视频获取单元701和序列生成单元702。其中,视频获取单元701被配置成获取包括对象的待识别视频;序列生成单元702被配置成将待识别视频输入采用如上述图2实施例所描述的方法生成的模型中,生成待识别视频中的对象所对应的标签序列。其中,标签可以用于表征对象所指示的内容,标签可以包括但不限于以下至少一项:文字、数字、符号、图片。标签序列可以包括至少两个标签。标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。
可以理解的是,该装置700中记载的诸单元与参考图6描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置700及其中包含的单元,在此不再赘述。
下面参见图8,其示出了适于用来实现本申请实施例的电子设备的计算机系统800的结构示意图。图8示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有系统800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
以下部件连接至I/O接口805:包括触摸屏、键盘、鼠标、摄像装置等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以 发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括样本获取单元和第一执行单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,样本获取单元还可以被描述为“获取训练样本集的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容;从训练样 本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
此外,当上述一个或者多个程序被该电子设备执行时,还可以使得该电子设备:获取包括对象的待识别视频;将待识别视频输入视频识别模型中,生成待识别视频中的对象所对应的标签序列,其中,标签用于表征对象所指示的内容。标签序列包括至少两个标签。标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。视频识别模型可以是采用如上述各实施例所描述的用于生成模型的方法而生成的。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (18)

  1. 一种用于生成模型的方法,包括:
    获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容;
    从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
  2. 根据权利要求1所述的方法,其中,所述从至少两个候选标签序列选择候选标签序列作为实际标签序列,包括:
    对于至少两个候选标签序列中候选标签序列,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率;
    选取概率最大的候选标签序列作为实际标签序列。
  3. 根据权利要求2所述的方法,其中,所述基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率,包括:
    对该候选标签序列中的候选标签所对应的概率进行求积,获得求积结果;
    将所获得的求积结果确定为该候选标签序列所对应的概率。
  4. 根据权利要求1所述的方法,其中,所述基于实际标签序列和 所选取的训练样本中的样本标签序列,确定初始模型是否训练完成,包括:
    对于实际标签序列中的实际标签,确定该实际标签相对于该实际标签所对应的样本标签序列中的样本标签的损失值;
    基于所确定的损失值,确定初始模型是否训练完成。
  5. 根据权利要求4所述的方法,其中,所述基于所确定的损失值,确定初始模型是否训练完成,包括:
    对于实际标签序列中的实际标签,确定该实际标签所对应的等级,以及确定该实际标签所对应的损失值是否小于等于针对所确定的等级预先设置的损失阈值;
    响应于确定实际标签序列中的实际标签所对应的损失值均小于等于相对应的损失阈值,确定初始模型训练完成。
  6. 根据权利要求4所述的方法,其中,所述基于所确定的损失值,确定初始模型是否训练完成,包括:
    确定实际标签序列中的实际标签所对应的等级;
    获取针对不同等级的标签预先设置的权重,以及基于所获取的权重,对所确定的损失值进行加权求和处理,获得加权求和值;
    将所获得的加权求和值确定为实际标签序列相对于样本标签序列的总损失值,以及响应于确定总损失值小于等于预先设置的总损失阈值,确定初始模型训练完成。
  7. 根据权利要求1-6之一所述的方法,其中,所述方法还包括:
    响应于确定初始模型未训练完成,调整初始模型中的相关参数,从未被选取的训练样本中选取训练样本,以及使用最近一次调整的初始模型作为初始模型且使用最近一次选取的训练样本作为所选取的训练样本,继续执行所述训练步骤。
  8. 一种用于生成模型的装置,包括:
    样本获取单元,被配置成获取训练样本集,其中,训练样本包括包括样本对象的样本视频和针对样本视频中的样本对象预先确定的样本标签序列,样本标签用于表征样本对象所指示的内容,样本标签序列包括至少两个样本标签,样本标签序列中的样本标签具有等级关系,等级低的样本标签所对应的内容属于等级高的样本标签所对应的内容;
    第一执行单元,被配置成从训练样本集中选取训练样本,以及执行以下训练步骤:将所选取的训练样本中的样本视频输入初始模型,获得样本视频中的样本对象所对应的至少两个候选标签序列;从至少两个候选标签序列选择候选标签序列作为实际标签序列;基于实际标签序列和所选取的训练样本中的样本标签序列,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练完成的初始模型作为视频识别模型。
  9. 根据权利要求8所述的装置,其中,所述第一执行单元包括:
    概率确定模块,被配置成对于至少两个候选标签序列中候选标签序列,基于该候选标签序列中的候选标签所对应的概率,确定该候选标签序列所对应的概率;
    序列选取模块,被配置成选取概率最大的候选标签序列作为实际标签序列。
  10. 根据权利要求9所述的装置,其中,所述概率确定模块进一步被配置成:
    对该候选标签序列中的候选标签所对应的概率进行求积,获得求积结果;
    将所获得的求积结果确定为该候选标签序列所对应的概率。
  11. 根据权利要求8所述的装置,其中,所述第一执行单元包括:
    损失确定模块,被配置成对于实际标签序列中的实际标签,确定该实际标签相对于该实际标签所对应的样本标签序列中的样本标签的 损失值;
    模型确定模块,被配置成基于所确定的损失值,确定初始模型是否训练完成。
  12. 根据权利要求11所述的装置,其中,所述模型确定模块进一步被配置成:
    对于实际标签序列中的实际标签,确定该实际标签所对应的等级,以及确定该实际标签所对应的损失值是否小于等于针对所确定的等级预先设置的损失阈值;
    响应于确定实际标签序列中的实际标签所对应的损失值均小于等于相对应的损失阈值,确定初始模型训练完成。
  13. 根据权利要求11所述的装置,其中,所述模型确定模块进一步被配置成:
    确定实际标签序列中的实际标签所对应的等级;
    获取针对不同等级的标签预先设置的权重,以及基于所获取的权重,对所确定的损失值进行加权求和处理,获得加权求和值;
    将所获得的加权求和值确定为实际标签序列相对于样本标签序列的总损失值,以及响应于确定总损失值小于等于预先设置的总损失阈值,确定初始模型训练完成。
  14. 根据权利要求8-13之一所述的装置,其中,所述装置还包括:
    第二执行单元,被配置成响应于确定初始模型未训练完成,调整初始模型中的相关参数,从未被选取的训练样本中选取训练样本,以及使用最近一次调整的初始模型作为初始模型且使用最近一次选取的训练样本作为所选取的训练样本,继续执行所述训练步骤。
  15. 一种用于识别视频的方法,包括:
    获取包括对象的待识别视频;
    将所述待识别视频输入采用如权利要求1-7之一所述的方法生成 的视频识别模型中,生成所述待识别视频中的对象所对应的标签序列,其中,标签用于表征所述对象所指示的内容,标签序列包括至少两个标签,标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。
  16. 一种用于识别视频的装置,包括:
    视频获取单元,被配置成获取包括对象的待识别视频;
    序列生成单元,被配置成将所述待识别视频输入采用如权利要求1-7之一所述的方法生成的视频识别模型中,生成所述待识别视频中的对象所对应的标签序列,其中,标签用于表征所述对象所指示的内容,标签序列包括至少两个标签,标签序列中的标签具有等级关系,等级低的标签所对应的内容属于等级高的标签所对应的内容。
  17. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7、15中任一所述的方法。
  18. 一种计算机可读介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-7、15中任一所述的方法。
PCT/CN2018/116175 2018-06-27 2018-11-19 用于生成模型的方法和装置 WO2020000876A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810679114.1 2018-06-27
CN201810679114.1A CN108960316B (zh) 2018-06-27 2018-06-27 用于生成模型的方法和装置

Publications (1)

Publication Number Publication Date
WO2020000876A1 true WO2020000876A1 (zh) 2020-01-02

Family

ID=64487219

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116175 WO2020000876A1 (zh) 2018-06-27 2018-11-19 用于生成模型的方法和装置

Country Status (2)

Country Link
CN (1) CN108960316B (zh)
WO (1) WO2020000876A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509562A (zh) * 2020-11-09 2021-03-16 北京有竹居网络技术有限公司 用于文本后处理的方法、装置、电子设备和介质
CN112541705A (zh) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 生成用户行为评估模型的方法、装置、设备以及存储介质
CN112712003A (zh) * 2020-12-25 2021-04-27 华南理工大学 一种用于骨骼动作序列识别的联合标签数据增强方法
CN112989023A (zh) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 标签推荐方法、装置、设备、存储介质及计算机程序产品
CN113744708A (zh) * 2021-09-07 2021-12-03 腾讯音乐娱乐科技(深圳)有限公司 模型训练方法、音频评价方法、设备及可读存储介质
CN115062709A (zh) * 2022-06-21 2022-09-16 腾讯科技(深圳)有限公司 模型优化方法、装置、设备、存储介质及程序产品

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476258B (zh) * 2019-01-24 2024-01-05 杭州海康威视数字技术股份有限公司 一种基于注意力机制的特征提取方法、装置及电子设备
CN110598869B (zh) * 2019-08-27 2024-01-19 创新先进技术有限公司 基于序列模型的分类方法、装置、电子设备
CN111311309A (zh) * 2020-01-19 2020-06-19 百度在线网络技术(北京)有限公司 用户满意度确定方法、装置、设备和介质
CN111352965B (zh) * 2020-02-18 2023-09-08 腾讯科技(深圳)有限公司 序列挖掘模型的训练方法、序列数据的处理方法及设备
CN111507089B (zh) * 2020-06-09 2022-09-09 平安科技(深圳)有限公司 基于深度学习模型的文献分类方法、装置和计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542024A (zh) * 2011-12-21 2012-07-04 电子科技大学 一种视频资源语义标签的标定方法
CN105677735A (zh) * 2015-12-30 2016-06-15 腾讯科技(深圳)有限公司 一种视频搜索方法及装置
US20160188592A1 (en) * 2014-12-24 2016-06-30 Facebook, Inc. Tag prediction for images or video content items
CN105913072A (zh) * 2016-03-31 2016-08-31 乐视控股(北京)有限公司 视频分类模型的训练方法和视频分类方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233668A1 (en) * 2006-04-03 2007-10-04 International Business Machines Corporation Method, system, and computer program product for semantic annotation of data in a software system
US8316302B2 (en) * 2007-05-11 2012-11-20 General Instrument Corporation Method and apparatus for annotating video content with metadata generated using speech recognition technology
EP2557524A1 (en) * 2011-08-09 2013-02-13 Teclis Engineering, S.L. Method for automatic tagging of images in Internet social networks
CN106326462B (zh) * 2016-08-30 2019-08-09 北京奇艺世纪科技有限公司 一种视频索引分级方法及装置
CN107766940B (zh) * 2017-11-20 2021-07-23 北京百度网讯科技有限公司 用于生成模型的方法和装置
CN107832305A (zh) * 2017-11-28 2018-03-23 百度在线网络技术(北京)有限公司 用于生成信息的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542024A (zh) * 2011-12-21 2012-07-04 电子科技大学 一种视频资源语义标签的标定方法
US20160188592A1 (en) * 2014-12-24 2016-06-30 Facebook, Inc. Tag prediction for images or video content items
CN105677735A (zh) * 2015-12-30 2016-06-15 腾讯科技(深圳)有限公司 一种视频搜索方法及装置
CN105913072A (zh) * 2016-03-31 2016-08-31 乐视控股(北京)有限公司 视频分类模型的训练方法和视频分类方法

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509562A (zh) * 2020-11-09 2021-03-16 北京有竹居网络技术有限公司 用于文本后处理的方法、装置、电子设备和介质
CN112509562B (zh) * 2020-11-09 2024-03-22 北京有竹居网络技术有限公司 用于文本后处理的方法、装置、电子设备和介质
CN112541705A (zh) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 生成用户行为评估模型的方法、装置、设备以及存储介质
CN112541705B (zh) * 2020-12-23 2024-01-23 北京百度网讯科技有限公司 生成用户行为评估模型的方法、装置、设备以及存储介质
CN112712003A (zh) * 2020-12-25 2021-04-27 华南理工大学 一种用于骨骼动作序列识别的联合标签数据增强方法
CN112712003B (zh) * 2020-12-25 2022-07-26 华南理工大学 一种用于骨骼动作序列识别的联合标签数据增强方法
CN112989023A (zh) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 标签推荐方法、装置、设备、存储介质及计算机程序产品
CN112989023B (zh) * 2021-03-25 2023-07-28 北京百度网讯科技有限公司 标签推荐方法、装置、设备、存储介质及计算机程序产品
CN113744708A (zh) * 2021-09-07 2021-12-03 腾讯音乐娱乐科技(深圳)有限公司 模型训练方法、音频评价方法、设备及可读存储介质
CN113744708B (zh) * 2021-09-07 2024-05-14 腾讯音乐娱乐科技(深圳)有限公司 模型训练方法、音频评价方法、设备及可读存储介质
CN115062709A (zh) * 2022-06-21 2022-09-16 腾讯科技(深圳)有限公司 模型优化方法、装置、设备、存储介质及程序产品

Also Published As

Publication number Publication date
CN108960316A (zh) 2018-12-07
CN108960316B (zh) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2020000876A1 (zh) 用于生成模型的方法和装置
WO2020000879A1 (zh) 图像识别方法和装置
CN108830235B (zh) 用于生成信息的方法和装置
US11978245B2 (en) Method and apparatus for generating image
KR102394756B1 (ko) 비디오를 처리하기 위한 방법 및 장치
WO2019237657A1 (zh) 用于生成模型的方法和装置
CN109104620B (zh) 一种短视频推荐方法、装置和可读介质
CN108922622B (zh) 一种动物健康监测方法、装置及计算机可读存储介质
CN110288049B (zh) 用于生成图像识别模型的方法和装置
CN109492128B (zh) 用于生成模型的方法和装置
CN109376267B (zh) 用于生成模型的方法和装置
CN109034069B (zh) 用于生成信息的方法和装置
CN108197652B (zh) 用于生成信息的方法和装置
CN109447156B (zh) 用于生成模型的方法和装置
WO2020024484A1 (zh) 用于输出数据的方法和装置
CN106611015B (zh) 标签的处理方法及装置
CN109583389B (zh) 绘本识别方法及装置
CN109582825B (zh) 用于生成信息的方法和装置
CN109214501B (zh) 用于识别信息的方法和装置
CN110209658B (zh) 数据清洗方法和装置
WO2022033534A1 (zh) 用于生成目标视频的方法、装置、服务器和介质
CN113033677A (zh) 视频分类方法、装置、电子设备和存储介质
CN115801980A (zh) 视频生成方法和装置
CN109816023B (zh) 用于生成图片标签模型的方法和装置
CN108921138B (zh) 用于生成信息的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18924123

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 17.02.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18924123

Country of ref document: EP

Kind code of ref document: A1