WO2019223361A1 - Video analysis method and apparatus - Google Patents

Video analysis method and apparatus Download PDF

Info

Publication number
WO2019223361A1
WO2019223361A1 PCT/CN2019/073661 CN2019073661W WO2019223361A1 WO 2019223361 A1 WO2019223361 A1 WO 2019223361A1 CN 2019073661 W CN2019073661 W CN 2019073661W WO 2019223361 A1 WO2019223361 A1 WO 2019223361A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
analyzed
image
target identifier
preset
Prior art date
Application number
PCT/CN2019/073661
Other languages
French (fr)
Chinese (zh)
Inventor
戴威
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2019223361A1 publication Critical patent/WO2019223361A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Definitions

  • the present application relates to the field of video processing, and in particular, to a video analysis method and device.
  • the title of the program has become an effective channel for advertisers to promote corporate brands.
  • advertisers have embedded corporate brand advertisements in TV programs, so that viewers noticed that the corporate brand advertisements were embedded in the process of watching TV programs. Then play the effect of publicizing the corporate brand.
  • exposure data such as whether or not a company's brand is exposed in a television program, the length of the exposure, and the duration of the exposure will affect the effectiveness of the corporate brand's promotion. Therefore, it is necessary to analyze the exposure data of the corporate brand in the TV program in order to find a publicity way for the advertiser's corporate brand to achieve better publicity, or analyze the exposure data of the competitor's corporate brand.
  • the present invention is provided in order to provide a video analysis method and device that overcome the above problems or at least partially solve the above problems.
  • a video analysis method includes:
  • the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
  • the separately identifying target identifiers in the videos to be analyzed includes:
  • the preset model identifies the target identifier in the any one frame image according to the following steps:
  • the target identifier in the arbitrary one-frame image is identified.
  • the preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
  • extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
  • the generating candidate regions based on the multi-scale feature image set includes:
  • the multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
  • the preset model is trained in the following manner to obtain the trained preset model:
  • the training set includes: a plurality of frames of images to which the target identifier is marked;
  • the corrected image is: an image that has been manually corrected for the incorrect annotation
  • the preset conditions further include:
  • the ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
  • determining that exposure data satisfying the target identifier in the video to be analyzed includes:
  • the detection result is that the target identifier meets the preset condition
  • determining that the target identifier is exposed in the video to be analyzed and further determining an exposure parameter, wherein the exposure parameter includes at least one of the following 1: exposure time, exposure position;
  • the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
  • a video analysis device includes:
  • a first identification unit configured to identify a target identifier in the video to be analyzed
  • a detection unit configured to detect whether the identified target identifier meets a preset condition, the preset condition includes: at least part of the overlap of the target identifiers distributed in at least two frames of adjacent images to obtain a detection result;
  • a determining unit is configured to determine, according to the detection result, exposure data that satisfies the target identifier in the video to be analyzed.
  • the first identification unit includes:
  • a first input subunit configured to input each frame image in the video to be analyzed into a preset model after training, so that the trained preset model identifies a target identifier in each frame of the video to be analyzed ;
  • the preset model includes:
  • a first extraction unit configured to extract multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set
  • a generating unit configured to generate a candidate region based on the multi-scale feature image set
  • a selection unit configured to select a feature image set of at least two scales from the multi-scale feature image set
  • a second extraction unit configured to respectively extract a region set corresponding to the candidate region from the feature image set of the at least two scales, to obtain a region of at least two scales corresponding to the feature image set of the at least two scales set;
  • the second recognition unit recognizes the target identifier in the arbitrary one-frame image by fully connecting the region sets of at least two scales.
  • the preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
  • the first extraction unit is specifically configured to extract the multi-scale features of the arbitrary one-frame image by using the underlying feature extraction module to obtain the multi-scale feature image set;
  • the generating unit is specifically configured to input the multi-scale feature image set into the candidate region generating network, and generate the candidate region through the candidate region generating network.
  • the training unit is configured to train the preset model to obtain the trained preset model
  • the training unit includes:
  • a first acquisition subunit configured to acquire a training set, where the training set includes: multiple frames of images to which the target identifier has been labeled;
  • a first training subunit configured to train the preset model by using the multi-frame image to obtain a first preset model
  • a second input subunit configured to input an image in the video to be analyzed into the first preset model
  • a second acquisition subunit configured to acquire an image labeled with the target identifier through the first preset model in the video to be analyzed; the image labeled with the target identifier has an incorrect label;
  • a third acquisition subunit configured to acquire a corrected image;
  • the corrected image is: an image that has been manually corrected for the incorrect annotation;
  • a second training subunit is configured to use the modified image to train the first preset model to obtain the trained preset model.
  • the detection unit is further configured to detect that an overlap ratio between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; and in the target identifiers at least partially overlapping the placeholders, the sharpness is greater than a preset sharpness threshold The total number of target IDs is greater than the preset total number.
  • the determining unit includes:
  • a first determining subunit configured to determine that the target identifier is exposed in the video to be analyzed, and further determine an exposure parameter if the detection result is that the target identifier meets the preset condition, and
  • the exposure parameter includes at least one of the following: exposure duration and exposure position;
  • a second determining subunit is configured to determine that the target identifier is not exposed in the video to be analyzed if the detection result is that the target identifier does not meet the preset condition.
  • a storage medium stores a program on the storage medium, and when the program is executed by a processor, the video analysis method according to any one of the foregoing is implemented.
  • a processor is configured to run a program, and when the program runs, the video analysis method according to any one of the foregoing is performed.
  • the technical solution provided by the present invention identifies the target identifier in the video to be analyzed; detects whether the identified target identifier meets a preset condition, and the characteristics of the target identifier that meets the preset condition, and The specific match that the target identifier has when exposed in the video; in this embodiment, a detection result obtained by detecting whether the identified target identifier satisfies a preset condition, and the detection result includes that the identified target identifier satisfies a preset The condition and the identified target identifier do not meet the preset conditions; therefore, in this embodiment, whether the target identifier is exposed according to the detection result; when the target identifier is exposed, the identified target identifier meets the preset condition, At this time, the preset conditions define that the placeholders of the target identifiers distributed in at least two frames of adjacent images are at least overlapping.
  • the placeholders of the at least overlapping target identifiers reflect the exposed target identifiers. Further, according to the position of the target identification of the exposure, the position of the exposure at the position in the video to be identified can be determined. Playback time length BRANDING; Accordingly, in the present application embodiment, the exposure may be determined in the target identification data of the video to be analyzed according to the detection result, and can save manpower resources.
  • FIG. 1 shows a flowchart of an embodiment of a model training method in the present application
  • FIG. 2 shows a schematic diagram of labeling each BMW brand logo in an image by using a box in the present application
  • FIG. 3 shows a flowchart of an embodiment of a method for analyzing target identification in a video in the present application
  • FIG. 4 is a schematic diagram showing a distribution of target identifiers identified in an image included in an image set in the present application
  • FIG. 5 is a schematic structural diagram of an embodiment of an analysis apparatus for target identification in a video in the present application.
  • a model for target recognition is set, and specifically, it can be applied to a scene based on target recognition, for example, image classification and image segmentation.
  • the model architecture can be a Faster-RCNN architecture.
  • the ResNet model is used as the underlying feature extraction model, and the RPN network is used as the candidate region generation network.
  • the ResNet model includes five parts, which are part 1, part 2, part 3, part 4 and part 5, each of which includes a pooling layer and a convolution layer.
  • the processing flow of the model is improved. Specifically, taking the model for image recognition as an example, the specific improvement of the processing flow of the model in this embodiment is introduced.
  • the image to be processed is input into the model, and the convolutional layers in different parts of the model output information of different scales of the image to be processed (different scales of the image can be understood as different resolutions). For example, when the size of the image to be processed is M * M, the convolutional layer of part 1 outputs a first feature image set of size M * M, and the convolutional layer of part 2 outputs a second feature image set of size M * M.
  • the convolutional layer of part 3 outputs a third feature image set of size M / 2 * M / 2
  • the convolutional layer of part 4 outputs a fourth feature image set of size M / 4 * M / 4
  • the convolution of part 5 The layer outputs a fifth feature image set of size M / 8 * M / 8.
  • the first feature image set, the second feature image set, the third feature image set, the fourth feature image set, and the fifth feature image set are all composed of multiple layers of images.
  • the specific number of image layers and the feature The number of convolution kernels in the convolution layer corresponding to the image set is the same.
  • the feature image sets of different scales are input to an RPN network, which generates candidate regions; again, from 5 At least two feature image sets are selected from each of the feature image sets, and the region sets corresponding to the candidate coordinate regions are extracted from the at least two feature image sets, respectively, and the extracted region sets are uniformly pre-defined in length and width.
  • the above-mentioned extraction region set, the extracted region set is unified to a preset size, and the region set unified to a preset size is spliced from the number of layers.
  • the first feature image set is 128 * 128 * 3, where 3 represents the number of frames of the first feature image included in the first feature image set, 128 * 128 represents the size of any one of the first feature images is 128 * 128;
  • the second feature image set is 64 * 64 * 6 , Where 6 represents the number of frames of the second feature image included in the second feature image set, 64 * 64 represents the size of any one of the second feature images is 64 * 64;
  • the third feature image set is 32 * 32 * 4,
  • the fourth feature image set is 16 * 16 * 2, and the fifth feature image set is 4 * 4 * 3.
  • the meaning of the parameters of the third image feature set, the fourth image feature set, and the fifth image feature set is the same as the first The meanings of the parameters in the feature image set are the same and will not be repeated here.
  • the set performs data sampling so that the images included in the selected at least two feature image sets are unified in size, for example, the sizes are unified into 7 * 7 images.
  • the selected at least two feature image sets are uniformly sized, they are superimposed from the number of layers. Specifically, it is assumed that the selected at least two feature image sets are a third feature image set and a fifth feature image set, and the sizes of the images included in the two feature image sets are unified to 7 * 7. At this time, the sizes are The two feature image sets unified as 7 * 7 are superimposed from the number of layers, and the superimposed feature image set is 7 * 7 * 10.
  • the model in this embodiment uses the ResNet model as the underlying feature extraction model, uses the RPN network as the candidate region generation network, and uses an improved processing flow to process the input to-be-processed image.
  • the model in this embodiment is based on the RPN
  • the region sets corresponding to the candidate regions are extracted from at least two feature image sets respectively to obtain at least two region sets. Since the at least two region sets are from different feature image sets, and the different feature image sets reflect information of the image to be processed at different scales, the model in this embodiment will include images of at least two scales in the image to be processed. The information is fully connected, so that the model in this embodiment recognizes information of different scales of the image to be processed.
  • the model of the standard Faster-RCNN architecture is used to fully connect the image region set corresponding to the candidate region extracted from the image to be processed, so that the model using the standard Faster-RCNN architecture only performs information on the scale of the processed image. Identify.
  • target images of different sizes may exist in the image to be processed, and the characteristics of target identifiers of different sizes may be reflected on feature image sets of different scales.
  • the model in this embodiment can identify information in feature image sets of different scales. Therefore, the model in this embodiment can identify information reflected on feature image sets of different scales. Furthermore, compared with a model using a standard Faster-RCNN architecture, the model in this embodiment has a higher recognition accuracy rate for identifying target identifiers of different sizes.
  • FIG. 1 shows a flowchart of an embodiment of a model training method in the present application.
  • the method Embodiments may include:
  • Step 101 Obtain a training set.
  • the model is used for image recognition as an example to introduce the training process of the model.
  • the specific image recognition scene is: identifying whether the BMW brand logo exists in the image.
  • a training set for training the model is obtained, where the training set includes a large number of images marked with the BMW brand logo.
  • a large number of images used to compose the training set can be obtained by searching for images containing the BMW brand logo from search platforms such as Baidu, Google, or other material websites; and also using screen capture software from videos such as live shows To capture images containing the BMW brand logo.
  • search platforms such as Baidu, Google, or other material websites
  • screen capture software from videos such as live shows
  • other methods can be used to obtain a large number of images containing the BMW brand logo. This step only provides two methods to obtain the images containing the BMW brand logo, but not specific methods to obtain the images containing the BMW brand logo. Make restrictions.
  • the BMW brand logo in the image is labeled for each of the acquired images. Specifically, as shown in FIG. 2, each box in the image is labeled BMW brand identity.
  • Step 102 Train the model using the acquired training set to obtain a first model.
  • the model is trained using a large number of images in the acquired training set. Specifically, an image labeled with the BMW brand logo is input into the model, and the model uses an improved process to identify and label the BMW brand logo in the input image. Based on the training set, the BMW brand logo is used as a benchmark and automatically adjusted. The parameters in the model are adjusted multiple times in the model. When a certain standard is reached, the first model is obtained.
  • Step 103 Input a preset number of frames of images to be identified into the first model.
  • a preset number of frames of images to be identified are input into the first model, and for each of the input frames of images to be identified, the first model recognizes and Mark out that each frame contains the BMW brand logo.
  • Step 104 Obtain a preset number of frame images that the first model separately recognizes and labels the target identifier.
  • a preset number of frames of images identified by the first model and labeled with the target identifier are obtained.
  • misidentification occurs.
  • the labeled target identifier is also wrong. Therefore, in this step, in the preset number of frames obtained by identifying and labeling the target identifier with the first model, there are symbols labeled with non-target identifiers. For convenience of description, this embodiment will label the symbols with non-target identifiers. Collectively referred to as error symbols.
  • Step 105 Obtain a preset number of frame images with artificially corrected error symbols.
  • Step 106 input the corrected preset number of frame images into the first model, and train the first model to obtain a trained model.
  • the corrected preset number of frame images are input to a first model, and the first model is further trained.
  • the process of training the first model in this step is the same as the idea of training the model in step 102.
  • the models obtained after training the first model are collectively referred to as a trained model.
  • the first model is obtained after training the model through the training set. Since the images of the training set are collected from the search platform, after training the model using the images in the training set, the model is only Learn the target identifiers in this training set. In practical applications, there may be similar identifiers similar to the target identifier in the image to be identified. In order to allow the model to better distinguish the distinguishing features of the target identifier and the similar identifier, in this embodiment, a preset number of frames are to be identified. The image is input to the first model, and there are error symbols in the symbols output by the first model for labeling the target identifier. A preset number of frame images corrected by the error symbols are manually used to train the first model again to obtain training. Post model. At this time, compared with the first model, the recognition accuracy of the target identifier in the image to be identified is improved after training. Therefore, the training method of this embodiment can further improve the accuracy of the target identifier in the image to be identified by the model. Identification accuracy.
  • the trained model is obtained, then, in this embodiment, the trained model is applied to a scenario for analyzing the implantation of target identifiers in a video.
  • FIG. 3 a flowchart of an embodiment of a method for analyzing a target identifier in a video in the present application is shown.
  • the method embodiment may include:
  • Step 301 Obtain a video to be analyzed.
  • the video to be analyzed obtained in this step may be an encoded video to be analyzed.
  • Step 302 Decode the obtained video to be analyzed to obtain a decoded video to be analyzed.
  • Step 303 For the decoded video to be analyzed, the decoded video is divided into multiple image sets according to the sequence of the video frames and the principle of using the first preset number of frames as an image set.
  • the target logo embedded in the video is generally played continuously for two to three seconds, where the target logo represents a preset type of logo.
  • the target logo represents a preset type of logo.
  • the BMW brand logo in the video needs to be analyzed.
  • BMW The brand identity is the target identity.
  • the image played every second is about 5 frames. Therefore, the image embedded with the target identifier in the decoded video to be analyzed generally appears in consecutive 10 to 15 frames. Therefore, in order to more accurately analyze the implantation of the target identifier in the video to be analyzed, in this step, for the decoded video to be analyzed, according to the sequence of the video frames in the video to be analyzed, the first preset number of frame images As an image set, the preset number can be any number from 5 to 7 frames. At this time, the decoded video to be analyzed is divided into multiple image sets.
  • Step 304 Input the images in each image set into the trained models separately, so that the trained models recognize the target identifiers in the images contained in each image set.
  • the images in each image set are respectively input into a trained model, and the trained model targets the objects in each frame of the image.
  • Identification In practical applications, after identifying the target logo in the video to be analyzed in the trained model, the identified target logo is labeled. For example, the trained model recognizes a BMW brand logo. A box can be used. Frame the identified BMW brand logo, and output an image that frames the identified BMW brand logo with a frame.
  • Step 305 Obtain an image set labeled with a target identifier corresponding to each image set and output by the trained model.
  • an image set marked with preset symbols corresponding to each of the divided image sets is obtained, and multiple recognized image sets are obtained.
  • Step 306 Detect whether the target identifier marked in each image set meets a preset condition.
  • this step After obtaining a plurality of image sets labeled with target identifiers, then, in this step, it is respectively detected whether the target identifiers labeled in each image set exist and satisfy a preset condition.
  • this step taking any image set as an example, it is introduced whether the target identifier marked in the any image set satisfies a preset condition.
  • the preset condition may include that the placeholders of the target identifiers distributed in at least two adjacent images at least partially overlap.
  • the placeholder of the target identifier refers to a space area occupied by the target identifier in a reference coordinate system.
  • the following uses a specific scenario as an example to introduce whether the identified target identifiers in the image collection meet a preset condition.
  • the specific scene is: the image set includes 5 frames of images, namely the first frame, the second frame, the third frame, the fourth frame, and the fifth frame, and the target logo is the BMW brand logo;
  • the position distribution of the identified target identifiers in the second frame image, the third frame image, the fourth frame image, and the fifth frame image is shown in FIG. 4.
  • the two BMW brand logos identified in the first frame of the image are two, one distributed at the upper left corner of the image and the other at the lower right corner of the image; two BMW brands are identified in the second frame of the image.
  • the preset condition is "the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap", so in this scene, the target identifiers distributed in at least two frames of adjacent images are specifically: 5 BMW brand logos in the first frame image, the second frame image, and the third frame image; then, determine whether the placeholders of the target logos distributed in at least two adjacent frames at least partially overlap, and in the first frame
  • the three BMW brand logos in the bottom right corner of the three frames of the image, the second frame image and the third frame image are overlapping. Therefore, the BMW brand identity identified in the image collection meets a preset condition.
  • two detection results are obtained, one is: the identified target identifiers in the image collection meet the preset conditions, and the other is: The identified target identifier does not meet the preset conditions.
  • the preset condition may further include: an overlap ratio between target identifiers with at least partially overlapping placeholders is greater than a preset percentage; the target identifiers with at least partially overlapping placeholders
  • the total number of target identifiers whose sharpness is greater than a preset sharpness threshold is greater than a preset total number.
  • the value range of the preset percentage may be not less than 50%, and the value range of the preset total number may not be less than 5.
  • this embodiment only provides a preferred value range of the preset percentage and the preset total number.
  • the preset percentage and the preset total number can also be determined based on actual conditions. This embodiment does not limit the specific values of the preset percentage and the preset total number.
  • Step 307 Determine the exposure data of the target identifier in the video to be analyzed according to the detection result.
  • the exposure data of the target identifier in the video to be analyzed is determined.
  • the exposure data includes: exposure, exposure position, and exposure duration. Specifically, in this step, if the detection result is: the identified target identifier in the image set meets a preset condition, it indicates that the target identifier is exposed in the image set; The space position occupied by at least partially overlapping target marks is determined as the exposure position of the target mark; and based on the exposure position, the number of frames of continuous images in which the target mark exists at the exposure position in the video to be analyzed is calculated according to the frame The number determines how long the target ID plays.
  • the playback time of the target mark for each exposure position is determined separately, and the sum of the playback time corresponding to all the exposure positions is used as The total playing time of the target identifier.
  • the detection result is: the identified target ID in the image set does not meet the preset conditions, it indicates that the target ID is not exposed in the image set, and if the target ID is not exposed in each image set, it indicates that the target ID
  • the target identifier is not exposed in the video to be analyzed. At this time, there is no exposure position and exposure time.
  • the target identifier in the video to be analyzed is identified; whether the identified target identifier satisfies a preset condition, the characteristics of the target identifier satisfying the preset condition, and the characteristics of the target identifier when the target identifier is exposed in the video are detected. Has a specific match; in this embodiment, a detection result obtained by detecting whether the identified target identifier satisfies a preset condition, and the detection result includes: the identified target identifier meets the preset condition and the identified target identifier does not match.
  • the preset conditions are met; therefore, in this embodiment, whether the target logo is exposed can be determined according to the detection result; when the target logo is exposed, the identified target logo meets the preset conditions.
  • the placeholders of the target identifiers distributed in at least two frames of adjacent images are at least overlapping.
  • the placeholders of the at least overlapping target identifiers reflect the positions of the exposed target identifiers; further, based on the exposure
  • the position of the target identifier can be used to determine the playing time of the exposed target identifier at that position in the video to be identified; therefore, in Application embodiments, the exposure may be determined to be analyzed in the video data based on a detection result of the target identifier.
  • the apparatus embodiment may include:
  • An obtaining unit 501 configured to obtain a video to be analyzed
  • a first identification unit 502 configured to identify a target identifier in the video to be analyzed
  • a detection unit 503 is configured to detect whether the identified target identifier meets a preset condition, where the preset condition includes: at least part of the placeholders of the target identifiers distributed in at least two frames of adjacent images overlap to obtain a detection result;
  • a determining unit 504 is configured to determine, according to the detection result, exposure data that satisfies the target identifier in the video to be analyzed.
  • the first identification unit 502 may include:
  • a first input subunit configured to input each frame image in the video to be analyzed into a preset model after training, so that the trained preset model identifies a target identifier in each frame of the video to be analyzed ;
  • the preset model includes:
  • a first extraction unit configured to extract multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set
  • a generating unit configured to generate a candidate region based on the multi-scale feature image set
  • a selection unit configured to select a feature image set of at least two scales from the multi-scale feature image set
  • a second extraction unit configured to respectively extract a region set corresponding to the candidate region from the feature image set of the at least two scales, to obtain a region of at least two scales corresponding to the feature image set of the at least two scales set;
  • the second recognition unit recognizes the target identifier in the arbitrary one-frame image by fully connecting the region sets of at least two scales.
  • the preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
  • the first extraction unit is specifically configured to extract the multi-scale features of the arbitrary frame of images through the underlying feature extraction module to obtain the multi-scale feature image set;
  • the generating unit is specifically configured to input the multi-scale feature image set into the candidate region generating network, and generate the candidate region through the candidate region generating network.
  • the device may further include: a training unit;
  • the training unit is configured to train the preset model to obtain the trained preset model
  • the training unit includes:
  • a first acquisition subunit configured to acquire a training set, where the training set includes: multiple frames of images to which the target identifier has been labeled;
  • a first training subunit configured to train the preset model by using the multi-frame image to obtain a first preset model
  • a second input subunit configured to input an image in the video to be analyzed into the first preset model
  • a second acquisition subunit configured to acquire an image labeled with the target identifier through the first preset model in the video to be analyzed; the image labeled with the target identifier has an incorrect label;
  • a third acquisition subunit configured to acquire a corrected image;
  • the corrected image is: an image that has been manually corrected for the incorrect annotation;
  • a second training subunit is configured to use the modified image to train the first preset model to obtain the trained preset model.
  • the detection unit 503 is further configured to detect that an overlap ratio between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; and in the target identifiers at least partially overlapping the placeholders, the definition is greater than a preset definition
  • the total number of target identifiers of the threshold is greater than the preset total number.
  • the determining unit 504 may include:
  • a first determining subunit configured to determine that the target identifier is exposed in the video to be analyzed, and further determine an exposure parameter if the detection result is that the target identifier meets the preset condition, and
  • the exposure parameter includes at least one of the following: exposure duration and exposure position;
  • a second determining subunit is configured to determine that the target identifier is not exposed in the video to be analyzed if the detection result is that the target identifier does not meet the preset condition.
  • the analysis device for the target identification in the video includes a processor and a memory.
  • the acquisition unit, the first identification unit, the detection unit, the determination unit, and the training unit are all stored in the memory as program units, and are executed by the processor and stored in the memory.
  • the above program units are used to implement the corresponding functions.
  • the processor contains a kernel, and the kernel retrieves the corresponding program unit from the memory.
  • the kernel can set one or more, and adjust the kernel parameters to analyze the exposure data of the target logo in the video.
  • Memory may include non-permanent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (RAM).
  • Memory includes at least one Memory chip.
  • An embodiment of the present invention provides a storage medium on which a program is stored, and the video analysis method is implemented when the program is executed by a processor.
  • An embodiment of the present invention provides a processor, where the processor is configured to run a program, and the video analysis method is executed when the program runs.
  • An embodiment of the present invention provides a device.
  • the device includes a processor, a memory, and a program stored on the memory and executable on the processor.
  • the processor executes the program, the following steps are implemented:
  • the preset model identifies the target identifier in the any one frame image according to the following steps:
  • the target identifier in the arbitrary one-frame image is identified.
  • the preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
  • extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
  • the generating candidate regions based on the multi-scale feature image set includes:
  • the multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
  • the preset model is trained in the following manner to obtain the trained preset model:
  • the training set includes: a plurality of frames of images to which the target identifier is marked;
  • the corrected image is: an image that has been manually corrected for the incorrect annotation
  • the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
  • the preset conditions may further include:
  • the ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
  • exposure data that satisfies the target identifier in the video to be analyzed is determined.
  • the detection result is that the target identifier meets the preset condition
  • determining that the target identifier is exposed in the video to be analyzed and further determining an exposure parameter, where the exposure parameter includes At least one of the following: exposure duration, exposure position;
  • the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
  • the equipment in this article can be server, PC, PAD, mobile phone, etc.
  • This application also provides a computer program product, which when executed on a data processing device, is suitable for executing a program having the following method steps for initialization:
  • the preset model identifies the target identifier in the any one frame image according to the following steps:
  • the target identifier in the arbitrary one-frame image is identified.
  • the preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
  • extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
  • the generating candidate regions based on the multi-scale feature image set includes:
  • the multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
  • the preset model is trained in the following manner to obtain the trained preset model:
  • the training set includes: a plurality of frames of images to which the target identifier is marked;
  • the corrected image is: an image that has been manually corrected for the incorrect annotation
  • the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
  • the preset conditions further include:
  • the ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
  • exposure data that satisfies the target identifier in the video to be analyzed is determined.
  • the detection result is that the target identifier meets the preset condition
  • determining that the target identifier is exposed in the video to be analyzed and further determining an exposure parameter, where the exposure parameter includes At least one of the following: exposure duration, exposure position;
  • the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes permanent and non-persistent, removable and non-removable media.
  • Information storage can be accomplished by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Abstract

Disclosed are a video analysis method and apparatus, the method comprising: acquiring a video to be analyzed; identifying target identifiers in the video to be analyzed; detecting whether the identified target identifiers meet a pre-set condition, wherein the pre-set condition comprises: placeholders of target identifiers distributed in at least two adjacent frames of images being at least partially overlapped, so as to obtain a detection result; and determining, according to the detection result, exposure data of the target identifiers meeting the pre-set condition, in the video to be analyzed. By means of the embodiments of the present application, the exposure data of the target identifiers, in the video to be analyzed can be determined, and human resources can be saved.

Description

一种视频分析方法及装置Video analysis method and device
本申请要求于2018年5月23日提交中国专利局、申请号为201810502120.X、发明名称为“一种视频分析方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority from a Chinese patent application filed with the Chinese Patent Office on May 23, 2018, with application number 201810502120.X, and with the invention name "A Video Analysis Method and Device", the entire contents of which are incorporated herein by reference Applying.
技术领域Technical field
本申请涉及视频处理领域,特别是涉及一种视频分析方法及装置。The present application relates to the field of video processing, and in particular, to a video analysis method and device.
背景技术Background technique
目前,节目冠名已经成为广告主宣传企业品牌的有效渠道,具体的,广告主通过在电视节目中嵌入企业品牌的广告,使得观众在观看电视节目的过程中注意到嵌入有企业品牌的广告,进而起到宣传企业品牌的效果。在实际应用中,电视节目中企业品牌是否曝光、曝光的位置,以及曝光的时长等曝光数据,都会影响企业品牌的宣传效果。因此,需要分析企业品牌在电视节目中的曝光数据,以便寻找使广告主的企业品牌达到更好宣传效果的宣传方式,或者,分析竞争对手的企业品牌的曝光数据。At present, the title of the program has become an effective channel for advertisers to promote corporate brands. Specifically, advertisers have embedded corporate brand advertisements in TV programs, so that viewers noticed that the corporate brand advertisements were embedded in the process of watching TV programs. Then play the effect of publicizing the corporate brand. In practical applications, exposure data such as whether or not a company's brand is exposed in a television program, the length of the exposure, and the duration of the exposure will affect the effectiveness of the corporate brand's promotion. Therefore, it is necessary to analyze the exposure data of the corporate brand in the TV program in order to find a publicity way for the advertiser's corporate brand to achieve better publicity, or analyze the exposure data of the competitor's corporate brand.
目前,专业人员观看电视节目,并分析用于表示待分析企业品牌的目标标识在电视节目中的曝光数据。At present, professionals watch TV programs and analyze the exposure data of target logos representing the brands of the companies to be analyzed in the TV programs.
但是,通过专业人员分析该目标标识在电视节目中的曝光数据,浪费人力资源。However, human resources are wasted by analyzing the exposure data of the target mark in the TV program by professionals.
发明内容Summary of the Invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种视频分析方法及装置。In view of the above problems, the present invention is provided in order to provide a video analysis method and device that overcome the above problems or at least partially solve the above problems.
其中,一种视频分析方法,包括:Among them, a video analysis method includes:
获取待分析视频;Get the video to be analyzed;
识别所述待分析视频中的目标标识;Identifying a target identifier in the video to be analyzed;
检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠,得到检测结果;Detecting whether the identified target identifier satisfies a preset condition, the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
依据所述检测结果,确定满足所述目标标识在所述待分析视频中的曝光数据。According to the detection result, exposure data satisfying the target identifier in the video to be analyzed is determined.
其中,所述分别识别所述待分析视频中的目标标识,包括:Wherein, the separately identifying target identifiers in the videos to be analyzed includes:
将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;Inputting each frame image in the video to be analyzed into a trained preset model, so that the trained preset model recognizes a target identifier in each frame of the video to be analyzed;
其中,针对所述待分析视频中的任意一帧图像,所述预设模型按照以下步骤识别所述任意一帧图像中的所述目标标识:For any one frame image in the video to be analyzed, the preset model identifies the target identifier in the any one frame image according to the following steps:
提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;Extracting the multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
基于所述多尺度的特征图像集合生成候选区域;Generating candidate regions based on the multi-scale feature image set;
从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;Selecting a feature image set of at least two scales from the multi-scale feature image set;
分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合对应的至少两个尺度的区域集合;Respectively extracting a region set corresponding to the candidate region from the feature image set of the at least two scales to obtain a region set of at least two scales corresponding to the feature image set of the at least two scales;
通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。By fully connecting the region sets of at least two scales, the target identifier in the arbitrary one-frame image is identified.
其中,所述预设模型为:以Faster-RCNN为架构,所述架构包括底层特征提取模型和候选区域生成网络;The preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
其中,所述提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合,包括:Wherein, extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
通过所述底层特征提取模块提取所述任意一帧图像的多尺度特征,得到所述多尺度特征图像集合;Extracting the multi-scale features of the arbitrary frame image through the underlying feature extraction module to obtain the multi-scale feature image set;
所述基于所述多尺度的特征图像集合生成候选区域,包括:The generating candidate regions based on the multi-scale feature image set includes:
将所述多尺度的特征图像集合输入所述候选区域生成网络,通过所述候选区域生成网络生成所述侯选区域。The multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
其中,通过以下方式对所述预设模型进行训练,得到所述训练后的预 设模型:The preset model is trained in the following manner to obtain the trained preset model:
获取训练集;所述训练集包括:已标注出所述目标标识的多帧图像;Acquiring a training set; the training set includes: a plurality of frames of images to which the target identifier is marked;
采用所述多帧图像对所述预设模型进行训练,得到第一预设模型;Using the multi-frame image to train the preset model to obtain a first preset model;
将所述待分析视频中的图像输入所述第一预设模型;Inputting an image in the video to be analyzed into the first preset model;
获取所述待分析视频中经所述第一预设模型标注出所述目标标识的图像;所述标注出所述目标标识的图像中存在错误标注;Acquiring an image labeled with the target identifier in the video to be analyzed through the first preset model; the image labeled with the target identifier has an incorrect label;
获取修正图像;所述修正图像为:经人工对所述错误标注进行修正后的图像;Obtaining a corrected image; the corrected image is: an image that has been manually corrected for the incorrect annotation;
采用所述修正图像对所述第一预设模型进行训练,得到所述训练后的预设模型。Training the first preset model by using the modified image to obtain the trained preset model.
其中,所述预设条件还包括:The preset conditions further include:
占位至少部分重叠的所述目标标识间的重叠比例大于预设百分比;所述占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。The ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
其中,依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据包括:Wherein, according to the detection result, determining that exposure data satisfying the target identifier in the video to be analyzed includes:
在所述检测结果为所述目标标识满足所述预设条件的情况下,确定所述目标标识在所述待分析视频中曝光,并进一步确定曝光参数,其中,所述曝光参数包括以下至少之一:曝光时长、曝光位置;In a case where the detection result is that the target identifier meets the preset condition, determining that the target identifier is exposed in the video to be analyzed, and further determining an exposure parameter, wherein the exposure parameter includes at least one of the following 1: exposure time, exposure position;
在所述检测结果为所述目标标识不满足所述预设条件的情况下,确定所述目标标识未在所述待分析视频中曝光。When the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
一种视频分析装置,包括:A video analysis device includes:
获取单元,用于获取待分析视频;An acquisition unit for acquiring a video to be analyzed;
第一识别单元,用于识别所述待分析视频中的目标标识;A first identification unit, configured to identify a target identifier in the video to be analyzed;
检测单元,用于检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重 叠,得到检测结果;A detection unit, configured to detect whether the identified target identifier meets a preset condition, the preset condition includes: at least part of the overlap of the target identifiers distributed in at least two frames of adjacent images to obtain a detection result;
确定单元,用于依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据。A determining unit is configured to determine, according to the detection result, exposure data that satisfies the target identifier in the video to be analyzed.
其中,所述第一识别单元,包括:The first identification unit includes:
第一输入子单元,用于将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;A first input subunit, configured to input each frame image in the video to be analyzed into a preset model after training, so that the trained preset model identifies a target identifier in each frame of the video to be analyzed ;
其中,针对所述待分析视频中的任意一帧图像,所述预设模型包括:For any one frame image in the video to be analyzed, the preset model includes:
第一提取单元,用于提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;A first extraction unit, configured to extract multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
生成单元,用于基于所述多尺度的特征图像集合生成候选区域;A generating unit, configured to generate a candidate region based on the multi-scale feature image set;
选取单元,用于从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;A selection unit, configured to select a feature image set of at least two scales from the multi-scale feature image set;
第二提取单元,用于分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合对应的至少两个尺度的区域集合;A second extraction unit, configured to respectively extract a region set corresponding to the candidate region from the feature image set of the at least two scales, to obtain a region of at least two scales corresponding to the feature image set of the at least two scales set;
第二识别单元,通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。The second recognition unit recognizes the target identifier in the arbitrary one-frame image by fully connecting the region sets of at least two scales.
其中,所述预设模型为:以Faster-RCNN为架构,所述架构包括底层特征提取模型和候选区域生成网络;The preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
其中,所述第一提取单元,具体用于通过所述底层特征提取模块提取所述任意一帧图像的多尺度特征,得到所述多尺度特征图像集合;The first extraction unit is specifically configured to extract the multi-scale features of the arbitrary one-frame image by using the underlying feature extraction module to obtain the multi-scale feature image set;
所述生成单元,具体用于将所述多尺度的特征图像集合输入所述候选区域生成网络,通过所述候选区域生成网络生成所述侯选区域。The generating unit is specifically configured to input the multi-scale feature image set into the candidate region generating network, and generate the candidate region through the candidate region generating network.
其中,还包括:训练单元;Which also includes: training units;
所述训练单元,用于对所述预设模型进行训练,得到所述训练后的预 设模型;The training unit is configured to train the preset model to obtain the trained preset model;
其中,所述训练单元,包括:The training unit includes:
第一获取子单元,用于获取训练集;所述训练集包括:已标注出所述目标标识的多帧图像;A first acquisition subunit, configured to acquire a training set, where the training set includes: multiple frames of images to which the target identifier has been labeled;
第一训练子单元,用于采用所述多帧图像对所述预设模型进行训练,得到第一预设模型;A first training subunit, configured to train the preset model by using the multi-frame image to obtain a first preset model;
第二输入子单元,用于将所述待分析视频中的图像输入所述第一预设模型;A second input subunit, configured to input an image in the video to be analyzed into the first preset model;
第二获取子单元,用于获取所述待分析视频中经所述第一预设模型标注出所述目标标识的图像;所述标注出所述目标标识的图像中存在错误标注;A second acquisition subunit, configured to acquire an image labeled with the target identifier through the first preset model in the video to be analyzed; the image labeled with the target identifier has an incorrect label;
第三获取子单元,用于获取修正图像;所述修正图像为:经人工对所述错误标注进行修正后的图像;A third acquisition subunit, configured to acquire a corrected image; the corrected image is: an image that has been manually corrected for the incorrect annotation;
第二训练子单元,用于采用所述修正图像对所述第一预设模型进行训练,得到所述训练后的预设模型。A second training subunit is configured to use the modified image to train the first preset model to obtain the trained preset model.
其中,检测单元还用于:检测占位至少部分重叠的所述目标标识间的重叠比例大于预设百分比;所述占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。The detection unit is further configured to detect that an overlap ratio between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; and in the target identifiers at least partially overlapping the placeholders, the sharpness is greater than a preset sharpness threshold The total number of target IDs is greater than the preset total number.
其中,确定单元,包括:The determining unit includes:
第一确定子单元,用于在所述检测结果为所述目标标识满足所述预设条件的情况下,确定所述目标标识在所述待分析视频中曝光,并进一步确定曝光参数,其中,所述曝光参数包括以下至少之一:曝光时长、曝光位置;A first determining subunit, configured to determine that the target identifier is exposed in the video to be analyzed, and further determine an exposure parameter if the detection result is that the target identifier meets the preset condition, and The exposure parameter includes at least one of the following: exposure duration and exposure position;
第二确定子单元,用于在所述检测结果为所述目标标识不满足所述预设条件的情况下,确定所述目标标识未在所述待分析视频中曝光。A second determining subunit is configured to determine that the target identifier is not exposed in the video to be analyzed if the detection result is that the target identifier does not meet the preset condition.
一种存储介质,所述存储介质上存储有程序,所述程序被处理器执行时实现上述任意一项所述的视频分析方法。A storage medium stores a program on the storage medium, and when the program is executed by a processor, the video analysis method according to any one of the foregoing is implemented.
一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述任意一项所述的视频分析方法。A processor is configured to run a program, and when the program runs, the video analysis method according to any one of the foregoing is performed.
借由上述技术方案,本发明提供的技术方案中,识别待分析视频中的目标标识;检测所识别出的目标标识是否满足预设条件,满足该预设条件的目标标识所具有的特点,与目标标识在视频中曝光时所具有的特定相符;本实施例中,通过检测所识别出的目标标识中是否满足预设条件所得到检测结果,该检测结果包括:识别出的目标标识满足预设条件与识别出的目标标识不满足预设条件;因此,在本实施例中,可以根据检测结果确定该目标标识是否曝光;当该目标标识曝光时,即识别出的目标标识满足预设条件,此时,该预设条件中限定了分布在相邻的至少两帧图像中的目标标识的占位是至少重叠的,此时,占位至少重叠的目标标识的占位体现了曝光的目标标识的位置;进一步的,依据曝光的目标标识的位置,可以确定待识别视频中该位置处的曝光的目标标识的播放时长;因此,在本申请实施例中,可以依据检测结果确定待分析视频中目标标识的曝光数据,而且可以节省人力资源。Based on the above technical solution, the technical solution provided by the present invention identifies the target identifier in the video to be analyzed; detects whether the identified target identifier meets a preset condition, and the characteristics of the target identifier that meets the preset condition, and The specific match that the target identifier has when exposed in the video; in this embodiment, a detection result obtained by detecting whether the identified target identifier satisfies a preset condition, and the detection result includes that the identified target identifier satisfies a preset The condition and the identified target identifier do not meet the preset conditions; therefore, in this embodiment, whether the target identifier is exposed according to the detection result; when the target identifier is exposed, the identified target identifier meets the preset condition, At this time, the preset conditions define that the placeholders of the target identifiers distributed in at least two frames of adjacent images are at least overlapping. At this time, the placeholders of the at least overlapping target identifiers reflect the exposed target identifiers. Further, according to the position of the target identification of the exposure, the position of the exposure at the position in the video to be identified can be determined. Playback time length BRANDING; Accordingly, in the present application embodiment, the exposure may be determined in the target identification data of the video to be analyzed according to the detection result, and can save manpower resources.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more obvious and understandable In the following, specific embodiments of the invention are enumerated.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the invention. Moreover, the same reference numerals are used throughout the drawings to refer to the same parts. In the drawings:
图1示出了本申请中一种模型训练方法实施例的流程图;FIG. 1 shows a flowchart of an embodiment of a model training method in the present application;
图2示出了本申请中一种采用方框标注出图像中的每个宝马品牌标识的示意图;FIG. 2 shows a schematic diagram of labeling each BMW brand logo in an image by using a box in the present application; FIG.
图3示出了本申请中一种视频中目标标识的分析方法实施例的流程图;FIG. 3 shows a flowchart of an embodiment of a method for analyzing target identification in a video in the present application;
图4示出了本申请中一种图像集合包含的图像中所识别出的目标标识的分布示意图;FIG. 4 is a schematic diagram showing a distribution of target identifiers identified in an image included in an image set in the present application; FIG.
图5示出了本申请中一种视频中目标标识的分析装置实施例的结构示意图。FIG. 5 is a schematic structural diagram of an embodiment of an analysis apparatus for target identification in a video in the present application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a thorough understanding of the present disclosure, and to fully convey the scope of the present disclosure to those skilled in the art.
在本实施例中,设置了用于目标识别的模型,具体的,可以应用于以目标识别为基础的场景中,例如,图像分类和图像分割等。其中,该模型架构可以为Faster-RCNN架构。其中,在该模型架构中,采用ResNet模型作为底层特征提取模型,采用RPN网络作为候选区域生成网络。其中,ResNet模型包括5个部分,分别为部分1、部分2、部分3、部分4和部分5,每个部分包含有池化层和卷积层。In this embodiment, a model for target recognition is set, and specifically, it can be applied to a scene based on target recognition, for example, image classification and image segmentation. The model architecture can be a Faster-RCNN architecture. Among them, in this model architecture, the ResNet model is used as the underlying feature extraction model, and the RPN network is used as the candidate region generation network. Among them, the ResNet model includes five parts, which are part 1, part 2, part 3, part 4 and part 5, each of which includes a pooling layer and a convolution layer.
在本实施例中,对模型的处理流程进行了改进,具体的,以该模型用于图像识别为例,介绍本实施例模型处理流程的具体改进。首先,将待处理图像输入该模型,该模型中不同部分的卷积层输出待处理图像不同尺度(图像的不同尺度可以理解为是不同分辨率)的信息。例如,当待处理图像大小为M*M时,部分1的卷积层输出大小为M*M的第一特征图像集合,部分2的卷积层输出大小为M*M的第二特征图像集合,部分3的卷积层输出M/2*M/2大小的第三特征图像集合,部分4的卷积层输出M/4*M/4大小的第四特征图像集合,部分5的卷积层输出M/8*M/8大小的第五特征图像集合。需要说明的是,第一特征图像集合、第二特征图像集合、第三特征图像集合、第四特征图像集合和第五特征图像集合都是由多层图像组成,具体的图像层数与该特征图像集合所对应的卷积层中的卷积核的数量相同。In this embodiment, the processing flow of the model is improved. Specifically, taking the model for image recognition as an example, the specific improvement of the processing flow of the model in this embodiment is introduced. First, the image to be processed is input into the model, and the convolutional layers in different parts of the model output information of different scales of the image to be processed (different scales of the image can be understood as different resolutions). For example, when the size of the image to be processed is M * M, the convolutional layer of part 1 outputs a first feature image set of size M * M, and the convolutional layer of part 2 outputs a second feature image set of size M * M. , The convolutional layer of part 3 outputs a third feature image set of size M / 2 * M / 2, the convolutional layer of part 4 outputs a fourth feature image set of size M / 4 * M / 4, and the convolution of part 5 The layer outputs a fifth feature image set of size M / 8 * M / 8. It should be noted that the first feature image set, the second feature image set, the third feature image set, the fourth feature image set, and the fifth feature image set are all composed of multiple layers of images. The specific number of image layers and the feature The number of convolution kernels in the convolution layer corresponding to the image set is the same.
其次,在部分1、部分2、部分3、部分4和部分5分别输出不同尺度的特 征图像集合后,将该不同尺度的特征图像集合输入RPN网络,该RPN网络生成候选区域;再次,从5个特征图像集合中选取出至少两个特征图像集合,并分别从至少两个特征图像集合中提取该候选坐标区域所对应的区域集合,并将提取出的区域集合在长度和宽度上统一为预设大小;最后,将统一为预设大小的区域集合从层数上进行拼接,并将拼接后的区域集合进行全连接,以识别待处理图像中待识别的目标。需要说明的是,上述接收该模型的处理流程时,以图像识别场景为例进行了介绍,当然,本实施例中的模型还可以用于其他场景,本实施例不对模型的具体场景作限定。Secondly, after outputting feature image sets of different scales in sections 1, 2, 2, 3, 4, and 5, respectively, the feature image sets of different scales are input to an RPN network, which generates candidate regions; again, from 5 At least two feature image sets are selected from each of the feature image sets, and the region sets corresponding to the candidate coordinate regions are extracted from the at least two feature image sets, respectively, and the extracted region sets are uniformly pre-defined in length and width. Set the size; finally, the area set unified to the preset size is stitched from the number of layers, and the stitched area set is fully connected to identify the target to be identified in the image to be processed. It should be noted that the foregoing description of the process flow for receiving the model was described using an image recognition scenario as an example. Of course, the model in this embodiment can also be used in other scenarios, and this embodiment does not limit the specific scenario of the model.
上述提取区域集合,将提取出的区域集合统一为预设大小,并将统一为预设大小的区域集合从层数上进行拼接的具体例子可以为:假设第一特征图像集合为128*128*3,其中,3表示第一特征图像集合所包含的第一特征图像的帧数,128*128表示任意一个第一特征图像的大小为128*128;第二特征图像集合为64*64*6,其中,6表示第二特征图像集合所包含的第二特征图像的帧数,64*64表示任意一个第二特征图像的大小为64*64;第三特征图像集合为32*32*4,第四特征图像集合为16*16*2,第五特征图像集合为4*4*3,其中,第三图像特征集合、第四图像特征集合和第五图像特征集合的参数含义,与第一特征图像集合中的参数含义相同,这里不再赘述。The above-mentioned extraction region set, the extracted region set is unified to a preset size, and the region set unified to a preset size is spliced from the number of layers. Assuming that the first feature image set is 128 * 128 * 3, where 3 represents the number of frames of the first feature image included in the first feature image set, 128 * 128 represents the size of any one of the first feature images is 128 * 128; the second feature image set is 64 * 64 * 6 , Where 6 represents the number of frames of the second feature image included in the second feature image set, 64 * 64 represents the size of any one of the second feature images is 64 * 64; the third feature image set is 32 * 32 * 4, The fourth feature image set is 16 * 16 * 2, and the fifth feature image set is 4 * 4 * 3. The meaning of the parameters of the third image feature set, the fourth image feature set, and the fifth image feature set is the same as the first The meanings of the parameters in the feature image set are the same and will not be repeated here.
从第一特征图像集合、第二特征图像集合、第三特征图像集合、第四特征图像集合和第五特征图像集合中,选取至少两个特征图像集合,将所选取出的至少两个特征图像集合进行数据采样,使得所选取的至少两个特征图像集合所包含的图像从大小上进行统一,例如,将大小统一为7*7的图像。Selecting at least two feature image sets from the first feature image set, the second feature image set, the third feature image set, the fourth feature image set, and the fifth feature image set, and selecting the selected at least two feature images The set performs data sampling so that the images included in the selected at least two feature image sets are unified in size, for example, the sizes are unified into 7 * 7 images.
在将所选取的至少两个特征图像集合统一大小后,从层数上进行叠加。具体的,假设所选取的至少两个特征图像集合为第三特征图像集合和第五特征图像集合,并将这两个特征图像集合所包含图像的大小统一为7*7,此时,将大小统一为7*7的这两个特征图像集合从层数上进行叠加,叠加后的特征图像集合为7*7*10。After the selected at least two feature image sets are uniformly sized, they are superimposed from the number of layers. Specifically, it is assumed that the selected at least two feature image sets are a third feature image set and a fifth feature image set, and the sizes of the images included in the two feature image sets are unified to 7 * 7. At this time, the sizes are The two feature image sets unified as 7 * 7 are superimposed from the number of layers, and the superimposed feature image set is 7 * 7 * 10.
在本实施例中的模型,采用ResNet模型作为底层特征提取模型,采用RPN网络作为候选区域生成网络,并采用改进的处理流程对输入的待处理 图像进行处理,其中,本实施例的模型在RPN模型生成候选区域后,分别从至少两个特征图像集合中提取该候选区域对应的区域集合,得到至少两个区域集合。由于该至少两个区域集合来自不同的特征图像集合,并且不同特征图像集合体现的是待处理图像在不同尺度的信息,因此,本实施例中的模型将待处理图像中至少两个尺度的图像信息进行全连接,使得本实施例中的模型对待处理图像的不同尺度的信息进行识别。而以标准Faster-RCNN架构的模型对从待处理图像中提取出的候选区域对应的图像区域集合,进行全连接,使得采用标准Faster-RCNN架构的模型只对待处理图像这一尺度上的信息进行识别。The model in this embodiment uses the ResNet model as the underlying feature extraction model, uses the RPN network as the candidate region generation network, and uses an improved processing flow to process the input to-be-processed image. The model in this embodiment is based on the RPN After the model generates candidate regions, the region sets corresponding to the candidate regions are extracted from at least two feature image sets respectively to obtain at least two region sets. Since the at least two region sets are from different feature image sets, and the different feature image sets reflect information of the image to be processed at different scales, the model in this embodiment will include images of at least two scales in the image to be processed. The information is fully connected, so that the model in this embodiment recognizes information of different scales of the image to be processed. The model of the standard Faster-RCNN architecture is used to fully connect the image region set corresponding to the candidate region extracted from the image to be processed, so that the model using the standard Faster-RCNN architecture only performs information on the scale of the processed image. Identify.
在实际应用中,待处理图像中可能存在不同大小的目标标识,其中,不同大小的目标标识的特征可能体现在不同尺度的特征图像集合上。本实施例中的模型可以识别不同尺度的特征图像集合中的信息,因此,本实施例中的模型可以识别体现在不同尺度的特征图像集合上的信息。进而,相比于采用标准Faster-RCNN架构的模型,本实施例中的模型对识别不同大小的目标标识具有更高的识别准确率。In practical applications, target images of different sizes may exist in the image to be processed, and the characteristics of target identifiers of different sizes may be reflected on feature image sets of different scales. The model in this embodiment can identify information in feature image sets of different scales. Therefore, the model in this embodiment can identify information reflected on feature image sets of different scales. Furthermore, compared with a model using a standard Faster-RCNN architecture, the model in this embodiment has a higher recognition accuracy rate for identifying target identifiers of different sizes.
在本实施例中,对设置好架构的模型进行训练,具体对该模型进行训练的过程,可以参考图1,图1示出了本申请中一种模型训练方法实施例的流程图,该方法实施例可以包括:In this embodiment, the model with the architecture set is trained, and the specific process of training the model can be referred to FIG. 1. FIG. 1 shows a flowchart of an embodiment of a model training method in the present application. The method Embodiments may include:
步骤101:获取训练集。Step 101: Obtain a training set.
在本实施例中,还以该模型用于图像识别为例,介绍对该模型的训练过程。具体的图像识别场景为:识别图像中是否存在宝马品牌标识。在本步骤中,获取用于训练该模型的训练集,其中,训练集包括标注出宝马品牌标识的大量图像。In this embodiment, the model is used for image recognition as an example to introduce the training process of the model. The specific image recognition scene is: identifying whether the BMW brand logo exists in the image. In this step, a training set for training the model is obtained, where the training set includes a large number of images marked with the BMW brand logo.
具体的,用于组成训练集的大量图像可以通过以下方式获取:从百度、Google等搜索平台或其他素材网站上,采集含有宝马品牌标识的图像;还可以利用屏幕截图软件从直播节目等视频中,采集含有宝马品牌标识的图像。当然,在实际应用中,还可以采用其他方式获取含有宝马品牌标识的大量图像,本步骤只是提供了两种获取含有宝马品牌标识的图像的方式, 并不对具体的获取含有宝马品牌标识图像的方式作限定。Specifically, a large number of images used to compose the training set can be obtained by searching for images containing the BMW brand logo from search platforms such as Baidu, Google, or other material websites; and also using screen capture software from videos such as live shows To capture images containing the BMW brand logo. Of course, in practical applications, other methods can be used to obtain a large number of images containing the BMW brand logo. This step only provides two methods to obtain the images containing the BMW brand logo, but not specific methods to obtain the images containing the BMW brand logo. Make restrictions.
在获取到含有宝马品牌标识的大量图像后,接着,针对所获取的每帧图像,标注出图像中的宝马品牌标识,具体的,如图2所示,采用方框标注出图像中的每个宝马品牌标识。After obtaining a large number of images containing the BMW brand logo, the BMW brand logo in the image is labeled for each of the acquired images. Specifically, as shown in FIG. 2, each box in the image is labeled BMW brand identity.
步骤102:采用所获取的训练集对模型进行训练,得到第一模型。Step 102: Train the model using the acquired training set to obtain a first model.
在获取到训练集后,接着,在本步骤中,采用所获取的训练集中大量图像对该模型进行训练。具体的,分别将标注出宝马品牌标识的图像输入模型中,该模型采用改进后的流程识别并标注所输入的图像中的宝马品牌标识,并以训练集中标注出宝马品牌标识为基准,自动调整模型中的参数,经过多次调整模型中的参数,当达到某一标准时,得到第一模型。After the training set is acquired, then, in this step, the model is trained using a large number of images in the acquired training set. Specifically, an image labeled with the BMW brand logo is input into the model, and the model uses an improved process to identify and label the BMW brand logo in the input image. Based on the training set, the BMW brand logo is used as a benchmark and automatically adjusted. The parameters in the model are adjusted multiple times in the model. When a certain standard is reached, the first model is obtained.
步骤103:将预设数量帧待识别图像输入该第一模型。Step 103: Input a preset number of frames of images to be identified into the first model.
采用训练集对模型进行训练得到第一模型后,接着,在本步骤中,将预设数量帧待识别图像输入该第一模型中,针对输入的每帧待识别图像,该第一模型识别并标注出每帧待识别图像中所包含有的为宝马品牌标识。After training the model using the training set to obtain a first model, then, in this step, a preset number of frames of images to be identified are input into the first model, and for each of the input frames of images to be identified, the first model recognizes and Mark out that each frame contains the BMW brand logo.
步骤104:获取第一模型分别识别并标注出目标标识的预设数目帧图像。Step 104: Obtain a preset number of frame images that the first model separately recognizes and labels the target identifier.
在第一模型对所输入的每帧待识别图像中的目标标识进行识别与标注后,获取该第一模型所识别并标注出目标标识的预设数目帧图像。在实际应用中,第一模型识别并标注待识别图像中的目标标识时,会出现错误识别的情况,此时,所标注出的目标标识也是错误的。因此,在本步骤中,所获取的经第一模型识别并标注出目标标识的预设数目帧图像中,存在标注非目标标识的符号,为了描述方便,本实施例将标注非目标标识的符号统称为错误符号。After the first model recognizes and labels the target identifier in each input frame of the image to be identified, a preset number of frames of images identified by the first model and labeled with the target identifier are obtained. In practical applications, when the first model recognizes and labels the target identifier in the image to be recognized, misidentification occurs. At this time, the labeled target identifier is also wrong. Therefore, in this step, in the preset number of frames obtained by identifying and labeling the target identifier with the first model, there are symbols labeled with non-target identifiers. For convenience of description, this embodiment will label the symbols with non-target identifiers. Collectively referred to as error symbols.
步骤105:获取将人工对错误符号修正后的预设数量帧图像。Step 105: Obtain a preset number of frame images with artificially corrected error symbols.
人工对错误符号进行修正,即人工识别错误符号,并人工标注出的目标标识。在本步骤中,获取经人工对错误符号修正后的预设数量帧图像。Correct the error symbols manually, that is, manually identify the error symbols and manually mark the target identification. In this step, a preset number of frame images are manually corrected for the error symbol.
步骤106:将修正后的预设数量帧图像输入第一模型中,对该第一模型进行训练,得到训练后的模型。Step 106: input the corrected preset number of frame images into the first model, and train the first model to obtain a trained model.
在获取到经人工修正后的预设数量帧图像后,接着,在本步骤中,将 修正后的预设数量帧图像输入第一模型中,对该第一模型进行进一步的训练。具体的,在本步骤中对第一模型进行训练的过程,与步骤102对模型进行训练的思路是相同的,具体训练过程可以参考步骤102,这里不再赘述。为了描述方便,本实施例将对第一模型进行训练后所得到的模型,统称为训练后的模型。After obtaining a preset number of frame images that have been manually modified, then, in this step, the corrected preset number of frame images are input to a first model, and the first model is further trained. Specifically, the process of training the first model in this step is the same as the idea of training the model in step 102. For the specific training process, refer to step 102, which is not described again here. For convenience of description, in this embodiment, the models obtained after training the first model are collectively referred to as a trained model.
在本实施例中,对通过训练集对模型进行训练后得到第一模型,由于训练集的图像是从搜索平台中采集得到的,因此,采用该训练集中的图像对模型进行训练后,模型只对该训练集中的目标标识进行学习。在实际应用中,待识别的图像中可能存在与目标标识相似的相似标识,为了使模型可以更好的区分目标标识与相似标识的区别特征,在本实施例中,将预设数量帧待识别图像输入该第一模型,该第一模型所输出的用于标注目标标识的符号中存在错误符号,采用人工对错误符号修正后的预设数量帧图像,再次对第一模型进行训练,得到训练后的模型。此时,相比于第一模型,训练后的模型对待识别图像中的目标标识的识别准确率得到提高,因此,通过本实施例的训练方法,可以进一步提高模型对待识别图像中的目标标识的识别准确率。In this embodiment, the first model is obtained after training the model through the training set. Since the images of the training set are collected from the search platform, after training the model using the images in the training set, the model is only Learn the target identifiers in this training set. In practical applications, there may be similar identifiers similar to the target identifier in the image to be identified. In order to allow the model to better distinguish the distinguishing features of the target identifier and the similar identifier, in this embodiment, a preset number of frames are to be identified. The image is input to the first model, and there are error symbols in the symbols output by the first model for labeling the target identifier. A preset number of frame images corrected by the error symbols are manually used to train the first model again to obtain training. Post model. At this time, compared with the first model, the recognition accuracy of the target identifier in the image to be identified is improved after training. Therefore, the training method of this embodiment can further improve the accuracy of the target identifier in the image to be identified by the model. Identification accuracy.
在得到训练后的模型后,接着,在本实施例中,将该训练后的模型应用于分析视频中目标标识植入情况的场景中。具体的,参考图3,示出了本申请中一种分析视频中目标标识的方法实施例的流程图,该方法实施例可以包括:After the trained model is obtained, then, in this embodiment, the trained model is applied to a scenario for analyzing the implantation of target identifiers in a video. Specifically, referring to FIG. 3, a flowchart of an embodiment of a method for analyzing a target identifier in a video in the present application is shown. The method embodiment may include:
步骤301:获取待分析视频。Step 301: Obtain a video to be analyzed.
在本步骤中所获取的待分析视频可以为编码后的待分析视频。The video to be analyzed obtained in this step may be an encoded video to be analyzed.
步骤302:对所获取的待分析视频进行解码,得到解码后的待分析视频。Step 302: Decode the obtained video to be analyzed to obtain a decoded video to be analyzed.
步骤303:对于解码后的待分析视频,按照视频帧的先后顺序,以及以第一预设数量帧的图像作为一个图像集合的原则,将解码后的视频划分为多个图像集合。Step 303: For the decoded video to be analyzed, the decoded video is divided into multiple image sets according to the sequence of the video frames and the principle of using the first preset number of frames as an image set.
在本实施例中,视频中植入的目标标识一般在连续的两三秒钟持续播放,其中,目标标识表示预设的一种标识,例如,需要分析视频中宝马品 牌标识,此时,宝马品牌标识就是目标标识。在实际应用中,每秒所播放的图像大概为5帧左右,因此,解码后的待分析视频中植入目标标识的图像一般在连续的10~15帧中出现。因此,为了更准确地分析待分析视频中目标标识的植入情况,在本步骤中,针对解码后的待分析视频,按照待分析视频中视频帧的先后顺序,将第一预设数量帧图像作为一个图像集合,其中,预设数量可以为5~7帧中的任意一个数字。此时,将解码后的待分析视频划分为多个图像集合。In this embodiment, the target logo embedded in the video is generally played continuously for two to three seconds, where the target logo represents a preset type of logo. For example, the BMW brand logo in the video needs to be analyzed. At this time, BMW The brand identity is the target identity. In practical applications, the image played every second is about 5 frames. Therefore, the image embedded with the target identifier in the decoded video to be analyzed generally appears in consecutive 10 to 15 frames. Therefore, in order to more accurately analyze the implantation of the target identifier in the video to be analyzed, in this step, for the decoded video to be analyzed, according to the sequence of the video frames in the video to be analyzed, the first preset number of frame images As an image set, the preset number can be any number from 5 to 7 frames. At this time, the decoded video to be analyzed is divided into multiple image sets.
步骤304:分别将每个图像集合中的图像输入训练后的模型,使得训练后的模型识别每个图像集合所包含的图像中的目标标识。Step 304: Input the images in each image set into the trained models separately, so that the trained models recognize the target identifiers in the images contained in each image set.
将解码后的待分析视频划分为多个图像集合后,接着,在本步骤中,分别将每个图像集合中的图像输入训练后的模型中,该训练后的模型对每帧图像中的目标标识进行识别。在实际应用中,训练后的模型中在识别出待分析视频中的目标标识后,将所识别出的目标标识进行标注,例如,训练后的模型识别出一个宝马品牌标识,可以采用一个方框将所识别出的宝马品牌标识框住,并输出采用方框框住所识别出的宝马品牌标识的图像。After the decoded video to be analyzed is divided into multiple image sets, then, in this step, the images in each image set are respectively input into a trained model, and the trained model targets the objects in each frame of the image. Identification. In practical applications, after identifying the target logo in the video to be analyzed in the trained model, the identified target logo is labeled. For example, the trained model recognizes a BMW brand logo. A box can be used. Frame the identified BMW brand logo, and output an image that frames the identified BMW brand logo with a frame.
步骤305:获取训练后的模型所输出的与每个图像集合对应的标注出目标标识的图像集合。Step 305: Obtain an image set labeled with a target identifier corresponding to each image set and output by the trained model.
在训练后的模型输出标注有预设符号的图像后,得到与所划分的每个图像集合对应的标注有预设符号的图像集合,得到多个识别后的图像集合。After the trained model outputs images marked with preset symbols, an image set marked with preset symbols corresponding to each of the divided image sets is obtained, and multiple recognized image sets are obtained.
步骤306:检测每个图像集合中所标注出的目标标识是否满足预设条件。Step 306: Detect whether the target identifier marked in each image set meets a preset condition.
在得到多个标注出目标标识的图像集合后,接着,在本步骤中,分别检测每个图像集合中所标注的目标标识是否存在满足预设条件。在本步骤中,以任意一个图像集合为例,介绍该任意一个图像集合中所标注出的目标标识是否满足预设条件。After obtaining a plurality of image sets labeled with target identifiers, then, in this step, it is respectively detected whether the target identifiers labeled in each image set exist and satisfy a preset condition. In this step, taking any image set as an example, it is introduced whether the target identifier marked in the any image set satisfies a preset condition.
其中,预设条件可以包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠。其中,目标标识的占位是指该目标标识在一个基准坐标系中所占的空间区域。Wherein, the preset condition may include that the placeholders of the target identifiers distributed in at least two adjacent images at least partially overlap. The placeholder of the target identifier refers to a space area occupied by the target identifier in a reference coordinate system.
以下以一个具体的场景为例,介绍图像集合中的所识别出的目标标识 是否满足预设条件。具体的场景为:该图像集合中包括5帧图像,分别为第一帧,第二帧,第三帧,第四帧和第五帧,目标标识为宝马品牌标识;对于第一帧图像、第二帧图像、第三帧图像、第四帧图像和第五帧图像中所识别出的目标标识的位置分布如图4所示。具体的,在第一帧图像中所识别出的宝马品牌标识为两个,一个分布在图像的左上角位置,另一个分布在图像的右下角位置;第二帧图像中识别出两个宝马品牌标识,一个分布在图像的右上角位置,另一个分布在图像的右下角位置;第三帧图像中识别出一个宝马品牌标识,分布在图像的右下角位置;第四帧图像中没有识别出宝马品牌标识;第五帧图像中识别出一个宝马品牌标识,分布在图像的右下角位置;其中,第一帧图像、第二帧图像和第三帧图像中右下角位置的目标标识的占位重叠。The following uses a specific scenario as an example to introduce whether the identified target identifiers in the image collection meet a preset condition. The specific scene is: the image set includes 5 frames of images, namely the first frame, the second frame, the third frame, the fourth frame, and the fifth frame, and the target logo is the BMW brand logo; The position distribution of the identified target identifiers in the second frame image, the third frame image, the fourth frame image, and the fifth frame image is shown in FIG. 4. Specifically, the two BMW brand logos identified in the first frame of the image are two, one distributed at the upper left corner of the image and the other at the lower right corner of the image; two BMW brands are identified in the second frame of the image. Logos, one distributed in the upper right corner of the image and the other in the lower right corner of the image; a BMW brand logo was identified in the third frame of the image and distributed in the lower right corner of the image; BMW was not identified in the fourth frame Brand logo; a BMW brand logo was identified in the fifth frame image and distributed in the lower right corner of the image; wherein the placeholders of the target logo in the lower right corner of the first frame image, the second frame image, and the third frame image overlap .
该预设条件为“分布在相邻至少两帧图像中的目标标识的占位至少部分重叠”,因此在这个场景中,分布在相邻的至少两帧图像中的目标标识,具体为:分布在第一帧图像、第二帧图像和第三帧图像中的5个宝马品牌标识;接着,判断分布在相邻至少两帧图像中的目标标识的占位是否至少部分重叠,在第一帧图像、第二帧图像和第三帧图像这三帧图像中右下角的3个宝马品牌标识是重叠的。因此,该图像集合中所识别出的宝马品牌标识满足预设条件。The preset condition is "the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap", so in this scene, the target identifiers distributed in at least two frames of adjacent images are specifically: 5 BMW brand logos in the first frame image, the second frame image, and the third frame image; then, determine whether the placeholders of the target logos distributed in at least two adjacent frames at least partially overlap, and in the first frame The three BMW brand logos in the bottom right corner of the three frames of the image, the second frame image and the third frame image are overlapping. Therefore, the BMW brand identity identified in the image collection meets a preset condition.
在检测图像集合中已识别出的目标标识是否满足预设条件,得到两种检测结果,一种为:图像集合中已识别出的目标标识满足预设条件,另一种为:图像集合中已识别出的目标标识不满足预设条件。In detecting whether the identified target identifiers in the image collection meet the preset conditions, two detection results are obtained, one is: the identified target identifiers in the image collection meet the preset conditions, and the other is: The identified target identifier does not meet the preset conditions.
为了使得检测结果的准确性更高,在本实施例中,预设条件还可以包括:占位至少部分重叠的目标标识间的重叠比例大于预设百分比;占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。其中,预设百分比的取值范围可以不小于50%,预设总数量的取值范围可以不小于5。In order to make the detection result more accurate, in this embodiment, the preset condition may further include: an overlap ratio between target identifiers with at least partially overlapping placeholders is greater than a preset percentage; the target identifiers with at least partially overlapping placeholders In the method, the total number of target identifiers whose sharpness is greater than a preset sharpness threshold is greater than a preset total number. The value range of the preset percentage may be not less than 50%, and the value range of the preset total number may not be less than 5.
需要说明的是,本实施例只是给出了预设百分比和预设总数量较优的取值范围,当然,在实际应用中,预设百分比和预设总数量还可以根据实际情况确定具体取值,本实施例不对预设百分比和预设总数量的具体取值 作限定。It should be noted that this embodiment only provides a preferred value range of the preset percentage and the preset total number. Of course, in actual applications, the preset percentage and the preset total number can also be determined based on actual conditions. This embodiment does not limit the specific values of the preset percentage and the preset total number.
步骤307:依据检测结果确定该目标标识在待分析视频中的曝光数据。Step 307: Determine the exposure data of the target identifier in the video to be analyzed according to the detection result.
在得到检测结果后,接着,在本步骤中,依据该检测结果,确定该目标标识在该待分析视频中的曝光数据。其中,曝光数据包括:是否曝光、曝光位置,以及曝光时长等。具体的,在本步骤中,若检测结果为:图像集合中已识别出的目标标识满足预设条件,则表明该目标标识在该图像集合中曝光;并将相邻的至少两帧图像中占位至少部分重叠的目标标识所占有的空间位置,确定为该目标标识的曝光位置;并依据该曝光位置,统计待分析视频中在该曝光位置存在目标标识的连续图像的帧数,依据该帧数确定该目标标识的播放时长。After the detection result is obtained, then, in this step, according to the detection result, the exposure data of the target identifier in the video to be analyzed is determined. The exposure data includes: exposure, exposure position, and exposure duration. Specifically, in this step, if the detection result is: the identified target identifier in the image set meets a preset condition, it indicates that the target identifier is exposed in the image set; The space position occupied by at least partially overlapping target marks is determined as the exposure position of the target mark; and based on the exposure position, the number of frames of continuous images in which the target mark exists at the exposure position in the video to be analyzed is calculated according to the frame The number determines how long the target ID plays.
需要说明的是,在实际应用中,目标标识的曝光位置可能有多个,此时,分别确定每个曝光位置的目标标识的播放时长,并将所有曝光位置所对应的播放时长的和,作为该目标标识的总播放时长。It should be noted that in actual applications, there may be multiple exposure positions of the target mark. At this time, the playback time of the target mark for each exposure position is determined separately, and the sum of the playback time corresponding to all the exposure positions is used as The total playing time of the target identifier.
当检测结果为:图像集合中已识别出的目标标识不满足预设条件,则表明该目标标识在该图像集合中没有曝光,若该目标标识在每个图像集合中都没有曝光,则表明该目标标识在待分析视频中没有曝光。此时,就没有曝光位置以及曝光时长。When the detection result is: the identified target ID in the image set does not meet the preset conditions, it indicates that the target ID is not exposed in the image set, and if the target ID is not exposed in each image set, it indicates that the target ID The target identifier is not exposed in the video to be analyzed. At this time, there is no exposure position and exposure time.
在本实施例中,识别待分析视频中的目标标识;检测所识别出的目标标识是否满足预设条件,满足该预设条件的目标标识所具有的特点,与目标标识在视频中曝光时所具有的特定相符;本实施例中,通过检测所识别出的目标标识中是否满足预设条件所得到检测结果,该检测结果包括:识别出的目标标识满足预设条件与识别出的目标标识不满足预设条件;因此,在本实施例中,可以根据检测结果确定该目标标识是否曝光;当该目标标识曝光时,即识别出的目标标识满足预设条件,此时,该预设条件中限定了分布在相邻的至少两帧图像中的目标标识的占位是至少重叠的,此时,占位至少重叠的目标标识的占位体现了曝光的目标标识的位置;进一步的,依据曝光的目标标识的位置,可以确定待识别视频中该位置处的曝光的目标标识的播放时长;因此,在本申请实施例中,可以依据检测结果确定待分析视频中目标标识的曝光数据。In this embodiment, the target identifier in the video to be analyzed is identified; whether the identified target identifier satisfies a preset condition, the characteristics of the target identifier satisfying the preset condition, and the characteristics of the target identifier when the target identifier is exposed in the video are detected. Has a specific match; in this embodiment, a detection result obtained by detecting whether the identified target identifier satisfies a preset condition, and the detection result includes: the identified target identifier meets the preset condition and the identified target identifier does not match. The preset conditions are met; therefore, in this embodiment, whether the target logo is exposed can be determined according to the detection result; when the target logo is exposed, the identified target logo meets the preset conditions. At this time, in the preset conditions, It is defined that the placeholders of the target identifiers distributed in at least two frames of adjacent images are at least overlapping. At this time, the placeholders of the at least overlapping target identifiers reflect the positions of the exposed target identifiers; further, based on the exposure The position of the target identifier can be used to determine the playing time of the exposed target identifier at that position in the video to be identified; therefore, in Application embodiments, the exposure may be determined to be analyzed in the video data based on a detection result of the target identifier.
参考图5,示出了本申请中一种视频中目标标识的分析装置实施例的结构示意图,该装置实施例可以包括:Referring to FIG. 5, a schematic structural diagram of an embodiment of an apparatus for analyzing target identification in a video in the present application is shown. The apparatus embodiment may include:
获取单元501,用于获取待分析视频;An obtaining unit 501, configured to obtain a video to be analyzed;
第一识别单元502,用于识别所述待分析视频中的目标标识;A first identification unit 502, configured to identify a target identifier in the video to be analyzed;
检测单元503,用于检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠,得到检测结果;A detection unit 503 is configured to detect whether the identified target identifier meets a preset condition, where the preset condition includes: at least part of the placeholders of the target identifiers distributed in at least two frames of adjacent images overlap to obtain a detection result;
确定单元504,用于依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据。A determining unit 504 is configured to determine, according to the detection result, exposure data that satisfies the target identifier in the video to be analyzed.
其中,第一识别单元502,可以包括:The first identification unit 502 may include:
第一输入子单元,用于将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;A first input subunit, configured to input each frame image in the video to be analyzed into a preset model after training, so that the trained preset model identifies a target identifier in each frame of the video to be analyzed ;
其中,针对所述待分析视频中的任意一帧图像,所述预设模型包括:For any one frame image in the video to be analyzed, the preset model includes:
第一提取单元,用于提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;A first extraction unit, configured to extract multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
生成单元,用于基于所述多尺度的特征图像集合生成候选区域;A generating unit, configured to generate a candidate region based on the multi-scale feature image set;
选取单元,用于从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;A selection unit, configured to select a feature image set of at least two scales from the multi-scale feature image set;
第二提取单元,用于分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合对应的至少两个尺度的区域集合;A second extraction unit, configured to respectively extract a region set corresponding to the candidate region from the feature image set of the at least two scales, to obtain a region of at least two scales corresponding to the feature image set of the at least two scales set;
第二识别单元,通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。The second recognition unit recognizes the target identifier in the arbitrary one-frame image by fully connecting the region sets of at least two scales.
其中,所述预设模型为:以Faster-RCNN为架构,所述架构包括底层特征提取模型和候选区域生成网络;The preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
其中,所述第一提取单元,具体用于通过所述底层特征提取模块提取 所述任意一帧图像的多尺度特征,得到所述多尺度特征图像集合;Wherein, the first extraction unit is specifically configured to extract the multi-scale features of the arbitrary frame of images through the underlying feature extraction module to obtain the multi-scale feature image set;
所述生成单元,具体用于将所述多尺度的特征图像集合输入所述候选区域生成网络,通过所述候选区域生成网络生成所述侯选区域。The generating unit is specifically configured to input the multi-scale feature image set into the candidate region generating network, and generate the candidate region through the candidate region generating network.
其中,该装置还可以包括:训练单元;The device may further include: a training unit;
该训练单元,用于对所述预设模型进行训练,得到所述训练后的预设模型;The training unit is configured to train the preset model to obtain the trained preset model;
其中,所述训练单元,包括:The training unit includes:
第一获取子单元,用于获取训练集;所述训练集包括:已标注出所述目标标识的多帧图像;A first acquisition subunit, configured to acquire a training set, where the training set includes: multiple frames of images to which the target identifier has been labeled;
第一训练子单元,用于采用所述多帧图像对所述预设模型进行训练,得到第一预设模型;A first training subunit, configured to train the preset model by using the multi-frame image to obtain a first preset model;
第二输入子单元,用于将所述待分析视频中的图像输入所述第一预设模型;A second input subunit, configured to input an image in the video to be analyzed into the first preset model;
第二获取子单元,用于获取所述待分析视频中经所述第一预设模型标注出所述目标标识的图像;所述标注出所述目标标识的图像中存在错误标注;A second acquisition subunit, configured to acquire an image labeled with the target identifier through the first preset model in the video to be analyzed; the image labeled with the target identifier has an incorrect label;
第三获取子单元,用于获取修正图像;所述修正图像为:经人工对所述错误标注进行修正后的图像;A third acquisition subunit, configured to acquire a corrected image; the corrected image is: an image that has been manually corrected for the incorrect annotation;
第二训练子单元,用于采用所述修正图像对所述第一预设模型进行训练,得到所述训练后的预设模型。A second training subunit is configured to use the modified image to train the first preset model to obtain the trained preset model.
其中,检测单元503还用于:检测占位至少部分重叠的所述目标标识间的重叠比例大于预设百分比;所述占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。The detection unit 503 is further configured to detect that an overlap ratio between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; and in the target identifiers at least partially overlapping the placeholders, the definition is greater than a preset definition The total number of target identifiers of the threshold is greater than the preset total number.
其中,确定单元504,可以包括:The determining unit 504 may include:
第一确定子单元,用于在所述检测结果为所述目标标识满足所述预设条件的情况下,确定所述目标标识在所述待分析视频中曝光,并进一步确定曝光参数,其中,所述曝光参数包括以下至少之一:曝光时长、曝光位置;A first determining subunit, configured to determine that the target identifier is exposed in the video to be analyzed, and further determine an exposure parameter if the detection result is that the target identifier meets the preset condition, and The exposure parameter includes at least one of the following: exposure duration and exposure position;
第二确定子单元,用于在所述检测结果为所述目标标识不满足所述预设条件的情况下,确定所述目标标识未在所述待分析视频中曝光。A second determining subunit is configured to determine that the target identifier is not exposed in the video to be analyzed if the detection result is that the target identifier does not meet the preset condition.
所述视频中目标标识的分析装置包括处理器和存储器,上述获取单元、第一识别单元、检测单元、确定单元和训练单元等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The analysis device for the target identification in the video includes a processor and a memory. The acquisition unit, the first identification unit, the detection unit, the determination unit, and the training unit are all stored in the memory as program units, and are executed by the processor and stored in the memory. The above program units are used to implement the corresponding functions.
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来分析目标标识在视频中的曝光数据。The processor contains a kernel, and the kernel retrieves the corresponding program unit from the memory. The kernel can set one or more, and adjust the kernel parameters to analyze the exposure data of the target logo in the video.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。Memory may include non-permanent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (RAM). Memory includes at least one Memory chip.
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述视频分析方法。An embodiment of the present invention provides a storage medium on which a program is stored, and the video analysis method is implemented when the program is executed by a processor.
本发明实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行所述视频分析方法。An embodiment of the present invention provides a processor, where the processor is configured to run a program, and the video analysis method is executed when the program runs.
本发明实施例提供了一种设备,设备包括处理器、存储器及存储在存储器上并可在处理器上运行的程序,处理器执行程序时实现以下步骤:An embodiment of the present invention provides a device. The device includes a processor, a memory, and a program stored on the memory and executable on the processor. When the processor executes the program, the following steps are implemented:
获取待分析视频;Get the video to be analyzed;
识别所述待分析视频中的目标标识;Identifying a target identifier in the video to be analyzed;
具体的,将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;Specifically, inputting each frame image in the video to be analyzed into a trained preset model, so that the trained preset model recognizes a target identifier in each frame of the video to be analyzed;
其中,针对所述待分析视频中的任意一帧图像,所述预设模型按照以下步骤识别所述任意一帧图像中的所述目标标识:For any one frame image in the video to be analyzed, the preset model identifies the target identifier in the any one frame image according to the following steps:
提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;Extracting the multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
基于所述多尺度的特征图像集合生成候选区域;Generating candidate regions based on the multi-scale feature image set;
从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;Selecting a feature image set of at least two scales from the multi-scale feature image set;
分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合对应的至少两个尺度的区域集合;Respectively extracting a region set corresponding to the candidate region from the feature image set of the at least two scales to obtain a region set of at least two scales corresponding to the feature image set of the at least two scales;
通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。By fully connecting the region sets of at least two scales, the target identifier in the arbitrary one-frame image is identified.
其中,所述预设模型为:以Faster-RCNN为架构,所述架构包括底层特征提取模型和候选区域生成网络;The preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
其中,所述提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合,包括:Wherein, extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
通过所述底层特征提取模块提取所述任意一帧图像的多尺度特征,得到所述多尺度特征图像集合;Extracting the multi-scale features of the arbitrary frame image through the underlying feature extraction module to obtain the multi-scale feature image set;
所述基于所述多尺度的特征图像集合生成候选区域,包括:The generating candidate regions based on the multi-scale feature image set includes:
将所述多尺度的特征图像集合输入所述候选区域生成网络,通过所述候选区域生成网络生成所述侯选区域。The multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
其中,通过以下方式对所述预设模型进行训练,得到所述训练后的预设模型:The preset model is trained in the following manner to obtain the trained preset model:
获取训练集;所述训练集包括:已标注出所述目标标识的多帧图像;Acquiring a training set; the training set includes: a plurality of frames of images to which the target identifier is marked;
采用所述多帧图像对所述预设模型进行训练,得到第一预设模型;Using the multi-frame image to train the preset model to obtain a first preset model;
将所述待分析视频中的图像输入所述第一预设模型;Inputting an image in the video to be analyzed into the first preset model;
获取所述待分析视频中经所述第一预设模型标注出所述目标标识的图像;所述标注出所述目标标识的图像中存在错误标注;Acquiring an image labeled with the target identifier in the video to be analyzed through the first preset model; the image labeled with the target identifier has an incorrect label;
获取修正图像;所述修正图像为:经人工对所述错误标注进行修正后的图像;Obtaining a corrected image; the corrected image is: an image that has been manually corrected for the incorrect annotation;
采用所述修正图像对所述第一预设模型进行训练,得到所述训练后的预设模型。Training the first preset model by using the modified image to obtain the trained preset model.
检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠,得到检测结果;Detecting whether the identified target identifier satisfies a preset condition, the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
其中,预设条件还可以包括:The preset conditions may further include:
占位至少部分重叠的所述目标标识间的重叠比例大于预设百分比;所述占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。The ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据。According to the detection result, exposure data that satisfies the target identifier in the video to be analyzed is determined.
具体的,在所述检测结果为所述目标标识满足所述预设条件的情况下,确定所述目标标识在所述待分析视频中曝光,并进一步确定曝光参数,其中,所述曝光参数包括以下至少之一:曝光时长、曝光位置;Specifically, in a case where the detection result is that the target identifier meets the preset condition, determining that the target identifier is exposed in the video to be analyzed, and further determining an exposure parameter, where the exposure parameter includes At least one of the following: exposure duration, exposure position;
在所述检测结果为所述目标标识不满足所述预设条件的情况下,确定所述目标标识未在所述待分析视频中曝光。When the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
本文中的设备可以是服务器、PC、PAD、手机等。The equipment in this article can be server, PC, PAD, mobile phone, etc.
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:This application also provides a computer program product, which when executed on a data processing device, is suitable for executing a program having the following method steps for initialization:
获取待分析视频;Get the video to be analyzed;
识别所述待分析视频中的目标标识;Identifying a target identifier in the video to be analyzed;
具体的,将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;Specifically, inputting each frame image in the video to be analyzed into a trained preset model, so that the trained preset model recognizes a target identifier in each frame of the video to be analyzed;
其中,针对所述待分析视频中的任意一帧图像,所述预设模型按照以下步骤识别所述任意一帧图像中的所述目标标识:For any one frame image in the video to be analyzed, the preset model identifies the target identifier in the any one frame image according to the following steps:
提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;Extracting the multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
基于所述多尺度的特征图像集合生成候选区域;Generating candidate regions based on the multi-scale feature image set;
从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;Selecting a feature image set of at least two scales from the multi-scale feature image set;
分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合对应的至少两个尺度的区域集合;Respectively extracting a region set corresponding to the candidate region from the feature image set of the at least two scales to obtain a region set of at least two scales corresponding to the feature image set of the at least two scales;
通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。By fully connecting the region sets of at least two scales, the target identifier in the arbitrary one-frame image is identified.
其中,所述预设模型为:以Faster-RCNN为架构,所述架构包括底层特征提取模型和候选区域生成网络;The preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
其中,所述提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合,包括:Wherein, extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
通过所述底层特征提取模块提取所述任意一帧图像的多尺度特征,得到所述多尺度特征图像集合;Extracting the multi-scale features of the arbitrary frame image through the underlying feature extraction module to obtain the multi-scale feature image set;
所述基于所述多尺度的特征图像集合生成候选区域,包括:The generating candidate regions based on the multi-scale feature image set includes:
将所述多尺度的特征图像集合输入所述候选区域生成网络,通过所述候选区域生成网络生成所述侯选区域。The multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
其中,通过以下方式对所述预设模型进行训练,得到所述训练后的预设模型:The preset model is trained in the following manner to obtain the trained preset model:
获取训练集;所述训练集包括:已标注出所述目标标识的多帧图像;Acquiring a training set; the training set includes: a plurality of frames of images to which the target identifier is marked;
采用所述多帧图像对所述预设模型进行训练,得到第一预设模型;Using the multi-frame image to train the preset model to obtain a first preset model;
将所述待分析视频中的图像输入所述第一预设模型;Inputting an image in the video to be analyzed into the first preset model;
获取所述待分析视频中经所述第一预设模型标注出所述目标标识的图像;所述标注出所述目标标识的图像中存在错误标注;Acquiring an image labeled with the target identifier in the video to be analyzed through the first preset model; the image labeled with the target identifier has an incorrect label;
获取修正图像;所述修正图像为:经人工对所述错误标注进行修正后的图像;Obtaining a corrected image; the corrected image is: an image that has been manually corrected for the incorrect annotation;
采用所述修正图像对所述第一预设模型进行训练,得到所述训练后的预设模型。Training the first preset model by using the modified image to obtain the trained preset model.
检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠,得到检测结 果;Detecting whether the identified target identifier satisfies a preset condition, the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
其中,所述预设条件还包括:The preset conditions further include:
占位至少部分重叠的所述目标标识间的重叠比例大于预设百分比;所述占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。The ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据。According to the detection result, exposure data that satisfies the target identifier in the video to be analyzed is determined.
具体的,在所述检测结果为所述目标标识满足所述预设条件的情况下,确定所述目标标识在所述待分析视频中曝光,并进一步确定曝光参数,其中,所述曝光参数包括以下至少之一:曝光时长、曝光位置;Specifically, in a case where the detection result is that the target identifier meets the preset condition, determining that the target identifier is exposed in the video to be analyzed, and further determining an exposure parameter, where the exposure parameter includes At least one of the following: exposure duration, exposure position;
在所述检测结果为所述目标标识不满足所述预设条件的情况下,确定所述目标标识未在所述待分析视频中曝光。When the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present application. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing device to produce a machine, so that instructions generated by the processor of the computer or other programmable data processing device may be used to Means for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存 储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions The device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。The memory may include non-permanent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes permanent and non-persistent, removable and non-removable media. Information storage can be accomplished by any method or technology. Information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、 方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "including", "comprising" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, product, or device that includes a series of elements includes not only those elements, but also Other elements not explicitly listed, or those that are inherent to such a process, method, product, or device. Without more restrictions, the elements defined by the sentence "including a ..." do not exclude the existence of other identical elements in the process, method, product or equipment including the elements.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are only examples of the present application and are not intended to limit the present application. For those skilled in the art, this application may have various modifications and changes. Any modification, equivalent replacement, and improvement made within the spirit and principle of this application shall be included in the scope of claims of this application.

Claims (10)

  1. 一种视频分析方法,其特征在于,包括:A video analysis method, comprising:
    获取待分析视频;Get the video to be analyzed;
    识别所述待分析视频中的目标标识;Identifying a target identifier in the video to be analyzed;
    检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠,得到检测结果;Detecting whether the identified target identifier satisfies a preset condition, the preset condition includes: the placeholders of the target identifiers distributed in at least two frames of adjacent images at least partially overlap to obtain a detection result;
    依据所述检测结果,确定满足所述目标标识在所述待分析视频中的曝光数据。According to the detection result, exposure data satisfying the target identifier in the video to be analyzed is determined.
  2. 根据权利要求1所述的方法,其特征在于,所述分别识别所述待分析视频中的目标标识,包括:The method according to claim 1, wherein the separately identifying target identifiers in the videos to be analyzed comprises:
    将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;Inputting each frame image in the video to be analyzed into a trained preset model, so that the trained preset model recognizes a target identifier in each frame of the video to be analyzed;
    其中,针对所述待分析视频中的任意一帧图像,所述预设模型按照以下步骤识别所述任意一帧图像中的所述目标标识:For any one frame image in the video to be analyzed, the preset model identifies the target identifier in the any one frame image according to the following steps:
    提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;Extracting the multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
    基于所述多尺度的特征图像集合生成候选区域;Generating candidate regions based on the multi-scale feature image set;
    从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;Selecting a feature image set of at least two scales from the multi-scale feature image set;
    分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合对应的至少两个尺度的区域集合;Respectively extracting a region set corresponding to the candidate region from the feature image set of the at least two scales to obtain a region set of at least two scales corresponding to the feature image set of the at least two scales;
    通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。By fully connecting the region sets of at least two scales, the target identifier in the arbitrary one-frame image is identified.
  3. 根据权利要求2所述的方法,其特征在于,所述预设模型为:以Faster-RCNN为架构,所述架构包括底层特征提取模型和候选区域生成网 络;The method according to claim 2, wherein the preset model is based on a Faster-RCNN architecture, and the architecture includes a low-level feature extraction model and a candidate region generation network;
    其中,所述提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合,包括:Wherein, extracting the multi-scale features of the arbitrary frame of images to obtain a multi-scale feature image set includes:
    通过所述底层特征提取模块提取所述任意一帧图像的多尺度特征,得到所述多尺度特征图像集合;Extracting the multi-scale features of the arbitrary frame image through the underlying feature extraction module to obtain the multi-scale feature image set;
    所述基于所述多尺度的特征图像集合生成候选区域,包括:The generating candidate regions based on the multi-scale feature image set includes:
    将所述多尺度的特征图像集合输入所述候选区域生成网络,通过所述候选区域生成网络生成所述侯选区域。The multi-scale feature image set is input to the candidate region generation network, and the candidate region is generated by the candidate region generation network.
  4. 根据权利要求2所述的方法,其特征在于,通过以下方式对所述预设模型进行训练,得到所述训练后的预设模型:The method according to claim 2, wherein the preset model is trained in the following manner to obtain the trained preset model:
    获取训练集;所述训练集包括:已标注出所述目标标识的多帧图像;Acquiring a training set; the training set includes: a plurality of frames of images to which the target identifier is marked;
    采用所述多帧图像对所述预设模型进行训练,得到第一预设模型;Using the multi-frame image to train the preset model to obtain a first preset model;
    将所述待分析视频中的图像输入所述第一预设模型;Inputting an image in the video to be analyzed into the first preset model;
    获取所述待分析视频中经所述第一预设模型标注出所述目标标识的图像;所述标注出所述目标标识的图像中存在错误标注;Acquiring an image labeled with the target identifier in the video to be analyzed through the first preset model; the image labeled with the target identifier has an incorrect label;
    获取修正图像;所述修正图像为:经人工对所述错误标注进行修正后的图像;Obtaining a corrected image; the corrected image is: an image that has been manually corrected for the incorrect annotation;
    采用所述修正图像对所述第一预设模型进行训练,得到所述训练后的预设模型。Training the first preset model by using the modified image to obtain the trained preset model.
  5. 根据权利要求1所述的方法,其特征在于,所述预设条件还包括:The method according to claim 1, wherein the preset condition further comprises:
    占位至少部分重叠的所述目标标识间的重叠比例大于预设百分比;所述占位至少部分重叠的所述目标标识中,清晰度大于预设清晰度阈值的目标标识的总数量大于预设总数量。The ratio of overlap between the target identifiers at least partially overlapping the placeholders is greater than a preset percentage; among the target identifiers at least partially overlapping the placeholders, the total number of target identifiers having a sharpness greater than a preset sharpness threshold is greater than a preset The total number.
  6. 根据权利要求1所述的方法,其特征在于,依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据包括:The method according to claim 1, wherein, according to the detection result, determining that exposure data satisfying the target identifier in the video to be analyzed comprises:
    在所述检测结果为所述目标标识满足所述预设条件的情况下,确定所述目标标识在所述待分析视频中曝光,并进一步确定曝光参数,其中,所述曝光参数包括以下至少之一:曝光时长、曝光位置;In a case where the detection result is that the target identifier meets the preset condition, determining that the target identifier is exposed in the video to be analyzed, and further determining an exposure parameter, wherein the exposure parameter includes at least one of the following 1: exposure time, exposure position;
    在所述检测结果为所述目标标识不满足所述预设条件的情况下,确定所述目标标识未在所述待分析视频中曝光。When the detection result is that the target identifier does not satisfy the preset condition, it is determined that the target identifier is not exposed in the video to be analyzed.
  7. 一种视频分析装置,其特征在于,包括:A video analysis device, comprising:
    获取单元,用于获取待分析视频;An acquisition unit for acquiring a video to be analyzed;
    第一识别单元,用于识别所述待分析视频中的目标标识;A first identification unit, configured to identify a target identifier in the video to be analyzed;
    检测单元,用于检测已识别出的目标标识是否满足预设条件,所述预设条件包括:分布在相邻的至少两帧图像中的目标标识的占位至少部分重叠,得到检测结果;A detection unit, configured to detect whether the identified target identifier meets a preset condition, the preset condition includes: at least part of the footprints of the target identifiers distributed in at least two adjacent frames of the image overlap to obtain a detection result;
    确定单元,用于依据所述检测结果,确定满足所述目标标识在所述待分析视频中曝光数据。A determining unit is configured to determine, according to the detection result, exposure data that satisfies the target identifier in the video to be analyzed.
  8. 根据权利要求7所述的装置,其特征在于,所述第一识别单元,包括:The device according to claim 7, wherein the first identification unit comprises:
    第一输入子单元,用于将所述待分析视频中的每帧图像输入训练后的预设模型,使得所述训练后的预设模型识别所述待分析视频中每帧图像中的目标标识;A first input subunit, configured to input each frame image in the video to be analyzed into a preset model after training, so that the trained preset model identifies a target identifier in each frame of the video to be analyzed ;
    其中,针对所述待分析视频中的任意一帧图像,所述预设模型包括:For any one frame image in the video to be analyzed, the preset model includes:
    第一提取单元,用于提取所述任意一帧图像的多尺度特征,得到多尺度特征图像集合;A first extraction unit, configured to extract multi-scale features of the arbitrary one-frame image to obtain a multi-scale feature image set;
    生成单元,用于基于所述多尺度的特征图像集合生成候选区域;A generating unit, configured to generate a candidate region based on the multi-scale feature image set;
    选取单元,用于从所述多尺度特征图像集合中选取至少两个尺度的特征图像集合;A selection unit, configured to select a feature image set of at least two scales from the multi-scale feature image set;
    第二提取单元,用于分别从所述至少两个尺度的特征图像集合中提取所述候选区域对应的区域集合,得到与所述至少两个尺度的特征图像集合 对应的至少两个尺度的区域集合;A second extraction unit, configured to respectively extract a region set corresponding to the candidate region from the feature image set of the at least two scales, to obtain a region of at least two scales corresponding to the feature image set of the at least two scales set;
    第二识别单元,通过对所述至少两个尺度的区域集合进行全连接,识别出所述任意一帧图像中的所述目标标识。The second recognition unit recognizes the target identifier in the arbitrary one-frame image by fully connecting the region sets of at least two scales.
  9. 一种存储介质,其特征在于,所述存储介质上存储有程序,所述程序被处理器执行时实现权利要求1至6中任一权利要求所述的视频分析方法。A storage medium, characterized in that a program is stored on the storage medium, and when the program is executed by a processor, the video analysis method according to any one of claims 1 to 6 is implemented.
  10. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至6中任一权利要求所述的视频分析方法。A processor, wherein the processor is used to run a program, and when the program runs, the video analysis method according to any one of claims 1 to 6 is executed.
PCT/CN2019/073661 2018-05-23 2019-01-29 Video analysis method and apparatus WO2019223361A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810502120.XA CN110532833A (en) 2018-05-23 2018-05-23 A kind of video analysis method and device
CN201810502120.X 2018-05-23

Publications (1)

Publication Number Publication Date
WO2019223361A1 true WO2019223361A1 (en) 2019-11-28

Family

ID=68616536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073661 WO2019223361A1 (en) 2018-05-23 2019-01-29 Video analysis method and apparatus

Country Status (2)

Country Link
CN (1) CN110532833A (en)
WO (1) WO2019223361A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027510A (en) * 2019-12-23 2020-04-17 上海商汤智能科技有限公司 Behavior detection method and device and storage medium
CN111046849A (en) * 2019-12-30 2020-04-21 珠海格力电器股份有限公司 Kitchen safety implementation method and device, intelligent terminal and storage medium
CN111062527A (en) * 2019-12-10 2020-04-24 北京爱奇艺科技有限公司 Video collection flow prediction method and device
CN111310695A (en) * 2020-02-26 2020-06-19 酷黑科技(北京)有限公司 Forced landing method and device and electronic equipment
CN111950424A (en) * 2020-08-06 2020-11-17 腾讯科技(深圳)有限公司 Video data processing method and device, computer and readable storage medium
CN112055249A (en) * 2020-09-17 2020-12-08 京东方科技集团股份有限公司 Video frame interpolation method and device
CN112989934A (en) * 2021-02-05 2021-06-18 方战领 Video analysis method, device and system
CN113191293A (en) * 2021-05-11 2021-07-30 创新奇智(重庆)科技有限公司 Advertisement detection method, device, electronic equipment, system and readable storage medium
CN113312951A (en) * 2020-10-30 2021-08-27 阿里巴巴集团控股有限公司 Dynamic video target tracking system, related method, device and equipment
CN113825013A (en) * 2021-07-30 2021-12-21 腾讯科技(深圳)有限公司 Image display method and apparatus, storage medium, and electronic device
CN114095722A (en) * 2021-10-08 2022-02-25 钉钉(中国)信息技术有限公司 Definition determining method, device and equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496230A (en) * 2020-03-18 2021-10-12 中国电信股份有限公司 Image matching method and system
CN111556337B (en) * 2020-05-15 2021-09-21 腾讯科技(深圳)有限公司 Media content implantation method, model training method and related device
CN113573043B (en) * 2021-01-18 2022-11-08 腾讯科技(深圳)有限公司 Video noise point identification method, storage medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020056124A1 (en) * 2000-03-15 2002-05-09 Cameron Hay Method of measuring brand exposure and apparatus therefor
CN105163127A (en) * 2015-09-07 2015-12-16 浙江宇视科技有限公司 Video analysis method and device
CN107122773A (en) * 2017-07-05 2017-09-01 司马大大(北京)智能系统有限公司 A kind of video commercial detection method, device and equipment
CN107679250A (en) * 2017-11-01 2018-02-09 浙江工业大学 A kind of multitask layered image search method based on depth own coding convolutional neural networks
CN107944409A (en) * 2017-11-30 2018-04-20 清华大学 video analysis method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777124A (en) * 2010-01-29 2010-07-14 北京新岸线网络技术有限公司 Method for extracting video text message and device thereof
CN102567982A (en) * 2010-12-24 2012-07-11 浪潮乐金数字移动通信有限公司 Extraction system and method for specific information of video frequency program and mobile terminal
CN107197269B (en) * 2017-07-04 2020-02-21 广东工业大学 Video splicing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020056124A1 (en) * 2000-03-15 2002-05-09 Cameron Hay Method of measuring brand exposure and apparatus therefor
CN105163127A (en) * 2015-09-07 2015-12-16 浙江宇视科技有限公司 Video analysis method and device
CN107122773A (en) * 2017-07-05 2017-09-01 司马大大(北京)智能系统有限公司 A kind of video commercial detection method, device and equipment
CN107679250A (en) * 2017-11-01 2018-02-09 浙江工业大学 A kind of multitask layered image search method based on depth own coding convolutional neural networks
CN107944409A (en) * 2017-11-30 2018-04-20 清华大学 video analysis method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062527A (en) * 2019-12-10 2020-04-24 北京爱奇艺科技有限公司 Video collection flow prediction method and device
CN111062527B (en) * 2019-12-10 2023-12-05 北京爱奇艺科技有限公司 Video traffic collection prediction method and device
CN111027510A (en) * 2019-12-23 2020-04-17 上海商汤智能科技有限公司 Behavior detection method and device and storage medium
CN111046849A (en) * 2019-12-30 2020-04-21 珠海格力电器股份有限公司 Kitchen safety implementation method and device, intelligent terminal and storage medium
CN111310695A (en) * 2020-02-26 2020-06-19 酷黑科技(北京)有限公司 Forced landing method and device and electronic equipment
CN111310695B (en) * 2020-02-26 2023-11-24 酷黑科技(北京)有限公司 Forced landing method and device and electronic equipment
CN111950424A (en) * 2020-08-06 2020-11-17 腾讯科技(深圳)有限公司 Video data processing method and device, computer and readable storage medium
CN112055249A (en) * 2020-09-17 2020-12-08 京东方科技集团股份有限公司 Video frame interpolation method and device
CN113312951B (en) * 2020-10-30 2023-11-07 阿里巴巴集团控股有限公司 Dynamic video target tracking system, related method, device and equipment
CN113312951A (en) * 2020-10-30 2021-08-27 阿里巴巴集团控股有限公司 Dynamic video target tracking system, related method, device and equipment
CN112989934A (en) * 2021-02-05 2021-06-18 方战领 Video analysis method, device and system
CN113191293A (en) * 2021-05-11 2021-07-30 创新奇智(重庆)科技有限公司 Advertisement detection method, device, electronic equipment, system and readable storage medium
CN113825013B (en) * 2021-07-30 2023-11-14 腾讯科技(深圳)有限公司 Image display method and device, storage medium and electronic equipment
CN113825013A (en) * 2021-07-30 2021-12-21 腾讯科技(深圳)有限公司 Image display method and apparatus, storage medium, and electronic device
CN114095722A (en) * 2021-10-08 2022-02-25 钉钉(中国)信息技术有限公司 Definition determining method, device and equipment

Also Published As

Publication number Publication date
CN110532833A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
WO2019223361A1 (en) Video analysis method and apparatus
CN109740670B (en) Video classification method and device
CN109117848B (en) Text line character recognition method, device, medium and electronic equipment
CN107707931B (en) Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment
CN106649316B (en) Video pushing method and device
CN110827247B (en) Label identification method and device
TW201834463A (en) Recommendation method and apparatus for video data
US8879894B2 (en) Pixel analysis and frame alignment for background frames
Yang et al. Lecture video indexing and analysis using video ocr technology
US20150248592A1 (en) Method and device for identifying target object in image
WO2019062388A1 (en) Advertisement effect analysis method and device
CN110827292B (en) Video instance segmentation method and device based on convolutional neural network
CN111147891A (en) Method, device and equipment for acquiring information of object in video picture
CN111160134A (en) Human-subject video scene analysis method and device
Nguyen et al. Semantic prior analysis for salient object detection
US20110216939A1 (en) Apparatus and method for tracking target
CN111836118B (en) Video processing method, device, server and storage medium
CN111798543A (en) Model training method, data processing method, device, equipment and storage medium
CN111541939B (en) Video splitting method and device, electronic equipment and storage medium
CN108229285B (en) Object classification method, object classifier training method and device and electronic equipment
CN113923504B (en) Video preview moving picture generation method and device
CN112348566A (en) Method and device for determining recommended advertisements and storage medium
KR20110087620A (en) Layout based page recognition method for printed medium
CN110019951B (en) Method and equipment for generating video thumbnail
CN111798542B (en) Model training method, data processing device, model training apparatus, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19806421

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19806421

Country of ref document: EP

Kind code of ref document: A1