WO2021051885A1 - 目标标注的方法及装置 - Google Patents

目标标注的方法及装置 Download PDF

Info

Publication number
WO2021051885A1
WO2021051885A1 PCT/CN2020/093958 CN2020093958W WO2021051885A1 WO 2021051885 A1 WO2021051885 A1 WO 2021051885A1 CN 2020093958 W CN2020093958 W CN 2020093958W WO 2021051885 A1 WO2021051885 A1 WO 2021051885A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
image frame
result
labeling
target
Prior art date
Application number
PCT/CN2020/093958
Other languages
English (en)
French (fr)
Inventor
蒋晨
张伟
程远
Original Assignee
创新先进技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 创新先进技术有限公司 filed Critical 创新先进技术有限公司
Publication of WO2021051885A1 publication Critical patent/WO2021051885A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • One or more embodiments of this specification relate to the field of computer technology, and in particular, to a method and device for marking targets through a computer.
  • the automatic recognition technology in the conventional technology is usually based on the labeling of a single picture. Therefore, based on the above background, it is urgent to provide a universal target labeling method (not limited to car damage condition labeling) that fully considers the association between the front and rear image frames to improve the effectiveness of target labeling.
  • One or more embodiments of this specification describe a method and device for marking targets, which can improve the accuracy of damage recognition.
  • a method for target labeling based on a video stream comprising: obtaining a current key frame, the current key frame being a plurality of key frames determined from each image frame of the video stream One of; using a pre-trained labeling model to target the current key frame to obtain a labeling result for the current key frame, the labeling model is used to label the area containing the predetermined target from the picture through the target frame ; Based on the labeling result for the current key frame, target the non-key frames after the current key frame in the video stream.
  • the initial multiple key frames are extracted in any of the following ways:
  • the video stream is input to a pre-trained frame extraction model, and multiple key frames are determined according to the output result of the frame extraction model.
  • the video stream is a vehicle video
  • the target is a vehicle damage
  • the labeling model is trained in the following way: multiple vehicle pictures are obtained, and each vehicle picture corresponds to the labeling result of each sample, where in the vehicle
  • the labeling result of a single sample includes at least one damage frame, the damage frame being the smallest rectangular frame surrounding a continuous damage area; and the labeling model is trained based at least on the multiple vehicle pictures.
  • adjacent key frames are respectively marked as the first image frame and the second image frame, and for the current image frame, the current key frame is the initial first image frame , The next frame of the current key frame is the initial second image frame; and the target is performed on the non-key frames after the current key frame in the video stream based on the annotation result for the current key frame
  • the labeling includes: after the labeling of the first image frame is completed, detecting whether the second image frame is a key frame; in the case that the second image frame is not a key frame, detecting the relationship between the second image frame and the The similarity of the first image frame; if the similarity between the second image frame and the first image frame is greater than a preset similarity threshold, the annotation result corresponding to the first image frame is mapped to the Second image frame, so as to obtain the annotation result corresponding to the second image frame; using the second image frame and the next frame of the second image frame to update the first image frame and the second image frame respectively For the image frame, target annotating the updated second image
  • the second image frame is determined as a key frame if the similarity between the second image frame and the first image frame is less than the similarity threshold.
  • determining the similarity between the second image frame and the first image frame includes: determining a reference area in the first image frame based on a labeling result of the first image frame;
  • the determined convolutional neural network processes the reference area in the first image frame and the second image frame respectively, and obtains the first convolution result and the second convolution result respectively; and uses the first convolution result as The convolution kernel performs convolution processing on the second convolution result to obtain a third convolution result.
  • each numerical value depicts the corresponding value of the second image frame.
  • Each similarity between the region and the reference region of the first image frame; based on the largest value in the numerical array corresponding to the third convolution result determine the similarity between the second image frame and the first image frame .
  • the mapping result corresponding to the first image frame is mapped to The second image frame to obtain the second labeling result corresponding to the second image frame includes: according to the labeling result of the first image frame, labeling the second image frame corresponding to the maximum value Image area.
  • the determining the reference area in the first image frame based on the labeling result of the first image frame includes: in the case that the first labeling result includes a target frame, the initial reference The area is determined as the area enclosed by the target frame; in the case that the first marking result does not include the target frame, the initial reference area is determined as the area at the designated position in the current key frame.
  • the current key frame also corresponds to a confidence flag
  • the non-key frames after the current key frame in the video stream are target-marked based on the marking result for the current key frame It includes: determining the confidence flag of each non-key frame after the current key frame and before the next key frame, which is consistent with the confidence flag corresponding to the labeling result of the current key frame.
  • the confidence mark includes a high confidence mark and a low confidence mark, wherein the high confidence mark corresponds to the labeling model.
  • the output result of the corresponding key frame contains the target frame, and the reference area indicates A situation in which a predetermined target with high confidence can be provided, the low-confidence identifier corresponds to a situation in which the output result of the labeling model for the corresponding key frame does not contain the target frame, and the reference area does not indicate the predetermined target; the method further includes: The image frames corresponding to the high-confidence mark are added to the target label set.
  • a device for marking targets based on a video stream comprising:
  • An acquiring unit configured to acquire a current key frame, where the current key frame is one of a plurality of key frames determined from each image frame of the video stream;
  • the first labeling unit is configured to use a pre-trained labeling model to perform target labeling on the current key frame to obtain a labeling result for the current key frame, and the labeling model is used to label the image containing a predetermined Target area
  • the second labeling unit is configured to perform target labeling on non-key frames following the current key frame in the video stream based on the labeling result for the current key frame.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the method of the first aspect is implemented .
  • Figure 1 shows a schematic diagram of an implementation scenario of an embodiment disclosed in this specification
  • Fig. 2 shows a flow chart of a method for marking a target according to an embodiment
  • FIG. 3 shows a schematic diagram of a process for determining the similarity of image frames in a specific example
  • FIG. 4 shows a specific example of a schematic flow diagram of target labeling based on a video stream
  • Fig. 5 shows a schematic block diagram of a target tagging device according to an embodiment.
  • Figure 1 shows the vehicle inspection scene, marking the target as vehicle damage, such as whether there is damage, damage type, damage material, and so on.
  • the vehicle inspection scene can be any scene that needs to inspect the damage of the vehicle. For example, when a vehicle is insured, the vehicle is inspected to determine whether the vehicle is damaged, or when a car insurance claim is settled, the vehicle's damage is determined through the vehicle inspection.
  • the user can collect on-site video of the vehicle through a terminal that can collect on-site information, such as a smart phone, a camera, and a sensor.
  • the live video can include one or more video streams, and one video stream is a segment of video.
  • the live video can be sent to the manual inspection platform, and the manual inspection platform determines the purpose of the inspection, so that the corresponding labeling request is sent to the computing platform, and the targeted live video is sent to the computing platform at the same time.
  • the targeted live video can be sent with a video stream as a unit and a labeling request for each video stream, or can be sent with a case as a unit for one or more video streams in a case.
  • the computing platform uses the target labeling method of the architecture of this specification to target the video stream in accordance with the labeling request.
  • the target label can be fed back to the manual inspection platform as a pre-labeled result to provide a reference for manual decision-making.
  • the pre-marked results can indicate that the vehicle is not damaged, or the damaged location and type of damage when the vehicle is damaged.
  • the pre-annotated result can be in the form of text, or in the form of an image frame containing vehicle damage.
  • the manual inspection platform and the computing platform shown in FIG. 1 may be integrated together, or may be set separately. In the case of separate settings, the computing platform can be used as a server that provides services for multiple manual inspection platforms.
  • the implementation scenario in Figure 1 is only an example. In some implementations, a manual inspection platform may not be provided.
  • the terminal directly sends a video stream to the computing platform, and the computing platform feeds back the annotation result to the terminal, or sends an inspection generated based on the annotation result. The result of the car is fed back to the terminal.
  • multiple key frames are first determined from the video stream, and each key frame is processed in a chronological order through a pre-trained labeling model.
  • Each key frame is processed to obtain After an annotation result, and after the non-key frame after the key frame is processed, the image frame between the current key frame and the next key frame is marked, and then the next key frame is processed.
  • processing non-key frames after the current key frame refer to the annotation result of the current key frame to reduce the amount of data processing.
  • image frames that meet the conditions can be selected from the non-key frames according to actual conditions and added to the key frames. And the selected key frames are processed using the annotation model to obtain the annotation result, and the subsequent image frame processing is performed based on the annotation result.
  • Fig. 2 shows a flow chart of a target labeling method according to an embodiment.
  • the execution subject of the method can be any system, equipment, device, platform or server with computing and processing capabilities.
  • the marked target can be any target in the relevant scene, such as various objects (such as kittens), modules with certain characteristics (such as oval leaves), and so on.
  • the target to be marked can be vehicle parts, vehicle damage, and so on.
  • the method of target labeling may include the following steps: Step 201, obtain the current key frame, the current key frame is one of a plurality of key frames determined from the video stream; Step 202, use a pre-trained label The model performs target labeling on the current key frame to obtain the labeling result for the current key frame.
  • the labeling model is used to label the area containing the target from the picture through the target frame; step 203, based on the labeling result for the current key frame, the video stream In the non-key frame after the current key frame, the target is marked.
  • a current key frame is acquired, and the current key frame is one of multiple key frames determined from each image frame of the video stream.
  • a key frame is usually an image frame that can reflect the changing characteristics of the video.
  • key frames Before processing a video stream, multiple key frames can be extracted in advance. These key frames can be used as initial key frames. Among them, the key frame extraction of the video stream can be performed in various reasonable ways.
  • image frames can be selected as key frames from the video stream at predetermined time intervals. For example, for a 30-second video stream, 60 image frames can be extracted as key frames at 0.5 second intervals.
  • the frame extraction model may be trained in advance, the video stream is input to the frame extraction model, and each key frame of the video stream is determined by the output result of the frame extraction model.
  • the frame extraction model is a model that extracts key frames from multiple image frames in a video stream.
  • the frame extraction model can be trained in the following way: multiple video streams are obtained as training samples, and for each image frame in each video stream, image features (such as color features, component Features, etc.), and have manually labeled sample key frames for the corresponding video stream; for each training sample, the image features of each corresponding image frame are sequentially input into the selected model, such as recurrent neural network RNN, LSTM, etc., using The model output results are compared with the sample key frames, and the model parameters are adjusted to train the frame extraction model.
  • the video stream obtained in step 201 may also include preprocessing results of extracting image features from each image frame, which will not be repeated here.
  • the frame extraction model can also be trained in the following manner: multiple video streams are obtained as training samples, and each video stream corresponds to multiple image frames and manually labeled sample key frames; For each training sample, input its various image frames into the selected model in turn, such as cyclic neural network RNN, LSTM, etc., and the model can mine the features of the image frame by itself, output the key frame extraction results, and then use the model to output the results and the sample key frames In order to train the frame drawing model by adjusting the model parameters.
  • the selected model such as cyclic neural network RNN, LSTM, etc.
  • the key frames can also be extracted in more effective ways, which will not be repeated here.
  • the key frame acquired in the current process is the current key frame, and the current key frame may be any key frame in the video stream to be processed.
  • the current key frame is target-labeled using the pre-trained labeling model, and the labeling result for the current key frame is obtained.
  • the labeling model is used to label the area containing the predetermined target from the picture through the target frame.
  • the predetermined target may be vehicle components, vehicle damage, and so on.
  • the annotation result of the annotation model can be in the form of pictures or text.
  • the picture format is, for example, on the basis of the original picture, the marked target is circled by the target frame.
  • the target frame may be the smallest frame of a predetermined shape surrounding the continuous target area, such as the smallest rectangular frame, the smallest circular frame, and the like.
  • the text form is, for example, the target feature marked by text description.
  • the labeling result in text form can be: damaged component + damage degree, such as bumper scratching; damage material + damage degree, such as cracked left front window; and so on.
  • the annotation model can be trained in the following ways:
  • each vehicle picture corresponds to the labeling result of each sample.
  • the labeling result of a single sample may include at least one damage frame (corresponding to at least one damage) on the original picture.
  • the damage frame is the smallest rectangular frame surrounding the continuous damage area (in other embodiments, it can also be a circular frame, etc.), otherwise, the marking result is empty, "no damage", or the original picture itself;
  • a key frame is a picture
  • the current key frame is input to the annotation model
  • the output result obtained through the processing of the annotation model can be the annotation result of the current key frame.
  • the annotation result for the current key frame can be empty, or the text representation of "no damage", or the original picture itself.
  • step 203 based on the marking result for the current key frame, target marking is performed on the non-key frames after the current key frame in the video stream.
  • the non-key frame here is an image frame that has not been determined as a key frame.
  • the non-key frame after the current key frame may be an image frame after the current key frame and before the next key frame.
  • the key frames are marked with targets through the marking model, and the non-key frames are marked with reference to the marking results of the key frames, thereby reducing the amount of data processing.
  • the image frames in the video stream are usually continuously collected at a certain frequency (for example, 24 frames per second), and the pictures between adjacent image frames may have a certain similarity. Adjacent image frames may have multiple similar regions, that is, have a greater degree of similarity. It can be understood that if the similarity of adjacent image frames is small, a sudden change of the picture may occur, and the characteristics of the image frame may change. In this case, the adjacent image frames of the key frame can also be used as the key frame to reflect the feature change of the video stream. Based on this, in the embodiment of this specification, non-key frames can be targeted based on the similarity between image frames.
  • each image frame after the current key frame and before the next key frame may be sequentially compared with the current key frame to determine their similarity. If the similarity is greater than the predetermined threshold, the corresponding image frame is labeled with the labeling result of the current key frame. If the similarity is less than the predetermined threshold, the corresponding image frame is taken as the key frame. According to the time sequence, the newly determined key frame is the next key frame of the current key frame in the current process. Therefore, the newly determined key frame can be acquired as the current key frame next, and the target labeling shown in Figure 2 can be executed. Process. Further, the non-key frames after the newly determined key frame are marked with reference to the marking result of the newly determined key frame.
  • the current key frame and the non-key frames after the current key frame and before the next key frame may be compared between adjacent image frames to determine the similarity of adjacent image frames. If the similarity of adjacent image frames is high, the next image frame is labeled with the labeling result of the previous image frame; otherwise, the next image frame is used as the newly determined key frame, and the target labeling shown in Figure 2 is executed. Process.
  • two adjacent image frames can be called the first image frame and the second image frame, respectively.
  • the current key frame is the initial first image frame
  • the next frame of the current key frame is The initial second image frame.
  • the second image frame can be used as the current key frame, and for the updated current key frame, the process shown in Figure 2 is used. deal with.
  • the annotation result of the first image frame is mapped to the second image frame, so as to obtain the annotation result corresponding to the second image frame.
  • the second image frame and the next frame of the second image frame are used to update the first image frame and the second image frame respectively, that is, the second image frame is used as the new first image frame, and the second image frame The next frame is used as the new second image frame.
  • the second image frame is the last frame of the video stream, and there is no next image frame (that is, the step of updating the second image frame cannot be continued); or,
  • the updated second image frame is a key frame
  • the second image frame is used as the current key frame to continue subsequent processing.
  • the labeling result of the current key frame is also possible to use the labeling result of the current key frame to target the non-key frames after the current key frame in other ways, for example, to label the current key frame by means of optical flow.
  • the result is mapped to other image frames, so I won't repeat it here.
  • the reference area in the first image frame may be determined based on the labeling result of the first image frame.
  • the initial reference area is determined as the area surrounded by the target frame; the labeling result for the current key frame does not include the target frame
  • the initial reference area is determined as the area at the specified position in the current key frame.
  • the area at the designated position may be a pre-designated area containing a predetermined number of pixels, such as a 9 ⁇ 9 pixel area in the center of the first image frame, a 9 ⁇ 9 pixel area in the upper left corner of the first image frame, and so on.
  • the reference area in the first image frame is the area marked in the corresponding marking result.
  • the method for determining the similarity between the reference area of the first image frame and the respective areas corresponding to the second image frame can be performed by a method such as pixel value comparison, or can be performed by a similarity model, which is not limited here.
  • the method for determining the similarity between the reference area of the first image frame and the corresponding areas of the second image frame is described below. Assume that the reference area determined based on the labeling result of the first image frame is the reference area z, and the second image frame is the image frame x.
  • the reference area z (for example, corresponding to a pixel array of 127 ⁇ 127 ⁇ 3) is processed by a predetermined convolutional neural network ⁇ to obtain the first convolution result (for example, a feature array of 6 ⁇ 6 ⁇ 128), and the other
  • the image frame x (for example, corresponding to a pixel array of 255 ⁇ 255 ⁇ 3) is also processed by the same convolutional neural network ⁇ to obtain a second convolution result (for example, a feature array of 22 ⁇ 22 ⁇ 128).
  • the first convolution result is used as a convolution kernel
  • the second convolution result is subjected to convolution processing to obtain a third convolution result (such as a 17 ⁇ 17 ⁇ 1 numerical array).
  • each numerical value respectively depicts the similarity between the corresponding array in the second convolution result and the array of the first convolution result.
  • the second convolution result is the processing result of the second image frame, so each sub-array in the second convolution result can correspond to each area in the second image frame.
  • each value in the third convolution result corresponds to each sub-array in the second convolution result. Therefore, the third convolution result can be regarded as a distribution array of the similarity between each corresponding area in the second image frame and the reference area of the first image frame.
  • the larger the numerical value the greater the similarity between the corresponding area in the second image frame and the reference area of the first image frame. Since the second image frame is annotated based on the annotation result of the first image frame, it is only necessary to determine whether there is an area corresponding to the reference area of the first image frame in the second image frame. Generally, if there is an area corresponding to the reference area of the first image frame in the second image frame, this area is the area in the second image frame with the highest similarity to the reference area of the first image frame. In this way, the similarity between the second image frame and the first key image frame can be determined based on the largest value in the numerical array corresponding to the third convolution result.
  • the maximum value corresponds to the area in the second image frame that is most similar to the reference area of the first image frame.
  • the similarity between the second image frame and the first key image frame can be the maximum value itself, or it can be the decimal number corresponding to the maximum value after normalizing each value in the numerical array corresponding to the third convolution result /Point value.
  • the similarity threshold of the same region in the two image frames can be preset. In this way, if the similarity determined by the above process is greater than the similarity threshold, it indicates that the second image frame has a reference region corresponding to the first image frame. Areas, for example, are left front lights. Otherwise, if the similarity determined by the above process is less than the similarity threshold, it indicates that the second image frame does not include an area corresponding to the reference area of the first image frame.
  • the second image frame does not include an area corresponding to the reference area of the first image frame, there may be a sudden change in the picture between the second image frame and the first image frame. If the second image frame is not annotated, important information may be missed. Therefore, at this time, the second image frame can be added to the key frame of the video stream. In addition, according to the time sequence, in the next process, the second image frame is acquired as the current key frame for target labeling.
  • each image frame can correspond to a reference area, but the actual meaning of the reference area is different.
  • the labeling result of the labeling model corresponding to the current key frame contains the target frame, it means that a certain part of the vehicle or a certain piece of material has high-confidence damage. This result can be provided for manual processing. Referential, or have an impact on decision-making.
  • the reference area at this time indicates a predetermined target that can provide a high degree of confidence.
  • the reference area obtained by specifying the position method may also include a frame, but the area enclosed by the frame is to provide a reference for the labeling of subsequent image frames without indicating a predetermined target, and there is no damage in the vehicle detection scene. Therefore, the labeling result of the current key frame may also correspond to a confidence flag.
  • the output result of the annotation model includes the target frame, and the reference area indicates the predetermined target that can provide high confidence, the confidence flag of the current key frame is the high confidence flag. In the vehicle detection scene shown in Figure 1, the high-confidence mark represents a higher possible vehicle damage, and the corresponding image frame can be output to the manual inspection platform for reference.
  • the designated location area is determined as the reference area, and the confidence indicator of the current key frame may be a low confidence indicator.
  • the reference label corresponds to the vehicle damage with low confidence or zero confidence.
  • the target table labeling process shown in FIG. 2 may also include adding image frames corresponding to a high-confidence identifier to the target labeling set.
  • the target annotation set is used for output to manual inspection, or for computer intelligent decision-making.
  • FIG. 4 In a specific implementation process, as shown in Figure 4, for the received video stream, first extract key frames. Then, one of the key frames is acquired in chronological order as the current key frame. The current key frame is processed by the annotation model, and the annotation result of the current key frame is obtained. Determine whether to output the target frame based on the labeling result. If so, use the area within the target frame as the reference area, and at the same time, set a high-confidence flag for the current key frame, such as by setting a confidence flag flag to 1, and add the current key frame to the pre-labeled result set. Otherwise, set a low confidence flag for the current key frame, such as setting a confidence flag flag to 0. Then, based on the annotation result of the current key frame, an image frame is annotated.
  • next image frame is a key frame. If yes, acquire the next image frame as the current key frame, and continue the process. Otherwise, the current key frame is the current frame, and the similarity between the current frame and the next frame is detected. If the similarity is less than the preset similarity threshold, the next frame is added to the key frame, the next frame is acquired as the current key frame, and the process continues. Otherwise, the similarity is greater than the preset similarity threshold, the next frame is marked with the labeling result of the current frame, and the next frame inherits the confidence flag of the current frame. It is detected whether the confidence flag of the next frame is a high confidence flag.
  • the confidence flag of the next frame is a high confidence flag, add the next frame to the pre-labeled result, and use the next frame and the next frame to update the current frame and the next frame, and continue the process until it is detected The key frame, or the end of the video stream. If the confidence flag of the next frame is not a high confidence flag, use the next frame and the next frame to update the current frame and the next frame, and continue the process until a key frame is detected or the video stream ends.
  • the target labeling process shown in FIG. 2 is an inevitable labeling process, but it is not a labeling process that must be completely executed for every key frame.
  • the next image frame of the current key frame is also a key frame, and there is no interval between the current key frame and the next key frame
  • the video stream The non-key frames after the current key frame need not be executed for target labeling.
  • a device for marking a target includes: an acquiring unit 51 configured to acquire a current key frame, the current key frame being one of a plurality of key frames determined from each image frame of the video stream; first The labeling unit 52 is configured to use a pre-trained labeling model to perform target labeling on the current key frame to obtain a labeling result for the current key frame.
  • the labeling model is used to label the area containing the predetermined target from the picture through the target frame; second labeling The unit 53 is configured to perform target labeling on non-key frames following the current key frame in the video stream based on the labeling result for the current key frame.
  • the device 500 further includes an extraction unit (not shown) configured to extract the initial multiple key frames in any of the following ways:
  • the video stream is input to the pre-trained frame extraction model, and multiple key frames are determined according to the output result of the frame extraction model.
  • the video stream is a vehicle video
  • the target is vehicle damage.
  • the device 500 may further include a training unit (not shown) configured to train the annotation model in the following manner:
  • each vehicle picture corresponds to the labeling result of each sample.
  • the labeling result of a single sample includes at least one damage frame, which is the smallest rectangular frame surrounding the continuous damage area ;
  • the adjacent key frames are respectively marked as the first image frame and the second image frame.
  • the initial first image frame is the current key frame.
  • the initial second image frame is the next frame of the current key frame;
  • the second marking unit 53 is also configured as:
  • the second image frame is not a key frame, detecting the similarity between the second image frame and the first image frame;
  • the first image frame and the second image frame are respectively updated with the second image frame and the next frame of the second image frame, and the updated second image frame is target-labeled based on the labeling result of the updated first image frame.
  • the second image frame is determined as the key frame.
  • the second labeling unit 53 is further configured to determine the similarity between the second image frame and the first image frame in the following manner:
  • each value depicts the second image frame The respective similarities between the corresponding area and the reference area of the first image frame;
  • the similarity between the second image frame and the first image frame is determined.
  • the second labeling unit 53 is further configured to:
  • the image area of the second image frame corresponding to the maximum value is marked.
  • the second marking unit 53 is further configured to:
  • the initial reference area is determined as the area enclosed by the target frame
  • the initial reference area is determined as the area at the specified position in the current key frame.
  • the current key frame also corresponds to a confidence flag
  • the second labeling unit is further configured to:
  • the confidence flag of each non-key frame after the current key frame and before the next key frame is determined to be consistent with the confidence flag corresponding to the labeling result of the current key frame.
  • the confidence mark includes a high confidence mark and a low confidence mark.
  • High confidence indicates that the output result of the corresponding annotation model for the corresponding key frame contains the target frame
  • the reference area indicates the situation that can provide a predetermined target with high confidence
  • the low confidence indicates that the output result of the corresponding annotation model for the corresponding key frame does not include the target
  • the frame, the reference area does not indicate the predetermined target.
  • the device 500 may further include a marking result determination unit (not shown), configured to:
  • the apparatus 500 shown in FIG. 5 is an apparatus embodiment corresponding to the method embodiment shown in FIG. 2, and the corresponding description in the method embodiment shown in FIG. 2 is also applicable to the apparatus 500. Go into details again.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 2 method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本说明书实施例提供一种目标标注的方法和装置,根据一个实施方式,获取当前关键帧,当前关键帧是从视频流的各个图像帧中确定的多个关键帧中的一个,然后,使用预先训练的标注模型处理当前关键帧,得到针对当前关键帧的第一标注结果,标注模型用于通过目标边框从图片中标注出包含预定目标的区域,接着,基于第一标注结果,对视频流中当前关键帧之后的非关键帧进行目标标注。如此,可以提高目标标注的有效性。

Description

目标标注的方法及装置 技术领域
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及通过计算机进行目标标注的方法和装置。
背景技术
在传统车险验车场景中,往往通过保险公司的专业查勘人员进行验车。例如,在投保时,需要检验车辆是否有损,车险理赔场景中,保险公司需要派出专业的查勘定损人员到事故现场进行现场查勘定损。由于需要人工查勘定损,保险公司需要投入大量的人力成本,和专业知识的培训成本。从普通用户的体验来说,投保和理赔流程由于等待人工查勘员现场查验等,用户的等待时间较长,体验较差。
针对需求背景中提到的这一人工成本巨大的行业痛点,开始设想将人工智能和机器学习应用到车辆损伤检测的场景中,希望能够利用人工智能领域计算机视觉图像识别技术,根据普通用户拍摄的现场图像,自动识别图片中反映的车损状况。如此,可以大大减少人工成本,提升用户体验。
然而,常规技术中的自动识别技术,通常是基于单张图片的标注进行的。因此,基于以上背景,亟需提供一种充分考虑前后图像帧之间关联的普适性目标标注方式(不限于车损状况标注),提高目标标注的有效性。
发明内容
本说明书一个或多个实施例描述了一种目标标注的方法和装置,可以提高损伤识别的准确度。
根据第一方面,提供了一种基于视频流进行目标标注的方法,所述方法包括:获取当前关键帧,所述当前关键帧是从所述视频流的各个图像帧中确定的多个关键帧中的一个;使用预先训练的标注模型对所述当前关键帧进行目标标注,得到针对所述当前关键帧的标注结果,所述标注模型用于通过目标边框从图片中标注出包含预定目标的区域;基于针对所述当前关键帧的标注结果,对所述视频流中所述当前关键帧之后的非关键帧进行目标标注。
在一个实施例中,初始的多个关键帧通过以下任一种方式提取:
按照预定时间间隔从所述视频流中选择出多个图像帧作为关键帧;
将所述视频流输入预先训练的抽帧模型,根据所述抽帧模型的输出结果确定出多个关键帧。
在一个实施例中,所述视频流是车辆视频,所述目标是车辆损伤,所述标注模型通过以下方式训练:获取多张车辆图片,各张车辆图片对应各个样本标注结果,其中,在车辆图片中包括车辆损伤的情况下,单个样本标注结果包括至少一个损伤边框,所述损伤边框是包围连续损伤区域的最小矩形框;至少基于所述多张车辆图片,训练所述标注模型。
在一个实施例中,在所述视频流中,相邻的关键帧分别记为第一图像帧和第二图像帧,针对所述当前图像帧,所述当前关键帧为初始的第一图像帧,所述当前关键帧的下一帧为初始的第二图像帧;所述基于针对所述当前关键帧的标注结果,对所述视频流中,所述当前关键帧之后的非关键帧进行目标标注包括:在对所述第一图像帧标注完成后,检测所述第二图像帧是否为关键帧;在所述第二图像帧不是关键帧的情况下,检测所述第二图像帧与所述第一图像帧的相似度;如果所述第二图像帧与所述第一图像帧的相似度大于预先设定的相似度阈值,将所述第一图像帧对应的标注结果映射到所述第二图像帧,从而得到所述第二图像帧对应的标注结果;用所述第二图像帧与所述第二图像帧的下一帧,分别更新所述第一图像帧与所述第二图像帧,基于更新后的第一图像帧的标注结果对更新后的第二图像帧进行目标标注。
在一个实施例中,如果所述第二图像帧与所述第一图像帧的相似度小于所述相似度阈值,将所述第二图像帧确定为关键帧。
在一个实施例中,确定所述第二图像帧与所述第一图像帧的相似度包括:基于所述第一图像帧的标注结果,确定所述第一图像帧中的参考区域;利用预先确定的卷积神经网络分别处理所述第一图像帧中的参考区域、所述第二图像帧,并分别得到第一卷积结果、第二卷积结果;将所述第一卷积结果作为卷积核,对所述第二卷积结果进行卷积处理,得到第三卷积结果,所述第三卷积结果对应的数值阵列中,各个数值分别描绘出所述第二图像帧的相应区域与所述第一图像帧的参考区域的各个相似度;基于所述第三卷积结果对应的数值阵列中的最大数值,确定所述第二图像帧与所述第一图像帧的相似度。
在一个实施例中,在所述第二图像帧与所述第一图像帧的相似度大于预先设定的相 似度阈值的情况下,所述将所述第一图像帧对应的标注结果映射到所述第二图像帧,从而得到所述第二图像帧对应的第二标注结果包括:按照所述第一图像帧的标注结果,标注出所述最大数值对应到的所述第二图像帧的图像区域。
在一个实施例中,所述基于所述第一图像帧的标注结果,确定所述第一图像帧中的参考区域包括:在所述第一标注结果包含目标边框的情况下,将初始的参考区域确定为,所述目标边框包围的区域;在所述第一标注结果不包含目标边框的情况下,将初始的参考区域确定为,所述当前关键帧中指定位置的区域。
在一个实施例中,所述当前关键帧还对应有置信标识,所述基于针对所述当前关键帧的标注结果,对所述视频流中,所述当前关键帧之后的非关键帧进行目标标注包括:确定所述当前关键帧之后、下一关键帧之前的各个非关键帧的置信标识,与所述当前关键帧的标注结果对应的置信标识一致。
在一个实施例中,所述置信标识包括,高置信度标识、低置信度标识,其中,所述高置信度标识对应所述标注模型针对相应关键帧的输出结果包含目标边框,参考区域指示出可提供高置信度的预定目标的情况,所述低置信度标识对应所述标注模型针对相应关键帧的输出结果不包含目标边框,参考区域不指示预定目标的情况;所述方法还包括:将对应有高置信度标识的图像帧加入目标标注集。
根据第二方面,提供一种基于视频流进行目标标注的装置,所述装置包括:
获取单元,配置为获取当前关键帧,所述当前关键帧是从所述视频流的各个图像帧中确定的多个关键帧中的一个;
第一标注单元,配置为使用预先训练的标注模型对所述当前关键帧进行目标标注,得到针对所述当前关键帧的标注结果,所述标注模型用于通过目标边框从图片中标注出包含预定目标的区域;
第二标注单元,配置为基于针对所述当前关键帧的标注结果,对所述视频流中所述当前关键帧之后的非关键帧进行目标标注。
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。
根据第四方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。
通过本说明书实施例提供的目标标注的方法和装置,在目标标注的过程中,在目标标注的过程中,仅通过标注模型处理视频流中的关键帧,对于非关键帧,通过关键帧的标注结果进行标注,从而大大减少数据处理量。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出本说明书披露的一个实施例的实施场景示意图;
图2示出根据一个实施例的目标标注的方法流程图;
图3示出一个具体例子中确定图像帧相似度的流程示意图;
图4示出一个具体例子的基于视频流进行目标标注的流程示意图;
图5示出根据一个实施例的目标标注的装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
为了便于说明,结合图1示出的本说明书实施例的一个具体适用场景进行描述。图1示出的是在验车场景,标注目标为车辆损伤,如有无损伤、损伤类型、损伤材质等等。该验车场景可以是任意需要对车辆损伤情况进行查验的场景。例如,车辆投保时,通过验车确定车辆有无损伤,或者车险理赔时,通过验车确定车辆损伤情况。
该实施场景中,用户可以通过可采集现场信息的终端,例如智能手机、照相机、传感器等,采集车辆的现场视频。现场视频可以包括一个或多个视频流,一个视频流即一段视频。现场视频可以发送至人工查验平台,由人工查验平台确定查验目的,从而向计算平台发送相应的标注请求,同时将所针对的现场视频发送给计算平台。值得说明的是,所针对的现场视频可以以视频流为单位,针对每个视频流发送一个标注请求,也可以以案件为单位,针对一个案件中的一个或多个视频流发送一个标注请求。计算平台通过本说明书架构的目标标注的方法,对视频流按照标注请求进行目标标注。在该实施场景中,该目标标注可以作为预标注结果反馈至人工查验平台,为人工决策提供参考。预标注结 果可以指示出车辆无损,或者车辆有损情况下的损伤部位、损伤类别等。预标注结果可以是文字形式,也可以是包含车辆损伤的图像帧形式。
其中,图1示出的人工查验平台和计算平台可以是集成在一起的,也可以分开设置。在分开设置的情况下,计算平台可以作为为多个人工查验平台提供服务的服务端。图1中的实施场景仅为示例,在一些实现中,也可以不设置人工查验平台,直接由终端向计算平台发送视频流,计算平台将标注结果反馈至终端,或者将基于标注结果生成的验车结果反馈至终端。
具体地,本说明书实施例架构下的目标标注的方法,先从视频流中确定出多个关键帧,按照时间顺序对各个关键帧通过预先训练的标注模型进行处理,每处理一个关键帧,得到一个标注结果,并且对该关键帧之后的非关键帧进行处理之后,将当前关键帧和下一关键帧之间的图像帧标注完成后,进行下一个关键帧的处理。其中,对当前关键帧之后的非关键帧进行处理时,参考当前关键帧的标注结果,减少数据处理量。可选地,在对非关键帧进行处理时,还可以根据实际情况从非关键帧中选出满足条件的图像帧加入到关键帧中。并对选出的关键帧,利用标注模型进行处理,得到标注结果,基于其标注结果进行后续图像帧处理。
下面详细介绍目标标注的方法。
图2示出根据一个实施例的目标标注的方法流程图。该方法的执行主体可以是任何具有计算、处理能力的系统、设备、装置、平台或服务器。例如图1示出的计算平台。所标注的目标可以是相关场景下的任意目标,例如各种物体(如小猫)、具有某些特征的模块(如椭圆形的树叶)等。在验车场景下,要标注的目标可以是车辆部件、车辆损伤等。
如图2所示,该目标标注的方法可以包括以下步骤:步骤201,获取当前关键帧,当前关键帧是从视频流中确定的多个关键帧中的一个;步骤202,使用预先训练的标注模型对当前关键帧进行目标标注,得到针对当前关键帧的标注结果,标注模型用于通过目标边框从图片中标注出包含目标的区域;步骤203,基于针对当前关键帧的标注结果,对视频流中当前关键帧之后的非关键帧进行目标标注。
首先,在步骤201中,获取当前关键帧,当前关键帧是从视频流的各个图像帧中确定的多个关键帧中的一个。关键帧通常是可以反映视频的变化特征的图像帧。通过对视频流抽取关键帧,用对关键帧的处理结果反映视频流的变化特征,可以有效降低数据处 理量。
对一个视频流处理之前,可以预先抽取其中的多个关键帧。这些关键帧可以作为初始的关键帧。其中,对视频流进行关键帧抽取可以采用各种合理的方式进行。
在一个实施方式中,可以按照预定时间间隔从视频流中选择出图像帧作为关键帧。例如,一个30秒的视频流,可以按照0.5秒的时间间隔,从中抽取60个图像帧作为关键帧。
在另一个实施方式中,可以预先训练抽帧模型,将视频流输入抽帧模型,通过抽帧模型的输出结果确定视频流的各个关键帧。抽帧模型,顾名思义,就是从视频流的多个图像帧中抽取关键帧的模型。
在一个可选的实现方式中,抽帧模型可以通过以下方式训练:获取多个视频流作为训练样本,对每个视频流中的每个图像帧,可以提取有图像特征(例如色彩特征、部件特征等),并针对相应视频流具有人工标注的样本关键帧;对每个训练样本,将其对应的各个图像帧的图像特征依次输入选定的模型,例如循环神经网络RNN、LSTM等,利用模型输出结果与样本关键帧的对比,调整模型参数,从而训练抽帧模型。此时,步骤201获取的视频流中还可以包含对各个图像帧抽取图像特征的预处理结果,在此不再赘述。
在另一个可选的实现方式中,抽帧模型还可以通过以下方式训练:获取多个视频流作为训练样本,每个视频流分别对应多个图像帧,以及人工标注的样本关键帧;依次针对每个训练样本,将其各个图像帧依次输入选定的模型,例如循环神经网络RNN、LSTM等,由模型自行挖掘图像帧的特征,输出关键帧提取结果,然后利用模型输出结果与样本关键帧的对比,调整模型参数,从而训练抽帧模型。
在更多实施例中,还可以通过更多有效的方式抽取关键帧,在此不再赘述。
针对待处理的视频流中的每个关键帧,可以依次通过图2示出的目标标注流程进行处理。在本步骤201中,当前流程所获取的关键帧为当前关键帧,当前关键帧可以是待处理的视频流中的任一个关键帧。
接着,通过步骤202,使用预先训练的标注模型对当前关键帧进行目标标注,得到针对当前关键帧的标注结果。其中,标注模型用于通过目标边框从图片中标注出包含预定目标的区域。在验车场景中,预定目标可以是车辆部件、车辆损伤等。
标注模型的标注结果可以是图片形式,也可以是文字形式。图片形式例如是,在原 图片的基础上,通过目标边框圈出所标注的目标。目标边框可以是包围连续目标区域的预定形状的最小边框,例如最小矩形框、最小圆形边框等。文字形式例如是,通过文字描述所标注的目标特征。例如,在目标为车辆损伤的情况下,文字形式的标注结果可以是:损伤部件+损伤程度,如保险杠刮擦;损伤材质+损伤程度,如左前车窗碎裂;等等。
根据一个实施方式,在视频流是车辆视频,目标是车辆损伤的情况下,标注模型可以通过以下方式训练:
获取多张车辆图片,各张车辆图片对应各个样本标注结果,其中,在车辆图片中包括车辆损伤的情况下,单个样本标注结果可以包括原图片上的至少一个损伤边框(对应至少一处损伤),损伤边框是包围连续损伤区域的最小矩形框(在其他实施例中也可以是圆形框等),否则,标注结果为空、“无损伤”,或原图片本身;
然后,至少基于这多张具有样本标注结果的车辆图片,训练标注模型。
这样,在本步骤202中,一个关键帧也就是一张图片,将当前关键帧输入标注模型,通过标注模型的处理,得到的输出结果,就可以是对当前关键帧的标注结果。在当前关键帧中不包含预定目标的情况下,针对当前关键帧的标注结果可以为空,或者“无损伤”的文字表示,或原图片本身。
进一步地,在步骤203,基于针对当前关键帧的标注结果,对视频流中当前关键帧之后的非关键帧进行目标标注。其中,这里的非关键帧就是没有被确定为关键帧的图像帧。当前关键帧之后的非关键帧,可以是当前关键帧之后、下一关键帧之前的图像帧。本说明书实施例中,关键帧通过标注模型进行目标标注,非关键帧参考关键帧的标注结果进行目标标注,从而减少数据处理量。
可以理解,视频流中的图像帧通常是按照一定的频率(如每秒24帧)连续采集的,相邻的图像帧之间的画面可能具有一定的相似性。相邻的图像帧可能具有多个相似的区域,即具有较大的相似度。可以理解,如果相邻图像帧相似度较小,则可能产生了某种画面的突变,图像帧的特征改变。这种情况下,关键帧的相邻的图像帧还可以作为关键帧,反映视频流的特征变化。基于此,在本说明书实施例中,可以基于图像帧之间的相似度对非关键帧进行目标标注。
在一个实施例中,可以将当前关键帧之后、下一关键帧之前的各个图像帧依次与当前关键帧比较,确定它们的相似度。如果相似度大于预定阈值,则利用当前关键帧的标注结果标注相应的图像帧。如果相似度小于预定阈值,则将相应图像帧作为关键帧。按 照时间顺序,该新确定的关键帧是当前流程中的当前关键帧的下一关键帧,因此,接下来可以获取该新确定的关键帧为当前关键帧,执行图2示出的目标标注的流程。进一步地,该新确定的关键帧之后的非关键帧,参考该新确定的关键帧的标注结果进行标注。
在另一个实施例中,可以将当前关键帧与当前关键帧之后、下一关键帧之前的非关键帧,进行相邻图像帧之间的比较,确定相邻图像帧的相似度。如果相邻图像帧相似度较高,则利用前一图像帧的标注结果对后一图像帧进行标注,否则,将后一图像帧作为新确定的关键帧,执行图2示出的目标标注的流程。
具体地,可以将相邻的两个图像帧分别称为第一图像帧和第二图像帧,则针对当前关键帧,当前关键帧为初始的第一图像帧,当前关键帧的下一帧为初始的第二图像帧。在对第一图像帧标注完成后,先检测第二图像帧是否为关键帧。在第二图像帧是关键帧的情况下,将第二图像帧作为当前关键帧,执行图2示出的流程。在第二图像帧不是关键帧的情况下,检测第二图像帧与第一图像帧的相似度。
如果第二图像帧与第一图像帧的相似度小于预先设定的相似度阈值,可以将第二图像帧作为当前关键帧,并针对更新后的当前关键帧,利用图2示出的流程进行处理。
如果第二图像帧与第一图像帧的相似度大于预先设定的相似度阈值,将第一图像帧的标注结果映射到第二图像帧,从而得到第二图像帧对应的标注结果。另一方面,用第二图像帧及第二图像帧的下一帧,分别更新第一图像帧、第二图像帧,即,第二图像帧作为新的第一图像帧,第二图像帧的下一帧作为新的第二图像帧。
重复以上过程。直到下列之一的情况出现:
第二图像帧是视频流的最后一帧,不存在下一图像帧(即更新第二图像帧的步骤无法继续进行);或者,
在上述过程中检测到更新后的第二图像帧是关键帧,将第二图像帧作为当前关键帧继续后续处理。
在其他实施例中,也可以通过其他方式,利用当前关键帧的标注结果对当前关键帧之后的非关键帧进行目标标注,例如通过光流(optical flow)之类的方法将当前关键帧的标注结果映射到其他图像帧,在此不再赘述。
其中,在两个图像帧的相似度确定时,可以使用诸如特征点描绘的形状的比对等等方式进行。可以理解,在通过前面的图像帧的标注结果对后面的图像帧进行标注时,主要目的在于在后面的图像帧的目标标注过程中,借鉴前面的图像帧的标注结果,因此, 在一个可选的实施方式中,为了减少数据处理量,可以只在第一图像帧中取一个参考区域,与第二图像帧做相似度判断。
以前述相邻的第一图像帧和第二图像帧为例,具体地,首先可以基于第一图像帧的标注结果,确定第一图像帧中的参考区域。可选地,针对当前关键帧,在针对当前关键帧的标注结果包含目标边框的情况下,将初始的参考区域确定为,目标边框包围的区域;在针对当前关键帧的标注结果不包含目标边框的情况下,将初始的参考区域确定为,当前关键帧中指定位置的区域。该指定位置的区域可以是预先指定的、包含预定个像素的区域,例如第一图像帧中心的9×9的像素区域,第一图像帧左上角的9×9的像素区域等等。在后续的图像帧中,第一图像帧中的参考区域是相应标注结果中标注出的区域。
第一图像帧的参考区域与第二图像帧相应的各个区域的相似度的确定方法可以通过像素值比较等方法进行,也可以通过相似度模型进行,在此不作限定。
下面参考图3,以相似度模型为例,说明第一图像帧的参考区域与第二图像帧相应的各个区域的相似度的确定方法。假设基于第一图像帧的标注结果确定的参考区域为参考区域z,第二图像帧为图像帧x。一方面,对参考区域z(如对应127×127×3的像素阵列)利用预先确定的卷积神经网络φ进行处理,得到第一卷积结果(如6×6×128的特征阵列),另一方面,对图像帧x(如对应255×255×3的像素阵列)也利用相同的卷积神经网络φ进行处理,得到第二卷积结果(如22×22×128的特征阵列)。进一步地,将第一卷积结果作为卷积核,对第二卷积结果进行卷积处理,得到第三卷积结果(如17×17×1的数值阵列)。可以理解,在通过卷积核对一个阵列做卷积时,与卷积核越相似的阵列,得到的数值越大。因此,第三卷积结果对应的数值阵列中,各个数值分别描绘出第二卷积结果中相应阵列与第一卷积结果的阵列的相似度。第二卷积结果是对第二图像帧的处理结果,那么第二卷积结果中的各个子阵列都可以对应到第二图像帧中的各个区域。同时,第三卷积结果中的各个数值又分别对应到第二卷积结果中的各个子阵列。于是,第三卷积结果可以看作是第二图像帧中各个相应区域与第一图像帧的参考区域的相似度的分布阵列。第三卷积结果的数值阵列中,数值越大,第二图像帧中相应区域与第一图像帧的参考区域的相似度越大。由于基于第一图像帧的标注结果对第二图像帧进行标注,因此,只需判断第二图像帧中是否存在与第一图像帧的参考区域对应的区域即可。通常,如果在第二图像帧中存在与第一图像帧的参考区域对应的区域,则该区域是第二图像帧中与第一图像帧的参考区域相似度最高的区域。如此,可以基于第三卷积结果对应的数值阵列中的最大数值,确定第二图像帧与第一关键图像帧的相似度。该 最大值对应着第二图像帧中与第一图像帧的参考区域最相似的区域。第二图像帧与第一关键图像帧的相似度可以是该最大数值本身,也可以是将第三卷积结果对应的数值阵列中的各个数值进行归一化处理后,该最大数值对应的小数/分数值。
可以预先设定有两个图像帧中相同区域的相似度阈值,这样,如果上述过程确定的相似度大于该相似度阈值,则表明第二图像帧中具有与第一图像帧的参考区域对应的区域,例如都是左前车灯。否则,如果上述过程确定的相似度小于相似度阈值,表明第二图像帧中不包含与第一图像帧的参考区域对应的区域。
在第二图像帧中不包含与第一图像帧的参考区域对应的区域的情况下,可能在第二图像帧与第一图像帧之间存在画面的突变。如果不对第二图像帧进行标注,可能会漏掉重要信息。因此,此时,可以将第二图像帧加入到视频流的关键帧中。并且,按照时间顺序,在下一流程中,获取该第二图像帧作为当前关键帧进行目标标注。
可以理解,根据以上描述,每个图像帧都可以对应有参考区域,但参考区域的实际意义是有所区别的。例如,在车辆检测场景下,通常,如果当前关键帧对应的标注模型的标注结果包含目标边框,则说明车辆某个部件或某块材质存在高置信度的损伤,这个结果是可以提供给人工进行参考的,或者对决策产生影响的。此时的参考区域指示出可提供高置信度的预定目标。而通过指定位置方式得到的参考区域也可能包含边框,但该边框包围的区域是为了给之后的图像帧的标注提供参考,而不指示预定目标,在车辆检测场景下即不存在损伤。因此,当前关键帧的标注结果还可以对应有置信标识。标注模型的输出结果包含目标边框,参考区域指示出可提供高置信度的预定目标的情况下,当前关键帧的置信标识为高置信度标识。在图1示出的车辆检测场景下,高置信度标识代表着较高可能的车辆损伤,相应图像帧可以输出给人工查验平台参考。在当前关键帧对应的标注模型的输出结果不包含目标边框时,指定位置区域被确定为参考区域,当前关键帧的置信标识可以为低置信度标识。在车辆检测场景下,参考标注对应着置信度较低,或者置信度为0的车辆损伤。
本领域技术人员容易理解,关键帧之后的非关键帧的标注结果是参考关键帧的标注结果进行的标注,因此,后续非关键帧的标注结果的置信标识都可以与当前关键帧的置信标识一致。在最终决策时,可以参考置信标识进行。此时,图2示出的目标表标注流程还可以包括,将对应有高置信度标识的图像帧加入目标标注集。目标标注集用于输出给人工查验,或者用于计算机的智能决策。
为了更清楚地理解本说明书实施例的技术构思,请参考图4。在一个具体的实施流 程中,如图4所示,针对接收到的视频流,首先提取关键帧。然后,按时间顺序获取其中一个关键帧为当前关键帧。当前关键帧经过标注模型处理,得到当前关键帧的标注结果。对标注结果判断是否输出目标边框。若是,将目标边框内的区域作为参考区域,同时,为当前关键帧设置高置信标识,如通过一个置信标识flag置1表示,并将当前关键帧加入预标注结果集。否则,为当前关键帧设置低置信标识,如将一个置信标识flag置0。接着,基于当前关键帧的标注结果为下以图像帧进行标注。
首先,判断下一图像帧是否为关键帧。若是,将下一图像帧获取为当前关键帧,继续流程。否则,当前关键帧为当前帧,检测当前帧和下一帧的相似度。如果相似度小于预设相似度阈值,将下一帧加入关键帧,获取下一帧作为当前关键帧,继续流程。否则,相似度大于预设相似度阈值,利用当前帧的标注结果标注下一帧,并且下一帧继承当前帧的置信标识。检测该下一帧的置信标识是否为高置信标识。若该下一帧的置信标识是高置信标识,将该下一帧加入预标注结果,并利用下一帧和下一帧的下一帧更新当前帧和下一帧,继续流程,直至检测到关键帧,或者视频流结束。若该下一帧的置信标识不是高置信标识,利用下一帧和下一帧的下一帧更新当前帧和下一帧,继续流程,直至检测到关键帧,或者视频流结束。
根据以上描述,可以理解的是,在本说明书的技术构思下,图2示出的目标标注流程是不可绕过的标注流程,但不是对每一个关键帧都必须完全执行的标注流程。例如,在当前关键帧的下一图像帧也是关键帧的情况下,当前关键帧和下一关键帧没有间隔的非关键帧,步骤203中,基于针对当前关键帧的标注结果,对视频流中当前关键帧之后的非关键帧进行目标标注,就无需执行。
回顾以上过程,在目标标注的过程中,仅通过标注模型处理视频流中的关键帧,对于非关键帧,通过关键帧的标注结果进行标注,从而大大减少数据处理量。进一步地,在非关键帧标注过程中,对于相似度较高的图像帧,可以将标注结果进行迁移,对相似度较低的图像帧,可以作为关键帧通过标注模型重新标注,从而得到更准确的标注结果。如此,可以提供更有效的目标标注。
根据另一方面的实施例,还提供一种目标标注的装置。图5示出根据一个实施例的目标标注的装置的示意性框图。如图5所示,用于目标标注的装置500包括:获取单元51,配置为获取当前关键帧,当前关键帧是从视频流的各个图像帧中确定的多个关键帧中的一个;第一标注单元52,配置为使用预先训练的标注模型对当前关键帧进行目标标注,得到针对当前关键帧的标注结果,标注模型用于通过目标边框从图片中标注出包含 预定目标的区域;第二标注单元53,配置为基于针对当前关键帧的标注结果,对视频流中当前关键帧之后的非关键帧进行目标标注。
根据一个实施方式,装置500还包括提取单元(未示出),配置为通过以下任一种方式提取初始的多个关键帧:
按照预定时间间隔从视频流中选择出多个图像帧作为关键帧;
将视频流输入预先训练的抽帧模型,根据抽帧模型的输出结果确定出多个关键帧。
在一个实施中,视频流是车辆视频,目标是车辆损伤,装置500还可以包括训练单元(未示出),配置为通过以下方式训练标注模型:
获取多张车辆图片,各张车辆图片对应各个样本标注结果,其中,在车辆图片中包括车辆损伤的情况下,单个样本标注结果包括至少一个损伤边框,损伤边框是包围连续损伤区域的最小矩形框;
至少基于多张车辆图片,训练标注模型。
根据一个可能的设计,在视频流中,为了描述方便,将相邻的关键帧分别记为第一图像帧和第二图像帧,针对当前关键帧,初始的第一图像帧为当前关键帧,初始的第二图像帧为当前关键帧的下一帧;
第二标注单元53还配置为:
在对第一图像帧标注完成后,检测第二图像帧是否为关键帧;
在第二图像帧不是关键帧的情况下,检测第二图像帧与第一图像帧的相似度;
如果第二图像帧与第一图像帧的相似度大于预先设定的相似度阈值,将第一图像帧对应的标注结果映射到第二图像帧,从而得到第二图像帧对应的标注结果;
用第二图像帧与第二图像帧的下一帧,分别更新第一图像帧与第二图像帧,基于更新后的第一图像帧的标注结果对更新后的第二图像帧进行目标标注。
如果第二图像帧与第一图像帧的相似度小于预先设定的相似度阈值,将第二图像帧确定为关键帧。
在一个进一步的实施例中,第二标注单元53还配置为,通过以下方式确定第二图像帧与第一图像帧的相似度:
基于第一图像帧的标注结果,确定第一图像帧中的参考区域;
利用预先确定的卷积神经网络分别处理第一图像帧中的参考区域、第二图像帧,并分别得到第一卷积结果、第二卷积结果;
将第一卷积结果作为卷积核,对第二卷积结果进行卷积处理,得到第三卷积结果,第三卷积结果对应的数值阵列中,各个数值分别描绘出第二图像帧的相应区域与第一图像帧的参考区域的各个相似度;
基于第三卷积结果对应的数值阵列中的最大数值,确定第二图像帧与第一图像帧的相似度。
在一个实施例中,在第二图像帧与第一图像帧的相似度大于预先设定的相似度阈值的情况下,第二标注单元53进一步配置为:
按照第一图像帧的标注结果,标注出最大数值对应到的第二图像帧的图像区域。
在一个实施例中,第二标注单元53还配置为:
在第一标注结果包含目标边框的情况下,将初始的参考区域确定为,目标边框包围的区域;
在第一标注结果不包含目标边框的情况下,将初始的参考区域确定为,当前关键帧中指定位置的区域。
在一个进一步地实施例中,当前关键帧还对应有置信标识,第二标注单元进一步配置为:
确定当前关键帧之后、下一关键帧之前的各个非关键帧的置信标识,与当前关键帧的标注结果对应的置信标识一致。
其中,置信标识包括,高置信度标识、低置信度标识。高置信度标识对应标注模型针对相应关键帧的输出结果包含目标边框,参考区域指示出可提供高置信度的预定目标的情况,低置信度标识对应标注模型针对相应关键帧的输出结果不包含目标边框,参考区域不指示预定目标的情况。
此时,装置500还可以包括标注结果确定单元(未示出),配置为:
将对应有高置信度标识的图像帧加入标注结果集。
值得说明的是,图5所示的装置500是与图2示出的方法实施例相对应的装置实施例,图2示出的方法实施例中的相应描述同样适用于装置500,在此不再赘述。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本说明书的技术构思的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本说明书的技术构思的具体实施方式而已,并不用于限定本说明书的技术构思的保护范围,凡在本说明书实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书的技术构思的保护范围之内。

Claims (22)

  1. 一种基于视频流进行目标标注的方法,所述方法包括:
    获取当前关键帧,所述当前关键帧是从所述视频流的各个图像帧中确定的多个关键帧中的一个;
    使用预先训练的标注模型对所述当前关键帧进行目标标注,得到针对所述当前关键帧的标注结果,所述标注模型用于通过目标边框从图片中标注出包含预定目标的区域;
    基于针对所述当前关键帧的标注结果,对所述视频流中所述当前关键帧之后的非关键帧进行目标标注。
  2. 根据权利要求1所述的方法,其中,初始的多个关键帧通过以下任一种方式提取:
    按照预定时间间隔从所述视频流中选择出多个图像帧作为关键帧;
    将所述视频流输入预先训练的抽帧模型,根据所述抽帧模型的输出结果确定出多个关键帧。
  3. 根据权利要求1所述的方法,其中,所述视频流是车辆视频,所述目标是车辆损伤,所述标注模型通过以下方式训练:
    获取多张车辆图片,各张车辆图片对应各个样本标注结果,其中,在车辆图片中包括车辆损伤的情况下,单个样本标注结果包括至少一个损伤边框,所述损伤边框是包围连续损伤区域的最小矩形框;
    至少基于所述多张车辆图片,训练所述标注模型。
  4. 根据权利要求1-3任一所述的方法,其中,在所述视频流中,相邻的关键帧分别记为第一图像帧和第二图像帧,针对所述当前关键帧,初始的第一图像帧为所述当前关键帧,初始的第二图像帧为所述当前关键帧的下一帧;
    所述基于针对所述当前关键帧的标注结果,对所述视频流中,所述当前关键帧之后的非关键帧进行目标标注包括:
    在对所述第一图像帧标注完成后,检测所述第二图像帧是否为关键帧;
    在所述第二图像帧不是关键帧的情况下,检测所述第二图像帧与所述第一图像帧的相似度;
    如果所述第二图像帧与所述第一图像帧的相似度大于预先设定的相似度阈值,将所述第一图像帧对应的标注结果映射到所述第二图像帧,从而得到所述第二图像帧对应的标注结果;
    用所述第二图像帧与所述第二图像帧的下一帧,分别更新所述第一图像帧与所述第二图像帧,基于更新后的第一图像帧的标注结果对更新后的第二图像帧进行目标标注。
  5. 根据权利要求4所述的方法,其中,如果所述第二图像帧与所述第一图像帧的相似度小于所述相似度阈值,将所述第二图像帧确定为关键帧。
  6. 根据权利要求4所述的方法,其中,确定所述第二图像帧与所述第一图像帧的相似度包括:
    基于所述第一图像帧的标注结果,确定所述第一图像帧中的参考区域;
    利用预先确定的卷积神经网络分别处理所述第一图像帧中的参考区域、所述第二图像帧,并分别得到第一卷积结果、第二卷积结果;
    将所述第一卷积结果作为卷积核,对所述第二卷积结果进行卷积处理,得到第三卷积结果,所述第三卷积结果对应的数值阵列中,各个数值分别描绘出所述第二图像帧的相应区域与所述第一图像帧的参考区域的各个相似度;
    基于所述第三卷积结果对应的数值阵列中的最大数值,确定所述第二图像帧与所述第一图像帧的相似度。
  7. 根据权利要求6所述的方法,其中,在所述第二图像帧与所述第一图像帧的相似度大于预先设定的相似度阈值的情况下,所述将所述第一图像帧对应的标注结果映射到所述第二图像帧,从而得到所述第二图像帧对应的第二标注结果包括:
    按照所述第一图像帧的标注结果,标注出所述最大数值对应到的所述第二图像帧的图像区域。
  8. 根据权利要求6所述的方法,其中,所述基于所述第一图像帧的标注结果,确定所述第一图像帧中的参考区域包括:
    在所述第一标注结果包含目标边框的情况下,将初始的参考区域确定为,所述目标边框包围的区域;
    在所述第一标注结果不包含目标边框的情况下,将初始的参考区域确定为,所述当前关键帧中指定位置的区域。
  9. 根据权利要求8所述的方法,其中,所述当前关键帧还对应有置信标识,所述基于针对所述当前关键帧的标注结果,对所述视频流中,所述当前关键帧之后的非关键帧进行目标标注包括:
    确定所述当前关键帧之后、下一关键帧之前的各个非关键帧的置信标识,与所述当前关键帧的标注结果对应的置信标识一致。
  10. 根据权利要求9所述的方法,其中,所述置信标识包括,高置信度标识、低置信度标识,其中,所述高置信度标识对应所述标注模型针对相应关键帧的输出结果包含目标边框,参考区域指示出可提供高置信度的预定目标的情况,所述低置信度标识对应所 述标注模型针对相应关键帧的输出结果不包含目标边框,参考区域不指示预定目标的情况;
    所述方法还包括:
    将对应有高置信度标识的图像帧加入目标标注集。
  11. 一种基于视频流进行目标标注的装置,所述装置包括:
    获取单元,配置为获取当前关键帧,所述当前关键帧是从所述视频流的各个图像帧中确定的多个关键帧中的一个;
    第一标注单元,配置为使用预先训练的标注模型对所述当前关键帧进行目标标注,得到针对所述当前关键帧的标注结果,所述标注模型用于通过目标边框从图片中标注出包含预定目标的区域;
    第二标注单元,配置为基于针对所述当前关键帧的标注结果,对所述视频流中所述当前关键帧之后的非关键帧进行目标标注。
  12. 根据权利要求11所述的装置,其中,所述装置还包括提取单元,配置为通过以下任一种方式提取初始的多个关键帧:
    按照预定时间间隔从所述视频流中选择出多个图像帧作为关键帧;
    将所述视频流输入预先训练的抽帧模型,根据所述抽帧模型的输出结果确定出多个关键帧。
  13. 根据权利要求11所述的装置,其中,所述视频流是车辆视频,所述目标是车辆损伤,所述装置还包括训练单元,配置为通过以下方式训练所述标注模型:
    获取多张车辆图片,各张车辆图片对应各个样本标注结果,其中,在车辆图片中包括车辆损伤的情况下,单个样本标注结果包括至少一个损伤边框,所述损伤边框是包围连续损伤区域的最小矩形框;
    至少基于所述多张车辆图片,训练所述标注模型。
  14. 根据权利要求11-13任一所述的装置,其中,在所述视频流中,相邻的关键帧分别记为第一图像帧和第二图像帧,针对所述当前关键帧,初始的第一图像帧为所述当前关键帧,初始的第二图像帧为所述当前关键帧的下一帧;
    所述第二标注单元还配置为:
    在对所述第一图像帧标注完成后,检测所述第二图像帧是否为关键帧;
    在所述第二图像帧不是关键帧的情况下,检测所述第二图像帧与所述第一图像帧的相似度;
    如果所述第二图像帧与所述第一图像帧的相似度大于预先设定的相似度阈值,将所 述第一图像帧对应的标注结果映射到所述第二图像帧,从而得到所述第二图像帧对应的标注结果;
    用所述第二图像帧与所述第二图像帧的下一帧,分别更新所述第一图像帧与所述第二图像帧,基于更新后的第一图像帧的标注结果对更新后的第二图像帧进行目标标注。
  15. 根据权利要求14所述的装置,其中,如果所述第二图像帧与所述第一图像帧的相似度小于所述相似度阈值,将所述第二图像帧确定为关键帧。
  16. 根据权利要求14所述的装置,其中,所述第二标注单元还配置为,通过以下方式确定所述第二图像帧与所述第一图像帧的相似度:
    基于所述第一图像帧的标注结果,确定所述第一图像帧中的参考区域;
    利用预先确定的卷积神经网络分别处理所述第一图像帧中的参考区域、所述第二图像帧,并分别得到第一卷积结果、第二卷积结果;
    将所述第一卷积结果作为卷积核,对所述第二卷积结果进行卷积处理,得到第三卷积结果,所述第三卷积结果对应的数值阵列中,各个数值分别描绘出所述第二图像帧的相应区域与所述第一图像帧的参考区域的各个相似度;
    基于所述第三卷积结果对应的数值阵列中的最大数值,确定所述第二图像帧与所述第一图像帧的相似度。
  17. 根据权利要求16所述的装置,其中,在所述第二图像帧与所述第一图像帧的相似度大于预先设定的相似度阈值的情况下,所述第二标注单元进一步配置为:
    按照所述第一图像帧的标注结果,标注出所述最大数值对应到的所述第二图像帧的图像区域。
  18. 根据权利要求16所述的装置,其中,所述第二标注单元还配置为:
    在所述第一标注结果包含目标边框的情况下,将初始的参考区域确定为,所述目标边框包围的区域;
    在所述第一标注结果不包含目标边框的情况下,将初始的参考区域确定为,所述当前关键帧中指定位置的区域。
  19. 根据权利要求18所述的装置,其中,所述当前关键帧还对应有置信标识,所述第二标注单元进一步配置为:
    确定所述当前关键帧与之后、下一关键帧之前的各个非关键帧的置信标识,与所述当前关键帧的标注结果对应的置信标识一致。
  20. 根据权利要求19所述的装置,其中,所述置信标识包括,高置信度标识、低置信度标识,其中,所述高置信度标识对应所述标注模型针对相应关键帧的输出结果包含 目标边框,参考区域指示出可提供高置信度的预定目标的情况,所述低置信度标识对应所述标注模型针对相应关键帧的输出结果不包含目标边框,参考区域不指示预定目标的情况;
    所述装置还包括标注结果确定单元,配置为:
    将对应有高置信度标识的图像帧加入标注结果集。
  21. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-10中任一项的所述的方法。
  22. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-10中任一项所述的方法。
PCT/CN2020/093958 2019-09-20 2020-06-02 目标标注的方法及装置 WO2021051885A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910891996.2A CN110705405B (zh) 2019-09-20 2019-09-20 目标标注的方法及装置
CN201910891996.2 2019-09-20

Publications (1)

Publication Number Publication Date
WO2021051885A1 true WO2021051885A1 (zh) 2021-03-25

Family

ID=69196186

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093958 WO2021051885A1 (zh) 2019-09-20 2020-06-02 目标标注的方法及装置

Country Status (2)

Country Link
CN (1) CN110705405B (zh)
WO (1) WO2021051885A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609316A (zh) * 2021-07-27 2021-11-05 支付宝(杭州)信息技术有限公司 媒体内容相似度的检测方法和装置
CN113657173A (zh) * 2021-07-20 2021-11-16 北京搜狗科技发展有限公司 一种数据处理方法、装置和用于数据处理的装置
CN113792600A (zh) * 2021-08-10 2021-12-14 武汉光庭信息技术股份有限公司 一种基于深度学习的视频抽帧方法和系统

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705405B (zh) * 2019-09-20 2021-04-20 创新先进技术有限公司 目标标注的方法及装置
CN111918016A (zh) * 2020-07-24 2020-11-10 武汉烽火众智数字技术有限责任公司 一种视频通话中高效的实时画面标注方法
CN112053323A (zh) * 2020-07-31 2020-12-08 上海图森未来人工智能科技有限公司 单镜头多帧图像数据物体追踪标注方法和装置、存储介质
CN112533060B (zh) * 2020-11-24 2023-03-21 浙江大华技术股份有限公司 一种视频处理方法及装置
CN113343857B (zh) * 2021-06-09 2023-04-18 浙江大华技术股份有限公司 标注方法、装置、存储介质及电子装置
CN115482426A (zh) * 2021-06-16 2022-12-16 华为云计算技术有限公司 视频标注方法、装置、计算设备和计算机可读存储介质
CN113378958A (zh) * 2021-06-24 2021-09-10 北京百度网讯科技有限公司 自动标注方法、装置、设备、存储介质及计算机程序产品
CN113506610A (zh) * 2021-07-08 2021-10-15 联仁健康医疗大数据科技股份有限公司 标注规范生成方法、装置、电子设备及存储介质
CN113657307A (zh) * 2021-08-20 2021-11-16 北京市商汤科技开发有限公司 一种数据标注方法、装置、计算机设备及存储介质
CN113660469A (zh) * 2021-08-20 2021-11-16 北京市商汤科技开发有限公司 一种数据标注方法、装置、计算机设备及存储介质
CN115640422B (zh) * 2022-11-29 2023-12-22 深圳有影传媒有限公司 一种网络传媒视频数据分析监管系统
CN116189063B (zh) * 2023-04-24 2023-07-18 青岛润邦泽业信息技术有限公司 一种用于智能视频监控的关键帧优化方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385640A (zh) * 2016-08-31 2017-02-08 北京旷视科技有限公司 视频标注方法及装置
CN106682595A (zh) * 2016-12-14 2017-05-17 南方科技大学 一种图像内容标注方法和装置
CN110033011A (zh) * 2018-12-14 2019-07-19 阿里巴巴集团控股有限公司 车祸事故处理方法和装置、电子设备
CN110705405A (zh) * 2019-09-20 2020-01-17 阿里巴巴集团控股有限公司 目标标注的方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471852B1 (en) * 2015-11-11 2016-10-18 International Business Machines Corporation User-configurable settings for content obfuscation
CN106375870B (zh) * 2016-08-31 2019-09-17 北京旷视科技有限公司 视频标注方法及装置
CN107610091A (zh) * 2017-07-31 2018-01-19 阿里巴巴集团控股有限公司 车险图像处理方法、装置、服务器及系统
US20190130583A1 (en) * 2017-10-30 2019-05-02 Qualcomm Incorporated Still and slow object tracking in a hybrid video analytics system
CN109684956A (zh) * 2018-12-14 2019-04-26 深源恒际科技有限公司 一种基于深度神经网络的车辆损伤检测方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385640A (zh) * 2016-08-31 2017-02-08 北京旷视科技有限公司 视频标注方法及装置
CN106682595A (zh) * 2016-12-14 2017-05-17 南方科技大学 一种图像内容标注方法和装置
CN110033011A (zh) * 2018-12-14 2019-07-19 阿里巴巴集团控股有限公司 车祸事故处理方法和装置、电子设备
CN110705405A (zh) * 2019-09-20 2020-01-17 阿里巴巴集团控股有限公司 目标标注的方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657173A (zh) * 2021-07-20 2021-11-16 北京搜狗科技发展有限公司 一种数据处理方法、装置和用于数据处理的装置
CN113657173B (zh) * 2021-07-20 2024-05-24 北京搜狗科技发展有限公司 一种数据处理方法、装置和用于数据处理的装置
CN113609316A (zh) * 2021-07-27 2021-11-05 支付宝(杭州)信息技术有限公司 媒体内容相似度的检测方法和装置
CN113792600A (zh) * 2021-08-10 2021-12-14 武汉光庭信息技术股份有限公司 一种基于深度学习的视频抽帧方法和系统
CN113792600B (zh) * 2021-08-10 2023-07-18 武汉光庭信息技术股份有限公司 一种基于深度学习的视频抽帧方法和系统

Also Published As

Publication number Publication date
CN110705405A (zh) 2020-01-17
CN110705405B (zh) 2021-04-20

Similar Documents

Publication Publication Date Title
WO2021051885A1 (zh) 目标标注的方法及装置
CN108229509B (zh) 用于识别物体类别的方法及装置、电子设备
US11748399B2 (en) System and method for training a damage identification model
CN113160257B (zh) 图像数据标注方法、装置、电子设备及存储介质
CN110569856B (zh) 样本标注方法及装置、损伤类别的识别方法及装置
CN111639629B (zh) 一种基于图像处理的猪只体重测量方法、装置及存储介质
TWI709085B (zh) 用於對車輛損傷影像進行損傷分割的方法、裝置、電腦可讀儲存媒體和計算設備
WO2020238256A1 (zh) 基于弱分割的损伤检测方法及装置
CN103870597B (zh) 一种无水印图片的搜索方法及装置
WO2020253508A1 (zh) 异常细胞检测方法、装置及计算机可读存储介质
CN112446363A (zh) 一种基于视频抽帧的图像拼接与去重方法及装置
CN115359239A (zh) 风电叶片缺陷检测定位方法、装置、存储介质和电子设备
CN110599453A (zh) 一种基于图像融合的面板缺陷检测方法、装置及设备终端
CN113515655A (zh) 一种基于图像分类的故障识别方法及装置
CN116612417A (zh) 利用视频时序信息的特殊场景车道线检测方法及装置
CN113902740A (zh) 图像模糊程度评价模型的构建方法
CN112967224A (zh) 一种基于人工智能的电子电路板检测系统、方法及介质
CN114821513B (zh) 一种基于多层网络的图像处理方法及装置、电子设备
CN116596903A (zh) 缺陷识别方法、装置、电子设备及可读存储介质
CN115457585A (zh) 作业批改的处理方法、装置、计算机设备及可读存储介质
CN114359931A (zh) 一种快递面单识别方法、装置、计算机设备及存储介质
CN109214398B (zh) 一种从连续图像中量测杆体位置的方法和系统
CN112232272B (zh) 一种激光与视觉图像传感器融合的行人识别方法
CN117218162B (zh) 一种基于ai的全景追踪控视系统
CN112634349A (zh) 一种基于遥感影像的茶园面积估计方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20866283

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20866283

Country of ref document: EP

Kind code of ref document: A1