CN113707280B - Method, device, medium and computing equipment for expanding labeled data set - Google Patents

Method, device, medium and computing equipment for expanding labeled data set Download PDF

Info

Publication number
CN113707280B
CN113707280B CN202111264090.1A CN202111264090A CN113707280B CN 113707280 B CN113707280 B CN 113707280B CN 202111264090 A CN202111264090 A CN 202111264090A CN 113707280 B CN113707280 B CN 113707280B
Authority
CN
China
Prior art keywords
image
frame
frame image
data set
original video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111264090.1A
Other languages
Chinese (zh)
Other versions
CN113707280A (en
Inventor
赵秋
曾凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuanwei Beijing Biotechnology Co ltd
Original Assignee
Xuanwei Beijing Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuanwei Beijing Biotechnology Co ltd filed Critical Xuanwei Beijing Biotechnology Co ltd
Priority to CN202111264090.1A priority Critical patent/CN113707280B/en
Publication of CN113707280A publication Critical patent/CN113707280A/en
Application granted granted Critical
Publication of CN113707280B publication Critical patent/CN113707280B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method, a device, a medium and a computing device for expanding a labeled data set, wherein the method for expanding the labeled data set comprises the following steps: acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image; comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video; intercepting frame image sequences in a preset time period before and after a corresponding frame image from an original video; and generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and adding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set. The method and the device have the advantages that the data set is expanded rapidly, the data labeling efficiency is improved, and the development cost of the data set is reduced.

Description

Method, device, medium and computing equipment for expanding labeled data set
Technical Field
The embodiment of the invention relates to the field of image data expansion, in particular to an annotation data set expansion method, device, medium and computing equipment.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
Ultrasonic instruments have become one of the rapid, safe and low-cost medical diagnostic tools in the modern medical industry, and ultrasonic endoscopic images are different from white light or electronic dyeing video images, and require doctors to have corresponding anatomical structure knowledge and accumulate a large amount of experience to understand the ultrasonic endoscopic images, so that the number of doctors mastering ultrasonic scanning skills is very limited. In recent years, the combination of artificial intelligence and medicine has promoted the rapid development of the medical industry due to the continuous development of artificial intelligence. The data set for training the artificial intelligence image recognition model usually needs tens of thousands or hundreds of thousands of images, the images are usually obtained by intercepting and labeling videos or images by a special data labeling company or a internist, and because labeling personnel need to have corresponding business knowledge, the problems of insufficient number of labeling personnel exist, the labor efficiency is low, errors are easy to occur, and the problems of low labeling precision, low speed, high research and development cost and the like exist.
At present, some annotation data expansion methods have appeared, but the principle of these methods is to convert images in a mode that a tag remains unchanged, or to define conversion for both images and tags, but the expansion data obtained by the conversion mode fails to significantly improve the segmentation performance of medical images, resulting in that the expanded annotation data has little meaning for artificial intelligence-based medical research.
Disclosure of Invention
In view of the above problems in the prior art, it is an object of the present disclosure to provide an annotation data set expansion method, apparatus, medium, and computing device to solve at least the problems in the prior art.
In a first aspect of the embodiments of the present invention, there is provided an annotation data set expansion method, including:
acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;
comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;
intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video;
and generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set.
In one embodiment of the invention, the annotation data set expansion method comprises the following steps:
the generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image comprises:
acquiring an annotation file of a corresponding frame image of the annotated image in the original video according to the annotation file corresponding to the annotated image;
determining a plurality of labeling areas in the corresponding frame image according to the labeling file of the corresponding frame image;
calculating the characteristic value of each labeling area;
and generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation region.
Further, the calculating the feature value of each labeled region includes:
converting the images in the range of the plurality of marked areas into a gray scale image;
obtaining all feature points in each marked region range by using a scale invariant feature transformation algorithm on the gray level image in each marked region range;
and generating a characteristic value array of each labeled area according to all the characteristic points in the range of each labeled area.
Further, the generating an annotation file for each frame image in the frame image sequence according to the feature value of each annotation region includes:
generating a new labeled area corresponding to each frame image in each frame image of the frame image sequence according to the characteristic value array of each labeled area;
and generating a labeling file according to the new labeling area corresponding to each frame image.
Further, the generating a new labeled region corresponding to each frame image in each frame image of the frame image sequence according to the feature value array of each labeled region includes:
acquiring the coordinates of each characteristic point in the characteristic value array of each marked area;
splitting the coordinates of each characteristic point, and recombining the split abscissa and ordinate to obtain a new abscissa array and a new ordinate array;
respectively obtaining the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array:
and generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array.
Further, the generating a new labeled area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array includes:
acquiring a bounding box of a new labeling area according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array;
calculating the distance between the surrounding frame of the new marked area and the coordinate vertex of the surrounding frame of the marked area;
judging whether the coordinate vertex distance is smaller than a preset transformation threshold value or not;
if so, reserving a new labeling area, and storing the new labeling area into a labeling file.
In another embodiment of the present invention, the annotation data set expansion method comprises:
the comparing the marked image with each frame of image in the original video to obtain the corresponding frame of image of the marked image in the original video includes:
acquiring a timestamp of the marked image and a timestamp of each frame image in the original video;
comparing the time stamp of the marked image with the time stamp of each frame image in the original video;
and taking the frame image in the original video, which has the same time stamp as the marked image, as the corresponding frame image of the marked image in the original video.
Further, the obtaining the timestamp of the labeled image includes:
traversing each image in the labeled data set, and intercepting a first timestamp region of interest of each image;
identifying a first timestamp string in the first timestamp region of interest;
and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.
Further, said intercepting the first time-stamped region of interest of each image comprises:
intercepting the first time stamp region of interest of each image by an OCR algorithm.
Further, the identifying a first timestamp string in the first timestamp interest region includes:
a first time stamp character string in the first time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.
Further, acquiring a time stamp of each frame image in the original video, including:
reading an original video, and acquiring each frame of image in the original video;
intercepting a second time stamp region of interest of each frame of image;
identifying a second timestamp string in the second timestamp region of interest;
and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.
Further, the taking the frame image of the original video with the same timestamp as the annotated frame image in the original video as the corresponding frame image of the annotated frame image includes:
playing each frame of image in the original video according to the sequence of the frame number;
in the playing process, comparing the time stamp of each frame of image in the original video with the time stamp of the labeled image;
and if the time stamp of each frame of image in the original video is the same as that of the marked image, taking the frame image with the same time stamp as the marked image as the corresponding frame image of the marked image in the original video.
In yet another embodiment of the present invention, the annotation data set augmentation method comprises:
and screening a plurality of frame images in the frame image sequence to obtain effective frame images.
Further, the screening the plurality of frame images in the frame image sequence to obtain the valid frame image includes:
calculating an information entropy matrix of each frame image in the frame image sequence;
calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image;
and screening out effective frame images according to the information entropy matrix change rate.
Further, the screening out the effective frame image according to the information entropy matrix change rate includes:
calculating a derivative of a rate-of-change fitting function;
sorting the change rates of the information entropy matrix from large to small according to the derivative of the change rate fitting function;
and taking the preset number of frame images sequenced in the front as effective frame images.
Further, the calculating an information entropy matrix of each frame image in the frame image sequence includes:
carrying out equal-scale reduction processing on each frame image according to a preset reduction scale to obtain a reduced image;
acquiring a coordinate range of the reduced image;
calculating the information entropy of the adjacent matrix according to the coordinate range of the reduced image;
and obtaining an information entropy matrix of each frame image according to the information entropy of the adjacent matrix.
Further, the preset reduction ratio is 1/10.
Further, the calculating the adjacency matrix information entropy according to the coordinate range of the reduced image includes:
calculating an integral matrix when the adjacent matrix traverses each pixel position of the reduced image from the central point according to the coordinate range of the reduced image;
flattening the whole matrix to obtain a flat function;
calculating the ratio of the points in the flat function, which are the same as the pixels of the reduced image, in the whole matrix;
and calculating the information entropy of the adjacency matrix according to the ratio.
Further, the calculating the information entropy matrix change rate of the adjacent frame image according to the information entropy matrix of each frame image includes:
obtaining correlation coefficients of two adjacent matrixes;
and calculating the information entropy matrix change rate of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes.
Further, the obtaining the correlation coefficients of the two adjacent matrices includes:
and obtaining the correlation coefficient of the two adjacent matrixes by adding an entropy self-defining formula of noise.
Further, the obtaining the correlation coefficients of the two adjacent matrices includes:
and calculating the correlation coefficients of the two adjacent matrixes by one or more methods of cosine similarity, adjustment of cosine similarity, Pearson correlation coefficient, Jaccard similarity coefficient, Tanimoto coefficient, log likelihood similarity, mutual information/information gain and relative entropy/KL divergence.
In a second aspect of the embodiments of the present invention, there is provided an annotation data set extension apparatus including:
the acquisition module is used for acquiring an original video and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;
the comparison module is used for comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;
the intercepting module is used for intercepting a frame image sequence in a preset time period before and after the corresponding frame image from the original video;
and the expansion module is used for generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image sequence to the annotated data set.
In a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program for executing the method of any one of the first aspect above.
In a fourth aspect of embodiments of the present invention, there is provided a computing device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to perform the method of any of the first aspect.
According to the method, the device, the medium and the computing equipment for expanding the annotated data set of the embodiment of the invention, the original video is obtained, the annotated data set is generated according to the original video, the annotated image in the annotated data set is compared with each frame image in the original video, the corresponding frame image of the annotated image in the original video is obtained, the frame image sequence in the preset time period before and after the corresponding frame image is intercepted from the original video, the annotated file is generated for each frame image in the frame image sequence according to the annotated file corresponding to the annotated image, each frame image in the frame image sequence and the corresponding annotated file thereof are expanded to the annotated data set, the rapid expansion of the data set can be realized, the annotation efficiency of the data is improved, the labor expense is reduced, the development cost of the data set is reduced, and the expanded annotated data can be rapidly applied to the annotation work of the data set of the endoscope artificial intelligence, and the training period of the relevant artificial intelligence model is shortened.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 is a flow chart illustrating an augmentation method for annotated data set according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an augmentation apparatus for annotated data set according to an embodiment of the present invention;
FIG. 3 schematically shows a schematic of the structure of a medium according to an embodiment of the invention;
FIG. 4 schematically illustrates a structural diagram of a computing device of an embodiment of the invention;
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a method, a device, a medium and a computing device for expanding a labeled data set are provided.
In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Exemplary method
A method for augmentation of annotated data sets according to an exemplary embodiment of the invention is described below with reference to FIG. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.
The invention is further described below with reference to specific embodiments.
The embodiment of the invention provides an annotation data set expansion method, which comprises the following steps:
step S101, acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;
step S102, comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;
step S103, intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video;
and step S104, generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set.
The traditional artificial intelligence image recognition data set usually needs tens of thousands or hundreds of thousands of images, the images are captured and labeled by a special data labeling company or a internist, and due to the fact that labeling personnel need to have corresponding business knowledge, the problems of insufficient number of labeling personnel exist, the labor efficiency is low, the errors are prone to occurring, and the problems of low labeling precision, low speed, high research and development cost and the like are caused. In the related technology, the image is converted in a mode that the label is kept unchanged, or the conversion is defined for both the image and the label so as to expand the original data set, but the expanded data obtained in the conversion mode cannot obviously improve the segmentation performance of the medical image, so that the expanded labeling data has little significance for artificial intelligence-based medical research.
According to the method, the data set can be rapidly expanded, the data labeling efficiency is improved, the labor expenditure is reduced, the data set development cost is reduced, the expanded labeled data can be rapidly applied to the data set labeling work of the endoscope artificial intelligence, and the training period of the relevant artificial intelligence model is shortened.
In some embodiments, the data is, for example, upper gastrointestinal tract ultrasound endoscope data, and the annotation file includes the adjacent organs and/or adjacent structures in the upper gastrointestinal tract ultrasound endoscope data. According to the method of the embodiment, the method can be used for carrying out label identification on adjacent organs and adjacent structures in upper gastrointestinal tract ultrasonic endoscopy, and carrying out auxiliary target tracking and self-adaptive boundary screenshot, wherein the adjacent organs comprise: pancreas (pancreas), gall bladder (gall bladder), bile duct (bile products), and the like; the proximity structure includes: lymph nodes (lymphadens), tumors (tumors), cysts (cysts), and blood vessels (blood vessels), among others. It should be noted that, the specific type of data is not limited in the present application, and those skilled in the art can select corresponding data according to actual needs.
How the augmentation of the annotated data set is performed is explained below with reference to the accompanying drawings:
firstly, step S101 is executed to obtain an original video, and generate a labeled data set according to the original video, where the labeled data set includes a labeled image and a labeled file corresponding to the labeled image, and the labeled file in the labeled data set may be obtained by manually (by a doctor) cutting out a plurality of pieces of labeled files at intervals for each part to be identified, and labeling the cut pieces of labeled files to generate a labeled data set.
The data is, for example, upper gastrointestinal ultrasound data, when the ultrasound image is manually intercepted and labeled to form a labeled ultrasound data set and stored according to a preset storage path, when the ultrasound labeled data set is expanded, the labeled data set is read according to the preset storage path, the file is traversed to obtain all the images and labeled files corresponding to all the images, and the labeled files are respectively stored in corresponding arrays: the image processing system comprises an array image _ arr and an annotation array label _ arr, wherein each picture corresponds to an annotation file.
Next, step S102 is executed to compare the annotated image with each frame of image in the original video, and obtain a corresponding frame of image of the annotated image in the original video, which specifically includes:
acquiring a timestamp of the marked image and a timestamp of each frame image in the original video; acquiring a timestamp of the labeled image, comprising: traversing each image in the labeled data set, and intercepting a first timestamp region of interest of each image; identifying a first timestamp string in the first timestamp region of interest; and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.
Specifically, a timestamp array timestamp _ arr corresponding to the image array image _ arr is created, the image array image _ arr is traversed to obtain a tmp _ img Of each picture, and a timestamp part Of the tmp _ img is cut to obtain an ROI (Region Of Interest) Of the timestamp. And identifying a time stamp character string corresponding to the picture tmp _ img by using a digital and symbol OCR (optical character recognition) algorithm, if the identification time result is '15/16/202016: 05: 15', converting the time stamp character string according to a time stamp format '% d/% M/% Y% H:% M:% S' in a data set image video to obtain time tmp _ t mapped by the tmp _ img, and adding the time tmp _ t to the tail of a time stamp array timestamp _ arr.
Acquiring a time stamp of each frame image in an original video, wherein the time stamp comprises the following steps: reading an original video, and acquiring each frame of image in the original video; intercepting a second time stamp region of interest of each frame of image; identifying a second timestamp string in a second timestamp region of interest; and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.
Specifically, reading an original ultrasonic video, acquiring each frame of image video _ img, intercepting a time stamp part ROI of the image video _ img, identifying a time stamp character string by using a digital and symbol OCR (optical character recognition) algorithm, and converting the time stamp character string according to a time stamp format in the video to obtain time video _ t mapped by the video _ img.
In some embodiments, the time stamp region of interest of each image is intercepted by an OCR algorithm; a time stamp character string in the time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.
Comparing the time stamp of the marked image with the time stamp of each frame image in the original video;
and taking the frame image in the original video, which has the same time stamp as the marked image, as the corresponding frame image of the marked image in the original video.
In some embodiments, taking a frame image of an original video and a frame image of an annotated image with the same timestamp as a corresponding frame image of the annotated image in the original video specifically includes:
playing each frame of image in the original video according to the sequence of the frame number;
in the playing process, comparing the time stamp of each frame of image in the original video with the time stamp of the labeled image;
and if the time stamp of each frame of image in the original video is the same as that of the marked image, taking the frame image with the same time stamp as the marked image as the corresponding frame image of the marked image in the original video.
Specifically, the timestamp array timestamp _ arr is traversed, the time tmp _ t corresponding to the image tmp _ img is obtained, if the tmp _ t is not equal to the video _ t, no processing is performed, and the next frame is continuously compared. The frame number video _ index is recorded if tmp _ t is equal to video _ t (video _ index is incremented by 1 every frame video is played).
Next, step S103 is executed to intercept a frame image sequence in a preset time period before and after a corresponding frame image from the original video;
and storing Frames of the original ultrasonic video in N seconds before and after the progress of the frame serial number video _ index, if the video FPS (Frames Per Second, the number of Frames of each Second of image) is 30, the frame image sequence comprises N30 frame image screenshots tmp _ shot, and putting the screenshots tmp _ shot into a temporary processing array tmp _ shot _ arr in sequence.
Next, step S104 is executed to generate an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and add each frame image in the frame image sequence and its corresponding annotation file to the annotated data set.
The method for generating the annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image specifically comprises the following steps:
step S1041 is performed first: acquiring an annotation file of a corresponding frame image of the annotated image in the original video according to the annotation file corresponding to the annotated image;
specifically, the time tmp _ t corresponding to the frame image corresponding to the frame number video _ index and the time tmp _ t corresponding to the image tmp _ img are obtained, the image tmp _ img is obtained through the frame number video _ index, and since the image array image _ arr and the annotation array label _ arr are in a mapping relationship, the annotation file tmp _ label corresponding to the frame image corresponding to the frame number video _ index can be obtained.
Next, step S1042 is executed: determining a plurality of labeled areas in the corresponding frame image according to the labeled file of the corresponding frame image;
next, step S1043: calculating the characteristic value of each labeling area;
in some embodiments, calculating the feature value of each labeled region specifically includes:
converting the images in the range of the plurality of marked areas into a gray scale image;
obtaining all feature points in each marked region range by using a scale invariant feature transformation algorithm on the gray level image in each marked region range;
and generating a characteristic value array of each labeled area according to all the characteristic points in the range of each labeled area.
Specifically, a plurality of labeled areas in the graph are read according to the labeled file tmp _ label
Figure 241284DEST_PATH_IMAGE001
Wherein each label area
Figure 500969DEST_PATH_IMAGE002
Both contain two vertex coordinates, the relationship is as follows:
Figure 264394DEST_PATH_IMAGE003
Figure 987499DEST_PATH_IMAGE004
wherein i is the serial number of the marked area in each image;
Figure 270713DEST_PATH_IMAGE005
and
Figure 421072DEST_PATH_IMAGE006
are the horizontal and vertical axis coordinates of the labeled areas in the picture,
Figure 422526DEST_PATH_IMAGE007
and
Figure 748990DEST_PATH_IMAGE008
respectively the width and height of the label area. Will be provided with
Figure 480185DEST_PATH_IMAGE001
Converting the image in the range into a gray scale image, then calling a 'scale invariant feature transformation' algorithm to respectively obtain all feature points in each region range
Figure 67024DEST_PATH_IMAGE009
The formed characteristic value array is related as follows:
Figure 290195DEST_PATH_IMAGE010
where m is the number of feature points each tag is expected to contain.
Next, step S1044 is executed: and generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation region.
In some embodiments, generating an annotation file for each frame image in the frame image sequence according to the feature value of each annotation region specifically includes:
generating a new labeled area corresponding to each frame image in each frame image of the frame image sequence according to the characteristic value array of each labeled area;
and generating a labeling file according to the new labeling area corresponding to each frame image.
In the above step, generating a new labeled region corresponding to each frame image in each frame image of the frame image sequence according to the feature value array of each labeled region includes:
acquiring the coordinates of each characteristic point in the characteristic value array of each marked area;
splitting coordinates of each characteristic point, and recombining the split abscissa and ordinate to obtain a new abscissa array and a new ordinate array;
respectively obtaining the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array:
and generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array.
In some embodiments, generating a new labeled area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array specifically includes:
acquiring a bounding box of the new labeling area according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array;
calculating the distance between the surrounding frame of the new marked area and the coordinate vertex of the surrounding frame of the marked area;
judging whether the distance between the vertexes of the coordinates is smaller than a preset transformation threshold value or not;
if so, reserving the new labeling area, and storing the new labeling area into the labeling file.
Specifically, traversing the feature value arrays to obtain feature points of each label area, splitting the feature point coordinates, and combining into a new array
Figure 620682DEST_PATH_IMAGE011
And
Figure 878488DEST_PATH_IMAGE012
and find out respectively
Figure 777174DEST_PATH_IMAGE011
And
Figure 877854DEST_PATH_IMAGE013
the relationship between the maximum and minimum values is as follows:
Figure 543191DEST_PATH_IMAGE014
Figure 563493DEST_PATH_IMAGE015
Figure 226555DEST_PATH_IMAGE016
Figure 283373DEST_PATH_IMAGE017
Figure 96608DEST_PATH_IMAGE018
Figure 190991DEST_PATH_IMAGE019
wherein neutralization is carried out
Figure 431479DEST_PATH_IMAGE020
Figure 116539DEST_PATH_IMAGE021
Is
Figure 592519DEST_PATH_IMAGE011
Maximum and minimum values of, wherein
Figure 679424DEST_PATH_IMAGE022
Figure 215448DEST_PATH_IMAGE023
Is
Figure 387803DEST_PATH_IMAGE013
And make it into a new feature region
Figure 277261DEST_PATH_IMAGE024
. By comparing the labelled areas
Figure 608886DEST_PATH_IMAGE025
Coordinates of (2) and characteristic region coordinates
Figure 191177DEST_PATH_IMAGE024
The starting point and the deformation size of the feature bounding box are dynamically adjusted, and the formula is as follows:
Figure 850828DEST_PATH_IMAGE026
Figure 668612DEST_PATH_IMAGE027
Figure 464529DEST_PATH_IMAGE028
Figure 342355DEST_PATH_IMAGE029
and defining the vertex Distance md of the obtained bounding box coordinate through the Minkowski Distance, wherein delta is a preset transformation threshold, and if the Distance is within the transformation threshold, selecting the characteristic region as a result and not carrying out the transformation of the region range.
New area will be
Figure 489303DEST_PATH_IMAGE024
Adding the element to the tail of the array of the labeling area, and finally enabling the ith element of each labeling array label _ arr to correspond to one labeling area
Figure 986143DEST_PATH_IMAGE030
A 1 and
Figure 26781DEST_PATH_IMAGE031
composing the key-value pair dictionary table, label _ aear _ dit, in a key-value pair manner, where the key (key) is i and the value (value) is
Figure 685295DEST_PATH_IMAGE030
In general, the method comprises the steps of identifying and matching annotation areas of multiple frames of images in a video through the video, selecting a new annotation area enclosing frame in a continuous frame image sequence in the video through a target of an instant screenshot, finally automatically adjusting the boundary of a target tracking enclosing frame according to the boundary range of an object, and simultaneously generating a screenshot and an annotation file.
In some embodiments, the contents of the new _ img _ arr and the new _ label _ arr are randomly stored to the new arrays img _ train, img _ test, label _ train and label _ test, respectively, according to the training and testing Ratio (Ratio) of the service setting.
Creating a trail folder, traversing img _ trail and label _ trail under the trail folder, saving the content of the train folder to the hard disk, and traversing img _ test and label _ test under the trail folder, and saving the content of the train folder to the hard disk.
The doctor only need be to every position that needs the discernment a plurality of frame images of interval intercept, this application just can be handled the video automatically, the frame of the image of automatic matching manual marking in the video, then trail and the intercept automatically, select and match, omit artifical batch marking step, quantity automatic expansion 100 times on former data set basis, marking efficiency has greatly been promoted, the human expenditure has been reduced, the cost of the data set development of reduction, can be applied to the data set marking work of ultrasonic endoscope artificial intelligence fast, and shorten the training cycle of relevant artificial intelligence model.
In another embodiment of the present embodiment, a plurality of frame images in the frame image sequence are further filtered to obtain valid frame images, and specifically,
calculating an information entropy matrix of each frame image in the frame image sequence;
calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image;
and screening out effective frame images according to the information entropy matrix change rate.
Further, screening out effective frame images according to the information entropy matrix change rate comprises:
calculating a derivative of a rate-of-change fitting function;
sorting the change rates of the information entropy matrixes from large to small according to the change rate fitting function derivative;
and taking the preset number of frame images sequenced in the front as effective frame images.
In some embodiments, calculating an entropy matrix of information for each frame image in the sequence of frame images comprises:
carrying out equal-scale reduction processing on each frame image according to a preset reduction scale to obtain a reduced image;
acquiring a coordinate range of the reduced image;
calculating the information entropy of the adjacent matrix according to the coordinate range of the reduced image;
and obtaining an information entropy matrix of each frame image according to the information entropy of the adjacent matrix.
The preset reduction scale is for example 1/10,
traversing the temporary screenshot array tmp _ shotcut _ arr to obtain each screenshot tmp _ shotcut, reducing the width and height of each screenshot by 10 times to obtain a reduced image S, calculating the information entropy of the adjacent matrix through a user-defined algorithm, and forming an entropy matrix, wherein the formula is as follows:
Figure 319539DEST_PATH_IMAGE032
Figure 479125DEST_PATH_IMAGE033
where N is the adjacency calculation step size, W and H are the reduced pixel width and height, and col and row are used as pixel indices when the adjacency matrix traverses the reduced image S, respectively, where
Figure 515214DEST_PATH_IMAGE034
Are the pixels of the image at the x and y positions.
In some embodiments, calculating the adjacency matrix information entropy according to the coordinate range of the reduced image includes:
calculating an integral matrix when the adjacent matrix traverses each pixel position of the reduced image from the central point according to the coordinate range of the reduced image;
Figure 460474DEST_PATH_IMAGE035
flattening the whole matrix to obtain a flat function;
Figure 582014DEST_PATH_IMAGE036
the number of values of the element values in the current matrix, which are the same as the S values of the pixels, can be obtained through statistics, and then the number of the same values is divided by the number of all the current pixels to obtain a ratio.
Calculating the ratio of the points in the flat function, which are the same as the pixels of the reduced image, in the whole matrix;
each one of which is
Figure 420657DEST_PATH_IMAGE037
The probability calculation algorithm for each point in (1) is as follows:
Figure 904728DEST_PATH_IMAGE038
the adjacency matrix information entropy is calculated from the ratio.
Figure 436203DEST_PATH_IMAGE039
Figure 638515DEST_PATH_IMAGE040
Figure 546428DEST_PATH_IMAGE041
And E _ M is an information entropy matrix, and the size of the adjacent matrix is 10 x 10 when the step length N = 5.
Specifically, the method for calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image comprises the following steps:
obtaining correlation coefficients of two adjacent matrixes;
and calculating the information entropy matrix change rate of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes.
Obtaining the correlation coefficients of two adjacent matrices includes, but is not limited to, the above ways:
acquiring correlation coefficients of two adjacent matrixes by adding an entropy self-defined formula of noise;
calculating the change rate of the information entropy matrix of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes
Figure 25951DEST_PATH_IMAGE042
Wherein, the length of tmp _ shotcut _ arr is L,
Figure 587382DEST_PATH_IMAGE043
to obtain a correlation coefficient
Figure 417935DEST_PATH_IMAGE044
The result is then inserted into the dependency array tmp _ c _ arr.
And calculating the correlation coefficients of the two adjacent matrixes by one or more methods of cosine similarity, adjustment of cosine similarity, Pearson correlation coefficient, Jaccard similarity coefficient, Tanimoto coefficient, log likelihood similarity, mutual information/information gain and relative entropy/KL divergence.
It should be noted that the present application does not limit the specific method for obtaining the correlation coefficients of the two adjacent matrices.
Because the number of the automatically intercepted target frame images is large, the number of the continuous and repeated images is large, and the data set of artificial intelligence training requires data to have certain commonality, but the content cannot be too repeated, otherwise the training effect can be influenced, therefore, the effective frame images can be obtained by screening a plurality of frame images in the frame image sequence, the images which are automatically intercepted can be selected, and the effectiveness of the training data is improved.
Exemplary devices
Having described the method of an exemplary embodiment of the present invention, next, an explanation is given of an annotation data set expansion apparatus of an exemplary embodiment of the present invention with reference to fig. 2, the apparatus comprising:
an obtaining module 201, configured to obtain an original video, and generate a labeled data set according to the original video, where the labeled data set includes a labeled image and a label file corresponding to the labeled image;
a comparison module 201, configured to compare the annotated image with each frame of image in the original video, and obtain a corresponding frame of image of the annotated image in the original video;
an intercepting module 203, configured to intercept, from the original video, a frame image sequence within a preset time period before and after the corresponding frame image;
the expansion module 204 is configured to generate an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expand each frame image in the frame image sequence and its corresponding annotation file to the annotated data set.
In an embodiment of this embodiment, the expansion module 204 includes:
the computing unit is used for determining a plurality of labeled areas in the corresponding frame image according to the labeled file of the corresponding frame image and computing the characteristic value of each labeled area;
in some embodiments, the computing unit is configured to convert the images within the plurality of labeled regions into a grayscale image; obtaining all feature points in each marked region range by using a scale invariant feature transformation algorithm on the gray level image in each marked region range; and generating a characteristic value array of each labeled area according to all the characteristic points in the range of each labeled area.
And the generating unit is used for generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation area.
In some embodiments, the generating unit is configured to generate a new labeled region corresponding to each frame image according to the feature value array of each labeled region in each frame image of the frame image sequence; and generating a labeling file according to the new labeling area corresponding to each frame image.
Generating a new labeled area corresponding to each frame image according to the characteristic value array of each labeled area in each frame image of the frame image sequence, wherein the method comprises the following steps:
acquiring the coordinates of each characteristic point in the characteristic value array of each marked area;
splitting the coordinates of each characteristic point, and recombining the split abscissa and ordinate to obtain a new abscissa array and a new ordinate array;
respectively obtaining the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array:
and generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array.
Further, generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array, including:
acquiring a bounding box of the new labeling area according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array;
calculating the distance between the surrounding frame of the new marked area and the coordinate vertex of the surrounding frame of the marked area;
judging whether the coordinate vertex distance is smaller than a preset transformation threshold value or not;
if so, reserving a new labeling area, and storing the new labeling area into a labeling file.
In an embodiment of the present invention, the comparing module 201 includes:
the time stamp obtaining unit is used for obtaining the time stamp of the marked image and the time stamp of each frame image in the original video;
in some embodiments, the timestamp acquisition unit is configured to traverse each image in the annotated dataset, intercepting a first timestamp region of interest of said each image; identifying a first timestamp string in the first timestamp region of interest; and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.
Further, intercepting the first time stamp region of interest of each image by an OCR algorithm. A first time stamp character string in the first time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.
Reading an original video, and acquiring each frame of image in the original video; intercepting a second time stamp region of interest of each frame of image; identifying a second timestamp string in the second timestamp region of interest; and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.
The comparison unit is used for comparing the time stamp of the marked image with the time stamp of each frame image in the original video;
and the acquisition unit is used for taking the frame image with the same time stamp as the marked image in the original video as the corresponding frame image of the marked image in the original video.
In some embodiments, the obtaining unit is configured to: playing each frame of image in the original video according to the sequence of the frame number; in the playing process, comparing the time stamp of each frame of image in the original video with the time stamp of the labeled image; and if the time stamp of each frame of image in the original video is the same as that of the marked image, taking the frame image with the same time stamp as the marked image as the corresponding frame image of the marked image in the original video.
In an embodiment of the present embodiment, the method further includes:
and the screening module is used for screening a plurality of frame images in the frame image sequence to obtain an effective frame image.
In some embodiments, the screening module is configured to: calculating an information entropy matrix of each frame image in the frame image sequence; calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image; and screening out effective frame images according to the information entropy matrix change rate.
Screening out effective frame images according to the information entropy matrix change rate, wherein the screening out effective frame images comprises the following steps:
calculating a derivative of a rate-of-change fitting function;
sorting the change rates of the information entropy matrix from large to small according to the derivative of the change rate fitting function;
and taking the preset number of frame images sequenced in the front as effective frame images.
Calculating an information entropy matrix of each frame image in the frame image sequence, including:
carrying out equal-scale reduction processing on each frame image according to a preset reduction scale to obtain a reduced image;
acquiring a coordinate range of the reduced image;
calculating the information entropy of the adjacent matrix according to the coordinate range of the reduced image;
and obtaining an information entropy matrix of each frame image according to the information entropy of the adjacent matrix.
The predetermined reduction ratio is 1/10.
Calculating the information entropy of the adjacency matrix according to the coordinate range of the reduced image, wherein the calculation comprises the following steps:
calculating an integral matrix when the adjacent matrix traverses each pixel position of the reduced image from the central point according to the coordinate range of the reduced image;
flattening the whole matrix to obtain a flat function;
calculating the ratio of the points in the flat function, which are the same as the pixels of the reduced image, in the whole matrix;
and calculating the information entropy of the adjacency matrix according to the ratio.
Calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image, and the method comprises the following steps:
obtaining correlation coefficients of two adjacent matrixes;
and calculating the information entropy matrix change rate of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes.
Further, the obtaining the correlation coefficients of the two adjacent matrices includes:
and obtaining the correlation coefficient of the two adjacent matrixes by adding an entropy self-defining formula of noise.
Further, the obtaining the correlation coefficients of the two adjacent matrices includes:
and calculating the correlation coefficients of the two adjacent matrixes by one or more methods of cosine similarity, adjustment of cosine similarity, Pearson correlation coefficient, Jaccard similarity coefficient, Tanimoto coefficient, log likelihood similarity, mutual information/information gain and relative entropy/KL divergence.
Exemplary Medium
Having described the apparatus according to the exemplary embodiment of the present invention, next, a computer-readable storage medium according to the exemplary embodiment of the present invention is described with reference to fig. 3, please refer to fig. 3, which illustrates a computer-readable storage medium being an optical disc 30, on which a computer program (i.e., a program product) is stored, where the computer program, when executed by a processor, implements the steps described in the above method embodiment, for example, acquiring an original video, generating a labeled data set from the original video, where the labeled data set includes a labeled image and a labeled file corresponding to the labeled image; comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video; intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video; generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set; the specific implementation of each step is not repeated here.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
Exemplary computing device
Having described the method, medium, and apparatus of exemplary embodiments of the present invention, a computing device of exemplary embodiments of the present invention is next described with reference to FIG. 4.
FIG. 4 illustrates a block diagram of an exemplary computing device 40, which computing device 40 may be a computer system or server, suitable for use in implementing embodiments of the present invention. The computing device 40 shown in FIG. 4 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.
As shown in fig. 4, components of computing device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401).
Computing device 40 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 40 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)4021 and/or cache memory 4022. Computing device 30 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM4023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. At least one program product may be included in system memory 402 having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 4025 having a set (at least one) of program modules 4024 may be stored, for example, in system memory 402, and such program modules 4024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. The program modules 4024 generally perform the functions and/or methods of the embodiments described herein.
Computing device 40 may also communicate with one or more external devices 404, such as a keyboard, pointing device, display, etc. Such communication may be through an input/output (I/O) interface. Also, computing device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 406. As shown in FIG. 4, network adapter 406 communicates with other modules of computing device 40, such as processing unit 401, over bus 403. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with computing device 40.
The processing unit 401 executes various functional applications and data processing by running the program stored in the system memory 402, for example, acquiring an original video, and generating a labeled data set from the original video, where the labeled data set includes a labeled image and a label file corresponding to the labeled image; comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video; intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video; generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set; the specific implementation of each step is not repeated here.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the data set expansion device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (23)

1. An augmentation method for an annotated data set, comprising:
acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;
comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;
intercepting a frame image sequence within a preset time period adjacent to the corresponding frame image from the original video;
generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set;
the generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image comprises:
acquiring an annotation file of a corresponding frame image of the annotated image in the original video according to the annotation file corresponding to the annotated image;
determining a plurality of labeling areas in the corresponding frame image according to the labeling file of the corresponding frame image;
calculating the characteristic value of each labeling area;
and generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation region.
2. The augmentation method for annotated data set according to claim 1, wherein the calculating the feature value of each annotated zone comprises:
converting the images in the range of the plurality of marked areas into a gray scale image;
obtaining all feature points in each marked region range by using a scale invariant feature transformation algorithm on the gray level image in each marked region range;
and generating a characteristic value array of each labeled area according to all the characteristic points in the range of each labeled area.
3. The augmentation method of annotation data set according to claim 2, wherein the generating an annotation file for each frame image in the sequence of frame images according to the feature value of each annotation region comprises:
generating a new labeled area corresponding to each frame image in each frame image of the frame image sequence according to the characteristic value array of each labeled area;
and generating a labeling file according to the new labeling area corresponding to each frame image.
4. The method for expanding an annotated data set according to claim 3, wherein the generating a new annotated zone corresponding to each frame image in each frame image of the sequence of frame images according to the feature value array of each annotated zone comprises:
acquiring the coordinates of each characteristic point in the characteristic value array of each marked area;
splitting the coordinates of each characteristic point, and recombining the split abscissa and ordinate to obtain a new abscissa array and a new ordinate array;
respectively obtaining the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array:
and generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array.
5. The augmentation method of the annotated data set according to claim 4, wherein the generating of the new annotated zone corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array comprises:
acquiring a bounding box of a new labeling area according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array;
calculating the distance between the surrounding frame of the new marked area and the coordinate vertex of the surrounding frame of the marked area;
judging whether the coordinate vertex distance is smaller than a preset transformation threshold value or not;
if so, reserving a new labeling area, and storing the new labeling area into a labeling file.
6. The method for expanding an annotated data set according to claim 1, wherein the comparing the annotated image with each frame of image in the original video to obtain the corresponding frame of image of the annotated image in the original video comprises:
acquiring a timestamp of the marked image and a timestamp of each frame image in the original video;
comparing the time stamp of the marked image with the time stamp of each frame image in the original video;
and taking the frame image in the original video, which has the same time stamp as the marked image, as the corresponding frame image of the marked image in the original video.
7. The augmentation method of an annotated data set according to claim 6, wherein the obtaining of the time stamp of the annotated image comprises:
traversing each image in the labeled data set, and intercepting a first timestamp region of interest of each image;
identifying a first timestamp string in the first timestamp region of interest;
and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.
8. The augmentation method of annotation data set of claim 7, wherein the intercepting the first time-stamped region of interest of each image comprises:
intercepting the first time stamp region of interest of each image by an OCR algorithm.
9. The augmentation method for annotated data set according to claim 7, wherein the identifying of the first timestamp string in the first timestamp region of interest comprises:
a first time stamp character string in the first time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.
10. The augmentation method of annotation data set according to claim 8, wherein obtaining a time stamp of each frame image in the original video comprises:
reading an original video, and acquiring each frame of image in the original video;
intercepting a second time stamp region of interest of each frame of image;
identifying a second timestamp string in the second timestamp region of interest;
and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.
11. The augmentation method for annotated data set according to claim 6, wherein the step of using the frame image of the original video with the same timestamp as the annotated image as the corresponding frame image of the annotated image in the original video comprises:
playing each frame of image in the original video according to the sequence of the frame number;
in the playing process, comparing the time stamp of each frame of image in the original video with the time stamp of the labeled image;
and if the time stamp of each frame of image in the original video is the same as that of the marked image, taking the frame image with the same time stamp as the marked image as the corresponding frame image of the marked image in the original video.
12. The augmentation method for annotated data set according to any one of claims 1 to 11, further comprising:
and screening a plurality of frame images in the frame image sequence to obtain effective frame images.
13. The augmentation method for annotated data set according to claim 12, wherein the screening of the plurality of frame images in the sequence of frame images to obtain the valid frame image comprises:
calculating an information entropy matrix of each frame image in the frame image sequence;
calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image;
and screening out effective frame images according to the information entropy matrix change rate.
14. The augmentation method for annotated data set according to claim 13, wherein the filtering out the valid frame image according to the information entropy matrix change rate comprises:
calculating a derivative of a rate-of-change fitting function;
sorting the change rates of the information entropy matrix from large to small according to the derivative of the change rate fitting function;
and taking the preset number of frame images sequenced in the front as effective frame images.
15. The augmentation method of annotation data set according to claim 13, wherein the calculating of the entropy matrix of information for each frame image in the sequence of frame images comprises:
carrying out equal-scale reduction processing on each frame image according to a preset reduction scale to obtain a reduced image;
acquiring a coordinate range of the reduced image;
calculating the information entropy of the adjacent matrix according to the coordinate range of the reduced image;
and obtaining an information entropy matrix of each frame image according to the information entropy of the adjacent matrix.
16. The augmentation method of claim 15, wherein the predetermined reduction ratio is 1/10.
17. The annotation data set expansion method of claim 15, wherein the computing of the adjacency matrix information entropy according to the coordinate range of the reduced image comprises:
calculating an integral matrix when the adjacent matrix traverses each pixel position of the reduced image from the central point according to the coordinate range of the reduced image;
flattening the whole matrix to obtain a flat function;
calculating the ratio of the points in the flat function, which are the same as the pixels of the reduced image, in the whole matrix;
and calculating the information entropy of the adjacency matrix according to the ratio.
18. The augmentation method for annotated data set according to claim 13, wherein the calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image comprises:
obtaining correlation coefficients of two adjacent matrixes;
and calculating the information entropy matrix change rate of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes.
19. The augmentation method for labeling data set according to claim 18, wherein the obtaining of the correlation coefficients of two adjacent matrices comprises:
and obtaining the correlation coefficient of the two adjacent matrixes by adding an entropy self-defining formula of noise.
20. The augmentation method for labeling data set according to claim 18, wherein the obtaining of the correlation coefficients of two adjacent matrices comprises:
and calculating the correlation coefficients of the two adjacent matrixes by one or more methods of cosine similarity, adjustment of cosine similarity, Pearson correlation coefficient, Jaccard similarity coefficient, Tanimoto coefficient, log likelihood similarity, mutual information/information gain and relative entropy/KL divergence.
21. An augmentation apparatus for annotated data sets, comprising:
the acquisition module is used for acquiring an original video and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;
the comparison module is used for comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;
the intercepting module is used for intercepting a frame image sequence in a preset time period before and after the corresponding frame image from the original video;
the expansion module is used for generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image sequence to the annotated data set; the method comprises the following steps:
acquiring an annotation file of a corresponding frame image of the annotated image in the original video according to the annotation file corresponding to the annotated image;
determining a plurality of labeling areas in the corresponding frame image according to the labeling file of the corresponding frame image;
calculating the characteristic value of each labeling area;
and generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation region.
22. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-20.
23. A computing device, the computing device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor configured to perform the method of any of the preceding claims 1-20.
CN202111264090.1A 2021-10-28 2021-10-28 Method, device, medium and computing equipment for expanding labeled data set Active CN113707280B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111264090.1A CN113707280B (en) 2021-10-28 2021-10-28 Method, device, medium and computing equipment for expanding labeled data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111264090.1A CN113707280B (en) 2021-10-28 2021-10-28 Method, device, medium and computing equipment for expanding labeled data set

Publications (2)

Publication Number Publication Date
CN113707280A CN113707280A (en) 2021-11-26
CN113707280B true CN113707280B (en) 2022-04-08

Family

ID=78647372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111264090.1A Active CN113707280B (en) 2021-10-28 2021-10-28 Method, device, medium and computing equipment for expanding labeled data set

Country Status (1)

Country Link
CN (1) CN113707280B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920497B (en) * 2021-12-07 2022-04-08 广东电网有限责任公司东莞供电局 Nameplate recognition model training method, nameplate recognition method and related devices

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210320A (en) * 2019-05-07 2019-09-06 南京理工大学 The unmarked Attitude estimation method of multiple target based on depth convolutional neural networks
CN110717464A (en) * 2019-10-15 2020-01-21 中国矿业大学(北京) Intelligent railway roadbed disease identification method based on radar data
CN110958469A (en) * 2019-12-13 2020-04-03 联想(北京)有限公司 Video processing method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11314799B2 (en) * 2016-07-29 2022-04-26 Splunk Inc. Event-based data intake and query system employing non-text machine data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210320A (en) * 2019-05-07 2019-09-06 南京理工大学 The unmarked Attitude estimation method of multiple target based on depth convolutional neural networks
CN110717464A (en) * 2019-10-15 2020-01-21 中国矿业大学(北京) Intelligent railway roadbed disease identification method based on radar data
CN110958469A (en) * 2019-12-13 2020-04-03 联想(北京)有限公司 Video processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113707280A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
CN108805076B (en) Method and system for extracting table characters of environmental impact evaluation report
CN113888541B (en) Image identification method, device and storage medium for laparoscopic surgery stage
CN111950610B (en) Weak and small human body target detection method based on precise scale matching
CN110647885B (en) Test paper splitting method, device, equipment and medium based on picture identification
CN111415364B (en) Conversion method, system and storage medium for image segmentation sample in computer vision
CN113469092B (en) Character recognition model generation method, device, computer equipment and storage medium
CN113707280B (en) Method, device, medium and computing equipment for expanding labeled data set
CN110110622B (en) Medical text detection method, system and storage medium based on image processing
López-Gutiérrez et al. Data augmentation for end-to-end optical music recognition
JP2020087165A (en) Learning data generation program, learning data generation device, and learning data generation method
CN112508000A (en) Method and equipment for generating OCR image recognition model training data
US20220406082A1 (en) Image processing apparatus, image processing method, and storage medium
CN109145918B (en) Image segmentation and annotation method and device
KR20210028966A (en) Method and apparatus for disease classification of plant leafs
CN113807218B (en) Layout analysis method, device, computer equipment and storage medium
Zhang et al. SEMv2: Table Separation Line Detection Based on Conditional Convolution
CN114463336A (en) Cutting method and system for image and pixel level segmentation marking data thereof
KR101106448B1 (en) Real-Time Moving Object Detection For Intelligent Visual Surveillance
CN114579796A (en) Machine reading understanding method and device
CN114648751A (en) Method, device, terminal and storage medium for processing video subtitles
Na et al. Music symbol recognition by a LAG-based combination model
US20240020944A1 (en) Systems and methods for sampling and augmenting unbalanced datasets
US20230094651A1 (en) Extracting text from an image
CN116758058B (en) Data processing method, device, computer and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant