CN113707280B

CN113707280B - Method, device, medium and computing equipment for expanding labeled data set

Info

Publication number: CN113707280B
Application number: CN202111264090.1A
Authority: CN
Inventors: 赵秋; 曾凡
Original assignee: Xuanwei Beijing Biotechnology Co ltd
Current assignee: Xuanwei Beijing Biotechnology Co ltd
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-04-08
Anticipated expiration: 2041-10-28
Also published as: CN113707280A

Abstract

The embodiment of the invention provides a method, a device, a medium and a computing device for expanding a labeled data set, wherein the method for expanding the labeled data set comprises the following steps: acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image; comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video; intercepting frame image sequences in a preset time period before and after a corresponding frame image from an original video; and generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and adding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set. The method and the device have the advantages that the data set is expanded rapidly, the data labeling efficiency is improved, and the development cost of the data set is reduced.

Description

Method, device, medium and computing equipment for expanding labeled data set

Technical Field

The embodiment of the invention relates to the field of image data expansion, in particular to an annotation data set expansion method, device, medium and computing equipment.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Ultrasonic instruments have become one of the rapid, safe and low-cost medical diagnostic tools in the modern medical industry, and ultrasonic endoscopic images are different from white light or electronic dyeing video images, and require doctors to have corresponding anatomical structure knowledge and accumulate a large amount of experience to understand the ultrasonic endoscopic images, so that the number of doctors mastering ultrasonic scanning skills is very limited. In recent years, the combination of artificial intelligence and medicine has promoted the rapid development of the medical industry due to the continuous development of artificial intelligence. The data set for training the artificial intelligence image recognition model usually needs tens of thousands or hundreds of thousands of images, the images are usually obtained by intercepting and labeling videos or images by a special data labeling company or a internist, and because labeling personnel need to have corresponding business knowledge, the problems of insufficient number of labeling personnel exist, the labor efficiency is low, errors are easy to occur, and the problems of low labeling precision, low speed, high research and development cost and the like exist.

At present, some annotation data expansion methods have appeared, but the principle of these methods is to convert images in a mode that a tag remains unchanged, or to define conversion for both images and tags, but the expansion data obtained by the conversion mode fails to significantly improve the segmentation performance of medical images, resulting in that the expanded annotation data has little meaning for artificial intelligence-based medical research.

Disclosure of Invention

In view of the above problems in the prior art, it is an object of the present disclosure to provide an annotation data set expansion method, apparatus, medium, and computing device to solve at least the problems in the prior art.

In a first aspect of the embodiments of the present invention, there is provided an annotation data set expansion method, including:

acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;

comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;

intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video;

and generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set.

In one embodiment of the invention, the annotation data set expansion method comprises the following steps:

the generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image comprises:

acquiring an annotation file of a corresponding frame image of the annotated image in the original video according to the annotation file corresponding to the annotated image;

determining a plurality of labeling areas in the corresponding frame image according to the labeling file of the corresponding frame image;

calculating the characteristic value of each labeling area;

and generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation region.

Further, the calculating the feature value of each labeled region includes:

converting the images in the range of the plurality of marked areas into a gray scale image;

obtaining all feature points in each marked region range by using a scale invariant feature transformation algorithm on the gray level image in each marked region range;

and generating a characteristic value array of each labeled area according to all the characteristic points in the range of each labeled area.

Further, the generating an annotation file for each frame image in the frame image sequence according to the feature value of each annotation region includes:

generating a new labeled area corresponding to each frame image in each frame image of the frame image sequence according to the characteristic value array of each labeled area;

and generating a labeling file according to the new labeling area corresponding to each frame image.

Further, the generating a new labeled region corresponding to each frame image in each frame image of the frame image sequence according to the feature value array of each labeled region includes:

acquiring the coordinates of each characteristic point in the characteristic value array of each marked area;

splitting the coordinates of each characteristic point, and recombining the split abscissa and ordinate to obtain a new abscissa array and a new ordinate array;

respectively obtaining the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array:

and generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array.

Further, the generating a new labeled area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array includes:

acquiring a bounding box of a new labeling area according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array;

calculating the distance between the surrounding frame of the new marked area and the coordinate vertex of the surrounding frame of the marked area;

judging whether the coordinate vertex distance is smaller than a preset transformation threshold value or not;

if so, reserving a new labeling area, and storing the new labeling area into a labeling file.

In another embodiment of the present invention, the annotation data set expansion method comprises:

the comparing the marked image with each frame of image in the original video to obtain the corresponding frame of image of the marked image in the original video includes:

acquiring a timestamp of the marked image and a timestamp of each frame image in the original video;

comparing the time stamp of the marked image with the time stamp of each frame image in the original video;

and taking the frame image in the original video, which has the same time stamp as the marked image, as the corresponding frame image of the marked image in the original video.

Further, the obtaining the timestamp of the labeled image includes:

traversing each image in the labeled data set, and intercepting a first timestamp region of interest of each image;

identifying a first timestamp string in the first timestamp region of interest;

and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.

Further, said intercepting the first time-stamped region of interest of each image comprises:

intercepting the first time stamp region of interest of each image by an OCR algorithm.

Further, the identifying a first timestamp string in the first timestamp interest region includes:

a first time stamp character string in the first time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.

Further, acquiring a time stamp of each frame image in the original video, including:

reading an original video, and acquiring each frame of image in the original video;

intercepting a second time stamp region of interest of each frame of image;

identifying a second timestamp string in the second timestamp region of interest;

and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.

Further, the taking the frame image of the original video with the same timestamp as the annotated frame image in the original video as the corresponding frame image of the annotated frame image includes:

playing each frame of image in the original video according to the sequence of the frame number;

in the playing process, comparing the time stamp of each frame of image in the original video with the time stamp of the labeled image;

and if the time stamp of each frame of image in the original video is the same as that of the marked image, taking the frame image with the same time stamp as the marked image as the corresponding frame image of the marked image in the original video.

In yet another embodiment of the present invention, the annotation data set augmentation method comprises:

and screening a plurality of frame images in the frame image sequence to obtain effective frame images.

Further, the screening the plurality of frame images in the frame image sequence to obtain the valid frame image includes:

calculating an information entropy matrix of each frame image in the frame image sequence;

calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image;

and screening out effective frame images according to the information entropy matrix change rate.

Further, the screening out the effective frame image according to the information entropy matrix change rate includes:

calculating a derivative of a rate-of-change fitting function;

sorting the change rates of the information entropy matrix from large to small according to the derivative of the change rate fitting function;

and taking the preset number of frame images sequenced in the front as effective frame images.

Further, the calculating an information entropy matrix of each frame image in the frame image sequence includes:

carrying out equal-scale reduction processing on each frame image according to a preset reduction scale to obtain a reduced image;

acquiring a coordinate range of the reduced image;

calculating the information entropy of the adjacent matrix according to the coordinate range of the reduced image;

and obtaining an information entropy matrix of each frame image according to the information entropy of the adjacent matrix.

Further, the preset reduction ratio is 1/10.

Further, the calculating the adjacency matrix information entropy according to the coordinate range of the reduced image includes:

calculating an integral matrix when the adjacent matrix traverses each pixel position of the reduced image from the central point according to the coordinate range of the reduced image;

flattening the whole matrix to obtain a flat function;

calculating the ratio of the points in the flat function, which are the same as the pixels of the reduced image, in the whole matrix;

and calculating the information entropy of the adjacency matrix according to the ratio.

Further, the calculating the information entropy matrix change rate of the adjacent frame image according to the information entropy matrix of each frame image includes:

obtaining correlation coefficients of two adjacent matrixes;

and calculating the information entropy matrix change rate of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes.

Further, the obtaining the correlation coefficients of the two adjacent matrices includes:

and obtaining the correlation coefficient of the two adjacent matrixes by adding an entropy self-defining formula of noise.

and calculating the correlation coefficients of the two adjacent matrixes by one or more methods of cosine similarity, adjustment of cosine similarity, Pearson correlation coefficient, Jaccard similarity coefficient, Tanimoto coefficient, log likelihood similarity, mutual information/information gain and relative entropy/KL divergence.

In a second aspect of the embodiments of the present invention, there is provided an annotation data set extension apparatus including:

the acquisition module is used for acquiring an original video and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;

the comparison module is used for comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;

the intercepting module is used for intercepting a frame image sequence in a preset time period before and after the corresponding frame image from the original video;

and the expansion module is used for generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image sequence to the annotated data set.

In a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program for executing the method of any one of the first aspect above.

In a fourth aspect of embodiments of the present invention, there is provided a computing device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to perform the method of any of the first aspect.

According to the method, the device, the medium and the computing equipment for expanding the annotated data set of the embodiment of the invention, the original video is obtained, the annotated data set is generated according to the original video, the annotated image in the annotated data set is compared with each frame image in the original video, the corresponding frame image of the annotated image in the original video is obtained, the frame image sequence in the preset time period before and after the corresponding frame image is intercepted from the original video, the annotated file is generated for each frame image in the frame image sequence according to the annotated file corresponding to the annotated image, each frame image in the frame image sequence and the corresponding annotated file thereof are expanded to the annotated data set, the rapid expansion of the data set can be realized, the annotation efficiency of the data is improved, the labor expense is reduced, the development cost of the data set is reduced, and the expanded annotated data can be rapidly applied to the annotation work of the data set of the endoscope artificial intelligence, and the training period of the relevant artificial intelligence model is shortened.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 is a flow chart illustrating an augmentation method for annotated data set according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an augmentation apparatus for annotated data set according to an embodiment of the present invention;

FIG. 3 schematically shows a schematic of the structure of a medium according to an embodiment of the invention;

FIG. 4 schematically illustrates a structural diagram of a computing device of an embodiment of the invention;

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the invention, a method, a device, a medium and a computing device for expanding a labeled data set are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

Exemplary method

A method for augmentation of annotated data sets according to an exemplary embodiment of the invention is described below with reference to FIG. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

The invention is further described below with reference to specific embodiments.

The embodiment of the invention provides an annotation data set expansion method, which comprises the following steps:

step S101, acquiring an original video, and generating a labeled data set according to the original video, wherein the labeled data set comprises a labeled image and a label file corresponding to the labeled image;

step S102, comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video;

step S103, intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video;

and step S104, generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set.

The traditional artificial intelligence image recognition data set usually needs tens of thousands or hundreds of thousands of images, the images are captured and labeled by a special data labeling company or a internist, and due to the fact that labeling personnel need to have corresponding business knowledge, the problems of insufficient number of labeling personnel exist, the labor efficiency is low, the errors are prone to occurring, and the problems of low labeling precision, low speed, high research and development cost and the like are caused. In the related technology, the image is converted in a mode that the label is kept unchanged, or the conversion is defined for both the image and the label so as to expand the original data set, but the expanded data obtained in the conversion mode cannot obviously improve the segmentation performance of the medical image, so that the expanded labeling data has little significance for artificial intelligence-based medical research.

According to the method, the data set can be rapidly expanded, the data labeling efficiency is improved, the labor expenditure is reduced, the data set development cost is reduced, the expanded labeled data can be rapidly applied to the data set labeling work of the endoscope artificial intelligence, and the training period of the relevant artificial intelligence model is shortened.

In some embodiments, the data is, for example, upper gastrointestinal tract ultrasound endoscope data, and the annotation file includes the adjacent organs and/or adjacent structures in the upper gastrointestinal tract ultrasound endoscope data. According to the method of the embodiment, the method can be used for carrying out label identification on adjacent organs and adjacent structures in upper gastrointestinal tract ultrasonic endoscopy, and carrying out auxiliary target tracking and self-adaptive boundary screenshot, wherein the adjacent organs comprise: pancreas (pancreas), gall bladder (gall bladder), bile duct (bile products), and the like; the proximity structure includes: lymph nodes (lymphadens), tumors (tumors), cysts (cysts), and blood vessels (blood vessels), among others. It should be noted that, the specific type of data is not limited in the present application, and those skilled in the art can select corresponding data according to actual needs.

How the augmentation of the annotated data set is performed is explained below with reference to the accompanying drawings:

firstly, step S101 is executed to obtain an original video, and generate a labeled data set according to the original video, where the labeled data set includes a labeled image and a labeled file corresponding to the labeled image, and the labeled file in the labeled data set may be obtained by manually (by a doctor) cutting out a plurality of pieces of labeled files at intervals for each part to be identified, and labeling the cut pieces of labeled files to generate a labeled data set.

The data is, for example, upper gastrointestinal ultrasound data, when the ultrasound image is manually intercepted and labeled to form a labeled ultrasound data set and stored according to a preset storage path, when the ultrasound labeled data set is expanded, the labeled data set is read according to the preset storage path, the file is traversed to obtain all the images and labeled files corresponding to all the images, and the labeled files are respectively stored in corresponding arrays: the image processing system comprises an array image _ arr and an annotation array label _ arr, wherein each picture corresponds to an annotation file.

Next, step S102 is executed to compare the annotated image with each frame of image in the original video, and obtain a corresponding frame of image of the annotated image in the original video, which specifically includes:

acquiring a timestamp of the marked image and a timestamp of each frame image in the original video; acquiring a timestamp of the labeled image, comprising: traversing each image in the labeled data set, and intercepting a first timestamp region of interest of each image; identifying a first timestamp string in the first timestamp region of interest; and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.

Specifically, a timestamp array timestamp _ arr corresponding to the image array image _ arr is created, the image array image _ arr is traversed to obtain a tmp _ img Of each picture, and a timestamp part Of the tmp _ img is cut to obtain an ROI (Region Of Interest) Of the timestamp. And identifying a time stamp character string corresponding to the picture tmp _ img by using a digital and symbol OCR (optical character recognition) algorithm, if the identification time result is '15/16/202016: 05: 15', converting the time stamp character string according to a time stamp format '% d/% M/% Y% H:% M:% S' in a data set image video to obtain time tmp _ t mapped by the tmp _ img, and adding the time tmp _ t to the tail of a time stamp array timestamp _ arr.

Acquiring a time stamp of each frame image in an original video, wherein the time stamp comprises the following steps: reading an original video, and acquiring each frame of image in the original video; intercepting a second time stamp region of interest of each frame of image; identifying a second timestamp string in a second timestamp region of interest; and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.

Specifically, reading an original ultrasonic video, acquiring each frame of image video _ img, intercepting a time stamp part ROI of the image video _ img, identifying a time stamp character string by using a digital and symbol OCR (optical character recognition) algorithm, and converting the time stamp character string according to a time stamp format in the video to obtain time video _ t mapped by the video _ img.

In some embodiments, the time stamp region of interest of each image is intercepted by an OCR algorithm; a time stamp character string in the time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.

In some embodiments, taking a frame image of an original video and a frame image of an annotated image with the same timestamp as a corresponding frame image of the annotated image in the original video specifically includes:

Specifically, the timestamp array timestamp _ arr is traversed, the time tmp _ t corresponding to the image tmp _ img is obtained, if the tmp _ t is not equal to the video _ t, no processing is performed, and the next frame is continuously compared. The frame number video _ index is recorded if tmp _ t is equal to video _ t (video _ index is incremented by 1 every frame video is played).

Next, step S103 is executed to intercept a frame image sequence in a preset time period before and after a corresponding frame image from the original video;

and storing Frames of the original ultrasonic video in N seconds before and after the progress of the frame serial number video _ index, if the video FPS (Frames Per Second, the number of Frames of each Second of image) is 30, the frame image sequence comprises N30 frame image screenshots tmp _ shot, and putting the screenshots tmp _ shot into a temporary processing array tmp _ shot _ arr in sequence.

Next, step S104 is executed to generate an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and add each frame image in the frame image sequence and its corresponding annotation file to the annotated data set.

The method for generating the annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image specifically comprises the following steps:

step S1041 is performed first: acquiring an annotation file of a corresponding frame image of the annotated image in the original video according to the annotation file corresponding to the annotated image;

specifically, the time tmp _ t corresponding to the frame image corresponding to the frame number video _ index and the time tmp _ t corresponding to the image tmp _ img are obtained, the image tmp _ img is obtained through the frame number video _ index, and since the image array image _ arr and the annotation array label _ arr are in a mapping relationship, the annotation file tmp _ label corresponding to the frame image corresponding to the frame number video _ index can be obtained.

Next, step S1042 is executed: determining a plurality of labeled areas in the corresponding frame image according to the labeled file of the corresponding frame image;

next, step S1043: calculating the characteristic value of each labeling area;

in some embodiments, calculating the feature value of each labeled region specifically includes:

Specifically, a plurality of labeled areas in the graph are read according to the labeled file tmp _ label

Wherein each label area

Both contain two vertex coordinates, the relationship is as follows:

wherein i is the serial number of the marked area in each image;

and

are the horizontal and vertical axis coordinates of the labeled areas in the picture,

and

respectively the width and height of the label area. Will be provided with

Converting the image in the range into a gray scale image, then calling a 'scale invariant feature transformation' algorithm to respectively obtain all feature points in each region range

The formed characteristic value array is related as follows:

where m is the number of feature points each tag is expected to contain.

Next, step S1044 is executed: and generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation region.

In some embodiments, generating an annotation file for each frame image in the frame image sequence according to the feature value of each annotation region specifically includes:

In the above step, generating a new labeled region corresponding to each frame image in each frame image of the frame image sequence according to the feature value array of each labeled region includes:

splitting coordinates of each characteristic point, and recombining the split abscissa and ordinate to obtain a new abscissa array and a new ordinate array;

In some embodiments, generating a new labeled area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array specifically includes:

acquiring a bounding box of the new labeling area according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array;

judging whether the distance between the vertexes of the coordinates is smaller than a preset transformation threshold value or not;

if so, reserving the new labeling area, and storing the new labeling area into the labeling file.

Specifically, traversing the feature value arrays to obtain feature points of each label area, splitting the feature point coordinates, and combining into a new array

And

and find out respectively

And

the relationship between the maximum and minimum values is as follows:

wherein neutralization is carried out

、

Is

Maximum and minimum values of, wherein

、

Is

And make it into a new feature region

. By comparing the labelled areas

Coordinates of (2) and characteristic region coordinates

The starting point and the deformation size of the feature bounding box are dynamically adjusted, and the formula is as follows:

and defining the vertex Distance md of the obtained bounding box coordinate through the Minkowski Distance, wherein delta is a preset transformation threshold, and if the Distance is within the transformation threshold, selecting the characteristic region as a result and not carrying out the transformation of the region range.

New area will be

Adding the element to the tail of the array of the labeling area, and finally enabling the ith element of each labeling array label _ arr to correspond to one labeling area

A 1 and

composing the key-value pair dictionary table, label _ aear _ dit, in a key-value pair manner, where the key (key) is i and the value (value) is

。

In general, the method comprises the steps of identifying and matching annotation areas of multiple frames of images in a video through the video, selecting a new annotation area enclosing frame in a continuous frame image sequence in the video through a target of an instant screenshot, finally automatically adjusting the boundary of a target tracking enclosing frame according to the boundary range of an object, and simultaneously generating a screenshot and an annotation file.

In some embodiments, the contents of the new _ img _ arr and the new _ label _ arr are randomly stored to the new arrays img _ train, img _ test, label _ train and label _ test, respectively, according to the training and testing Ratio (Ratio) of the service setting.

Creating a trail folder, traversing img _ trail and label _ trail under the trail folder, saving the content of the train folder to the hard disk, and traversing img _ test and label _ test under the trail folder, and saving the content of the train folder to the hard disk.

The doctor only need be to every position that needs the discernment a plurality of frame images of interval intercept, this application just can be handled the video automatically, the frame of the image of automatic matching manual marking in the video, then trail and the intercept automatically, select and match, omit artifical batch marking step, quantity automatic expansion 100 times on former data set basis, marking efficiency has greatly been promoted, the human expenditure has been reduced, the cost of the data set development of reduction, can be applied to the data set marking work of ultrasonic endoscope artificial intelligence fast, and shorten the training cycle of relevant artificial intelligence model.

In another embodiment of the present embodiment, a plurality of frame images in the frame image sequence are further filtered to obtain valid frame images, and specifically,

Further, screening out effective frame images according to the information entropy matrix change rate comprises:

calculating a derivative of a rate-of-change fitting function;

sorting the change rates of the information entropy matrixes from large to small according to the change rate fitting function derivative;

In some embodiments, calculating an entropy matrix of information for each frame image in the sequence of frame images comprises:

acquiring a coordinate range of the reduced image;

The preset reduction scale is for example 1/10,

traversing the temporary screenshot array tmp _ shotcut _ arr to obtain each screenshot tmp _ shotcut, reducing the width and height of each screenshot by 10 times to obtain a reduced image S, calculating the information entropy of the adjacent matrix through a user-defined algorithm, and forming an entropy matrix, wherein the formula is as follows:

where N is the adjacency calculation step size, W and H are the reduced pixel width and height, and col and row are used as pixel indices when the adjacency matrix traverses the reduced image S, respectively, where

Are the pixels of the image at the x and y positions.

In some embodiments, calculating the adjacency matrix information entropy according to the coordinate range of the reduced image includes:

flattening the whole matrix to obtain a flat function;

the number of values of the element values in the current matrix, which are the same as the S values of the pixels, can be obtained through statistics, and then the number of the same values is divided by the number of all the current pixels to obtain a ratio.

each one of which is

The probability calculation algorithm for each point in (1) is as follows:

the adjacency matrix information entropy is calculated from the ratio.

And E _ M is an information entropy matrix, and the size of the adjacent matrix is 10 x 10 when the step length N = 5.

Specifically, the method for calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image comprises the following steps:

obtaining correlation coefficients of two adjacent matrixes;

Obtaining the correlation coefficients of two adjacent matrices includes, but is not limited to, the above ways:

acquiring correlation coefficients of two adjacent matrixes by adding an entropy self-defined formula of noise;

calculating the change rate of the information entropy matrix of the adjacent frame images according to the correlation coefficients of the two adjacent matrixes

Wherein, the length of tmp _ shotcut _ arr is L,

to obtain a correlation coefficient

The result is then inserted into the dependency array tmp _ c _ arr.

It should be noted that the present application does not limit the specific method for obtaining the correlation coefficients of the two adjacent matrices.

Because the number of the automatically intercepted target frame images is large, the number of the continuous and repeated images is large, and the data set of artificial intelligence training requires data to have certain commonality, but the content cannot be too repeated, otherwise the training effect can be influenced, therefore, the effective frame images can be obtained by screening a plurality of frame images in the frame image sequence, the images which are automatically intercepted can be selected, and the effectiveness of the training data is improved.

Exemplary devices

Having described the method of an exemplary embodiment of the present invention, next, an explanation is given of an annotation data set expansion apparatus of an exemplary embodiment of the present invention with reference to fig. 2, the apparatus comprising:

an obtaining module 201, configured to obtain an original video, and generate a labeled data set according to the original video, where the labeled data set includes a labeled image and a label file corresponding to the labeled image;

a comparison module 201, configured to compare the annotated image with each frame of image in the original video, and obtain a corresponding frame of image of the annotated image in the original video;

an intercepting module 203, configured to intercept, from the original video, a frame image sequence within a preset time period before and after the corresponding frame image;

the expansion module 204 is configured to generate an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expand each frame image in the frame image sequence and its corresponding annotation file to the annotated data set.

In an embodiment of this embodiment, the expansion module 204 includes:

the computing unit is used for determining a plurality of labeled areas in the corresponding frame image according to the labeled file of the corresponding frame image and computing the characteristic value of each labeled area;

in some embodiments, the computing unit is configured to convert the images within the plurality of labeled regions into a grayscale image; obtaining all feature points in each marked region range by using a scale invariant feature transformation algorithm on the gray level image in each marked region range; and generating a characteristic value array of each labeled area according to all the characteristic points in the range of each labeled area.

And the generating unit is used for generating an annotation file for each frame image in the frame image sequence according to the characteristic value of each annotation area.

In some embodiments, the generating unit is configured to generate a new labeled region corresponding to each frame image according to the feature value array of each labeled region in each frame image of the frame image sequence; and generating a labeling file according to the new labeling area corresponding to each frame image.

Generating a new labeled area corresponding to each frame image according to the characteristic value array of each labeled area in each frame image of the frame image sequence, wherein the method comprises the following steps:

Further, generating a new labeling area corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array, including:

In an embodiment of the present invention, the comparing module 201 includes:

the time stamp obtaining unit is used for obtaining the time stamp of the marked image and the time stamp of each frame image in the original video;

in some embodiments, the timestamp acquisition unit is configured to traverse each image in the annotated dataset, intercepting a first timestamp region of interest of said each image; identifying a first timestamp string in the first timestamp region of interest; and converting the first time stamp character string according to a preset time stamp format to obtain the time stamp of the marked image.

Further, intercepting the first time stamp region of interest of each image by an OCR algorithm. A first time stamp character string in the first time stamp region of interest is identified using a numerical and symbolic OCR recognition algorithm.

Reading an original video, and acquiring each frame of image in the original video; intercepting a second time stamp region of interest of each frame of image; identifying a second timestamp string in the second timestamp region of interest; and converting the second time stamp character string according to a preset time stamp format to obtain the time stamp of each frame image in the original video.

The comparison unit is used for comparing the time stamp of the marked image with the time stamp of each frame image in the original video;

and the acquisition unit is used for taking the frame image with the same time stamp as the marked image in the original video as the corresponding frame image of the marked image in the original video.

In some embodiments, the obtaining unit is configured to: playing each frame of image in the original video according to the sequence of the frame number; in the playing process, comparing the time stamp of each frame of image in the original video with the time stamp of the labeled image; and if the time stamp of each frame of image in the original video is the same as that of the marked image, taking the frame image with the same time stamp as the marked image as the corresponding frame image of the marked image in the original video.

In an embodiment of the present embodiment, the method further includes:

and the screening module is used for screening a plurality of frame images in the frame image sequence to obtain an effective frame image.

In some embodiments, the screening module is configured to: calculating an information entropy matrix of each frame image in the frame image sequence; calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image; and screening out effective frame images according to the information entropy matrix change rate.

Screening out effective frame images according to the information entropy matrix change rate, wherein the screening out effective frame images comprises the following steps:

calculating a derivative of a rate-of-change fitting function;

Calculating an information entropy matrix of each frame image in the frame image sequence, including:

acquiring a coordinate range of the reduced image;

The predetermined reduction ratio is 1/10.

Calculating the information entropy of the adjacency matrix according to the coordinate range of the reduced image, wherein the calculation comprises the following steps:

flattening the whole matrix to obtain a flat function;

Calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image, and the method comprises the following steps:

obtaining correlation coefficients of two adjacent matrixes;

Exemplary Medium

Having described the apparatus according to the exemplary embodiment of the present invention, next, a computer-readable storage medium according to the exemplary embodiment of the present invention is described with reference to fig. 3, please refer to fig. 3, which illustrates a computer-readable storage medium being an optical disc 30, on which a computer program (i.e., a program product) is stored, where the computer program, when executed by a processor, implements the steps described in the above method embodiment, for example, acquiring an original video, generating a labeled data set from the original video, where the labeled data set includes a labeled image and a labeled file corresponding to the labeled image; comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video; intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video; generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set; the specific implementation of each step is not repeated here.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

Exemplary computing device

Having described the method, medium, and apparatus of exemplary embodiments of the present invention, a computing device of exemplary embodiments of the present invention is next described with reference to FIG. 4.

FIG. 4 illustrates a block diagram of an exemplary computing device 40, which computing device 40 may be a computer system or server, suitable for use in implementing embodiments of the present invention. The computing device 40 shown in FIG. 4 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.

As shown in fig. 4, components of computing device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401).

Computing device 40 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 40 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)4021 and/or cache memory 4022. Computing device 30 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM4023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. At least one program product may be included in system memory 402 having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 4025 having a set (at least one) of program modules 4024 may be stored, for example, in system memory 402, and such program modules 4024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. The program modules 4024 generally perform the functions and/or methods of the embodiments described herein.

Computing device 40 may also communicate with one or more external devices 404, such as a keyboard, pointing device, display, etc. Such communication may be through an input/output (I/O) interface. Also, computing device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 406. As shown in FIG. 4, network adapter 406 communicates with other modules of computing device 40, such as processing unit 401, over bus 403. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with computing device 40.

The processing unit 401 executes various functional applications and data processing by running the program stored in the system memory 402, for example, acquiring an original video, and generating a labeled data set from the original video, where the labeled data set includes a labeled image and a label file corresponding to the labeled image; comparing the marked image with each frame of image in the original video to obtain a corresponding frame of image of the marked image in the original video; intercepting frame image sequences in a preset time period before and after the corresponding frame image from the original video; generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set; the specific implementation of each step is not repeated here.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the data set expansion device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An augmentation method for an annotated data set, comprising:

intercepting a frame image sequence within a preset time period adjacent to the corresponding frame image from the original video;

generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image to the annotated data set;

calculating the characteristic value of each labeling area;

2. The augmentation method for annotated data set according to claim 1, wherein the calculating the feature value of each annotated zone comprises:

3. The augmentation method of annotation data set according to claim 2, wherein the generating an annotation file for each frame image in the sequence of frame images according to the feature value of each annotation region comprises:

4. The method for expanding an annotated data set according to claim 3, wherein the generating a new annotated zone corresponding to each frame image in each frame image of the sequence of frame images according to the feature value array of each annotated zone comprises:

5. The augmentation method of the annotated data set according to claim 4, wherein the generating of the new annotated zone corresponding to each frame image according to the maximum value and the minimum value in the new abscissa array and the maximum value and the minimum value in the new ordinate array comprises:

6. The method for expanding an annotated data set according to claim 1, wherein the comparing the annotated image with each frame of image in the original video to obtain the corresponding frame of image of the annotated image in the original video comprises:

7. The augmentation method of an annotated data set according to claim 6, wherein the obtaining of the time stamp of the annotated image comprises:

identifying a first timestamp string in the first timestamp region of interest;

8. The augmentation method of annotation data set of claim 7, wherein the intercepting the first time-stamped region of interest of each image comprises:

9. The augmentation method for annotated data set according to claim 7, wherein the identifying of the first timestamp string in the first timestamp region of interest comprises:

10. The augmentation method of annotation data set according to claim 8, wherein obtaining a time stamp of each frame image in the original video comprises:

intercepting a second time stamp region of interest of each frame of image;

11. The augmentation method for annotated data set according to claim 6, wherein the step of using the frame image of the original video with the same timestamp as the annotated image as the corresponding frame image of the annotated image in the original video comprises:

12. The augmentation method for annotated data set according to any one of claims 1 to 11, further comprising:

13. The augmentation method for annotated data set according to claim 12, wherein the screening of the plurality of frame images in the sequence of frame images to obtain the valid frame image comprises:

14. The augmentation method for annotated data set according to claim 13, wherein the filtering out the valid frame image according to the information entropy matrix change rate comprises:

calculating a derivative of a rate-of-change fitting function;

15. The augmentation method of annotation data set according to claim 13, wherein the calculating of the entropy matrix of information for each frame image in the sequence of frame images comprises:

acquiring a coordinate range of the reduced image;

16. The augmentation method of claim 15, wherein the predetermined reduction ratio is 1/10.

17. The annotation data set expansion method of claim 15, wherein the computing of the adjacency matrix information entropy according to the coordinate range of the reduced image comprises:

flattening the whole matrix to obtain a flat function;

18. The augmentation method for annotated data set according to claim 13, wherein the calculating the change rate of the information entropy matrix of the adjacent frame image according to the information entropy matrix of each frame image comprises:

obtaining correlation coefficients of two adjacent matrixes;

19. The augmentation method for labeling data set according to claim 18, wherein the obtaining of the correlation coefficients of two adjacent matrices comprises:

20. The augmentation method for labeling data set according to claim 18, wherein the obtaining of the correlation coefficients of two adjacent matrices comprises:

21. An augmentation apparatus for annotated data sets, comprising:

the expansion module is used for generating an annotation file for each frame image in the frame image sequence according to the annotation file corresponding to the annotated image, and expanding each frame image in the frame image sequence and the annotation file corresponding to the frame image sequence to the annotated data set; the method comprises the following steps:

calculating the characteristic value of each labeling area;

22. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-20.

23. A computing device, the computing device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor configured to perform the method of any of the preceding claims 1-20.