CN113298112B - Integrated data intelligent labeling method and system - Google Patents
Integrated data intelligent labeling method and system Download PDFInfo
- Publication number
- CN113298112B CN113298112B CN202110358429.8A CN202110358429A CN113298112B CN 113298112 B CN113298112 B CN 113298112B CN 202110358429 A CN202110358429 A CN 202110358429A CN 113298112 B CN113298112 B CN 113298112B
- Authority
- CN
- China
- Prior art keywords
- labeling
- sample set
- data
- labeled
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 328
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000007781 pre-processing Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 11
- 238000012886 linear function Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 3
- 238000002203 pretreatment Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000000605 extraction Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 239000012212 insulator Substances 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Tourism & Hospitality (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an integrated data intelligent labeling method and system, comprising the following steps: receiving a labeling task request of a user, and starting a corresponding labeling task module to execute a labeling task; receiving a first sample set to be marked, marking template setting data and parameter setting data; invoking a matched preprocessing method in a preset preprocessing tool library to preprocess a sample set to be marked; according to the marking template setting data and the parameter setting data, sequentially judging a plurality of marking methods in the started marking task module according to the priority order to obtain a marking method which is optimally matched with the marking task; the method and the device realize the integrated data labeling of the sample set to be labeled for the types of power voice, text, video and images, further realize the unification of a sample data preprocessing method and a sample labeling method, and avoid the problems of different labeling modes, repeated operation of data labeling work and low efficiency for each company of power data.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an integrated data intelligent labeling method and system.
Background
At present, the national network company has various artificial intelligence application requirements, such as the service fields of equipment image recognition, voiceprint recognition and the like, and further has the following technical problems in the labeling work of multiple types of data such as power images, voiceprint data and the like:
(1) The labeling amount of the multi-type data is large, the cost investment is high, no unified labeling standard exists, and the labeling method is urgent to be perfected and enriched: the national network has a large amount of power related data, the data is comprehensive, but the current labeling method of the national network depends on manpower, so that the labeling price is too high. And the labeling modes of all companies in the national power grid are different, so that all companies still need to reprocess when using the data of other companies, the team efficiency is low, the data is repeatedly operated, and a large amount of resources are wasted.
(2) The lack of a unified management platform can cause the increase of potential safety hazards of data and high management cost, and the research and development of a multi-type data integrated labeling platform for management is needed; different data management platforms are used by various companies of the national power grid, and data loss is caused by limited storage conditions, so that potential safety hazards of the data are caused.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an integrated data intelligent labeling method, an integrated data intelligent labeling system, electronic equipment and a computer readable storage medium, which realize the integrated data labeling of a sample set to be labeled of electric power voice, text, video and image types, and the technical scheme is as follows:
in a first aspect, an integrated data intelligent labeling method is provided, including:
receiving a labeling task request of a user, and starting a corresponding labeling task module to execute a labeling task;
receiving a first sample set to be marked, marking template setting data and parameter setting data;
according to the type of the first sample set to be marked, invoking a matched preprocessing method in a preset preprocessing tool library to preprocess the sample set to be marked;
according to the marking template setting data and the parameter setting data, sequentially judging a plurality of marking methods in the started marking task module according to the priority order to obtain a marking method which is optimally matched with the marking task;
and auditing the labeling data of the labeled sample and outputting the sample carrying the labeling data which passes the auditing.
In one possible implementation manner, the labeling task module includes a first labeling task module, a second labeling task module, a third labeling task module, and a fourth labeling task module, which are respectively used for processing a sample set to be labeled of a power voice, a text, a video, and an image type.
In one possible implementation manner, according to the labeling template setting data and the parameter setting data, a plurality of labeling methods in the started labeling task module are sequentially judged according to a priority order, and a labeling method which is best adapted to the labeling task is obtained, including:
(31) Judging whether a labeling method corresponding to a labeling task exists or not according to the labeling template setting data and the parameter setting data, if yes, entering a step (32); otherwise, go to step (33);
(32) Executing a first labeling method, and calling the existing labeling method to execute a labeling task of a first sample set to be labeled;
(33) Judging whether a labeling method of a similar labeling task exists, if so, entering a step (34), otherwise, entering a step (35);
(34) Executing a second labeling method, obtaining a second labeling sample set in the first sample set to be labeled, wherein the second labeling sample set is a sample carrying labeling data which passes the examination, training and optimizing the labeling method of the existing similar labeling task based on the second labeling sample set, and labeling a non-second sample set to be labeled in the first sample set by adopting the labeling method after optimizing;
(35) And executing a third labeling method, acquiring manual labeling participants selected by the user side, sending a data labeling task invitation notice to the participant client side, and receiving labeling data of the first sample set to be labeled sent by the participant client side.
In one possible implementation, the step (32) further includes:
(41) Judging whether all labeling in the first sample set to be labeled is completed, if so, completing the labeling, namely determining that the sample data in the first sample set to be labeled is all valid, otherwise, entering a step (42);
(42) Executing a fourth labeling method, wherein a third labeling sample set is formed by the samples which are labeled in the first sample set to be labeled, a fourth sample set to be labeled is formed by the samples which are not labeled, the sample similarity of the fourth sample set to be labeled and the third sample set to be labeled is judged, if the similarity is larger than a preset second threshold value, the step (44) is carried out, otherwise, the step (43) is carried out;
(43) Forming a fifth sample set to be marked by samples in a fourth sample set to be marked, wherein the similarity of the samples is not greater than a preset second threshold value, and judging the fifth sample set to be marked as an invalid sample;
(44) And forming samples in a fourth sample set to be marked, the similarity of which is greater than a preset second threshold value, into a sixth sample set to be marked, judging whether the number of the samples in the sixth sample set to be marked is greater than a preset value, if so, executing a third marking method, and otherwise, outputting the sixth sample set to be marked.
In one possible implementation manner, the step (35) of receiving annotation data of a first sample set to be annotated sent by a participant client further includes:
constructing a model to be trained based on a preset data annotation model, annotation template setting data and parameter setting data;
training the model to be trained through the received labeling data of the first sample set to be labeled, and obtaining a labeling method matched with the first sample set to be labeled, the labeling template setting data and the parameter setting data.
In one possible implementation, the preprocessing step of the power image data includes denoising, image restoration, and image enhancement processing, where the image enhancement processing includes:
(61) Obtaining a corresponding first high-resolution image and a corresponding second high-resolution image from the image to be marked;
(62) Setting weight distribution of corresponding pixel points in the first high-resolution image and the second high-resolution image based on feature comparison of the corresponding pixel points in the first high-resolution image and the second high-resolution image;
(63) And fusing the same pixel point in the two high-resolution images based on the weight distribution result.
In one possible implementation manner, the setting the weight distribution of the corresponding pixel points in the first high-resolution image and the second high-resolution image includes acquiring a first weight distribution map and a second weight distribution map, and the acquiring method includes:
(621) Taking the first high resolution image as an input image;
(622) Taking the input image as a t=1st reference image, and carrying out t=1st pixel denoising processing on the input image by adopting the method of the step (623);
(623) Sliding on a t-th reference image based on a preset sliding window to obtain a plurality of window pixel areas, fitting an image of each window pixel area to a corresponding window pixel area of an input image through a linear function, and superposing and fusing a denoising processing result of one pixel point based on the linear function corresponding to the plurality of window pixel areas to which the pixel point belongs to obtain an output image;
(624) Taking an output image of the t-1 (t > 1) sub-pixel denoising process as a t reference image, and performing the t sub-pixel denoising process on an input image by adopting the method of the step (623);
(625) Carrying out Gaussian filtering on the difference image of the T-th output image and the difference image of the input image to obtain a first intermediate image;
(626) Processing the second high resolution image based on the methods of (621) - (625) to obtain a second intermediate image;
(627) And comparing the values of the pixel points corresponding to the first intermediate image and the second intermediate image, distributing weights a to the pixel points with larger pixel values in the corresponding pixel points, and distributing weights 1-a to the pixel points with smaller pixel values, wherein a is more than 0.5, so that the first weight of each pixel point of the first intermediate image is recorded as a first weight distribution map, and the second weight of each pixel point of the second intermediate image is recorded as a second weight distribution map.
In a second aspect, an integrated data intelligent labeling system is provided, including:
the task request processing unit is used for receiving a labeling task request of a user, and selecting and starting one labeling task module to execute a labeling task;
the to-be-marked data acquisition unit is used for receiving the first to-be-marked sample set, the marking template setting data and the parameter setting data;
the data pretreatment unit to be marked is used for calling a matched pretreatment method in a preset pretreatment tool library to pretreat the sample set to be marked according to the type of the first sample set to be marked;
the intelligent labeling unit is used for sequentially judging a plurality of labeling methods in the started labeling task module according to the labeling template setting data and the parameter setting data and the priority order to obtain a labeling method which is optimally matched with the labeling task;
the labeling result output unit is used for auditing the labeling data of the labeled sample and outputting the sample carrying the labeling data which passes the auditing.
In a third aspect, an electronic device is provided, the electronic device comprising:
a memory for storing executable instructions;
and the processor is used for realizing the integrated data intelligent labeling method when the executable instructions stored in the memory are operated.
In a fourth aspect, a computer readable storage medium is provided, where executable instructions are stored, where the executable instructions when executed by a processor implement the integrated data intelligent labeling method described above.
The integrated data intelligent labeling method and system have the following beneficial effects:
1. with the high-speed development of emerging technologies such as artificial intelligence technology and big data technology, the project develops an integrated data intelligent labeling platform, realizes unified labeling of multiple types of data of power service, improves the resources of the power service data, realizes the asset of the power service data, and provides a basic support for the floor application of artificial intelligence on a power grid.
2. The first labeling method, the second labeling method, the third labeling method and the fourth labeling method are arranged to realize intelligent labeling of characteristics of different data sample sets to be labeled, integrated data labeling of the data sample sets to be labeled of power voice, text, video and image types is realized, labeling tasks are executed by judging and acquiring the labeling methods optimally matched with the labeling tasks, unification of sample data preprocessing methods and sample labeling methods of the same labeling tasks is realized, and the problems of different labeling modes of power data companies, repeated operation of data labeling work and low efficiency are avoided. And the sample labeling method and the labeling sample sharing are realized through the integrated intelligent labeling platform, so that the accuracy of intelligent labeling and the labeling task diversity in the integrated intelligent labeling platform are further improved.
Drawings
FIG. 1 is an overall flow chart of an integrated data intelligent labeling method according to an embodiment of the invention;
FIG. 2 is a flow chart of a labeling method for achieving best fit with a labeling task according to an embodiment of the invention;
FIG. 3 is a flowchart of a method for acquiring a first weight distribution diagram and a second weight distribution diagram of an image to be annotated according to an embodiment of the present invention;
FIG. 4 is a block diagram of an integrated data intelligent labeling system in an embodiment of the invention.
Detailed Description
The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.
The embodiment of the invention provides an integrated data intelligent labeling method, which comprises the following steps:
receiving a labeling task request of a user, starting a corresponding labeling task module to execute a labeling task, dividing the labeling task into a plurality of labeling task modules according to the types of data to be labeled in the embodiment, wherein the labeling task modules comprise a first labeling task module, a second labeling task module, a third labeling task module and a fourth labeling task module which are respectively used for processing a sample set to be labeled of the types of electric power voice, texts, videos and images;
receiving a first sample set to be marked, marking template setting data and parameter setting data, wherein the marking template can be the classification requirement of marking tasks, such as normal and defect judgment classification of insulators in images, setting marking forms and marking formats, and the like;
according to the type of the first sample set to be marked, a matched preprocessing method in a preset preprocessing tool library is called to preprocess the sample set to be marked, in the embodiment, a data preprocessing tool library is established based on technologies such as preprocessing technologies and feature extraction of multi-type data such as infrared images, visible light images and equipment operation voiceprints, preprocessing of electric voiceprint data can comprise denoising, enhancement, feature extraction and the like, voiceprint data preprocessing is realized, the quality of electric voiceprint data is improved, preprocessing of electric text data can comprise marking, normalization, substitution, word segmentation, feature (core vocabulary) extraction and the like, preprocessing on original text corpus is realized, and tasks such as text mining or NLP are prepared;
according to the marking template setting data and the parameter setting data, sequentially judging a plurality of marking methods in the started marking task module according to the priority order to obtain a marking method which is optimally matched with the marking task;
and auditing the labeling data of the labeled sample and outputting the sample carrying the labeling data which passes the auditing.
In the embodiment of the application, corresponding preprocessing tool libraries are built for various types of electric data, and in the multiple labeling methods in the started labeling task module, the labeling method which is optimally matched with the labeling task is obtained to execute the labeling task, so that the unification of a sample data preprocessing method and a sample labeling method of the same labeling task is realized, and the problems of different labeling modes, repeated data labeling operation and low efficiency of various companies of the electric data are avoided. And the sample labeling method and the labeling sample sharing are realized through the integrated intelligent labeling platform, so that the accuracy of intelligent labeling and the labeling task diversity in the integrated intelligent labeling platform are further improved.
According to the marking template setting data and the parameter setting data, the marking methods in the started marking task module are sequentially judged according to the priority order, and the marking method which is best adapted to the marking task is obtained, and the marking method comprises the following steps:
(31) Judging whether a labeling method corresponding to a labeling task exists or not according to the labeling template setting data and the parameter setting data, if yes, entering a step (32); otherwise, step (33) is carried out, wherein the judging method can be used for judging whether the labeling task of the insulator of the power transmission line is carried out in the system or not according to the fact that whether the type of the data to be labeled is the same as the labeling task or not, if so, step (32) is carried out, and if not, step (33) is carried out;
(32) Executing a first labeling method, calling the existing labeling method to execute the labeling task of the first sample set to be labeled, wherein the labeling method in the embodiment can comprise an automatic labeling network model which is trained, such as a neural network model, or an image feature extraction algorithm/model, a labeling model and the like, and the labeling method directly utilizes the original labeling method, so that the labeling efficiency is improved, and the repeated training process of the automatic labeling method is avoided;
(33) Judging whether a labeling method of a similar labeling task exists, if yes, entering a step (34), otherwise, entering a step (35), wherein the judging whether the labeling method of the similar labeling task exists or not can be according to the similarity between a specific task of the labeling task and the executed labeling task (comprising a labeling sample and labeling data), and comprises the following steps:
extracting data characteristics of an original labeling sample set carrying labeling data and a first sample set sample to be labeled;
converting all extracted sample features into the same feature space, and then carrying out similarity calculation comparison between the original labeling sample set carrying labeling data and the first sample set to be labeled, wherein the same feature space is RKHS space, and the similarity calculation adopts a maximum mean difference MMD algorithm;
sorting the plurality of original labeling sample sets carrying labeling data according to the similarity between the plurality of original labeling sample sets carrying labeling data and the first to-be-labeled sample set, and acquiring a labeling method adopted by the original labeling sample sets carrying labeling data with the similarity larger than a preset first threshold value as a labeling method of similar labeling tasks;
(34) Executing a second labeling method to obtain a second labeling sample set in the first sample set to be labeled, wherein the second labeling sample set is a sample which is subjected to verification and carries labeling data, the second labeling sample set can be data obtained after a user labels part of sample data with a small selection number of the first sample set to be labeled, training and optimizing the labeling method of the existing similar labeling task is carried out based on the second labeling sample set, and labeling is carried out on non-second sample sets to be labeled in the first sample set by adopting the labeling method after optimizing, and the labeling method is based on the existing labeling method and the sample data which are labeled by a small amount of samples by the user, so that the labeling method matched with the first sample set to be labeled is obtained through training and optimizing, the labor cost and the repeated operation of the data labeling are reduced, and the input economic cost and time cost are reduced;
(35) Executing a third labeling method, namely, acquiring manual labeling participant selected by a user side, sending a data labeling task invitation notice to a participant client side, and receiving labeling data of a first sample set to be labeled sent by the participant client side, wherein the labeling method is a processing method of a brand-new labeling task which is not executed by a system, and is biased to adopting the manual labeling method for the labeling task of the type of data, and after the labeling is completed, training the labeling method of the type of labeling task based on a labeled sample serving as a training set, so as to obtain an automatic labeling method which can be executed by a computer, wherein the method comprises the following steps:
constructing a model to be trained based on a preset data annotation model, annotation template setting data and parameter setting data, wherein the preset data annotation model is an algorithm model which is initially constructed based on a preset algorithm and initialization algorithm parameter setting, for example, the preset data annotation model adopts a convolutional neural network algorithm, and loss function calculation, back propagation optimization algorithm and the like in the initialization setting network model;
training the model to be trained through the received labeling data of the first sample set to be labeled, and obtaining a labeling method matched with the first sample set to be labeled, the labeling template setting data and the parameter setting data.
In the step (32), the labeling task on the first sample set to be labeled is performed by completely adopting the existing labeling method, and considering the validity of the sample data in the first sample set to be labeled, the method further includes:
(41) Judging whether all labeling in the first sample set to be labeled is completed, if so, completing the labeling, namely determining that the sample data in the first sample set to be labeled is all valid, otherwise, entering a step (42);
(42) Executing a fourth labeling method, wherein a third labeling sample set is formed by the samples which are subjected to labeling in the first sample set to be labeled, a fourth sample set to be labeled is formed by the samples which are not subjected to labeling, the sample similarity of the fourth sample set to be labeled and the third sample set to be labeled is judged, if the similarity is larger than a preset second threshold value, a step (44) is entered, otherwise, a step (43) is entered, in this embodiment, the sample similarity of the fourth sample set to be labeled and the sample similarity of the third sample set to be labeled can be calculated, for example, when the sample type in the first sample set to be labeled is an image, the image feature extraction can be performed through a convolution layer of an algorithm such as YOLO, googleNet, resNet or a histogram feature is adopted, then the feature vector distance of two images is calculated to judge the similarity, and for the samples with the similarity larger than a preset second threshold value, the sample data can be effectively identified can be judged;
(43) Forming a fifth sample set to be marked by samples in a fourth sample set to be marked, wherein the similarity is not greater than a preset second threshold value, judging the fifth sample set to be an invalid sample, for example, an image sample in the first sample set to be marked should theoretically be a power transmission line insulator image in an electric power equipment image, and judging an image which does not contain an insulator or is very fuzzy and damaged and unrecognizable as invalid data;
(44) And forming samples in a fourth sample set to be marked, which has similarity greater than a preset second threshold value, into a sixth sample set to be marked, judging whether the number of the samples in the sixth sample set to be marked is greater than a preset value, if so, executing a third marking method, and directly carrying out data feature recognition and data marking on the sample data which cannot be directly marked based on the existing marking method.
In addition, in this embodiment, after the execution of the first labeling method, the second labeling method, the third labeling method, and the fourth labeling method is completed, the output samples carrying the labeling data can be packaged and output only through an audit process, so as to ensure the quality of the data labeling.
The integrated data intelligent labeling platform based on the integrated data intelligent labeling method in the embodiment realizes unified management of methods such as preprocessing, feature extraction, data labeling and the like of various types of data in an electric power application scene, wherein the preprocessing steps of image data comprise denoising, image restoration and image enhancement processing, and the image enhancement processing comprises:
(61) Obtaining a corresponding first high-resolution image and a corresponding second high-resolution image from the image to be marked;
(62) Setting weight distribution of corresponding pixel points in the first high-resolution image and the second high-resolution image based on feature comparison of the corresponding pixel points in the first high-resolution image and the second high-resolution image;
(63) Fusing the same pixel point in the two high-resolution images based on the weight distribution result;
wherein (62) comprises the steps of:
acquiring a first weight distribution diagram and a second weight distribution diagram:
(621) Taking the first high resolution image as an input image;
(622) Taking the input image as a t=1st reference image, and carrying out t=1st pixel denoising processing on the input image by adopting the method of the step (623);
(623) Sliding on a t-th reference image based on a preset sliding window to obtain a plurality of window pixel areas, fitting an image of each window pixel area to a corresponding window pixel area of an input image through a linear function, and superposing and fusing a denoising processing result of one pixel point based on the linear function corresponding to the plurality of window pixel areas to which the pixel point belongs to obtain an output image;
in this step, the linear function formula is:
O i =a k I i +b k ,wherein the linear coefficient->Wherein w is k For window pixel region w, I centered on k i Pixel points I, O for reference image I i For pixel points i, p of the input image O i For pixel i, of the input image p>A pixel mean value of the input image p in a window w; mu (mu) k 、/>The mean and variance of the pixels of the reference image p within the window w are respectively, epsilon being a constant.
(624) Taking an output image of the t-1 (t > 1) sub-pixel denoising process as a t reference image, and performing the t sub-pixel denoising process on an input image by adopting the method of the step (623);
(625) Carrying out Gaussian filtering on the difference image of the T-th output image and the difference image of the input image to obtain a first intermediate image;
(626) Processing the second high resolution image based on the methods of (621) - (625) to obtain a second intermediate image;
(627) Based on the values of the pixel points corresponding to the first intermediate image and the second intermediate image, distributing weights a to the pixel points with larger pixel values in the corresponding pixel points, and distributing weights 1-a to the pixel points with smaller pixel values, wherein a is more than 0.5, so that a first weight of each pixel point of the first intermediate image is recorded as a first weight distribution map, and a second weight of each pixel point of the second intermediate image is recorded as a second weight distribution map;
acquiring third to sixth weight distribution graphs:
(628) Based on the first high resolution image as an input image;
(629) Taking the first weight distribution diagram as a t=1 th reference image, and carrying out t=1 th pixel denoising processing on the input image by adopting the method of the step (623);
(6210) Taking an output image of the t-1 (t > 1) sub-pixel denoising process as a t reference image, and performing the t sub-pixel denoising process on an input image by adopting the method of the step (623);
(6211) Acquiring an output image as a third weight distribution map;
taking the first weight distribution image as an input image, taking the first high-resolution image as a t=1st reference image, and referring to the methods of the steps (628) - (6210) to obtain a fourth weight distribution map;
taking the second high-resolution image as an input image, taking the second weight distribution image as a t=1st reference image, and acquiring a fifth weight distribution map by referring to the methods of the steps (628) - (6210);
taking the second weight distribution image as an input image, taking the second high-resolution image as a t=1st reference image, and referring to the methods of the steps (628) - (6210) to obtain a sixth weight distribution map;
the step (63) specifically comprises:
and weighting the first high-resolution image based on the third weight distribution diagram and/or the fourth weight distribution diagram, weighting the second high-resolution image based on the fifth weight distribution diagram and/or the sixth weight distribution diagram, and fusing the weighted first high-resolution image and the weighted second high-resolution image to obtain an image enhancement processing result of the image to be marked.
The first high-resolution image and the second high-resolution image are respectively high-resolution images with different types of characteristics, which are acquired through different super-resolution reconstruction methods, and the high-resolution images are weighted and fused through the acquisition of the first weight distribution diagram to the sixth weight distribution diagram, so that the image edge information is enhanced, and the visual effect of the fused image is improved.
The embodiment also provides an integrated data intelligent labeling system, which comprises:
the task request processing unit is used for receiving a labeling task request of a user, and selecting and starting one labeling task module to execute a labeling task;
the to-be-marked data acquisition unit is used for receiving the first to-be-marked sample set, the marking template setting data and the parameter setting data;
the data pretreatment unit to be marked is used for calling a matched pretreatment method in a preset pretreatment tool library to pretreat the sample set to be marked according to the type of the first sample set to be marked;
the intelligent labeling unit is used for sequentially judging a plurality of labeling methods in the started labeling task module according to the labeling template setting data and the parameter setting data and the priority order to obtain a labeling method which is optimally matched with the labeling task;
the labeling result output unit is used for auditing the labeling data of the labeled sample and outputting the sample carrying the labeling data which passes the auditing.
For specific limitation of the integrated data intelligent labeling system, reference may be made to the limitation of the integrated data intelligent labeling method hereinabove, and no further description is given here. All or part of each unit in the integrated data intelligent labeling system can be realized by software, hardware and a combination thereof. The units can be embedded in hardware or independent of a processor in the computer equipment, and can also be stored in a memory in the computer equipment in a software mode, so that the processor can call and execute the operations corresponding to the units.
The embodiment also provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the integrated data intelligent labeling method when the executable instructions stored in the memory are operated.
The embodiment also provides a computer readable storage medium, which stores executable instructions, wherein the executable instructions realize the integrated data intelligent labeling method when being executed by a processor.
Wherein the processor of the electronic device is configured to provide computing and control capabilities.
The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system, an electronic program and a database, and the database is used for storing data samples to be marked, marked data samples and the like; the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The computer readable storage medium may be a read-only memory (ROM), a random access memory (random access memory, RAM), a compact disc read-only memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage node, etc.
The present invention is not limited to the above-described specific embodiments, and various modifications may be made by those skilled in the art without inventive effort from the above-described concepts, and are within the scope of the present invention.
Claims (7)
1. The integrated data intelligent labeling method is characterized by comprising the following steps of:
receiving a labeling task request of a user, and starting a corresponding labeling task module to execute a labeling task, wherein the labeling task module comprises a first labeling task module, a second labeling task module, a third labeling task module and a fourth labeling task module which are respectively used for processing a sample set to be labeled of electric power voice, text, video and image types;
receiving a first sample set to be marked, marking template setting data and parameter setting data;
according to the type of the first sample set to be marked, invoking a matched preprocessing method in a preset preprocessing tool library to preprocess the sample set to be marked;
according to the marking template setting data and the parameter setting data, sequentially judging a plurality of marking methods in the started marking task module according to the priority order to obtain a marking method which is optimally matched with the marking task;
auditing the labeling data of the labeled sample and outputting the sample carrying the labeling data which passes the auditing;
the labeling method for obtaining the best fit with the labeling task comprises the following steps:
(31) Judging whether a labeling method corresponding to a labeling task exists or not according to the labeling template setting data and the parameter setting data, if yes, entering a step (32); otherwise, go to step (33);
(32) Executing a first labeling method, and calling the existing labeling method to execute a labeling task of a first sample set to be labeled;
(33) Judging whether a labeling method of a similar labeling task exists, if so, entering a step (34), otherwise, entering a step (35);
(34) Executing a second labeling method, obtaining a second labeling sample set in the first sample set to be labeled, wherein the second labeling sample set is a sample carrying labeling data which passes the examination, training and optimizing the labeling method of the existing similar labeling task based on the second labeling sample set, and labeling a non-second sample set to be labeled in the first sample set by adopting the labeling method after optimizing;
(35) Executing a third labeling method, acquiring manual labeling participants selected by a user side, sending a data labeling task invitation notice to a participant client side, and receiving labeling data of a first sample set to be labeled sent by the participant client side;
the step (32) further includes:
(41) Judging whether all labeling in the first sample set to be labeled is completed, if so, completing the labeling, namely determining that the sample data in the first sample set to be labeled is all valid, otherwise, entering a step (42);
(42) Executing a fourth labeling method, wherein a third labeling sample set is formed by the samples which are labeled in the first sample set to be labeled, a fourth sample set to be labeled is formed by the samples which are not labeled, the sample similarity of the fourth sample set to be labeled and the third sample set to be labeled is judged, if the similarity is larger than a preset second threshold value, the step (44) is carried out, otherwise, the step (43) is carried out;
(43) Forming a fifth sample set to be marked by samples in a fourth sample set to be marked, wherein the similarity of the samples is not greater than a preset second threshold value, and judging the fifth sample set to be marked as an invalid sample;
(44) And forming samples in a fourth sample set to be marked, the similarity of which is greater than a preset second threshold value, into a sixth sample set to be marked, judging whether the number of the samples in the sixth sample set to be marked is greater than a preset value, if so, executing a third marking method, and otherwise, outputting the sixth sample set to be marked.
2. The method for intelligent labeling of integrated data according to claim 1, wherein the step (35) of receiving labeling data of the first sample set to be labeled sent by the participant client further comprises:
constructing a model to be trained based on a preset data annotation model, annotation template setting data and parameter setting data;
training the model to be trained through the received labeling data of the first sample set to be labeled, and obtaining a labeling method matched with the first sample set to be labeled, the labeling template setting data and the parameter setting data.
3. The integrated data intelligent labeling method according to claim 1, wherein the preprocessing step of the power image data comprises denoising, image restoration and image enhancement processing, and the image enhancement processing comprises:
(61) Obtaining a corresponding first high-resolution image and a corresponding second high-resolution image from the image to be marked;
(62) Setting weight distribution of corresponding pixel points in the first high-resolution image and the second high-resolution image based on feature comparison of the corresponding pixel points in the first high-resolution image and the second high-resolution image;
(63) And fusing the same pixel point in the two high-resolution images based on the weight distribution result.
4. The integrated data intelligent labeling method according to claim 3, wherein the setting of the weight distribution of the corresponding pixels in the first high-resolution image and the second high-resolution image includes obtaining a first weight distribution map and a second weight distribution map, and the obtaining method includes:
(621) Taking the first high resolution image as an input image;
(622) Taking the input image as a t=1st reference image, and carrying out t=1st pixel denoising processing on the input image by adopting the method of the step (623);
(623) Sliding on a t-th reference image based on a preset sliding window to obtain a plurality of window pixel areas, fitting an image of each window pixel area to a corresponding window pixel area of an input image through a linear function, and superposing and fusing a denoising processing result of one pixel point based on the linear function corresponding to the plurality of window pixel areas to which the pixel point belongs to obtain an output image;
(624) Taking an output image of the t-1 (t > 1) sub-pixel denoising process as a t reference image, and performing the t sub-pixel denoising process on an input image by adopting the method of the step (623);
(625) Carrying out Gaussian filtering on the difference image of the T-th output image and the difference image of the input image to obtain a first intermediate image;
(626) Processing the second high resolution image based on the methods of (621) - (625) to obtain a second intermediate image;
(627) And comparing the values of the pixel points corresponding to the first intermediate image and the second intermediate image, distributing weights a to the pixel points with larger pixel values in the corresponding pixel points, and distributing weights 1-a to the pixel points with smaller pixel values, wherein a is more than 0.5, so that the first weight of each pixel point of the first intermediate image is recorded as a first weight distribution map, and the second weight of each pixel point of the second intermediate image is recorded as a second weight distribution map.
5. Integration data intelligence annotation system, its characterized in that includes:
the task request processing unit is used for receiving a labeling task request of a user, selecting and starting one labeling task module to execute a labeling task, wherein the labeling task module comprises a first labeling task module, a second labeling task module, a third labeling task module and a fourth labeling task module which are respectively used for processing a sample set to be labeled of electric power voice, text, video and image types;
the to-be-marked data acquisition unit is used for receiving the first to-be-marked sample set, the marking template setting data and the parameter setting data;
the data pretreatment unit to be marked is used for calling a matched pretreatment method in a preset pretreatment tool library to pretreat the sample set to be marked according to the type of the first sample set to be marked;
the intelligent labeling unit is used for sequentially judging a plurality of labeling methods in the started labeling task module according to the labeling template setting data and the parameter setting data and the priority order to obtain a labeling method which is optimally matched with the labeling task;
the labeling result output unit is used for auditing the labeling data of the labeled sample and outputting the sample carrying the labeling data which passes the auditing;
the intelligent labeling unit obtains a labeling method which is optimally adapted to a labeling task, and the intelligent labeling unit comprises the following steps:
(31) Judging whether a labeling method corresponding to a labeling task exists or not according to the labeling template setting data and the parameter setting data, if yes, entering a step (32); otherwise, go to step (33);
(32) Executing a first labeling method, and calling the existing labeling method to execute a labeling task of a first sample set to be labeled;
(33) Judging whether a labeling method of a similar labeling task exists, if so, entering a step (34), otherwise, entering a step (35);
(34) Executing a second labeling method, obtaining a second labeling sample set in the first sample set to be labeled, wherein the second labeling sample set is a sample carrying labeling data which passes the examination, training and optimizing the labeling method of the existing similar labeling task based on the second labeling sample set, and labeling a non-second sample set to be labeled in the first sample set by adopting the labeling method after optimizing;
(35) Executing a third labeling method, acquiring manual labeling participants selected by a user side, sending a data labeling task invitation notice to a participant client side, and receiving labeling data of a first sample set to be labeled sent by the participant client side;
the step (32) further includes:
(41) Judging whether all labeling in the first sample set to be labeled is completed, if so, completing the labeling, namely determining that the sample data in the first sample set to be labeled is all valid, otherwise, entering a step (42);
(42) Executing a fourth labeling method, wherein a third labeling sample set is formed by the samples which are labeled in the first sample set to be labeled, a fourth sample set to be labeled is formed by the samples which are not labeled, the sample similarity of the fourth sample set to be labeled and the third sample set to be labeled is judged, if the similarity is larger than a preset second threshold value, the step (44) is carried out, otherwise, the step (43) is carried out;
(43) Forming a fifth sample set to be marked by samples in a fourth sample set to be marked, wherein the similarity of the samples is not greater than a preset second threshold value, and judging the fifth sample set to be marked as an invalid sample;
(44) And forming samples in a fourth sample set to be marked, the similarity of which is greater than a preset second threshold value, into a sixth sample set to be marked, judging whether the number of the samples in the sixth sample set to be marked is greater than a preset value, if so, executing a third marking method, and otherwise, outputting the sixth sample set to be marked.
6. An electronic device, the electronic device comprising:
a memory for storing executable instructions;
the processor is used for realizing the integrated data intelligent labeling method according to any one of claims 1 to 4 when the executable instructions stored in the memory are executed.
7. A computer readable storage medium storing executable instructions which when executed by a processor implement the integrated data intelligent labeling method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110358429.8A CN113298112B (en) | 2021-04-01 | 2021-04-01 | Integrated data intelligent labeling method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110358429.8A CN113298112B (en) | 2021-04-01 | 2021-04-01 | Integrated data intelligent labeling method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113298112A CN113298112A (en) | 2021-08-24 |
CN113298112B true CN113298112B (en) | 2023-05-16 |
Family
ID=77319427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110358429.8A Active CN113298112B (en) | 2021-04-01 | 2021-04-01 | Integrated data intelligent labeling method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113298112B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035406B (en) * | 2022-06-08 | 2023-08-04 | 中国科学院空间应用工程与技术中心 | Remote sensing scene data set labeling method, remote sensing scene data set labeling system, storage medium and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033220A (en) * | 2018-06-29 | 2018-12-18 | 北京京东尚科信息技术有限公司 | Automatically selecting method, system, equipment and the storage medium of labeled data |
CN112100425A (en) * | 2020-09-17 | 2020-12-18 | 广州图普网络科技有限公司 | Label labeling method and device based on artificial intelligence, electronic equipment and medium |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101359351B (en) * | 2008-09-25 | 2010-11-10 | 中国人民解放军信息工程大学 | Multilayer semantic annotation and detection method against malignancy |
CN101782897A (en) * | 2010-03-17 | 2010-07-21 | 上海大学 | Chinese corpus labeling method based on events |
CN103581232B (en) * | 2012-07-26 | 2016-12-21 | 中国移动通信集团公司 | Web page transmission, web page display device and comprise the system of this device |
CN103024585B (en) * | 2012-12-28 | 2017-02-22 | Tcl集团股份有限公司 | Program recommendation system, program recommendation method and terminal equipment |
CN104462738B (en) * | 2013-09-24 | 2018-10-30 | 西门子公司 | A kind of methods, devices and systems of mark medical image |
EP3065357B1 (en) * | 2015-03-06 | 2019-02-20 | Juniper Networks, Inc. | Rsvp make-before-break label reuse |
CN106447028A (en) * | 2016-12-01 | 2017-02-22 | 江苏物联网研究发展中心 | Improved service robot task planning method |
CN109062950B (en) * | 2018-06-22 | 2021-11-05 | 北京奇艺世纪科技有限公司 | Text labeling method and device |
CN109087061A (en) * | 2018-07-17 | 2018-12-25 | 北京猎户星空科技有限公司 | A kind of data task distribution method, device, equipment and medium |
CN109739987B (en) * | 2018-12-29 | 2020-12-18 | 北京创鑫旅程网络技术有限公司 | Corpus labeling method, corpus construction method and apparatus |
CN110020201B (en) * | 2019-03-26 | 2021-05-25 | 中国科学院软件研究所 | User type automatic labeling system based on user portrait clustering |
CN110717317B (en) * | 2019-09-12 | 2021-06-08 | 中国科学院自动化研究所 | On-line artificial Chinese text marking system |
CN112035675A (en) * | 2020-08-31 | 2020-12-04 | 康键信息技术(深圳)有限公司 | Medical text labeling method, device, equipment and storage medium |
-
2021
- 2021-04-01 CN CN202110358429.8A patent/CN113298112B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033220A (en) * | 2018-06-29 | 2018-12-18 | 北京京东尚科信息技术有限公司 | Automatically selecting method, system, equipment and the storage medium of labeled data |
CN112100425A (en) * | 2020-09-17 | 2020-12-18 | 广州图普网络科技有限公司 | Label labeling method and device based on artificial intelligence, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113298112A (en) | 2021-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109741332B (en) | Man-machine cooperative image segmentation and annotation method | |
US11538286B2 (en) | Method and apparatus for vehicle damage assessment, electronic device, and computer storage medium | |
CN110189336B (en) | Image generation method, system, server and storage medium | |
CN112132197B (en) | Model training, image processing method, device, computer equipment and storage medium | |
CN112884758B (en) | Defect insulator sample generation method and system based on style migration method | |
CN112102323B (en) | Adhesion cell nucleus segmentation method based on generation of countermeasure network and Caps-Unet network | |
CN111738169B (en) | Handwriting formula recognition method based on end-to-end network model | |
CN111291759A (en) | Character detection method and device, electronic equipment and storage medium | |
CN110490959B (en) | Three-dimensional image processing method and device, virtual image generating method and electronic equipment | |
CN111932577B (en) | Text detection method, electronic device and computer readable medium | |
CN112949767A (en) | Sample image increment, image detection model training and image detection method | |
CN113515655A (en) | Fault identification method and device based on image classification | |
CN112668638A (en) | Image aesthetic quality evaluation and semantic recognition combined classification method and system | |
CN113064995A (en) | Text multi-label classification method and system based on deep learning of images | |
CN112329605B (en) | City appearance random pasting and random drawing behavior identification method, storage device and server | |
CN111898544B (en) | Text image matching method, device and equipment and computer storage medium | |
CN113762303A (en) | Image classification method and device, electronic equipment and storage medium | |
CN114241495B (en) | Data enhancement method for off-line handwritten text recognition | |
CN113298112B (en) | Integrated data intelligent labeling method and system | |
CN114581789A (en) | Hyperspectral image classification method and system | |
CN114328942A (en) | Relationship extraction method, apparatus, device, storage medium and computer program product | |
CN117635935A (en) | Lightweight unsupervised self-adaptive image semantic segmentation method and system | |
CN112560668A (en) | Human behavior identification method based on scene prior knowledge | |
CN114693554B (en) | Big data image processing method and system | |
CN115359468A (en) | Target website identification method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |