CN108960297A - Mask method, annotation equipment, equipment and the storage medium of picture - Google Patents

Mask method, annotation equipment, equipment and the storage medium of picture Download PDF

Info

Publication number
CN108960297A
CN108960297A CN201810618773.4A CN201810618773A CN108960297A CN 108960297 A CN108960297 A CN 108960297A CN 201810618773 A CN201810618773 A CN 201810618773A CN 108960297 A CN108960297 A CN 108960297A
Authority
CN
China
Prior art keywords
picture
classification
mark
labeler
annotation results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810618773.4A
Other languages
Chinese (zh)
Other versions
CN108960297B (en
Inventor
刘世权
刘弘也
苏驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd, Beijing Kingsoft Cloud Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN201810618773.4A priority Critical patent/CN108960297B/en
Publication of CN108960297A publication Critical patent/CN108960297A/en
Application granted granted Critical
Publication of CN108960297B publication Critical patent/CN108960297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a kind of mask method of picture, annotation equipment, equipment and storage mediums, and wherein method includes: the picture for obtaining multiple classifications to be marked, and plurality of pictures is divided into several pieces;Every part of picture after division is distributed at least two labelers;For every part of picture after division, the annotation results data of at least two labelers are obtained;For every picture in every part of picture, compare whether pre- mark classification of the picture in the annotation results data of at least two labelers be identical, determines the quantity of pre- mark classification in every part of picture in the annotation results data of at least two labelers picture all the same;The mark classification of picture in this part of picture is determined based on the quantity of identified pre- mark classification picture all the same and the proportionate relationship of this part of picture total quantity for every part of picture after division.The mask method of picture provided in an embodiment of the present invention can be improved the mark quality of the mark classification of picture.

Description

Mask method, annotation equipment, equipment and the storage medium of picture
Technical field
The present invention relates to machine learning techniques fields, more particularly to the mask method, annotation equipment, equipment of a kind of picture And storage medium.
Background technique
With the prevalence of net cast, the harmful contents such as a large amount of vulgar pornographic have been supervened in net cast content, Therefore, it is necessary to effectively supervise to live video content.Currently, respectively live streaming platform generally passes through manpower manually in live streaming Appearance is supervised, configuration can up to hundreds of people supervision team, pass through team's direct broadcasting room of patrolling and identify bad live content. But this supervision method cost is huge and inefficiency.It, can benefit with the continuous development of artificial intelligence and machine learning techniques Realize that machine distinguishes video content automatically with depth learning technology.
The deep learning of machine needs to prepare the training sample of a large amount of high quality mark, is with above-mentioned net cast supervision Example, needs to prepare a large amount of direct broadcasting room screenshot, furthermore, it is desirable to these a large amount of direct broadcasting room screenshots titled with accurate content mark Label, that is, be labeled, for example, screenshot can be labeled according to normal, vulgar and pornographic three kinds of classifications, by cutting after mark Figure carries out learning training as training sample to obtain content monitoring model.It is available when being supervised to live content Direct broadcasting room screenshot, by screenshot input content monitoring model, it is corresponding normal, vulgar or pornographic that model will export the direct broadcasting room screenshot Label, relative to artificial supervision, can effectively reduce cost supervisory efficiency to distinguish the classification of live content.In order to have Effect guarantees the accuracy of content monitoring model, i.e. output accuracy, needs to be effectively ensured the mark accuracy of training sample.
At present when being labeled to sample, common way is to recruit a large amount of employees, learns the judgement mark of Various types of data Standard carries out simple duplicate manpower mark, the timing selective examination of administrator is equipped with, mark quality is effectively ensured.But artificial mark Note inevitably error, and the level for marking personnel is irregular, and for being up to the data bulk of millions of ranks, administrator is artificial When screening out the picture for wherein marking classification mistake, it is easy to generate omission, it is lower that samples pictures mark quality.
Summary of the invention
The mask method for being designed to provide a kind of picture, annotation equipment, equipment and the storage medium of the embodiment of the present invention, To effectively improve the mark quality of samples pictures.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of mask methods of picture, comprising:
Plurality of pictures is divided into several pieces by the picture for obtaining multiple classifications to be marked;
Every part of picture after division is distributed at least two labelers;
For every part of picture after division, the annotation results data of at least two labeler, each labeler are obtained Annotation results data in, carry pre- mark classification corresponding with each picture in this part of picture;
For every picture in every part of picture, the picture is compared in the mark knot of at least two labeler Whether the pre- mark classification in fruit data is identical, determines in every part of picture in the annotation results of at least two labeler The quantity of pre- mark classification in data picture all the same;
For every part of picture after division, based on the identified pre- quantity for marking classification picture all the same and it is somebody's turn to do The proportionate relationship of part picture total quantity, determines the mark classification of picture in this part of picture.
Optionally, every part of picture for after dividing, based on identified pre- mark classification figure all the same The proportionate relationship of the quantity of piece and this part of picture total quantity determines the mark classification of picture in this part of picture, comprising:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same with The ratio between this part of picture total quantity is greater than or equal to the first preset threshold, by the pre- mark of pre- mark classification picture all the same Classification is determined as the mark classification of the picture.
Optionally, the pre- mark classification of the picture that the pre- mark classification is all the same is determined as the mark of the picture After infusing classification, the method also includes:
It is marked again to the different picture of classification is marked described in this part of picture in advance.
Optionally, the different picture of classification is marked described in described pair of this part of picture in advance to be marked again, comprising:
The different picture of classification will be marked in advance described in this part of picture distributes to the second labeler;
It obtains the annotation results data of second labeler, in the annotation results data of second labeler, takes The different picture of the pre- mark classification determine with second labeler, described marks classification again;
It is that second labeler is determined, the different picture of the pre- mark classification mark again classification be determined as it is described The mark classification of the different picture of pre- mark classification.
Optionally, every part of picture for after dividing, based on identified pre- mark classification figure all the same The proportionate relationship of the quantity of piece and this part of picture total quantity determines the mark classification of picture in this part of picture, comprising:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same with The ratio between this part of picture total quantity deletes the mark of at least two labeler corresponding with this part of picture less than the first preset threshold Result data is infused, and the picture in this part of picture is marked again.
Optionally, the labeler includes at least three;
Every part of picture for after dividing, based on the quantity of identified pre- mark classification picture all the same With the proportionate relationship of this part of picture total quantity, the mark classification of picture in this part of picture is determined, comprising:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same, With the ratio between this part of picture total quantity less than the first preset threshold, obtain two in the annotation results data of at least three labeler The similarity of a annotation results data;
In the annotation results data of at least three labeler, there are two that similarity is higher than the second preset threshold When annotation results data:
Determine the highest two annotation results data of similarity;
By the pre- mark of the picture in the highest two annotation results data of the similarity, with identical pre- mark classification Classification is determined as the mark classification of the picture;
Wherein, the similarity of two annotation results data are as follows: this part of picture has identical in two annotation results data The quantity of the picture of pre- mark classification and the ratio of this part of picture number.
Optionally it is determined that the picture of mark classification is used as the training sample of machine learning model.
Optionally, the method also includes:
The machine learning model is obtained to the class prediction of predetermined pictures as a result, carrying in the class prediction result The classification of the predetermined pictures of the machine learning model prediction;
For the classification of prediction described in each, the predetermined pictures of the classification are distributed at least one third mark Person obtains at least one described third labeler to the veritification result of the classification of the prediction of the predetermined pictures;
According at least one described third labeler to the veritification of the classification of the prediction of the predetermined pictures as a result, really The classification of the fixed prediction is the predetermined pictures of correct classification, and the correct classification is determined as to the mark class of the predetermined pictures Not.
Optionally, the mark classification is that the predetermined pictures of the correct classification are used as the training of the machine learning model Sample.
Optionally, the third labeler includes at least two;
The veritification knot of at least one third labeler to the classification of the prediction of the predetermined pictures according to Fruit determines that the classification of the prediction is the predetermined pictures of correct classification, comprising:
When veritification result of at least two thirds labeler to the classification of the prediction of the predetermined pictures is When correct, determine that the classification of the prediction is correct classification;
Alternatively,
When the veritification result of the classification of the prediction to the predetermined pictures of at least two thirds labeler is When correctly ratio of the veritification result in total veritification result of at least two thirds labeler reaches third predetermined threshold value, The classification for determining the prediction is correct classification.
Second aspect, the embodiment of the invention provides a kind of annotation equipments of picture, comprising:
Plurality of pictures is divided into several pieces for obtaining the picture of multiple classifications to be marked by the first acquisition module;
Distribution module, for every part of picture after dividing to be distributed at least two labelers;
Second obtains module, for obtaining the mark knot of at least two labeler for every part of picture after dividing Fruit data in the annotation results data of each labeler, carry pre- mark class corresponding with each picture in this part of picture Not;
Contrast module, for comparing the picture described at least two for every picture in every part of picture Whether the pre- mark classification in the annotation results data of labeler is identical, determines in every part of picture at least two mark The quantity of pre- mark classification in the annotation results data of note person picture all the same;
Determining module, every part of picture for being directed to after dividing are all the same based on the identified pre- mark classification The proportionate relationship of the quantity of picture and this part of picture total quantity determines the mark classification of picture in this part of picture.
Optionally, the determining module, is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same with The ratio between this part of picture total quantity is greater than or equal to the first preset threshold, by the pre- mark of pre- mark classification picture all the same Classification is determined as the mark classification of the picture.
Optionally, described device further include:
Labeling module, for being marked again to marking the different picture of classification described in this part of picture in advance.
Optionally, the labeling module, comprising:
Distribution sub module distributes to the second labeler for will mark in advance the different picture of classification described in this part of picture;
Acquisition submodule, for obtaining the annotation results data of second labeler, the mark of second labeler In result data, carry the different picture of the pre- mark classification that second labeler determines, described marks classification again;
Determine submodule, the picture different for the pre- mark classification that determine second labeler, described is marked again Note classification is determined as the mark classification of the different picture of the pre- mark classification.
Optionally, the determining module, is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same with The ratio between this part of picture total quantity deletes the mark of at least two labeler corresponding with this part of picture less than the first preset threshold Result data is infused, and the picture in this part of picture is marked again.
Optionally, the labeler includes at least three;
The determining module, is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same, With the ratio between this part of picture total quantity less than the first preset threshold, obtain two in the annotation results data of at least three labeler The similarity of a annotation results data;
In the annotation results data of at least three labeler, there are two that similarity is higher than the second preset threshold When annotation results data:
Determine the highest two annotation results data of similarity;
By the pre- mark of the picture in the highest two annotation results data of the similarity, with identical pre- mark classification Classification is determined as the mark classification of the picture;
Wherein, the similarity of two annotation results data are as follows: this part of picture has identical in two annotation results data The quantity of the picture of pre- mark classification and the ratio of this part of picture number.
Optionally, the picture that the mark classification has been determined is used as the training sample of machine learning model.
Optionally, described device further include:
Third obtains module, for obtaining the machine learning model to the class prediction of predetermined pictures as a result, the class The classification of the predetermined pictures of the machine learning model prediction is carried in other prediction result;
The distribution module, is also used to: for the classification of prediction described in each, the predetermined pictures of the classification being distributed At least one third labeler is given, obtains at least one described third labeler to the classification of the prediction of the predetermined pictures Veritification result;
The determining module, is also used to: according at least one described third labeler to the described pre- of the predetermined pictures The veritification of the classification of survey is as a result, the classification of the determination prediction is the predetermined pictures of correct classification, by the correct classification determination For the mark classification of the predetermined pictures.
Optionally, the mark classification is the predetermined pictures of the correct classification, the instruction as the machine learning model Practice sample.
Optionally, the third labeler includes at least two;
The determining module, is specifically used for: when at least two thirds labeler is to the described pre- of the predetermined pictures When the veritification result of the classification of survey is correct, determine that the classification of the prediction is correct classification;
Alternatively,
When the veritification result of the classification of the prediction to the predetermined pictures of at least two thirds labeler is When correctly ratio of the veritification result in total veritification result of at least two thirds labeler reaches third predetermined threshold value, The classification for determining the prediction is correct classification.
The third aspect the embodiment of the invention provides a kind of tagging equipment of picture, including processor and machine readable is deposited Storage media, the machine readable storage medium are stored with the machine-executable instruction that can be executed by the processor, the place Reason device executes the machine-executable instruction to realize the method and step of the mask method of the picture of above-mentioned first aspect offer.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Dielectric memory contains computer program, when the computer program is executed by processor, realizes the figure that above-mentioned first aspect provides The method and step of the mask method of piece.
5th aspect, the embodiment of the invention also provides a kind of computer program products comprising instruction, when it is being calculated When being run on machine, so that computer executes the method and step of the mask method for the picture that above-mentioned first aspect provides.
6th aspect, the embodiment of the invention also provides a kind of computer programs, when run on a computer, so that Computer executes the method and step of the mask method for the picture that above-mentioned first aspect provides.
Mask method, annotation equipment, equipment and the storage medium of a kind of picture provided in an embodiment of the present invention, will be acquired Multiple classifications to be marked picture be divided into more parts after, then every part of picture is distributed at least two labelers and is manually marked Note obtains the corresponding annotation results data of each labeler, and then for each picture in every part of picture, compares this figure Whether pre- mark classification of the piece in the corresponding annotation results data of different labeled person be identical, and determines that pre- mark classification is identical Picture number, then the proportionate relationship based on pre- mark classification identical picture number and this part of picture total quantity, determine that this part is schemed The mark classification of picture in piece.The mask method of picture provided in an embodiment of the present invention, since every part of picture is by multiple labelers Mark, and the mark classification of every part of picture is obtained by the annotation results to multiple labelers, therefore can effectively improve every part The accuracy of the mark classification of picture, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine most Whole mark classification, therefore, annotation results do not determine the final mark classification of picture, and problem the problem of single labeler Annotation results can effectively be found, horizontal irregular labeler can be made to work at the same time and picture mark is effectively ensured The accuracy of classification effectively improves the mark quality of picture.Certainly, it implements any of the products of the present invention or method must be needed not necessarily To reach all the above advantage simultaneously.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of the mask method of picture provided in an embodiment of the present invention;
Fig. 2 is the flow diagram that the picture different to mark classification is marked again;
Fig. 3 is another flow diagram of the mask method of picture provided in an embodiment of the present invention;
Fig. 4 is the flow diagram that machine learning model is trained and obtained to machine learning model;
Fig. 5 is the schematic diagram of subtask dividing condition in the mask method of picture provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of the annotation equipment of picture provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of labeling module in the embodiment of the present invention;
Fig. 8 is another structural schematic diagram of the annotation equipment of picture provided in an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of the tagging equipment of picture provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Nearly 2 years, as mobile live streaming class application buds out into popularity rapidly, many vulgar contents of pornographic are supervened, therefore Especially urgent and important is become to the supervision of live content.Each live streaming platform is configured with the supervision team of hundreds of people thus, Each direct broadcasting room is checked by manpower, but this artificial monitoring and managing method cost is huge, inefficiency.
On the one hand, with the promotion of deep learning theory and server computational power, machine is carried out using depth learning technology It is gradually feasible that device judges automatically live content.Establish the live streaming supervisory systems based on deep learning model development prerequisite be There is the high quality labeled data of a large amount of (such as millions of ten million), using high quality labeled data as samples pictures, thus right Machine learning model is trained.
Specifically, artificial judgment can be carried out to each direct broadcasting room screenshot in advance, is marked for each picture different Classification, for example, normally, vulgar, pornographic classification, after obtaining samples pictures, then samples pictures input machine learning model instructed Practice.
However, existing artificial mask method, since the professional skill of each labeler is different, the mark of obtained samples pictures Note quality is also irregular, and for the picture number of millions of ranks, the mask method of existing picture is difficult to screen out its acceptance of the bid The samples pictures that note classification and the picture concrete class are not inconsistent.If the samples pictures for marking classification mistake are inputted machine learning Model certainly will influence the training result of machine learning model, reduce machine learning model prediction accuracy other for picture category, And then influence the result of live content supervision.
On the other hand, the existing labeling system for manually being marked to picture generally uses B/S (Browser/ Server, browser/server) framework exploitation, use front and back end isolation technics, that is, headend equipment uses HTML (Hyper Text Markup Language, hypertext markup language) language, CSS (Cascading Style Sheet, cascading style sheets Single language), the technologies such as JavaScript (always literal translate formula scripting language) realize that login page, picture mark the page, picture mark Infuse the exploitation of history page;By Ajax (Asynchronous Javascript And XML, asynchronous JavaScript and XML) technology and back-end server carry out data communication, wherein XML refers to Extensible Markup Language, Ji Kekuo Open up markup language;Back-end server is related using Java (a kind of programming language), a kind of Python (programming language) technological development RESTful interface, can be with configuration database to be labeled the persistent storage of class label.
But it is this based on manually-operated labeling system, picture mark work is all by being accomplished manually, labeling system sheet Body is only responsible for the transmission and preservation of the displaying of picture, annotation results data, and working efficiency is low.For example, a skilled picture Labeler, it has been very outstanding level that darg amount, which can reach 5000, and a base that there is actual use to be worth 5,000,000 parts are at least needed in the training data that the labeling system of deep learning needs, is the workload of 10 people, 5 calendar months. Also, the labeler of different qualifications is widely different to the determined level of picture concrete class, will lead to mark and completes picture There are a large amount of error categories marks.It is existing to be based on manually-operated labeling system, it is showed on annotating efficiency and mark quality It is bad, mistake can only be reduced to the greatest extent by a large amount of selective examination of administrator, greatly increase business administration cost.
Have in view of that, the embodiment of the invention provides a kind of mask methods of picture.It is described in detail below.
The mask method of picture provided in an embodiment of the present invention is carried out for supervision is broadcast live although be emphasized that Illustrate, but is only that preferably description is of the invention, rather than limits the present invention, the mask method of picture provided in an embodiment of the present invention It can apply in the field that any need is labeled picture, also, the picture after mark is not limited to use in machine learning mould The training sample of type can be also used for other purposes.
Embodiment of the method 1
As shown in Figure 1, specifically can be applied to server the embodiment of the invention provides a kind of mask method of picture, It can certainly be applied to the other kinds of equipment with data processing operation function, be said by taking server as an example below Bright, which may comprise steps of:
S101, obtains the picture of multiple classifications to be marked, and plurality of pictures is divided into several pieces.
In the embodiment of the present invention, the available plurality of pictures of server.It is used as machine learning model with the icon after marking Training sample, the machine learning model be used as live video content class prediction for, these pictures need to be direct broadcasting room The screenshot of live video, specifically, these pictures may include a screenshot for live video between each net cast, or Person includes multiple screenshots that the live video between a net cast intercepts at multiple time points.Accordingly, due to acquired Picture marks classification not yet, therefore can be referred to as the picture of classification to be marked.
After obtaining above-mentioned picture, these pictures are divided into more parts, for the ease of processing, can usually be drawn these pictures It is divided into multiple equal portions.Illustratively, after server obtains the pictures of 100,000 classifications to be marked, this 100,000 picture can be drawn It is divided into 100 equal portions, to make every part to include 1000 pictures.When being divided to above-mentioned 100,000 picture, generally can be It is 100 equal portions by 100,000 picture random divisions.
Every part of picture after division is distributed at least two labelers by S102.
After picture is divided into several pieces, every a picture can be distributed into more than two different labelers, thus It is labeled labeler to each picture in every a picture.It should be noted that in distribution, it can be by every part of picture point At least two labeler of dispensing, that is, at least available two labelers are for each in this part of picture for same a picture The annotation results of picture.
It should be noted that being directed to the picture of this multiple classification to be marked, multiple labelers can be configured, and every portion is schemed Piece distributes to two or more labelers, for example, the picture of 100,000 classifications to be marked can distribute 3 labelers, Respectively tri- labelers of A, B, C.This 100,000 picture is divided into 100 equal portions, every part of picture distributes to 2 labelers, that is, Different part pictures, the labeler distributed can be identical labeler, it is also possible to labeler are different, for example, first part Picture distributes to A, B, and second part of picture distributes to B, C, and third part picture distributes to A, B ....
Server end can record the corresponding labeler mark of this part of picture after a picture is distributed to certain labeler Know, identified further according to the labeler of this part of picture, by this part of picture distribute to the labeler recorded identify it is different in addition Labeler.
It is understood that labeler can log in mark interface on headend equipment, server carries out the distribution of picture, Every part of picture is sent to headend equipment, that is, is sent to labeler, after a labeler completes the mark of a picture, server Another picture is sent to the labeler.For same a picture, server can send it at least two different marks Note person.
S103 obtains the annotation results data of at least two labelers for every part of picture after division.Wherein, each In the annotation results data of labeler, pre- mark classification corresponding with each picture in this part of picture is carried.
In the embodiment of the present invention, every part of picture can be sent at least two headend equipments by server, to make at least Two labelers can carry out classification mark to each picture in this part of picture by headend equipment, and annotation results data are returned Back to server, server can obtain the annotation results data of at least two labelers for this part of picture.
Wherein, it in the annotation results data of each labeler, can carry corresponding pre- with each picture in this part of picture Classification, the i.e. classification of every picture in this part of picture determined by labeler are marked, that is, an annotation results data i.e. one The annotation results data of a labeler contain a pre- mark classification corresponding to every picture in this part of picture.
It is exported due to the annotation results of the subsequent multiple labelers for needing every part of picture of comprehensive analysis each in every part of picture Therefore each labeler is known as pre- mark classification for the classification that every picture is marked by the mark classification of picture.
Every part of picture is known as a subtask, in practical application, several marks can be distributed into a mark subtask The corresponding mark subtask of above-mentioned portion picture can be sent to more headend equipments by note person, specifically, server, for example, One mark subtask is sent to 3 headend equipments, each headend equipment distributes a labeler.3 headend equipments receive Behind the mark subtask, the picture to be marked in the mark subtask is shown in display interface, each labeler is to the mark Picture in subtask carries out classification mark, and after the completion of labeler marks, 3 headend equipments are respectively by respective annotation results number According to server is transmitted to, server can receive and store 3 annotation results data for this part of picture to database, remains Subsequent analysis processing.
S104 compares the picture in the annotation results of at least two labelers for every picture in every part of picture Whether corresponding pre- mark classification is identical in data, determines in every part of picture in the annotation results data of at least two labelers Pre- mark classification picture all the same quantity.
Since each annotation results data carry the corresponding pre- mark classification of every picture in this part of picture, and each mark It infuses result data and completion is marked by different labelers, therefore for wherein any one picture of this part of picture, it may Different classifications is labeled as by different labelers, that is, different pre- mark classifications may be corresponded to.Illustratively, for one Part picture, is marked, then 3 parts of annotation results data are respectively annotation results data A, annotation results data B and mark by 3 labelers Infuse result data C, the picture a in this part of picture, the pre- mark classification in annotation results data A be it is normal, in annotation results Pre- mark classification in data B be it is vulgar, the pre- mark classification in annotation results data C is normal.
Based on above content, after server receives the annotation results data at least two labelers of every part of picture, It can corresponding pre- mark classification carries out pair in the annotation results data of above-mentioned at least two labeler by the picture Than to confirm whether the pre- mark classification of multiple labelers mark of the picture is identical.
Be readily appreciated that, for every a picture, in comparing this part of picture multiple pre- mark classifications of each picture whether phase With after, plurality of pre- mark classification picture all the same can be filtered out, and determines multiple pre- mark classifications figure all the same Piece quantity.
For example, totally 1000, a picture are marked by 3 labelers, wherein 800,3 labelers are labeled as Normally, other 200, the result of 3 labelers mark is not exactly the same, i.e. in this part of picture, pre- mark classification is all the same Picture is 800.
S105, for every part of picture after division, based on the quantity of identified pre- mark classification picture all the same with The proportionate relationship of this part of picture total quantity determines the mark classification of picture in this part of picture.
In the embodiment of the present invention, for every part of picture after dividing, it is thus necessary to determine that the mark class of each picture in this part of picture Not.It is easily understood that if identified multiple pre- quantity and this part of picture total quantity for marking classification picture all the same Ratio it is higher, show that several labelers more reach unanimity to the annotation results of this part of picture, correctly marked in this part of picture Picture it is more, i.e. in this part of picture with high quality mark in advance classification picture it is also more.That is, the embodiment of the present invention Can by the quantity of identified multiple pre- mark classifications picture all the same and the proportionate relationship of this part of picture total quantity, from And determine the mark classification of each picture in this part of picture.
It should be noted that every part of picture after above-mentioned division, can be plurality of pictures it is divided be appointing after several pieces A picture of meaning.Specifically, during determining that picture marks classification, a picture can be randomly selected, it can also be by suitable Sequence chooses each part picture, determines as long as guaranteeing that every a picture passes through, specific to choose the mode present invention without limitation.
By foregoing teachings of the embodiment of the present invention it is found that multiple pre- mark classifications of every picture in every part of picture and different It is fixed all the same, also, if there is a situation where that the different picture number of multiple pre- mark classifications is excessive in a copy of it picture, table The mark of multiple labelers of bright this part of picture is widely different.
As a kind of optional embodiment of the embodiment of the present invention, in order to determine the mark class of each picture in every part of picture , not can be set the first preset threshold, and judge pre- mark classification picture all the same quantity and this part of picture total quantity it Than the size relation with the first preset threshold, if in advance the quantity with this part of picture total quantity of mark classification picture all the same it Than showing the picture for having enough pre- mark classifications all the same in this part of picture more than or equal to the first preset threshold, because The pre- mark classification of pre- mark classification picture all the same can be determined as the corresponding mark classification of each picture by this.
Illustratively, when the first preset threshold is set as 90%, it is assumed that in every part of picture, a total of 1000 picture, Wherein, the quantity of the pre- mark classification of multiple labelers of this part of picture picture all the same is 950, then mark classification is equal in advance The ratio between the quantity of identical picture and this part of picture total quantity are 950:1000, and being scaled percentage is 95%, which is greater than First preset threshold 90%, therefore can be by this part of picture, the pre- pre- mark classification for marking classification picture all the same determines For the corresponding mark classification of these pictures.Assuming that the pre- mark classifications of the identical picture of 950 pre- mark classifications be normal, then general This 950 picture is determined as normal picture.
It is easily understood that most of picture can determine that it marks classification, but also for this part of picture of above-mentioned example There is a small amount of picture not to be determined mark classification, this is because caused by pre- mark classification corresponding to these pictures is different, For example, 950 have determined that classification, and 50 furthermore due to marking classification in advance in 1000 pictures of above-mentioned 1 part of picture Not exactly the same, some labelers are labeled as normally, some labelers are labeled as pornographic, some labelers be labeled as it is vulgar, because This can carry out the different picture of these pre- mark classifications again as a kind of optional embodiment of the embodiment of the present invention Mark.For example, in the way of as before, the picture for needing to mark again is divided into several pieces, is divided again The multiple and different labeler of dispensing is labeled, and then analyzes the mark classification of determining picture after the result of these labelers.
Specifically, as shown in Fig. 2, the above-mentioned process marked again to the different picture of mark classification specifically can be with Are as follows:
S201 will mark in advance the different picture of classification and distribute to the second labeler in this part of picture.
In the embodiment of the present invention, server will can mark in advance the different picture of classification and distribute to the second mark in this part of picture Note person, so that the second labeler be made to be labeled above-mentioned picture again.
Labeler can be classified, be divided into mark person and administrator, mark person marks picture for the first time, management Member can check the history labeled data of each mark person, can also spot-check the annotation results of each mark person, can also be for The annotation results of conflict veritify and finally confirm.
Wherein, above-mentioned second labeler specifically can be administrator.Server can mark the different picture of classification for pre- It is sent to administrator, the annotation results of multiple labelers of the picture can be carried, administrator can confirm the classification of picture, and defeated Enter his annotation results data, in the present embodiment, mark classification of second labeler for picture is referred to as marked into classification again.
S202 obtains the annotation results data of the second labeler.
Available second labeler of server is directed to the annotation results data of the different picture of above-mentioned pre- mark classification, the In the annotation results data of two labelers, can carry picture that the second labeler determines, above-mentioned marks classification again, That is, the picture different for above-mentioned pre- mark classification, mark classification of the available each picture of server after marking again.
The classification that marks again of S203, the different picture of mark classification that the second labeler is determined, pre- are determined as pre- mark The mark classification of the different picture of classification.
Server obtain the different picture of above-mentioned pre- mark classification mark classification again after, classification can be marked again by above-mentioned It is determined as the mark classification of the different picture of above-mentioned pre- mark classification, so that it is determined that these pre- marks for marking the different picture of classifications Infuse classification.
As a kind of optional embodiment of the embodiment of the present invention, there are a kind of possibilities, that is, a copy of it divided In picture, the ratio between quantity and this part of picture total quantity of identified pre- mark classification picture all the same are less than the first default threshold Value, shows that the annotation results of multiple labelers of this part of picture differ greatly, and does not have enough pre- marks in this part of picture Classification picture all the same, at this time, it may be necessary to be labeled again to the picture in this part of picture.Before marking again, it can delete Except the annotation results data of previous at least two labelers corresponding with this part of picture, to save memory space, then by this part Picture is distributed at least two labelers and is marked again, which can be with the first mark of this part of picture Person is different or part is identical or identical.In the mark for reacquiring at least two labelers for this part of picture After result data, above-mentioned steps S104 can be re-executed, and based on the number of identified pre- mark classification picture all the same The proportionate relationship of amount and this part of picture total quantity, redefines the mark classification of picture in this part of picture.
Illustratively, the first preset threshold is still set as 90%, it is assumed that in every part of picture, a total of 1000 picture, In, the pre- quantity for marking classification picture all the same is 850, then the quantity of mark classification picture all the same and the part are schemed in advance The ratio between piece total quantity is 850:1000, and being scaled percentage is 85%, which, then can be right less than the first preset threshold 90% Picture in this part of picture is marked again, and reacquires the annotation results data of at least two labelers.
In the annotation results data of reacquisition, the pre- quantity for marking classification picture all the same is 960, then pre- mark The ratio between quantity and this part of picture total quantity of note classification picture all the same are 960:1000, and being scaled percentage is 96%, because This can be by this part of picture, and the pre- pre- mark classification for marking classification picture all the same is determined as the corresponding mark of each picture Classification.
Optionally, if at least two annotation results data of above-mentioned reacquisition, identified pre- mark classification is homogeneous The ratio between quantity and this part of picture total quantity of same picture can mark again again still less than the first preset threshold, and obtaining should At least two annotation results data of part picture;Alternatively, at least two annotation results numbers for directly reacquiring this part of picture According to the second labeler, i.e. administrator is distributed to, administrator is transferred to check;Alternatively, directly giving up this part of picture, to reduce Calculation amount, to improve the annotating efficiency of picture.
As another optional embodiment of the embodiment of the present invention, there are a kind of possibilities, that is, for every after dividing Part picture, this part of picture are marked by least three labelers, although the identified pre- quantity for marking classification picture all the same With the ratio between this part of picture total quantity less than the first preset threshold, but the partial results data in annotation results data, for example, wherein Two similarities are very high, and such case shows that labeler corresponding with this part annotation results data should have high mark Water filling is quasi-, therefore might as well be using the two annotation results data as reliable annotation results data, so that it is determined that this part of picture exists In the highest annotation results data of the two similarities, the picture with identical pre- mark classification is samples pictures.
For each annotation results number can be compared two-by-two in the annotation results data of at least three labelers of this part of picture According to so that it is determined that the similarity between every two annotation results data.Wherein, the similarity of two annotation results data can be with Refer to: this part of picture is in two annotation results data, the quantity and this part of picture number of the picture with identical pre- mark classification Ratio.Meanwhile second preset threshold can be set, when there are the similarities of two of them annotation results data to be higher than the When the case where two preset thresholds, the highest two annotation results data of similarity can be determined.
That is, in the annotation results data of at least three labelers of portion picture of the embodiment of the present invention, Ke Nengcun Similarity after comparing two-by-two is higher than the annotation results data of the second preset threshold.Such as a picture has 3 parts of annotation results Data, respectively annotation results data A, annotation results data B and annotation results data C, the second preset threshold are 90%, wherein Similarity between annotation results data A and annotation results data B is 91%, is greater than the second preset threshold;Annotation results data A Similarity between annotation results data C is 92%, is greater than the second preset threshold;Annotation results data B and annotation results number It is 88% according to the similarity between C, less than the second preset threshold;Can then choose the highest annotation results data B of similarity and Annotation results data C, thus in annotation results data B and annotation results data C, by the picture with identical pre- mark classification Pre- mark classification be determined as the corresponding mark classification of the picture.
There is also a kind of possibilities, if compared two-by-two in the annotation results data of at least three acquired labelers Similarity after relatively is above the second preset threshold, and the highest annotation results data of similarity have multiple groups, for example, a picture There are 3 parts of annotation results data, respectively annotation results data D, annotation results data E and annotation results data F, the second default threshold Value is 90%, and wherein the similarity between annotation results data D and annotation results data E is 91%, is greater than the second preset threshold; Similarity between annotation results data D and annotation results data F is 91%, is greater than the second preset threshold;Annotation results data E Similarity between annotation results data F is 88%, less than the second preset threshold;Annotation results data D and mark knot at this time It is default that similarity between similarity between fruit data E, with annotation results data D and annotation results data F is all larger than second Threshold value and equal, then can choose the highest wherein one group of annotation results data of similarity, will be with identical pre- mark classification The pre- mark classification of picture is determined as the corresponding mark classification of the picture, can also reacquire at least three for this part of picture The labeled data of a labeler.After reacquiring annotation results data, above-mentioned steps S104 can be re-executed, and really based on institute The quantity of fixed pre- mark classification picture all the same and the proportionate relationship of this part of picture total quantity, redefine in this part of picture The mark classification of picture.
As another optional embodiment of the embodiment of the present invention, the highest two annotation results numbers of similarity are being determined According to rear, the highest annotation results data of the two similarities can also be stored to database, when needing to check samples pictures When marking quality, transferred in the future convenient for administrator.
A kind of mask method of picture provided in an embodiment of the present invention draws the picture of multiple acquired classifications to be marked After being divided into more parts, then every part of picture is distributed at least two labelers and is manually marked, it is corresponding to obtain each labeler Annotation results data, and then for each picture in every part of picture, the picture is compared in the corresponding mark of different labeled person Whether the pre- mark classification infused in result data is identical, and determines the identical picture number of pre- mark classification, then based on pre- mark The proportionate relationship of classification identical picture number and this part of picture total quantity determines the mark classification of picture in this part of picture.This The mask method for the picture that inventive embodiments provide, since every part of picture is marked by multiple labelers, and by multiple marks The annotation results of person obtain the mark classification of every part of picture, thus can effectively improve every part of picture mark classification it is accurate Property, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine final mark classification, therefore, singly The problem of a labeler, annotation results did not determine the final mark classification of picture, and problem annotation results can effectively be sent out It is existing, horizontal irregular labeler can be made to work at the same time and be effectively ensured the accuracy of picture mark classification, therefore energy Enough effectively improve the mark quality of the mark classification of picture.
Embodiment of the method 2
As shown in figure 3, can be applied to server the embodiment of the invention also provides a kind of mask method of picture, when So it also can be applied to the other kinds of equipment with data processing operation function, which may comprise steps of:
The picture that mark classification has been determined is used as the training sample of machine learning model, obtains machine learning mould by S301 Type.
In the embodiment of the present invention, the picture that mark classification has been determined can be used as to the training sample of machine learning model, To be trained to machine learning model, the machine learning model for predicted pictures classification is got.Wherein, above-mentioned determination The picture of mark classification, which can be, determines the picture after marking classification by the mask method of embodiment of the method 1, certainly It can be other pictures for being labeled with classification.
S302 obtains the machine learning model to the class prediction of predetermined pictures as a result, carrying in class prediction result The classification of the predetermined pictures of machine learning model prediction.
After machine learning model is trained, the classification of picture can be predicted, for example, the classification of one picture of prediction It is normal, vulgar or pornographic.In the embodiment of the present invention, multiple predetermined pictures can be inputted in machine learning model, thus Obtain the class prediction result that machine learning model is directed to above-mentioned predetermined pictures.It is easily understood that can in class prediction result To carry the classification of the above-mentioned predetermined pictures by machine learning model prediction.
The predetermined pictures of above-mentioned classification are distributed at least one third mark for the classification of each prediction by S303 Person obtains at least one third labeler to the veritification result of the classification of the prediction of predetermined pictures.
After the classification for obtaining the prediction of multiple predetermined pictures, so it is easy to understand that different predetermined pictures can have not Therefore same category of predetermined pictures can be distributed to an at least labeler by same classification, to pass through the labeler core Whether test accurate by the classification of machine learning model prediction.
In this step, the picture of each classification, such as normal picture, pornographic figure can be filtered out according to the classification of prediction Normal picture after filtering out normal picture, is distributed to third labeler and carried out by piece and vulgar picture by taking normal picture as an example It veritifies, third labeler will determine whether the picture is normal picture, and feed back veritification result.
That is, labeler is it is seen that a collection of picture with identical prediction classification, he only needs fast browsing to veritify The picture for being wherein not belonging to the category can be rejected, and be converted into only judge whether 2 classification by original more classification problems Problem.For example, labeler may belong to classification by predicting that the predetermined pictures after classification are classified as 10 classifications by machine learning model After 1 predetermined pictures have judged, judgement belongs to the Target Photo of classification 2 again, until 10 classification browsings finish.
Illustratively, existing by labeler it is now assumed that there is a collection of predetermined pictures needs to be labeled in 10 classifications The artificial method for carrying out classification mark, labeler need to examine each picture, associate the respective judgement mark of 10 classifications Standard, and then the one of classification of selection marks the classification of the picture.And after using the mask method of the embodiment of the present invention, labeler Only it need to veritify whether the classification that the picture is predicted by machine learning model is the actual classification of the picture, to reduce labeler Workload.
Optionally, mode identical with the mask method of preceding method embodiment 1 can be used, to the figure of each classification Piece is grouped processing, the picture of each classification after grouping can be given multiple third labelers and veritified, and synthesis is more The veritification result of a third labeler come determine prediction classification whether be picture correct classification, which is not described herein again.
S304, according at least one third labeler to the veritification of the classification of the prediction of predetermined pictures as a result, determining prediction Classification be correct classification predetermined pictures, correct classification is determined as to the mark classification of predetermined pictures.
Since the classification of predetermined pictures is inevitably there is the case where prediction error by machine learning model automatic Prediction, Therefore, for the classification of obtained predetermined pictures, the accuracy of the category can be manually veritified by labeler.Server can be with According to veritification as a result, the classification for determining prediction is the predetermined pictures of correct classification, and correct classification is determined as above-mentioned predetermined figure The mark classification of piece, to complete the mark to predetermined pictures.
By foregoing description content it is found that labeler will not have to one by one manually mark every predetermined pictures, but Then whether need to only veritify the classification that the predetermined pictures are predicted by machine learning model accurate, that is, by labeler by icon Infuse work conversion error correction procedure in batch, it is clear that the working efficiency of labeler can be greatlyd improve.
Specifically, when veritification result of at least two third labelers to the classification of the prediction of predetermined pictures is correct When, it can determine that the classification of prediction is correct classification;Alternatively, work as the prediction to predetermined pictures of at least two third labelers The veritification result of classification is correctly to veritify ratio of the result in total veritification result of at least two third labelers to reach the When three preset thresholds, determine that the classification of prediction is correct classification.Above-mentioned third predetermined threshold value can be according to actual mark quality It is required that being set, for example, mark quality requirement is higher, then third predetermined threshold value setting is also higher.
The mask method of picture provided in an embodiment of the present invention, by that will be determined that the picture of mark classification is used as engineering The training sample for practising model, is trained machine learning model, and the machine learning model obtained using training is to predetermined figure The classification of piece is predicted, does not need manually to be labeled by labeler again, to promote the annotating efficiency of predetermined pictures;And And for the annotation results for the predetermined pictures predicted by machine learning model, available at least one third mark of server Person to the veritification of the classification of the prediction of predetermined pictures as a result, so that it is determined that the classification of prediction is the predetermined pictures of correct classification, and Correct classification is determined as to the mark classification of predetermined pictures, to improve the mark quality of the mark classification of picture.
Further, the above-mentioned mark picture that correct classification has been determined, may be used as the training of above-mentioned machine learning model Sample is fed back into the training of the machine learning model, and the accuracy rate of model gets a promotion, the machine learning after reusing optimization Model predicts again that part figure piece, the error category of prediction can be fewer and fewer, can be repeated multiple times, for example, can counterweight again The picture newly predicted carries out veritification mark, determines the mark picture of correct classification, is re-used as training sample and feeds back to machine learning In the training of model, the predictablity rate of model is further increased, repeatedly, machine learning model can constantly be evolved, It is more intelligent.
Embodiment of the method 3
As shown in figure 4, can be applied to take the embodiment of the invention also provides a kind of acquisition methods of machine learning model Business device, naturally it is also possible to applied to the other kinds of equipment with data processing operation function.In the present embodiment, to engineering Model is practised to be trained and obtain the process of machine learning model, as shown in figure 4, can specifically include following steps:
S401 obtains the first sample picture for being labeled with classification.
First sample picture is the picture for having marked classification, for example, by labeler mark classification after and Through the picture for determining the mark classification of the picture.
Specifically, the first sample picture for being labeled with classification can be obtained by the mask method of embodiment of the method 1.
S402 is trained using the first sample picture for being labeled with classification as training sample and is obtained machine learning model.
In the embodiment of the present invention, since each first sample picture is corresponding with a mark classification, it can incite somebody to action First sample picture is trained machine learning model as training sample.Specifically, it can be used known in the art Training method obtains machine learning model using above-mentioned first sample picture training.It, can be with after the machine learning model is trained The classification of picture is predicted.
Second samples pictures are input to machine learning model, obtain the second sample of machine learning model prediction by S403 The classification of picture.
Above-mentioned second samples pictures can be the picture without containing mark classification, for example, may include for each video One screenshot of direct broadcasting room live video, or intercept including live video between a net cast at multiple time points multiple Screenshot.By the way that the second samples pictures are input to machine learning model, machine learning model can be to the class of the second samples pictures It is not predicted, to obtain the classification of the second samples pictures.
S404 determines that machine learning model predicts the second samples pictures of correct classification.
After the classification for obtaining the second samples pictures of machine learning model prediction, the classification predicted can be sentenced It is disconnected, specifically, manually the classification of the second samples pictures containing prediction classification can be judged by labeler, thus really Determine machine learning model and predicts the second samples pictures of correct classification.
Can by the mask method of embodiment of the method 2, by the second samples pictures for predicting classification distribute to labeler into Row is veritified, and receives the veritification of labeler as a result, the veritification based on labeler is as a result, determine that machine learning model predicts correctly Second samples pictures of classification.
S405 is trained using the second samples pictures for predicting correct classification as training sample and is obtained new machine learning Model.
In the embodiment of the present invention, machine can be inputted using the second samples pictures for predicting correct classification as training sample Device learning model is constantly trained machine learning model, to obtain new machine learning model, machine learning model is obtained It evolves to continuous, will become more intelligent.The accuracy rate of model gets a promotion, and reuses new machine learning model to picture weight New prediction, the error category of prediction can be fewer and fewer.
Can be repeated multiple times, for example, veritification mark can be carried out to the picture that new machine learning model is predicted again again, It determines the mark picture of correct classification, then the mark picture of the correct classification is fed back into machine learning model as training sample Training in, further increase the predictablity rate of model, repeatedly, machine learning model can constantly be evolved, more Intelligence.
Embodiment of the method 4
The embodiment of the invention also provides a kind of mask method of picture, this method process may include:
Administrator can create total mark task of 100,000 pictures in labeling system, and designated pictures can mark Classification, such as normally, vulgar and pornographic classification, picture is inputted into server.It, can should after server receives mark task Total mark task is divided into 100 subtasks, 100,000 acquired pictures is then divided into 100 equal portions, and make every equal portions Picture corresponds to a subtask, generates each subtask information.Wherein, subtask information may include subtask ID (identification number), Always mark the ID (identification number) of task belonging to subtask, the number of picture to be marked, subtask completion status, son are appointed in subtask The information such as business deadline.The corresponding picture to be marked in subtask and subtask can be sent to front end browser by server, For labeler mark.
Illustratively, 100,000 figures are distributed into 5 labelers, for every figure is marked by two labelers simultaneously, Above method process is illustrated.Wherein, each subtask includes 1000 pictures.
Step A, mark mission bit stream is saved to task list, generates unique total mark task ID.
In the step, mark mission bit stream refers to the information of entire mark task, such as the mark task of 10 pictures Information, i.e., always the information of mark task, the mark mission bit stream may include: the picture total quantity for needing to mark, the son of division Task number, total completion status for marking task and deadline.Above-mentioned markup information can be saved to existing labeling system number According in the task list (mark_task) in library, and generate the unique corresponding total mark task ID of total mark task.
Step B, it is inserted into 100 datas in each subtask table, the mark task ID of associated steps A, subtask shape at this time State be it is unfinished, subtask ID is 0,1,2,3 ... ..., 99, totally 100 subtasks.
Subtask table (sub_mark_task) is for recording relevant to subtask information, can be with after each subtask division The corresponding image data in subtask is inserted into the table of subtask, and the subtask table is associated with total mark task ID.It is easy Understand, due to not being labeled at this time to picture also, in each subtask table, the state of subtask is unfinished.
And it is possible to distribute an ID for each subtask, such as 0,1,2,3 ... ..., 99, totally 100 ID, every height Task includes 1000 pictures, and 100 subtasks are exactly 100,000 pictures in total.
Step C, 0,1,2,3 can be obtained by replicating the subtask ID of previous step one time ... ..., and 99;0,1,2 ... ..., 99, i.e., Obtain 200 sub- task IDs, above- mentioned information recorded to subtask allocation table, the same subtask respectively correspond copy_id1 and copy_id2。
Subtask allocation table (sub_mark_task_assign) is used to record the distribution information of each subtask, can wrap Containing contents such as user_id (labeler identification number), status (distribution state).By foregoing teachings it is found that needing to guarantee every figure It is marked simultaneously by two labelers, it is therefore desirable to which two labelers are distributed into each subtask.Subtask ID can be replicated One time, respectively 0,1,2,3 ... ..., 99, i.e. copy_id1;0,1,2 ... ..., 99, i.e. copy_id2, the same subtask point Copy_id1 and copy_id2 are not corresponded to, that is, ensure that each subtask is marked by two labelers.Copy_id1 and Copy_id2 produces 200 distribution ID, i.e. 200 assign id (assigned identification number) in total.
It should be noted that at this time since subtask is also not yet assigned to labeler, user_id leave a blank it is to be allocated, Status is the 0 unallocated state of expression;If the allocated labeler, correspondingly, user_id inserts labeler identification number, Status is changed to 1, and expression has distributed.
Step D, the ID of 100,000 pictures is divided into 100 parts, then by the ID duplication of above-mentioned 100 parts of pictures it is a to get 200 parts of Image IDs, will be each to duplicate Image ID in this 200 parts of Image IDs, corresponding with each subtask respectively The distribution ID of the distribution ID and copy_id2 of copy_id1 is associated, so that the picture of distribution and the subtask of distribution be made to build Vertical association.That is, corresponding two picture after a picture is replicated, is assigned to two under a subtask It in the corresponding distribution ID of copy_id, therefore is not in that corresponding two picture of a picture is assigned to two sons times Situation in business.In the embodiment of the present invention, above-mentioned 200 parts of Image IDs can be distributed into each one's share of expenses for a joint undertaking task in sequence.
Above-mentioned task allocation information is recorded to user task allocation table.User task allocation table (user_sub_mark_ Mask) for storing the task distribution condition information of labeler, each distribution ID can correspond to a labeler identification number (user_id)。
Step E, it when some labeler starts to mark a subtask, takes one to be in from the allocation table of subtask and does not divide Distribution ID with state (status=0), inserts the user_id of labeler, is updated to distribution state (status=1), simultaneously Update corresponding user_id in user task allocation table.If there is new labeler to be added in annotation process, can also equally handle.By In labeler when obtaining a subtask, the distribution state for distributing ID can change, therefore be not in that a labeler is got Same subtask twice the case where, ensure that every picture is marked by two people.
Assuming that total mark task of 100,000 pictures is divided into 3 subtasks, it is labeled there are two labeler, Then for subtask dividing condition as shown in figure 5, in figure, subtask 1,2,3 has been each assigned to labeler 1 (USER1) and labeler 2 (USER2), and each labeler will not be assigned to duplicate subtask.
A kind of mask method of picture provided in an embodiment of the present invention is appointed by creating mark in existing labeling system The mark task after server receives mark task, can be divided into multiple sons and appointed by business, the classification that designated pictures can mark Business, is then divided into multiple equal portions for acquired picture, and makes the corresponding subtask of every equal portions picture, generates each subtask Information, the corresponding picture to be marked in subtask and subtask can be sent to front end browser by server, for labeler mark Note.Labeler can be labeled picture according to different subtasks, and change the subtask state after mark, no It will appear a case where labeler gets same subtask twice, improve annotating efficiency.
It should be stressed that mark and training can be the two of a machine learning system in the embodiment of the present invention The training of a part, mask method and machine learning model is performed by the internal component of the machine learning system.Certainly, Mark part is also possible to the different components discrete from machine learning system, and machine learning system is used to utilize the picture after mark Carry out the training of model.
Corresponding to above method embodiment, embodiment that the embodiment of the present invention also provides corresponding devices.
Installation practice 1
As shown in fig. 6, the embodiment of the invention provides a kind of annotation equipments of picture, comprising:
Plurality of pictures is divided into several pieces for obtaining the picture of multiple classifications to be marked by the first acquisition module 501.
Distribution module 502, for every part of picture after dividing to be distributed at least two labelers.
Second obtains module 503, for obtaining the annotation results of at least two labelers for every part of picture after dividing Data in the annotation results data of each labeler, carry pre- mark classification corresponding with each picture in this part of picture.
Contrast module 504, for comparing the picture at least two labelers for every picture in every part of picture Annotation results data in pre- mark classification it is whether identical, determine in every part of picture in the annotation results of at least two labelers The quantity of pre- mark classification in data picture all the same.
Determining module 505, every part of picture for being directed to after dividing, based on identified pre- mark classification figure all the same The proportionate relationship of the quantity of piece and this part of picture total quantity determines the mark classification of picture in this part of picture.
Wherein, above-mentioned determining module 505, is specifically used for: if divide after a copy of it picture in, identified pre- mark The ratio between quantity and this part of picture total quantity of note classification picture all the same are greater than or equal to the first preset threshold, will mark class in advance The pre- mark classification of picture not all the same is determined as the mark classification of picture.
Wherein, above-mentioned apparatus further include:
Labeling module 506, for being marked again to marking the different picture of classification in this part of picture in advance.
Wherein, as shown in fig. 7, labeling module 506 includes:
Distribution sub module 5061 distributes to the second labeler for will mark in advance the different picture of classification in this part of picture.
Acquisition submodule 5062, for obtaining the annotation results data of the second labeler, the annotation results of the second labeler In data, carry the different picture of mark classification that the second labeler determines, pre- marks classification again.
It determines submodule 5063, marks class again for pictures that determine the second labeler, that pre- mark classification is different It is not determined as the mark classification of the different picture of pre- mark classification.
Wherein, above-mentioned determining module 505, is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same and the part The ratio between picture total quantity deletes the annotation results number of at least two labelers corresponding with this part of picture less than the first preset threshold According to, and the picture in this part of picture is marked again.
Wherein, labeler includes at least three, and above-mentioned determining module 505 is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same, with this The ratio between part picture total quantity obtains two marks knots in the annotation results data of at least three labelers less than the first preset threshold The similarity of fruit data;
In the annotation results data of at least three labelers, there are two marks that similarity is higher than the second preset threshold When result data:
Determine the highest two annotation results data of similarity;
By the pre- mark classification of the picture in the highest two annotation results data of similarity, with identical pre- mark classification It is determined as the mark classification of picture.Wherein, the similarity of two annotation results data are as follows: this part of picture is in two annotation results numbers In, the quantity of the picture with identical pre- mark classification and the ratio of this part of picture number.
A kind of annotation equipment of picture provided in an embodiment of the present invention draws the picture of multiple acquired classifications to be marked After being divided into more parts, then every part of picture is distributed at least two labelers and is manually marked, it is corresponding to obtain each labeler Annotation results data, and then for each picture in every part of picture, the picture is compared in the corresponding mark of different labeled person Whether the pre- mark classification infused in result data is identical, and determines the identical picture number of pre- mark classification, then based on pre- mark The proportionate relationship of classification identical picture number and this part of picture total quantity determines the mark classification of picture in this part of picture.This The annotation equipment for the picture that inventive embodiments provide, since every part of picture is marked by multiple labelers, and by multiple marks The annotation results of person obtain the mark classification of every part of picture, thus can effectively improve every part of picture mark classification it is accurate Property, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine final mark classification, therefore, singly The problem of a labeler, annotation results did not determine the final mark classification of picture, and problem annotation results can effectively be sent out It is existing, horizontal irregular labeler can be made to work at the same time and be effectively ensured the accuracy of picture mark classification, therefore energy Enough effectively improve the mark quality of picture.
Installation practice 2
As shown in figure 8, the embodiment of the invention provides a kind of annotation equipment of picture, on the basis of Installation practice 1, The above-mentioned picture that mark classification has been determined is used as the training sample of machine learning model.
Wherein, above-mentioned apparatus further include:
Third obtains module 601, for obtaining machine learning model to the class prediction of predetermined pictures as a result, class prediction As a result the classification of the predetermined pictures of the machine learning model prediction is carried in.
Above-mentioned distribution module 502, is also used to:
For the classification of each prediction, the predetermined pictures of classification are distributed at least one third labeler, are obtained extremely Veritification result of few third labeler to the classification of the prediction of predetermined pictures.
Above-mentioned determining module 505, is also used to:
According at least one third labeler to the veritification of the classification of the prediction of predetermined pictures as a result, determining the classification of prediction For the predetermined pictures of correct classification, correct classification is determined as to the mark classification of predetermined pictures.
Wherein, will mark the predetermined pictures that classification is correct classification can be used as the training sample of machine learning model.
Wherein, third labeler includes at least two, and above-mentioned determining module 505 is specifically used for:
When veritification result of at least two third labelers to the classification of the prediction of predetermined pictures is correct, determine pre- The classification of survey is correct classification;
Alternatively,
When the veritification result of the classification of the prediction to predetermined pictures of at least two third labelers is correctly to veritify knot When ratio of the fruit in total veritification result of at least two third labelers reaches third predetermined threshold value, determine that the classification of prediction is Correct classification.
The annotation equipment of picture provided in an embodiment of the present invention since every part of picture is marked by multiple labelers, and passes through The mark classification of every part of picture is obtained to the annotation results of multiple labelers, therefore can effectively improve the mark class of every part of picture Other accuracy, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine final mark classification, Therefore, annotation results do not determine the final mark classification of picture the problem of single labeler, and problem annotation results can be with It is effectively found, horizontal irregular labeler can be made to work at the same time and the accurate of picture mark classification is effectively ensured Property, effectively improve the mark quality of picture.
Moreover, by that will be determined that the picture of mark classification is used as the training sample of machine learning model, to machine learning Model is trained, and is predicted using the obtained machine learning model of training the classification of predetermined pictures, do not need again by Labeler is manually labeled, to promote the annotating efficiency of predetermined pictures;Also, it is pre- for being predicted by machine learning model Determine the annotation results of picture, at least one the available veritification of third labeler to the classification of the prediction of predetermined pictures of server As a result, so that it is determined that the classification of prediction is the predetermined pictures of correct classification, and correct classification is determined as to the mark of predetermined pictures Classification, to improve the mark quality of the mark classification of picture.
The embodiment of the invention also provides a kind of tagging equipments of picture, are specifically as follows server, as shown in figure 9, should Equipment 700 includes processor 701 and machine readable storage medium 702, and machine readable storage medium is stored with can be by processor The machine-executable instruction of execution, processor execute machine-executable instruction and perform the steps of
Plurality of pictures is divided into several pieces by the picture for obtaining multiple classifications to be marked;
Every part of picture after division is distributed at least two labelers;
For every part of picture after division, the annotation results data of at least two labelers, the mark of each labeler are obtained It infuses in result data, carries pre- mark classification corresponding with each picture in this part of picture;
For every picture in every part of picture, the picture is compared in the annotation results number of at least two labeler Whether the pre- mark classification in is identical, determines in every part of picture in the annotation results data of at least two labeler The quantity of pre- mark classification picture all the same;
For every part of picture after division, schemed based on the quantity of identified pre- mark classification picture all the same and the part The proportionate relationship of piece total quantity determines the mark classification of picture in this part of picture.
Machine readable storage medium 702 may include random access memory (Random Access Memory, abbreviation It RAM), also may include nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.It can Choosing, memory can also be that at least one is located remotely from the storage device of aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, Abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), field programmable gate array (Field-Programmable Gate Array, Abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.
The picture of multiple acquired classifications to be marked is divided by the tagging equipment of picture provided in an embodiment of the present invention After more parts, then every part of picture is distributed at least two labelers and is manually marked, obtains the corresponding mark of each labeler Result data, and then for each picture in every part of picture, the picture is compared in the corresponding mark knot of different labeled person Whether the pre- mark classification in fruit data is identical, and determines the identical picture number of pre- mark classification, then based on pre- mark classification The proportionate relationship of identical picture number and this part of picture total quantity determines the mark classification of picture in this part of picture.The present invention The mask method for the picture that embodiment provides, since every part of picture is marked by multiple labelers, and by multiple labelers Annotation results obtain the mark classification of every part of picture, therefore can effectively improve the accuracy of the mark classification of every part of picture, and And final mark classification is determined since every part of picture needs to integrate the annotation results of multiple labelers, it is single to mark The problem of person, annotation results did not determine the final mark classification of picture, and problem annotation results can effectively be found, can So that horizontal irregular labeler works at the same time and be effectively ensured the accuracy of picture mark classification, picture is effectively improved Mark quality.
The embodiment of the invention also provides a kind of computer readable storage medium, it is stored in computer readable storage medium Computer program, when computer program is executed by processor, to execute following steps:
Plurality of pictures is divided into several pieces by the picture for obtaining multiple classifications to be marked;
Every part of picture after division is distributed at least two labelers;
For every part of picture after division, the annotation results data of at least two labelers, the mark of each labeler are obtained It infuses in result data, carries pre- mark classification corresponding with each picture in this part of picture;
For every picture in every part of picture, the picture is compared in the annotation results number of at least two labeler Whether the pre- mark classification in is identical, determines in every part of picture in the annotation results data of at least two labeler The quantity of pre- mark classification picture all the same;
For every part of picture after division, schemed based on the quantity of identified pre- mark classification picture all the same and the part The proportionate relationship of piece total quantity determines the mark classification of picture in this part of picture.
Computer readable storage medium provided in an embodiment of the present invention draws the picture of multiple acquired classifications to be marked After being divided into more parts, then every part of picture is distributed at least two labelers and is manually marked, it is corresponding to obtain each labeler Annotation results data, and then for each picture in every part of picture, the picture is compared in the corresponding mark of different labeled person Whether the pre- mark classification infused in result data is identical, and determines the identical picture number of pre- mark classification, then based on pre- mark The proportionate relationship of classification identical picture number and this part of picture total quantity determines the mark classification of picture in this part of picture.This The mask method for the picture that inventive embodiments provide, since every part of picture is marked by multiple labelers, and by multiple marks The annotation results of person obtain the mark classification of every part of picture, thus can effectively improve every part of picture mark classification it is accurate Property, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine final mark classification, therefore, singly The problem of a labeler, annotation results did not determine the final mark classification of picture, and problem annotation results can effectively be sent out It is existing, horizontal irregular labeler can be made to work at the same time and be effectively ensured the accuracy of picture mark classification, effectively mentioned The mark quality of high picture.
The embodiment of the invention also provides a kind of computer program products comprising instruction, when it runs on computers When, so that computer executes following steps:
Plurality of pictures is divided into several pieces by the picture for obtaining multiple classifications to be marked;
Every part of picture after division is distributed at least two labelers;
For every part of picture after division, the annotation results data of at least two labelers, the mark of each labeler are obtained It infuses in result data, carries pre- mark classification corresponding with each picture in this part of picture;
For every picture in every part of picture, the picture is compared in the annotation results number of at least two labeler Whether the pre- mark classification in is identical, determines in every part of picture in the annotation results data of at least two labeler The quantity of pre- mark classification picture all the same;
For every part of picture after division, schemed based on the quantity of identified pre- mark classification picture all the same and the part The proportionate relationship of piece total quantity determines the mark classification of picture in this part of picture.
Computer program product provided in an embodiment of the present invention comprising instruction, by multiple acquired classifications to be marked After picture is divided into more parts, then every part of picture is distributed at least two labelers and is manually marked, obtains each labeler Corresponding annotation results data, and then for each picture in every part of picture, the picture is compared in different labeled person couple Whether the pre- mark classification in the annotation results data answered is identical, and determines the identical picture number of pre- mark classification, then be based on The pre- proportionate relationship for marking classification identical picture number and this part of picture total quantity, determines the mark class of picture in this part of picture Not.The mask method of picture provided in an embodiment of the present invention, since every part of picture is marked by multiple labelers, and by multiple The annotation results of labeler obtain the mark classification of every part of picture, therefore can effectively improve the standard of the mark classification of every part of picture True property, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine final mark classification, therefore, The problem of single labeler, annotation results did not determine the final mark classification of picture, and problem annotation results can be effective It was found that horizontal irregular labeler can be made to work at the same time and the accuracy that picture marks classification is effectively ensured, effectively Improve the mark quality of picture.
The embodiment of the invention also provides a kind of computer programs, when run on a computer, so that computer is held Row following steps:
Plurality of pictures is divided into several pieces by the picture for obtaining multiple classifications to be marked;
Every part of picture after division is distributed at least two labelers;
For every part of picture after division, the annotation results data of at least two labelers, the mark of each labeler are obtained It infuses in result data, carries pre- mark classification corresponding with each picture in this part of picture;
For every picture in every part of picture, the picture is compared in the annotation results number of at least two labeler Whether the pre- mark classification in is identical, determines in every part of picture in the annotation results data of at least two labeler The quantity of pre- mark classification picture all the same;
For every part of picture after division, schemed based on the quantity of identified pre- mark classification picture all the same and the part The proportionate relationship of piece total quantity determines the mark classification of picture in this part of picture.
Computer program provided in an embodiment of the present invention comprising instruction, by the picture of multiple acquired classifications to be marked After being divided into more parts, then every part of picture is distributed at least two labelers and is manually marked, it is corresponding to obtain each labeler Annotation results data it is corresponding in different labeled person to compare the picture and then for each picture in every part of picture Whether the pre- mark classification in annotation results data is identical, and determines the identical picture number of pre- mark classification, then based on pre- mark The proportionate relationship for infusing classification identical picture number and this part of picture total quantity, determines the mark classification of picture in this part of picture. The mask method of picture provided in an embodiment of the present invention, since every part of picture is marked by multiple labelers, and by multiple marks The annotation results of note person obtain the mark classification of every part of picture, thus can effectively improve every part of picture mark classification it is accurate Property, moreover, because every part of picture needs to integrate the annotation results of multiple labelers to determine final mark classification, therefore, singly The problem of a labeler, annotation results did not determine the final mark classification of picture, and problem annotation results can effectively be sent out It is existing, horizontal irregular labeler can be made to work at the same time and be effectively ensured the accuracy of picture mark classification, effectively mentioned The mark quality of high picture.
For device/picture tagging equipment/storage medium embodiment, implement since it is substantially similar to method Example, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (22)

1. a kind of mask method of picture, which is characterized in that the described method includes:
Plurality of pictures is divided into several pieces by the picture for obtaining multiple classifications to be marked;
Every part of picture after division is distributed at least two labelers;
For every part of picture after division, the annotation results data of at least two labeler, the mark of each labeler are obtained It infuses in result data, carries pre- mark classification corresponding with each picture in this part of picture;
For every picture in every part of picture, the picture is compared in the annotation results number of at least two labeler Whether the pre- mark classification in is identical, determines in every part of picture in the annotation results data of at least two labeler In pre- mark classification picture all the same quantity;
For every part of picture after division, schemed based on the quantity of identified pre- mark classification picture all the same and the part The proportionate relationship of piece total quantity determines the mark classification of picture in this part of picture.
2. mask method according to claim 1, which is characterized in that every part of picture for after dividing is based on institute The quantity of determining pre- mark classification picture all the same and the proportionate relationship of this part of picture total quantity, determine this part of picture The mark classification of middle picture, comprising:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same and the part The ratio between picture total quantity is greater than or equal to the first preset threshold, by the pre- mark classification of pre- mark classification picture all the same It is determined as the mark classification of the picture.
3. mask method according to claim 2, which is characterized in that the picture that the pre- mark classification is all the same Pre- mark classification be determined as after the mark classification of the picture, the method also includes:
It is marked again to the different picture of classification is marked described in this part of picture in advance.
4. mask method according to claim 3, which is characterized in that mark classification in advance not described in described pair of this part of picture Same picture is marked again, comprising:
The different picture of classification will be marked in advance described in this part of picture distributes to the second labeler;
It obtains the annotation results data of second labeler, in the annotation results data of second labeler, carries The different picture of the pre- mark classification that second labeler determines, described marks classification again;
The classification that marks again of the different picture of the pre- mark classification that second labeler is determined, described is determined as the pre- mark Infuse the mark classification of the different picture of classification.
5. mask method according to claim 1, which is characterized in that every part of picture for after dividing is based on institute The quantity of determining pre- mark classification picture all the same and the proportionate relationship of this part of picture total quantity, determine this part of picture The mark classification of middle picture, comprising:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same and the part The ratio between picture total quantity deletes the mark knot of at least two labeler corresponding with this part of picture less than the first preset threshold Fruit data, and the picture in this part of picture is marked again.
6. mask method according to claim 1, which is characterized in that
The labeler includes at least three;
Every part of picture for after dividing based on the identified pre- quantity for marking classification picture all the same and is somebody's turn to do The proportionate relationship of part picture total quantity, determines the mark classification of picture in this part of picture, comprising:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same, with this The ratio between part picture total quantity obtains two marks in the annotation results data of at least three labeler less than the first preset threshold Infuse the similarity of result data;
In the annotation results data of at least three labeler, there are two marks that similarity is higher than the second preset threshold When result data:
Determine the highest two annotation results data of similarity;
By the pre- mark classification of the picture in the highest two annotation results data of the similarity, with identical pre- mark classification It is determined as the mark classification of the picture;
Wherein, the similarity of two annotation results data are as follows: this part of picture has identical pre- mark in two annotation results data Infuse the quantity of the picture of classification and the ratio of this part of picture number.
7. mask method according to claim 1, which is characterized in that the picture that mark classification has been determined is used as machine The training sample of learning model.
8. mask method according to claim 7, which is characterized in that the method also includes:
The machine learning model is obtained to the class prediction of predetermined pictures as a result, carrying in the class prediction result described The classification of the predetermined pictures of machine learning model prediction;
For the classification of prediction described in each, the predetermined pictures of the classification are distributed at least one third labeler, are obtained Take at least one described third labeler to the veritification result of the classification of the prediction of the predetermined pictures;
According at least one described third labeler to the veritification of the classification of the prediction of the predetermined pictures as a result, determining institute The classification for stating prediction is the predetermined pictures of correct classification, and the correct classification is determined as to the mark classification of the predetermined pictures.
9. mask method according to claim 8, which is characterized in that the mark classification is the predetermined of the correct classification Picture is used as the training sample of the machine learning model.
10. mask method according to claim 9, which is characterized in that
The third labeler includes at least two;
Described at least one third labeler according to is to the veritification of the classification of the prediction of the predetermined pictures as a result, really The classification of the fixed prediction is the predetermined pictures of correct classification, comprising:
When veritification result of at least two thirds labeler to the classification of the prediction of the predetermined pictures is correct When, determine that the classification of the prediction is correct classification;
Alternatively,
When the veritification result of the classification of the prediction to the predetermined pictures of at least two thirds labeler is correct Ratio of the veritification result in total veritification result of at least two thirds labeler when reaching third predetermined threshold value, determine The classification of the prediction is correct classification.
11. a kind of annotation equipment of picture, which is characterized in that described device includes:
Plurality of pictures is divided into several pieces for obtaining the picture of multiple classifications to be marked by the first acquisition module;
Distribution module, for every part of picture after dividing to be distributed at least two labelers;
Second obtains module, for obtaining the annotation results number of at least two labeler for every part of picture after dividing According to carrying pre- mark classification corresponding with each picture in this part of picture in the annotation results data of each labeler;
Contrast module, for comparing the picture at least two mark for every picture in every part of picture Whether the pre- mark classification in the annotation results data of person is identical, determines in every part of picture at least two labeler Annotation results data in pre- mark classification picture all the same quantity;
Determining module, every part of picture for being directed to after dividing, based on identified pre- mark classification picture all the same Quantity and this part of picture total quantity proportionate relationship, determine the mark classification of picture in this part of picture.
12. annotation equipment according to claim 11, which is characterized in that the determining module is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same and the part The ratio between picture total quantity is greater than or equal to the first preset threshold, by the pre- mark classification of pre- mark classification picture all the same It is determined as the mark classification of the picture.
13. annotation equipment according to claim 12, which is characterized in that described device further include:
Labeling module, for being marked again to marking the different picture of classification described in this part of picture in advance.
14. annotation equipment according to claim 13, which is characterized in that the labeling module, comprising:
Distribution sub module distributes to the second labeler for will mark in advance the different picture of classification described in this part of picture;
Acquisition submodule, for obtaining the annotation results data of second labeler, the annotation results of second labeler In data, carry the different picture of the pre- mark classification that second labeler determines, described marks classification again;
Determine submodule, the picture different for the pre- mark classification that determine second labeler, described marks class again It is not determined as the mark classification of the different picture of the pre- mark classification.
15. annotation equipment according to claim 11, which is characterized in that the determining module is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same and the part The ratio between picture total quantity deletes the mark knot of at least two labeler corresponding with this part of picture less than the first preset threshold Fruit data, and the picture in this part of picture is marked again.
16. annotation equipment according to claim 11, which is characterized in that the labeler includes at least three;
The determining module, is specifically used for:
If in a copy of it picture after dividing, the quantity of identified pre- mark classification picture all the same, with this The ratio between part picture total quantity obtains two marks in the annotation results data of at least three labeler less than the first preset threshold Infuse the similarity of result data;
In the annotation results data of at least three labeler, there are two marks that similarity is higher than the second preset threshold When result data:
Determine the highest two annotation results data of similarity;
By the pre- mark classification of the picture in the highest two annotation results data of the similarity, with identical pre- mark classification It is determined as the mark classification of the picture;
Wherein, the similarity of two annotation results data are as follows: this part of picture has identical pre- mark in two annotation results data Infuse the quantity of the picture of classification and the ratio of this part of picture number.
17. annotation equipment according to claim 11, which is characterized in that the picture that mark classification has been determined is used as machine The training sample of device learning model.
18. annotation equipment according to claim 17, which is characterized in that described device further include:
Third obtains module, for obtaining the machine learning model to the class prediction of predetermined pictures as a result, the classification is pre- Survey the classification that the predetermined pictures of the machine learning model prediction are carried in result;
The distribution module, is also used to: for the classification of prediction described in each, by the predetermined pictures of the classification distribute to A few third labeler obtains at least one described third labeler to the core of the classification of the prediction of the predetermined pictures Test result;
The determining module, is also used to: according at least one described third labeler to the predictions of the predetermined pictures The correct classification is determined as institute as a result, the classification for determining the prediction is the predetermined pictures of correct classification by the veritification of classification State the mark classification of predetermined pictures.
19. annotation equipment according to claim 18, which is characterized in that the mark classification is the pre- of the correct classification Determine picture, the training sample as the machine learning model.
20. annotation equipment according to claim 19, which is characterized in that the third labeler includes at least two;
The determining module, is specifically used for: when at least two thirds labeler is to the predictions of the predetermined pictures When the veritification result of classification is correct, determine that the classification of the prediction is correct classification;
Alternatively,
When the veritification result of the classification of the prediction to the predetermined pictures of at least two thirds labeler is correct Ratio of the veritification result in total veritification result of at least two thirds labeler when reaching third predetermined threshold value, determine The classification of the prediction is correct classification.
21. a kind of tagging equipment of picture, which is characterized in that including processor and machine readable storage medium, the machine can It reads storage medium and is stored with the machine-executable instruction that can be executed by the processor, the processor executes the machine can It executes instruction to realize the described in any item method and steps of claim 1-10.
22. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes claim 1-10 described in any item method and steps when the computer program is executed by processor.
CN201810618773.4A 2018-06-15 2018-06-15 Picture labeling method, labeling device, equipment and storage medium Active CN108960297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810618773.4A CN108960297B (en) 2018-06-15 2018-06-15 Picture labeling method, labeling device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810618773.4A CN108960297B (en) 2018-06-15 2018-06-15 Picture labeling method, labeling device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108960297A true CN108960297A (en) 2018-12-07
CN108960297B CN108960297B (en) 2021-07-30

Family

ID=64489489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810618773.4A Active CN108960297B (en) 2018-06-15 2018-06-15 Picture labeling method, labeling device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108960297B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739644A (en) * 2018-12-18 2019-05-10 创新奇智(南京)科技有限公司 A kind of computer based data picture mask method, system and device
CN110110795A (en) * 2019-05-10 2019-08-09 厦门美图之家科技有限公司 Image classification method and device
CN110413821A (en) * 2019-07-31 2019-11-05 四川长虹电器股份有限公司 Data mask method
CN110852166A (en) * 2019-10-10 2020-02-28 上海速益网络科技有限公司 Picture identification and marking method
CN111275097A (en) * 2020-01-17 2020-06-12 北京世纪好未来教育科技有限公司 Video processing method and system, picture processing method and system, equipment and medium
CN111507405A (en) * 2020-04-17 2020-08-07 北京百度网讯科技有限公司 Picture labeling method and device, electronic equipment and computer readable storage medium
CN112488160A (en) * 2020-11-16 2021-03-12 浙江新再灵科技股份有限公司 Model training method for image classification task
CN112860416A (en) * 2021-04-25 2021-05-28 城云科技(中国)有限公司 Annotating task assignment strategy method and device
CN113553144A (en) * 2020-04-24 2021-10-26 杭州海康威视数字技术股份有限公司 Data distribution method, device and system
CN115248831A (en) * 2021-04-28 2022-10-28 马上消费金融股份有限公司 Labeling method, device, system, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
US20140270550A1 (en) * 2013-03-15 2014-09-18 Dropbox, Inc. Presentation and organization of content
CN104462738A (en) * 2013-09-24 2015-03-25 西门子公司 Method, device and system for labeling medical images
CN104795077A (en) * 2015-03-17 2015-07-22 北京航空航天大学 Voice annotation quality consistency detection method
CN105404896A (en) * 2015-11-03 2016-03-16 北京旷视科技有限公司 Annotation data processing method and annotation data processing system
CN107908641A (en) * 2017-09-27 2018-04-13 百度在线网络技术(北京)有限公司 A kind of method and system for obtaining picture labeled data
CN108052555A (en) * 2017-11-29 2018-05-18 汉柏科技有限公司 A kind of photo classification method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
US20140270550A1 (en) * 2013-03-15 2014-09-18 Dropbox, Inc. Presentation and organization of content
CN104462738A (en) * 2013-09-24 2015-03-25 西门子公司 Method, device and system for labeling medical images
CN104795077A (en) * 2015-03-17 2015-07-22 北京航空航天大学 Voice annotation quality consistency detection method
CN105404896A (en) * 2015-11-03 2016-03-16 北京旷视科技有限公司 Annotation data processing method and annotation data processing system
CN107908641A (en) * 2017-09-27 2018-04-13 百度在线网络技术(北京)有限公司 A kind of method and system for obtaining picture labeled data
CN108052555A (en) * 2017-11-29 2018-05-18 汉柏科技有限公司 A kind of photo classification method and system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739644A (en) * 2018-12-18 2019-05-10 创新奇智(南京)科技有限公司 A kind of computer based data picture mask method, system and device
CN110110795B (en) * 2019-05-10 2021-04-20 厦门美图之家科技有限公司 Image classification method and device
CN110110795A (en) * 2019-05-10 2019-08-09 厦门美图之家科技有限公司 Image classification method and device
CN110413821A (en) * 2019-07-31 2019-11-05 四川长虹电器股份有限公司 Data mask method
CN110852166A (en) * 2019-10-10 2020-02-28 上海速益网络科技有限公司 Picture identification and marking method
CN111275097A (en) * 2020-01-17 2020-06-12 北京世纪好未来教育科技有限公司 Video processing method and system, picture processing method and system, equipment and medium
CN111507405A (en) * 2020-04-17 2020-08-07 北京百度网讯科技有限公司 Picture labeling method and device, electronic equipment and computer readable storage medium
CN113553144A (en) * 2020-04-24 2021-10-26 杭州海康威视数字技术股份有限公司 Data distribution method, device and system
CN113553144B (en) * 2020-04-24 2023-09-26 杭州海康威视数字技术股份有限公司 Data distribution method, device and system
CN112488160A (en) * 2020-11-16 2021-03-12 浙江新再灵科技股份有限公司 Model training method for image classification task
CN112488160B (en) * 2020-11-16 2023-02-07 浙江新再灵科技股份有限公司 Model training method for image classification task
CN112860416A (en) * 2021-04-25 2021-05-28 城云科技(中国)有限公司 Annotating task assignment strategy method and device
CN115248831A (en) * 2021-04-28 2022-10-28 马上消费金融股份有限公司 Labeling method, device, system, equipment and readable storage medium
CN115248831B (en) * 2021-04-28 2024-03-15 马上消费金融股份有限公司 Labeling method, labeling device, labeling system, labeling equipment and readable storage medium

Also Published As

Publication number Publication date
CN108960297B (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN108960297A (en) Mask method, annotation equipment, equipment and the storage medium of picture
CN109034188A (en) Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
US20210318851A1 (en) Systems and Methods for Dataset Merging using Flow Structures
EP2778929B1 (en) Test script generation system
CN105718371B (en) A kind of regression testing method, apparatus and system
CN107256247A (en) Big data data administering method and device
CN105144080A (en) System for metadata management
CN106682096A (en) Method and device for log data management
CN107111625A (en) Realize the method and system of the efficient classification and exploration of data
CN104484558B (en) The analysis report automatic generation method and system of biological information project
CN109885624A (en) Data processing method, device, computer equipment and storage medium
CN103440199B (en) Test bootstrap technique and device
CN106570013A (en) Method and device for processing page access data
CN105426307A (en) Local area network product test resource sharing method and system
CN106484853A (en) document analysis method and device
CN106021114B (en) Towards the automated testing method and system of intelligent robot
CN108153754A (en) A kind of data processing method and its device
CN117519656A (en) Software development system based on intelligent manufacturing
CN110362767A (en) Bury a processing method, device, system and computer readable storage medium
CN108416151A (en) A kind of Satellite TT information flow intelligentized design system and fault message method for rapidly positioning based on model
CN110188258B (en) Method and device for acquiring external data by using crawler
CN104954407B (en) Information-pushing method and device
Gomez et al. Experimenting with a Machine Generated Annotations Pipeline
CN108873781A (en) A kind of Full-automatic digital equipment
CN105373043B (en) The method and system of monitor controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant