CN109657087A - A kind of batch data mask method, device and computer readable storage medium - Google Patents
A kind of batch data mask method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN109657087A CN109657087A CN201811456459.7A CN201811456459A CN109657087A CN 109657087 A CN109657087 A CN 109657087A CN 201811456459 A CN201811456459 A CN 201811456459A CN 109657087 A CN109657087 A CN 109657087A
- Authority
- CN
- China
- Prior art keywords
- data
- image
- cluster
- neural network
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 238000002372 labelling Methods 0.000 claims abstract description 16
- 238000012800 visualization Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 52
- 238000003062 neural network model Methods 0.000 claims description 30
- 230000009467 reduction Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 241000282326 Felis catus Species 0.000 description 17
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 241000406668 Loxodonta cyclotis Species 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000005611 electricity Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to field of artificial intelligence, providing a kind of batch data mask method, device and storage medium, method includes: to carry out dimension-reduction treatment to the data set for including multiple images, obtains the data set being made of low-dimensional vector;The low-dimensional vector of data set is clustered, different classifications is divided the image into;The data after cluster are shown by visualization tool, choose same category of data, and unified Batch labeling is carried out to same category of data.The data in data set are divided into different classifications by cluster, so as to carry out Batch labeling to the same category of data in data set, reduce the workload of mark.It is versatile by the way of Unsupervised clustering.Also, after cluster by the way of neural network recognization, the feature of the image in same category is further identified, so as to determine the common characteristic of the data in the same category, and then unified Batch labeling can be carried out to same category according to recognition result.
Description
Technical field
The present invention relates to field of artificial intelligence, specifically, being related to a kind of batch data mask method, device and meter
Calculation machine readable storage medium storing program for executing.
Background technique
With the rapid development of multimedia information technology and Internet information technique, new images hundreds of millions of daily are presented
On the internet.Compared with text, image can more intuitive, more accurate ground description information, therefore in nowadays information explosion
Epoch, image can make user it is more convenient, it is faster, more accurately obtain information needed.When image information is increasingly becoming instantly
One of the most important approach propagated for information.Especially in intelligent identification technology, need largely marked picture as
Training dataset carrys out training pattern, to improve the recognition capability of model.It however is usually at present logical to the mark of image data
Cross artificial observed number evidence, distinguish data category, and by tool one by one classification annotation is carried out to every picture.This method
The disadvantage is that can not batch data are labeled, when the amount of data is large annotating efficiency is lower;Many mark needs of work are special
Industry personnel carry out classification annotation, cause to mark higher cost.
Summary of the invention
In order to solve the above technical problems, the present invention provides a kind of batch data mask method, is applied to electronic device, to packet
Data set containing multiple images carries out dimension-reduction treatment, obtains the data set being made of low-dimensional vector;To the low-dimensional of data set to
Amount is clustered, and different classifications is divided the image into;The data after cluster are shown by visualization tool, are chosen same
The data of classification, and unified Batch labeling is carried out to same category of data.
Preferably, low-dimensional data is converted by high position data by the way of Nonlinear Dimension Reduction.
Preferably, Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
qijIndicate lower dimensional space conditional probability,
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution,
Gradient
Preferably for the uncertain classification of characteristics of image, classification annotation is carried out using number.
Preferably, after cluster, at least image in a certain classification is identified, using neural network also to accelerate to mark
Infuse speed, comprising the following steps: collect training dataset, training dataset includes the picture largely marked, as training
Data;With training data training neural network model, the recognition capability of neural network model is improved;After the completion of cluster, utilize
Neural network model identifies an image in each classification, to obtain the feature in this image;According to the spy of this image
Sign is uniformly labeled all images in each classification after cluster.
Preferably, after cluster, at least two images in each classification are identified, using neural network also to accelerate to mark
Infuse speed, comprising the following steps: collect training dataset, training dataset includes the picture largely marked, as training
Data;With training data training neural network model, the recognition capability of neural network model is improved;After the completion of cluster, utilize
Neural network model identifies at least two images in each classification, the feature in image is extracted, if acquired feature does not have
Common characteristic then further identifies next image, and continues to search the common characteristic of the feature in identified image, until institute
The feature of the image of identification has common characteristic, then is the reference name of the classification with the common characteristic, carries out to entire classification
Mark.
Preferably, data set is formed as feature vector using the color histogram of image.
The present invention also provides a kind of electronic device, which includes: memory and processor, is deposited in the memory
Batch data marking program is contained, the batch data marking program realizes following steps when being executed by the processor: to packet
Data set containing multiple images carries out dimension-reduction treatment, obtains the data set being made of low-dimensional vector;To the low-dimensional of data set to
Amount is clustered, and different classifications is divided the image into;The data after cluster are shown by visualization tool, are chosen same
The data of classification, and unified Batch labeling is carried out to same category of data.
Preferably, after cluster, at least image in a certain classification is identified, using neural network also to accelerate to mark
Infuse speed, comprising the following steps: collect training dataset, training dataset includes the picture largely marked, as training
Data;With training data training neural network model, the recognition capability of neural network model is improved;After the completion of cluster, utilize
Neural network model identifies an image in each classification, to obtain the feature in this image;According to the spy of this image
Sign is uniformly labeled all images in each classification after cluster.
The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has computer
Program, the computer program include that program instruction realizes data as described above when described program instruction is executed by processor
Batch labeling method.
Data in data set are divided into different classifications by cluster by the present invention, so as to same in data set
The data of classification carry out Batch labeling, reduce the workload of mark.For the feature of uncertain data, can be directly used
The mode for numbering mark, does not need professional and goes to identify.It is versatile by the way of Unsupervised clustering.Also, it is clustering
Afterwards by the way of neural network recognization, the feature of the image in same category is further identified, so as to determine that this is same
The common characteristic of data in classification, and then unified Batch labeling can be carried out to same category according to recognition result.
Detailed description of the invention
By the way that embodiment is described in conjunction with following accompanying drawings, features described above of the invention and technological merit will become
More understands and be readily appreciated that.
Fig. 1 is the flow diagram of the batch data mask method of the embodiment of the present invention;
Fig. 2 be one embodiment of the invention cluster after using neural network recognition method carry out batch data mark stream
Journey schematic diagram;
Fig. 3 be another embodiment of the present invention cluster after utilize neural network recognition method carry out batch data mark
Flow diagram;
Fig. 4 is the hardware structure schematic diagram of the electronic device of the embodiment of the present invention;
Fig. 5 is the module structure drafting of the batch data marking program of the embodiment of the present invention.
Specific embodiment
Batch data mask method, device and computer-readable storage of the present invention described below with reference to the accompanying drawings
The embodiment of medium.Those skilled in the art will recognize, without departing from the spirit and scope of the present invention the case where
Under, described embodiment can be modified with a variety of different modes or combinations thereof.Therefore, attached drawing and description are in essence
On be it is illustrative, be not intended to limit the scope of the claims.In addition, in the present specification, attached drawing is not in scale
It draws, and identical appended drawing reference indicates identical part.
Fig. 1 is the flow diagram of batch data mask method provided in an embodiment of the present invention.This method includes following step
It is rapid:
Step S10 carries out dimension-reduction treatment to the data set for including multiple images, obtains the data being made of low-dimensional vector
Collection.Wherein, multiple images can be using a color histogram of every image as feature vector, and every is schemed
The low-dimensional of picture, available High Dimensional Data Streams indicates vector.High dimensional data passes through dimension-reduction treatment dimensionality reduction to two dimension or three-dimensional data
It can be used to cluster, can show the effect of cluster.
Step S30 clusters the low-dimensional vector of data set, divides the image into different classifications.For example, some images
Automobile, image be high mountain, image be cat, image be elephant, then pass through clustering algorithm showing identical spy
The image clustering of sign is together.Such as together the image clustering of automobile, together the image clustering of cat.
Data after cluster are shown by visualization tool (such as display), are chosen different classes of by step S50
Data, and unified Batch labeling is carried out to data of all categories.Such as: some region is all " cat " in visualization tool
Data set, then this region is all chosen, and be labeled as " cat ", then the label of this batch data collection is all " cat ", reach
The purpose of rapid batch mark is arrived.
In one alternate embodiment, low-dimensional data is converted by high position data by the way of Nonlinear Dimension Reduction.
Further, high dimensional data is regarded as the point in higher dimensional space, then maps that low-dimensional sky with manifold method
Between in, keep its space length, i.e., at a distance of closer/remote point in higher dimensional space, be mapped in lower dimensional space still relatively it is close/
Far.Specifically, it at a distance of closer point in higher dimensional space, is mapped in lower dimensional space still relatively closely, in higher dimensional space apart
It is farther away, it is mapped in lower dimensional space still farther out.Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
Wherein, qijIndicate lower dimensional space conditional probability;
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function are as follows:
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution;
Gradient are as follows:
In one alternate embodiment, the classification that do not know for characteristics of image carries out classification annotation using number.Such as
Medical picture, these need professional person to go to identify classification.Can be used number carry out classification annotation, such as " 1,2,3... " or "
A, b, c... " etc..
In one alternate embodiment, it goes mark also using the artificial observation category again after cluster or to occupy personnel
Time.And hence it is also possible at least image in a certain classification is further identified using neural network after cluster,
Since picture has been classified according to certain features using clustering algorithm, then, in subsequent neural network recognization,
It can be expedited the speed of neural network recognization picture.Such as the picture of the cat in a certain classification including different appearance, then in mind
After Network Recognition wherein at least an image (such as 3 images), then automatically confirm that the category is cat, and utilize annotation tool
Automatically all pictures of the category are all labeled as cat.So cluster is further identified using neural network after cluster
An at least image in classification afterwards, can more quickly mark image.
Specifically, after cluster, at least image in a certain classification is identified, using neural network also to accelerate to mark
Speed is infused, as shown in Figure 2, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number
According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S300 identifies an image in each classification using neural network model after the completion of cluster, to obtain
Feature in this image;
Step S400 uniformly marks all images in each classification after cluster according to the feature of this image
Note.
In one alternate embodiment, after cluster, at least two in each classification are also identified using neural network
Image, to accelerate to mark speed, as shown in Figure 3, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number
According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S500 identifies at least two images in each classification using neural network model, mentions after the completion of cluster
The feature in image is taken, if acquired feature does not have common characteristic, further identifies next image, and continue to search institute
The common characteristic of the feature in image is identified, until the feature of the image identified has common characteristic;
Step S600 take the common characteristic as the reference name of the classification, is labeled to entire classification.
As shown in fig.3, being the hardware structure schematic diagram of the embodiment of electronic device of the present invention.It is described in the present embodiment
Electronic device 2 be it is a kind of can according to the instruction for being previously set or store, automatic progress numerical value calculating and/or information processing
Equipment.For example, it may be smart phone, tablet computer, laptop, desktop computer, rack-mount server, blade type take
It is engaged in device, tower server or Cabinet-type server (including server set composed by independent server or multiple servers
Group) etc..As shown in figure 3, the electronic device 2 includes at least, but it is not limited to, depositing for connection can be in communication with each other by system bus
Reservoir 21, processor 22, network interface 23.Wherein: the memory 21 includes at least a type of computer-readable storage
Medium, the readable storage medium storing program for executing include flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.),
Random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable are only
Read memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..In some embodiments
In, the memory 21 can be the internal storage unit of the electronic device 2, such as the hard disk or memory of the electronic device 2.
In further embodiments, the memory 21 is also possible to the External memory equipment of the electronic device 2, such as electronics dress
Set the plug-in type hard disk being equipped on 2, intelligent memory card (Smart Media Card, SMC), secure digital (Secure
Digital, SD) card, flash card (Flash Card) etc..Certainly, the memory 21 can also both include the electronic device 2
Internal storage unit also include its External memory equipment.In the present embodiment, the memory 21 is installed on commonly used in storage
Operating system and types of applications software, such as the batch data marking program code of the electronic device 2 etc..In addition, institute
Stating memory 21 can be also used for temporarily storing the Various types of data that has exported or will export.
The processor 22 can be in some embodiments central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 22 is commonly used in the control electricity
The overall operation of sub-device 2, such as execute control relevant to the electronic device 2 progress data interaction or communication and processing
Deng.In the present embodiment, the processor 22 is for running the program code stored in the memory 21 or processing data, example
Batch data marking program as described in running.
The network interface 23 may include radio network interface or wired network interface, which is commonly used in
Communication connection is established between the electronic device 2 and other electronic devices.For example, the network interface 23 is used to incite somebody to action by network
The electronic device 2 is connected with push platform, and data transmission channel is established between the electronic device 2 and push platform and is led to
Letter connection etc..The network can be intranet (Intranet), internet (Internet), global system for mobile communications
(Global System of Mobile communication, GSM), wideband code division multiple access (Wideband
CodeDivision Multiple Access, WCDMA), 4G network, 5G network, bluetooth (Bluetooth), Wi-Fi etc. is wireless
Or cable network.
Optionally, which can also include user interface, and user interface may include input unit such as keyboard
(Keyboard), speech input device such as microphone (microphone) etc. has the equipment of speech identifying function, voice defeated
Device such as sound equipment, earphone etc. out.
Optionally, user interface can also include standard wireline interface and wireless interface.
Optionally, which can also include display, and display is referred to as display screen or display unit.
It can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and Organic Light Emitting Diode in some embodiments
(Organic Light-Emitting Diode, OLED) display etc..Display is used to be shown in handle in electronic device 2
Information and for showing visual user interface.
It should be pointed out that Fig. 3 illustrates only the electronic device 2 with component 21-23, it should be understood that not
It is required that implement all components shown, the implementation that can be substituted is more or less component.
It may include operating system, batch data marking program 50 etc. in memory 11 comprising readable storage medium storing program for executing.Place
Reason device 22 executes the function step corresponding with batch data mask method realized when batch data marking program 50 in memory 11
It is rapid to correspond, to avoid repeating, it is not described in detail one by one herein.Each module is briefly described below.
In the present embodiment, the batch data marking program being stored in memory 21 can be divided into one or
The multiple program modules of person, one or more of program modules are stored in memory 21, and can be by one or more
It is performed to manage device (the present embodiment is processor 22), to complete the present invention.For example, Fig. 3 shows the batch data mark journey
Sequence module diagram, in the embodiment, the batch data marking program 50 can be divided into dimension-reduction treatment module 501, poly-
Class processing module 502, classification choose module 503, Batch labeling module 504.Wherein, the so-called program module of the present invention refers to energy
The series of computation machine program instruction section for enough completing specific function, than program more suitable for describing the cabinet configuration manager
Implementation procedure in the electronic device 2.The concrete function of the program module will specifically be introduced by being described below.
Wherein, dimension-reduction treatment module 501 is used to carry out dimension-reduction treatment to the data set for including multiple images, obtains by low
The data set of dimensional vector composition.Wherein, multiple images can be a color histogram using every image as feature
Vector, for every image, the low-dimensional of available High Dimensional Data Streams indicates vector.High dimensional data is arrived by dimension-reduction treatment dimensionality reduction
Two dimension or three-dimensional data can be used to cluster, and can show the effect of cluster.
Clustering processing module 502 divides the image into different classifications for clustering to the low-dimensional vector of data set.Example
Such as, image is automobile, image be high mountain, image be cat, image be elephant, then pass through clustering algorithm handle
Show the image clustering of same characteristic features together.Such as together the image clustering of automobile, together the image clustering of cat.
Classification chooses module 503 and is used to choose the different classes of data after cluster, and Batch labeling module 504 is to of all categories
Data carry out unified Batch labeling.Such as: some region is all the data set of " cat " in visualization tool, then by this
Region is all chosen, and is labeled as " cat ", then the label of this batch data collection is all " cat ", has reached rapid batch mark
Purpose.
In one alternate embodiment, dimension-reduction treatment module 501 includes by the way of Nonlinear Dimension Reduction by high position data
It is converted into low-dimensional data.
Further, high dimensional data is regarded as the point in higher dimensional space by dimension-reduction treatment module 501, then uses manifold method will
It is mapped in lower dimensional space, keeps its space length, i.e., at a distance of closer/remote point in higher dimensional space, is mapped to low-dimensional sky
Between in it is still relatively close/remote.Specifically, it at a distance of closer point in higher dimensional space, is mapped in lower dimensional space still relatively closely, in height
Point apart from each other, is mapped in lower dimensional space still farther out in dimension space.Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
Wherein, qijIndicate lower dimensional space conditional probability;
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function are as follows:
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution;
Gradient are as follows:
In one alternate embodiment, for characteristics of image do not know classification, 504 use of Batch labeling module number into
Row classification annotation.Such as medical picture, these need professional person to go to identify classification.Number can be used and carry out classification annotation,
Such as " 1,2,3... " or " a, b, c... ".
In one alternate embodiment, further include characteristic extracting module 505, use the artificial observation category again after cluster
It goes mark also or to occupy the time of personnel.And hence it is also possible to further be identified using neural network a certain after cluster
An at least image in classification, since picture has been classified according to certain features using clustering algorithm, then,
In subsequent neural network recognization, the speed of neural network recognization picture can be expedited.It such as include difference in a certain classification
The picture of the cat of appearance then automatically confirms that such then after neural network recognization wherein at least an image (such as 3 images)
Not Wei cat, and all pictures of the category are all labeled as cat automatically using annotation tool.So the feature extraction mould after cluster
Block 505 further identifies at least image in the classification after clustering using neural network, can more quickly mark
Infuse image.
Specifically, after cluster, characteristic extracting module 505 also identifies at least one in a certain classification using neural network
Image is opened, to accelerate to mark speed, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number
According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S300, after the completion of cluster, characteristic extracting module 505 is identified in each classification using neural network model
One image, to obtain the feature in this image;
Step S400 uniformly marks all images in each classification after cluster according to the feature of this image
Note.
In one alternate embodiment, after cluster, characteristic extracting module 505 also identifies every one kind using neural network
At least two images in not, to accelerate to mark speed, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number
According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S500, after the completion of cluster, characteristic extracting module 505 is identified in each classification using neural network model
At least two images, the feature extracted in image further identify next figure if acquired feature does not have common characteristic
Picture, and the common characteristic of the feature in identified image is continued to search, until the feature of the image identified has common characteristic;
Step S600 take the common characteristic as the reference name of the classification, is labeled to entire classification.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
It can be hard disk, multimedia card, SD card, flash card, SMC, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM
(EPROM), any one in portable compact disc read-only memory (CD-ROM), USB storage etc. or several timess
Meaning combination.It include batch data marking program etc., the batch data marking program 50 in the computer readable storage medium
Following operation is realized when being executed by processor 22:
Step S10 carries out dimension-reduction treatment to the data set for including multiple images, obtains the data being made of low-dimensional vector
Collection.Wherein, multiple images can be using a color histogram of every image as feature vector, and every is schemed
The low-dimensional of picture, available High Dimensional Data Streams indicates vector.High dimensional data passes through dimension-reduction treatment dimensionality reduction to two dimension or three-dimensional data
It can be used to cluster, can show the effect of cluster.
Step S30 clusters the low-dimensional vector of data set, divides the image into different classifications.For example, some images
Automobile, image be high mountain, image be cat, image be elephant, then pass through clustering algorithm showing identical spy
The image clustering of sign is together.Such as together the image clustering of automobile, together the image clustering of cat.
Data after cluster are shown by visualization tool (such as display), are chosen different classes of by step S50
Data, and unified Batch labeling is carried out to data of all categories.Such as: some region is all " cat " in visualization tool
Data set, then this region is all chosen, and be labeled as " cat ", then the label of this batch data collection is all " cat ", reach
The purpose of rapid batch mark is arrived.
The specific embodiment of the computer readable storage medium of the present invention and above-mentioned batch data mask method and electricity
The specific embodiment of sub-device 2 is roughly the same, and details are not described herein.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art
For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification,
Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of batch data mask method is applied to electronic device, which is characterized in that
Dimension-reduction treatment is carried out to the data set for including multiple images, obtains the data set being made of low-dimensional vector;
The low-dimensional vector of data set is clustered, different classifications is divided the image into;
The data after cluster are shown by visualization tool, choose same category of data, and to same category of number
According to carrying out unified Batch labeling.
2. batch data mask method according to claim 1, which is characterized in that will be high by the way of Nonlinear Dimension Reduction
Position data are converted into low-dimensional data.
3. batch data mask method according to claim 1, which is characterized in that
Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
qijIndicate lower dimensional space conditional probability,
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution,
Gradient
4. batch data mask method according to claim 1, which is characterized in that class uncertain for characteristics of image
Not, classification annotation is carried out using number.
5. batch data mask method according to claim 1, which is characterized in that
After cluster, at least image in a certain classification is also identified using neural network, to accelerate to mark speed, including
Following steps:
Training dataset is collected, training dataset includes the picture largely marked, as training data;
With training data training neural network model, the recognition capability of neural network model is improved;
After the completion of cluster, an image in each classification is identified using neural network model, to obtain in this image
Feature;
All images in each classification after cluster are uniformly labeled according to the feature of this image.
6. batch data mask method according to claim 1, which is characterized in that
After cluster, at least two images in each classification are also identified using neural network, to accelerate to mark speed, including
Following steps:
Training dataset is collected, training dataset includes the picture largely marked, as training data;
With training data training neural network model, the recognition capability of neural network model is improved;
After the completion of cluster, at least two images in each classification are identified using neural network model, extract the spy in image
Sign, if acquired feature does not have common characteristic, further identifies next image, and continue to search in identified image
The common characteristic of feature, until the feature of the image identified has common characteristic, then it is the mark of the classification with the common characteristic
Title is infused, entire classification is labeled.
7. batch data mask method according to claim 1, which is characterized in that
Using the color histogram of image as feature vector, data set is formed.
8. a kind of electronic device, which is characterized in that the electronic device includes: memory and processor, is stored in the memory
There is batch data marking program, the batch data marking program realizes following steps when being executed by the processor:
Dimension-reduction treatment is carried out to the data set for including multiple images, obtains the data set being made of low-dimensional vector;
The low-dimensional vector of data set is clustered, different classifications is divided the image into;
The data after cluster are shown by visualization tool, choose same category of data, and to same category of number
According to carrying out unified Batch labeling.
9. electronic device according to claim 8, which is characterized in that
After cluster, at least image in a certain classification is also identified using neural network, to accelerate to mark speed, including
Following steps:
Training dataset is collected, training dataset includes the picture largely marked, as training data;
With training data training neural network model, the recognition capability of neural network model is improved;
After the completion of cluster, an image in each classification is identified using neural network model, to obtain in this image
Feature;
All images in each classification after cluster are uniformly labeled according to the feature of this image.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program includes program instruction, when described program instruction is executed by processor, is realized as in claim 1-7
Described in any item batch data mask methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811456459.7A CN109657087A (en) | 2018-11-30 | 2018-11-30 | A kind of batch data mask method, device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811456459.7A CN109657087A (en) | 2018-11-30 | 2018-11-30 | A kind of batch data mask method, device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109657087A true CN109657087A (en) | 2019-04-19 |
Family
ID=66112260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811456459.7A Pending CN109657087A (en) | 2018-11-30 | 2018-11-30 | A kind of batch data mask method, device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109657087A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110795A (en) * | 2019-05-10 | 2019-08-09 | 厦门美图之家科技有限公司 | Image classification method and device |
CN110264443A (en) * | 2019-05-20 | 2019-09-20 | 平安科技(深圳)有限公司 | Eye fundus image lesion mask method, device and medium based on feature visualization |
CN110516093A (en) * | 2019-08-28 | 2019-11-29 | 深圳力维智联技术有限公司 | Picture mask method, device and equipment |
CN110781920A (en) * | 2019-09-24 | 2020-02-11 | 同济大学 | Method for identifying semantic information of cloud components of indoor scenic spots |
CN111639705A (en) * | 2020-05-29 | 2020-09-08 | 江苏云从曦和人工智能有限公司 | Batch picture marking method, system, machine readable medium and equipment |
CN113127668A (en) * | 2019-12-31 | 2021-07-16 | 深圳云天励飞技术有限公司 | Data annotation method and related product |
CN113793306A (en) * | 2021-08-23 | 2021-12-14 | 上海派影医疗科技有限公司 | Breast pathology image identification and detection method and system based on fragment processing |
CN113918747A (en) * | 2021-09-29 | 2022-01-11 | 北京三快在线科技有限公司 | Image data cleaning method, device, equipment and storage medium |
CN114116965A (en) * | 2021-11-08 | 2022-03-01 | 竹间智能科技(上海)有限公司 | Opinion extraction method for comment text and electronic equipment |
CN114220111A (en) * | 2021-12-22 | 2022-03-22 | 深圳市伊登软件有限公司 | Image-text batch identification method and system based on cloud platform |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929894A (en) * | 2011-08-12 | 2013-02-13 | 中国人民解放军总参谋部第五十七研究所 | Online clustering visualization method of text |
CN105701502A (en) * | 2016-01-06 | 2016-06-22 | 福州大学 | Image automatic marking method based on Monte Carlo data balance |
CN107004141A (en) * | 2017-03-03 | 2017-08-01 | 香港应用科技研究院有限公司 | To the efficient mark of large sample group |
CN107622104A (en) * | 2017-09-11 | 2018-01-23 | 中央民族大学 | A kind of character image identification mask method and system |
CN107644235A (en) * | 2017-10-24 | 2018-01-30 | 广西师范大学 | Image automatic annotation method based on semi-supervised learning |
CN107944454A (en) * | 2017-11-08 | 2018-04-20 | 国网电力科学研究院武汉南瑞有限责任公司 | A kind of semanteme marking method based on machine learning for substation |
CN108182443A (en) * | 2016-12-08 | 2018-06-19 | 广东精点数据科技股份有限公司 | A kind of image automatic annotation method and device based on decision tree |
-
2018
- 2018-11-30 CN CN201811456459.7A patent/CN109657087A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929894A (en) * | 2011-08-12 | 2013-02-13 | 中国人民解放军总参谋部第五十七研究所 | Online clustering visualization method of text |
CN105701502A (en) * | 2016-01-06 | 2016-06-22 | 福州大学 | Image automatic marking method based on Monte Carlo data balance |
CN108182443A (en) * | 2016-12-08 | 2018-06-19 | 广东精点数据科技股份有限公司 | A kind of image automatic annotation method and device based on decision tree |
CN107004141A (en) * | 2017-03-03 | 2017-08-01 | 香港应用科技研究院有限公司 | To the efficient mark of large sample group |
CN107622104A (en) * | 2017-09-11 | 2018-01-23 | 中央民族大学 | A kind of character image identification mask method and system |
CN107644235A (en) * | 2017-10-24 | 2018-01-30 | 广西师范大学 | Image automatic annotation method based on semi-supervised learning |
CN107944454A (en) * | 2017-11-08 | 2018-04-20 | 国网电力科学研究院武汉南瑞有限责任公司 | A kind of semanteme marking method based on machine learning for substation |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110795B (en) * | 2019-05-10 | 2021-04-20 | 厦门美图之家科技有限公司 | Image classification method and device |
CN110110795A (en) * | 2019-05-10 | 2019-08-09 | 厦门美图之家科技有限公司 | Image classification method and device |
CN110264443A (en) * | 2019-05-20 | 2019-09-20 | 平安科技(深圳)有限公司 | Eye fundus image lesion mask method, device and medium based on feature visualization |
CN110264443B (en) * | 2019-05-20 | 2024-04-16 | 平安科技(深圳)有限公司 | Fundus image lesion labeling method, device and medium based on feature visualization |
CN110516093A (en) * | 2019-08-28 | 2019-11-29 | 深圳力维智联技术有限公司 | Picture mask method, device and equipment |
CN110781920A (en) * | 2019-09-24 | 2020-02-11 | 同济大学 | Method for identifying semantic information of cloud components of indoor scenic spots |
CN113127668A (en) * | 2019-12-31 | 2021-07-16 | 深圳云天励飞技术有限公司 | Data annotation method and related product |
CN111639705B (en) * | 2020-05-29 | 2021-06-29 | 江苏云从曦和人工智能有限公司 | Batch picture marking method, system, machine readable medium and equipment |
CN111639705A (en) * | 2020-05-29 | 2020-09-08 | 江苏云从曦和人工智能有限公司 | Batch picture marking method, system, machine readable medium and equipment |
CN113793306A (en) * | 2021-08-23 | 2021-12-14 | 上海派影医疗科技有限公司 | Breast pathology image identification and detection method and system based on fragment processing |
CN113918747A (en) * | 2021-09-29 | 2022-01-11 | 北京三快在线科技有限公司 | Image data cleaning method, device, equipment and storage medium |
CN114116965A (en) * | 2021-11-08 | 2022-03-01 | 竹间智能科技(上海)有限公司 | Opinion extraction method for comment text and electronic equipment |
CN114220111A (en) * | 2021-12-22 | 2022-03-22 | 深圳市伊登软件有限公司 | Image-text batch identification method and system based on cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109657087A (en) | A kind of batch data mask method, device and computer readable storage medium | |
Yi et al. | Scene text recognition in mobile applications by character descriptor and structure configuration | |
WO2019218514A1 (en) | Method for extracting webpage target information, device, and storage medium | |
CN112699775B (en) | Certificate identification method, device, equipment and storage medium based on deep learning | |
CN108228757A (en) | Image search method and device, electronic equipment, storage medium, program | |
WO2017088537A1 (en) | Component classification method and apparatus | |
CN112528616B (en) | Service form generation method and device, electronic equipment and computer storage medium | |
CN113626607B (en) | Abnormal work order identification method and device, electronic equipment and readable storage medium | |
CN113221918B (en) | Target detection method, training method and device of target detection model | |
CN111967437A (en) | Text recognition method, device, equipment and storage medium | |
CN114416939A (en) | Intelligent question and answer method, device, equipment and storage medium | |
CN109933502A (en) | Electronic device, the processing method of user operation records and storage medium | |
CN116662839A (en) | Associated big data cluster analysis method and device based on multidimensional intelligent acquisition | |
CN113704474B (en) | Bank outlet equipment operation guide generation method, device, equipment and storage medium | |
CN113435308B (en) | Text multi-label classification method, device, equipment and storage medium | |
CN118115293A (en) | Identity document verification method, device, equipment and storage medium thereof | |
CN113344125A (en) | Long text matching identification method and device, electronic equipment and storage medium | |
CN117390173A (en) | Massive resume screening method for semantic similarity matching | |
CN116738044A (en) | Book recommendation method, device and equipment for realizing college library based on individuation | |
CN114077682B (en) | Intelligent recognition matching processing method and system for image retrieval and storage medium | |
CN116340537A (en) | Character relation extraction method and device, electronic equipment and storage medium | |
CN114943306A (en) | Intention classification method, device, equipment and storage medium | |
CN114996386A (en) | Business role identification method, device, equipment and storage medium | |
CN113723114A (en) | Semantic analysis method, device and equipment based on multi-intent recognition and storage medium | |
CN112580505A (en) | Method and device for identifying opening and closing states of network points, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190419 |