CN109657087A - A kind of batch data mask method, device and computer readable storage medium - Google Patents

A kind of batch data mask method, device and computer readable storage medium Download PDF

Info

Publication number
CN109657087A
CN109657087A CN201811456459.7A CN201811456459A CN109657087A CN 109657087 A CN109657087 A CN 109657087A CN 201811456459 A CN201811456459 A CN 201811456459A CN 109657087 A CN109657087 A CN 109657087A
Authority
CN
China
Prior art keywords
data
image
cluster
neural network
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811456459.7A
Other languages
Chinese (zh)
Inventor
成冠举
高鹏
谢国彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811456459.7A priority Critical patent/CN109657087A/en
Publication of CN109657087A publication Critical patent/CN109657087A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to field of artificial intelligence, providing a kind of batch data mask method, device and storage medium, method includes: to carry out dimension-reduction treatment to the data set for including multiple images, obtains the data set being made of low-dimensional vector;The low-dimensional vector of data set is clustered, different classifications is divided the image into;The data after cluster are shown by visualization tool, choose same category of data, and unified Batch labeling is carried out to same category of data.The data in data set are divided into different classifications by cluster, so as to carry out Batch labeling to the same category of data in data set, reduce the workload of mark.It is versatile by the way of Unsupervised clustering.Also, after cluster by the way of neural network recognization, the feature of the image in same category is further identified, so as to determine the common characteristic of the data in the same category, and then unified Batch labeling can be carried out to same category according to recognition result.

Description

A kind of batch data mask method, device and computer readable storage medium
Technical field
The present invention relates to field of artificial intelligence, specifically, being related to a kind of batch data mask method, device and meter Calculation machine readable storage medium storing program for executing.
Background technique
With the rapid development of multimedia information technology and Internet information technique, new images hundreds of millions of daily are presented On the internet.Compared with text, image can more intuitive, more accurate ground description information, therefore in nowadays information explosion Epoch, image can make user it is more convenient, it is faster, more accurately obtain information needed.When image information is increasingly becoming instantly One of the most important approach propagated for information.Especially in intelligent identification technology, need largely marked picture as Training dataset carrys out training pattern, to improve the recognition capability of model.It however is usually at present logical to the mark of image data Cross artificial observed number evidence, distinguish data category, and by tool one by one classification annotation is carried out to every picture.This method The disadvantage is that can not batch data are labeled, when the amount of data is large annotating efficiency is lower;Many mark needs of work are special Industry personnel carry out classification annotation, cause to mark higher cost.
Summary of the invention
In order to solve the above technical problems, the present invention provides a kind of batch data mask method, is applied to electronic device, to packet Data set containing multiple images carries out dimension-reduction treatment, obtains the data set being made of low-dimensional vector;To the low-dimensional of data set to Amount is clustered, and different classifications is divided the image into;The data after cluster are shown by visualization tool, are chosen same The data of classification, and unified Batch labeling is carried out to same category of data.
Preferably, low-dimensional data is converted by high position data by the way of Nonlinear Dimension Reduction.
Preferably, Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
qijIndicate lower dimensional space conditional probability,
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution,
Gradient
Preferably for the uncertain classification of characteristics of image, classification annotation is carried out using number.
Preferably, after cluster, at least image in a certain classification is identified, using neural network also to accelerate to mark Infuse speed, comprising the following steps: collect training dataset, training dataset includes the picture largely marked, as training Data;With training data training neural network model, the recognition capability of neural network model is improved;After the completion of cluster, utilize Neural network model identifies an image in each classification, to obtain the feature in this image;According to the spy of this image Sign is uniformly labeled all images in each classification after cluster.
Preferably, after cluster, at least two images in each classification are identified, using neural network also to accelerate to mark Infuse speed, comprising the following steps: collect training dataset, training dataset includes the picture largely marked, as training Data;With training data training neural network model, the recognition capability of neural network model is improved;After the completion of cluster, utilize Neural network model identifies at least two images in each classification, the feature in image is extracted, if acquired feature does not have Common characteristic then further identifies next image, and continues to search the common characteristic of the feature in identified image, until institute The feature of the image of identification has common characteristic, then is the reference name of the classification with the common characteristic, carries out to entire classification Mark.
Preferably, data set is formed as feature vector using the color histogram of image.
The present invention also provides a kind of electronic device, which includes: memory and processor, is deposited in the memory Batch data marking program is contained, the batch data marking program realizes following steps when being executed by the processor: to packet Data set containing multiple images carries out dimension-reduction treatment, obtains the data set being made of low-dimensional vector;To the low-dimensional of data set to Amount is clustered, and different classifications is divided the image into;The data after cluster are shown by visualization tool, are chosen same The data of classification, and unified Batch labeling is carried out to same category of data.
Preferably, after cluster, at least image in a certain classification is identified, using neural network also to accelerate to mark Infuse speed, comprising the following steps: collect training dataset, training dataset includes the picture largely marked, as training Data;With training data training neural network model, the recognition capability of neural network model is improved;After the completion of cluster, utilize Neural network model identifies an image in each classification, to obtain the feature in this image;According to the spy of this image Sign is uniformly labeled all images in each classification after cluster.
The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has computer Program, the computer program include that program instruction realizes data as described above when described program instruction is executed by processor Batch labeling method.
Data in data set are divided into different classifications by cluster by the present invention, so as to same in data set The data of classification carry out Batch labeling, reduce the workload of mark.For the feature of uncertain data, can be directly used The mode for numbering mark, does not need professional and goes to identify.It is versatile by the way of Unsupervised clustering.Also, it is clustering Afterwards by the way of neural network recognization, the feature of the image in same category is further identified, so as to determine that this is same The common characteristic of data in classification, and then unified Batch labeling can be carried out to same category according to recognition result.
Detailed description of the invention
By the way that embodiment is described in conjunction with following accompanying drawings, features described above of the invention and technological merit will become More understands and be readily appreciated that.
Fig. 1 is the flow diagram of the batch data mask method of the embodiment of the present invention;
Fig. 2 be one embodiment of the invention cluster after using neural network recognition method carry out batch data mark stream Journey schematic diagram;
Fig. 3 be another embodiment of the present invention cluster after utilize neural network recognition method carry out batch data mark Flow diagram;
Fig. 4 is the hardware structure schematic diagram of the electronic device of the embodiment of the present invention;
Fig. 5 is the module structure drafting of the batch data marking program of the embodiment of the present invention.
Specific embodiment
Batch data mask method, device and computer-readable storage of the present invention described below with reference to the accompanying drawings The embodiment of medium.Those skilled in the art will recognize, without departing from the spirit and scope of the present invention the case where Under, described embodiment can be modified with a variety of different modes or combinations thereof.Therefore, attached drawing and description are in essence On be it is illustrative, be not intended to limit the scope of the claims.In addition, in the present specification, attached drawing is not in scale It draws, and identical appended drawing reference indicates identical part.
Fig. 1 is the flow diagram of batch data mask method provided in an embodiment of the present invention.This method includes following step It is rapid:
Step S10 carries out dimension-reduction treatment to the data set for including multiple images, obtains the data being made of low-dimensional vector Collection.Wherein, multiple images can be using a color histogram of every image as feature vector, and every is schemed The low-dimensional of picture, available High Dimensional Data Streams indicates vector.High dimensional data passes through dimension-reduction treatment dimensionality reduction to two dimension or three-dimensional data It can be used to cluster, can show the effect of cluster.
Step S30 clusters the low-dimensional vector of data set, divides the image into different classifications.For example, some images Automobile, image be high mountain, image be cat, image be elephant, then pass through clustering algorithm showing identical spy The image clustering of sign is together.Such as together the image clustering of automobile, together the image clustering of cat.
Data after cluster are shown by visualization tool (such as display), are chosen different classes of by step S50 Data, and unified Batch labeling is carried out to data of all categories.Such as: some region is all " cat " in visualization tool Data set, then this region is all chosen, and be labeled as " cat ", then the label of this batch data collection is all " cat ", reach The purpose of rapid batch mark is arrived.
In one alternate embodiment, low-dimensional data is converted by high position data by the way of Nonlinear Dimension Reduction.
Further, high dimensional data is regarded as the point in higher dimensional space, then maps that low-dimensional sky with manifold method Between in, keep its space length, i.e., at a distance of closer/remote point in higher dimensional space, be mapped in lower dimensional space still relatively it is close/ Far.Specifically, it at a distance of closer point in higher dimensional space, is mapped in lower dimensional space still relatively closely, in higher dimensional space apart It is farther away, it is mapped in lower dimensional space still farther out.Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
Wherein, qijIndicate lower dimensional space conditional probability;
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function are as follows:
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution;
Gradient are as follows:
In one alternate embodiment, the classification that do not know for characteristics of image carries out classification annotation using number.Such as Medical picture, these need professional person to go to identify classification.Can be used number carry out classification annotation, such as " 1,2,3... " or " A, b, c... " etc..
In one alternate embodiment, it goes mark also using the artificial observation category again after cluster or to occupy personnel Time.And hence it is also possible at least image in a certain classification is further identified using neural network after cluster, Since picture has been classified according to certain features using clustering algorithm, then, in subsequent neural network recognization, It can be expedited the speed of neural network recognization picture.Such as the picture of the cat in a certain classification including different appearance, then in mind After Network Recognition wherein at least an image (such as 3 images), then automatically confirm that the category is cat, and utilize annotation tool Automatically all pictures of the category are all labeled as cat.So cluster is further identified using neural network after cluster An at least image in classification afterwards, can more quickly mark image.
Specifically, after cluster, at least image in a certain classification is identified, using neural network also to accelerate to mark Speed is infused, as shown in Figure 2, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S300 identifies an image in each classification using neural network model after the completion of cluster, to obtain Feature in this image;
Step S400 uniformly marks all images in each classification after cluster according to the feature of this image Note.
In one alternate embodiment, after cluster, at least two in each classification are also identified using neural network Image, to accelerate to mark speed, as shown in Figure 3, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S500 identifies at least two images in each classification using neural network model, mentions after the completion of cluster The feature in image is taken, if acquired feature does not have common characteristic, further identifies next image, and continue to search institute The common characteristic of the feature in image is identified, until the feature of the image identified has common characteristic;
Step S600 take the common characteristic as the reference name of the classification, is labeled to entire classification.
As shown in fig.3, being the hardware structure schematic diagram of the embodiment of electronic device of the present invention.It is described in the present embodiment Electronic device 2 be it is a kind of can according to the instruction for being previously set or store, automatic progress numerical value calculating and/or information processing Equipment.For example, it may be smart phone, tablet computer, laptop, desktop computer, rack-mount server, blade type take It is engaged in device, tower server or Cabinet-type server (including server set composed by independent server or multiple servers Group) etc..As shown in figure 3, the electronic device 2 includes at least, but it is not limited to, depositing for connection can be in communication with each other by system bus Reservoir 21, processor 22, network interface 23.Wherein: the memory 21 includes at least a type of computer-readable storage Medium, the readable storage medium storing program for executing include flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), Random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable are only Read memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..In some embodiments In, the memory 21 can be the internal storage unit of the electronic device 2, such as the hard disk or memory of the electronic device 2. In further embodiments, the memory 21 is also possible to the External memory equipment of the electronic device 2, such as electronics dress Set the plug-in type hard disk being equipped on 2, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Certainly, the memory 21 can also both include the electronic device 2 Internal storage unit also include its External memory equipment.In the present embodiment, the memory 21 is installed on commonly used in storage Operating system and types of applications software, such as the batch data marking program code of the electronic device 2 etc..In addition, institute Stating memory 21 can be also used for temporarily storing the Various types of data that has exported or will export.
The processor 22 can be in some embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 22 is commonly used in the control electricity The overall operation of sub-device 2, such as execute control relevant to the electronic device 2 progress data interaction or communication and processing Deng.In the present embodiment, the processor 22 is for running the program code stored in the memory 21 or processing data, example Batch data marking program as described in running.
The network interface 23 may include radio network interface or wired network interface, which is commonly used in Communication connection is established between the electronic device 2 and other electronic devices.For example, the network interface 23 is used to incite somebody to action by network The electronic device 2 is connected with push platform, and data transmission channel is established between the electronic device 2 and push platform and is led to Letter connection etc..The network can be intranet (Intranet), internet (Internet), global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband CodeDivision Multiple Access, WCDMA), 4G network, 5G network, bluetooth (Bluetooth), Wi-Fi etc. is wireless Or cable network.
Optionally, which can also include user interface, and user interface may include input unit such as keyboard (Keyboard), speech input device such as microphone (microphone) etc. has the equipment of speech identifying function, voice defeated Device such as sound equipment, earphone etc. out.
Optionally, user interface can also include standard wireline interface and wireless interface.
Optionally, which can also include display, and display is referred to as display screen or display unit. It can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and Organic Light Emitting Diode in some embodiments (Organic Light-Emitting Diode, OLED) display etc..Display is used to be shown in handle in electronic device 2 Information and for showing visual user interface.
It should be pointed out that Fig. 3 illustrates only the electronic device 2 with component 21-23, it should be understood that not It is required that implement all components shown, the implementation that can be substituted is more or less component.
It may include operating system, batch data marking program 50 etc. in memory 11 comprising readable storage medium storing program for executing.Place Reason device 22 executes the function step corresponding with batch data mask method realized when batch data marking program 50 in memory 11 It is rapid to correspond, to avoid repeating, it is not described in detail one by one herein.Each module is briefly described below.
In the present embodiment, the batch data marking program being stored in memory 21 can be divided into one or The multiple program modules of person, one or more of program modules are stored in memory 21, and can be by one or more It is performed to manage device (the present embodiment is processor 22), to complete the present invention.For example, Fig. 3 shows the batch data mark journey Sequence module diagram, in the embodiment, the batch data marking program 50 can be divided into dimension-reduction treatment module 501, poly- Class processing module 502, classification choose module 503, Batch labeling module 504.Wherein, the so-called program module of the present invention refers to energy The series of computation machine program instruction section for enough completing specific function, than program more suitable for describing the cabinet configuration manager Implementation procedure in the electronic device 2.The concrete function of the program module will specifically be introduced by being described below.
Wherein, dimension-reduction treatment module 501 is used to carry out dimension-reduction treatment to the data set for including multiple images, obtains by low The data set of dimensional vector composition.Wherein, multiple images can be a color histogram using every image as feature Vector, for every image, the low-dimensional of available High Dimensional Data Streams indicates vector.High dimensional data is arrived by dimension-reduction treatment dimensionality reduction Two dimension or three-dimensional data can be used to cluster, and can show the effect of cluster.
Clustering processing module 502 divides the image into different classifications for clustering to the low-dimensional vector of data set.Example Such as, image is automobile, image be high mountain, image be cat, image be elephant, then pass through clustering algorithm handle Show the image clustering of same characteristic features together.Such as together the image clustering of automobile, together the image clustering of cat.
Classification chooses module 503 and is used to choose the different classes of data after cluster, and Batch labeling module 504 is to of all categories Data carry out unified Batch labeling.Such as: some region is all the data set of " cat " in visualization tool, then by this Region is all chosen, and is labeled as " cat ", then the label of this batch data collection is all " cat ", has reached rapid batch mark Purpose.
In one alternate embodiment, dimension-reduction treatment module 501 includes by the way of Nonlinear Dimension Reduction by high position data It is converted into low-dimensional data.
Further, high dimensional data is regarded as the point in higher dimensional space by dimension-reduction treatment module 501, then uses manifold method will It is mapped in lower dimensional space, keeps its space length, i.e., at a distance of closer/remote point in higher dimensional space, is mapped to low-dimensional sky Between in it is still relatively close/remote.Specifically, it at a distance of closer point in higher dimensional space, is mapped in lower dimensional space still relatively closely, in height Point apart from each other, is mapped in lower dimensional space still farther out in dimension space.Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
Wherein, qijIndicate lower dimensional space conditional probability;
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function are as follows:
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution;
Gradient are as follows:
In one alternate embodiment, for characteristics of image do not know classification, 504 use of Batch labeling module number into Row classification annotation.Such as medical picture, these need professional person to go to identify classification.Number can be used and carry out classification annotation, Such as " 1,2,3... " or " a, b, c... ".
In one alternate embodiment, further include characteristic extracting module 505, use the artificial observation category again after cluster It goes mark also or to occupy the time of personnel.And hence it is also possible to further be identified using neural network a certain after cluster An at least image in classification, since picture has been classified according to certain features using clustering algorithm, then, In subsequent neural network recognization, the speed of neural network recognization picture can be expedited.It such as include difference in a certain classification The picture of the cat of appearance then automatically confirms that such then after neural network recognization wherein at least an image (such as 3 images) Not Wei cat, and all pictures of the category are all labeled as cat automatically using annotation tool.So the feature extraction mould after cluster Block 505 further identifies at least image in the classification after clustering using neural network, can more quickly mark Infuse image.
Specifically, after cluster, characteristic extracting module 505 also identifies at least one in a certain classification using neural network Image is opened, to accelerate to mark speed, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S300, after the completion of cluster, characteristic extracting module 505 is identified in each classification using neural network model One image, to obtain the feature in this image;
Step S400 uniformly marks all images in each classification after cluster according to the feature of this image Note.
In one alternate embodiment, after cluster, characteristic extracting module 505 also identifies every one kind using neural network At least two images in not, to accelerate to mark speed, comprising the following steps:
Step S100 collects training dataset, and training dataset includes the picture largely marked, as training number According to;
Step S200 improves the recognition capability of neural network model with training data training neural network model;
Step S500, after the completion of cluster, characteristic extracting module 505 is identified in each classification using neural network model At least two images, the feature extracted in image further identify next figure if acquired feature does not have common characteristic Picture, and the common characteristic of the feature in identified image is continued to search, until the feature of the image identified has common characteristic;
Step S600 take the common characteristic as the reference name of the classification, is labeled to entire classification.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium It can be hard disk, multimedia card, SD card, flash card, SMC, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), any one in portable compact disc read-only memory (CD-ROM), USB storage etc. or several timess Meaning combination.It include batch data marking program etc., the batch data marking program 50 in the computer readable storage medium Following operation is realized when being executed by processor 22:
Step S10 carries out dimension-reduction treatment to the data set for including multiple images, obtains the data being made of low-dimensional vector Collection.Wherein, multiple images can be using a color histogram of every image as feature vector, and every is schemed The low-dimensional of picture, available High Dimensional Data Streams indicates vector.High dimensional data passes through dimension-reduction treatment dimensionality reduction to two dimension or three-dimensional data It can be used to cluster, can show the effect of cluster.
Step S30 clusters the low-dimensional vector of data set, divides the image into different classifications.For example, some images Automobile, image be high mountain, image be cat, image be elephant, then pass through clustering algorithm showing identical spy The image clustering of sign is together.Such as together the image clustering of automobile, together the image clustering of cat.
Data after cluster are shown by visualization tool (such as display), are chosen different classes of by step S50 Data, and unified Batch labeling is carried out to data of all categories.Such as: some region is all " cat " in visualization tool Data set, then this region is all chosen, and be labeled as " cat ", then the label of this batch data collection is all " cat ", reach The purpose of rapid batch mark is arrived.
The specific embodiment of the computer readable storage medium of the present invention and above-mentioned batch data mask method and electricity The specific embodiment of sub-device 2 is roughly the same, and details are not described herein.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of batch data mask method is applied to electronic device, which is characterized in that
Dimension-reduction treatment is carried out to the data set for including multiple images, obtains the data set being made of low-dimensional vector;
The low-dimensional vector of data set is clustered, different classifications is divided the image into;
The data after cluster are shown by visualization tool, choose same category of data, and to same category of number According to carrying out unified Batch labeling.
2. batch data mask method according to claim 1, which is characterized in that will be high by the way of Nonlinear Dimension Reduction Position data are converted into low-dimensional data.
3. batch data mask method according to claim 1, which is characterized in that
Nonlinear Dimension Reduction uses following formula:
Higher dimensional space indicates are as follows:
Wherein, pjiIndicate higher dimensional space conditional probability;
xiAnd xjIndicate the point of higher dimensional space;
σiIt indicates with xiCentered on Gaussian Profile variance;
Lower dimensional space indicates are as follows:
qijIndicate lower dimensional space conditional probability,
yiAnd yjIndicate the point that higher dimensional space is mapped in lower dimensional space;
Cost function
Wherein, KL divergence indicates the error between the P and Q of a point;
P indicates higher dimensional space conditional probability distribution, and Q indicates lower dimensional space conditional probability distribution,
Gradient
4. batch data mask method according to claim 1, which is characterized in that class uncertain for characteristics of image Not, classification annotation is carried out using number.
5. batch data mask method according to claim 1, which is characterized in that
After cluster, at least image in a certain classification is also identified using neural network, to accelerate to mark speed, including Following steps:
Training dataset is collected, training dataset includes the picture largely marked, as training data;
With training data training neural network model, the recognition capability of neural network model is improved;
After the completion of cluster, an image in each classification is identified using neural network model, to obtain in this image Feature;
All images in each classification after cluster are uniformly labeled according to the feature of this image.
6. batch data mask method according to claim 1, which is characterized in that
After cluster, at least two images in each classification are also identified using neural network, to accelerate to mark speed, including Following steps:
Training dataset is collected, training dataset includes the picture largely marked, as training data;
With training data training neural network model, the recognition capability of neural network model is improved;
After the completion of cluster, at least two images in each classification are identified using neural network model, extract the spy in image Sign, if acquired feature does not have common characteristic, further identifies next image, and continue to search in identified image The common characteristic of feature, until the feature of the image identified has common characteristic, then it is the mark of the classification with the common characteristic Title is infused, entire classification is labeled.
7. batch data mask method according to claim 1, which is characterized in that
Using the color histogram of image as feature vector, data set is formed.
8. a kind of electronic device, which is characterized in that the electronic device includes: memory and processor, is stored in the memory There is batch data marking program, the batch data marking program realizes following steps when being executed by the processor:
Dimension-reduction treatment is carried out to the data set for including multiple images, obtains the data set being made of low-dimensional vector;
The low-dimensional vector of data set is clustered, different classifications is divided the image into;
The data after cluster are shown by visualization tool, choose same category of data, and to same category of number According to carrying out unified Batch labeling.
9. electronic device according to claim 8, which is characterized in that
After cluster, at least image in a certain classification is also identified using neural network, to accelerate to mark speed, including Following steps:
Training dataset is collected, training dataset includes the picture largely marked, as training data;
With training data training neural network model, the recognition capability of neural network model is improved;
After the completion of cluster, an image in each classification is identified using neural network model, to obtain in this image Feature;
All images in each classification after cluster are uniformly labeled according to the feature of this image.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program includes program instruction, when described program instruction is executed by processor, is realized as in claim 1-7 Described in any item batch data mask methods.
CN201811456459.7A 2018-11-30 2018-11-30 A kind of batch data mask method, device and computer readable storage medium Pending CN109657087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811456459.7A CN109657087A (en) 2018-11-30 2018-11-30 A kind of batch data mask method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811456459.7A CN109657087A (en) 2018-11-30 2018-11-30 A kind of batch data mask method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109657087A true CN109657087A (en) 2019-04-19

Family

ID=66112260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811456459.7A Pending CN109657087A (en) 2018-11-30 2018-11-30 A kind of batch data mask method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109657087A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110795A (en) * 2019-05-10 2019-08-09 厦门美图之家科技有限公司 Image classification method and device
CN110264443A (en) * 2019-05-20 2019-09-20 平安科技(深圳)有限公司 Eye fundus image lesion mask method, device and medium based on feature visualization
CN110516093A (en) * 2019-08-28 2019-11-29 深圳力维智联技术有限公司 Picture mask method, device and equipment
CN110781920A (en) * 2019-09-24 2020-02-11 同济大学 Method for identifying semantic information of cloud components of indoor scenic spots
CN111639705A (en) * 2020-05-29 2020-09-08 江苏云从曦和人工智能有限公司 Batch picture marking method, system, machine readable medium and equipment
CN113127668A (en) * 2019-12-31 2021-07-16 深圳云天励飞技术有限公司 Data annotation method and related product
CN113793306A (en) * 2021-08-23 2021-12-14 上海派影医疗科技有限公司 Breast pathology image identification and detection method and system based on fragment processing
CN113918747A (en) * 2021-09-29 2022-01-11 北京三快在线科技有限公司 Image data cleaning method, device, equipment and storage medium
CN114116965A (en) * 2021-11-08 2022-03-01 竹间智能科技(上海)有限公司 Opinion extraction method for comment text and electronic equipment
CN114220111A (en) * 2021-12-22 2022-03-22 深圳市伊登软件有限公司 Image-text batch identification method and system based on cloud platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929894A (en) * 2011-08-12 2013-02-13 中国人民解放军总参谋部第五十七研究所 Online clustering visualization method of text
CN105701502A (en) * 2016-01-06 2016-06-22 福州大学 Image automatic marking method based on Monte Carlo data balance
CN107004141A (en) * 2017-03-03 2017-08-01 香港应用科技研究院有限公司 To the efficient mark of large sample group
CN107622104A (en) * 2017-09-11 2018-01-23 中央民族大学 A kind of character image identification mask method and system
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning
CN107944454A (en) * 2017-11-08 2018-04-20 国网电力科学研究院武汉南瑞有限责任公司 A kind of semanteme marking method based on machine learning for substation
CN108182443A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of image automatic annotation method and device based on decision tree

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929894A (en) * 2011-08-12 2013-02-13 中国人民解放军总参谋部第五十七研究所 Online clustering visualization method of text
CN105701502A (en) * 2016-01-06 2016-06-22 福州大学 Image automatic marking method based on Monte Carlo data balance
CN108182443A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of image automatic annotation method and device based on decision tree
CN107004141A (en) * 2017-03-03 2017-08-01 香港应用科技研究院有限公司 To the efficient mark of large sample group
CN107622104A (en) * 2017-09-11 2018-01-23 中央民族大学 A kind of character image identification mask method and system
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning
CN107944454A (en) * 2017-11-08 2018-04-20 国网电力科学研究院武汉南瑞有限责任公司 A kind of semanteme marking method based on machine learning for substation

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110795B (en) * 2019-05-10 2021-04-20 厦门美图之家科技有限公司 Image classification method and device
CN110110795A (en) * 2019-05-10 2019-08-09 厦门美图之家科技有限公司 Image classification method and device
CN110264443A (en) * 2019-05-20 2019-09-20 平安科技(深圳)有限公司 Eye fundus image lesion mask method, device and medium based on feature visualization
CN110264443B (en) * 2019-05-20 2024-04-16 平安科技(深圳)有限公司 Fundus image lesion labeling method, device and medium based on feature visualization
CN110516093A (en) * 2019-08-28 2019-11-29 深圳力维智联技术有限公司 Picture mask method, device and equipment
CN110781920A (en) * 2019-09-24 2020-02-11 同济大学 Method for identifying semantic information of cloud components of indoor scenic spots
CN113127668A (en) * 2019-12-31 2021-07-16 深圳云天励飞技术有限公司 Data annotation method and related product
CN111639705B (en) * 2020-05-29 2021-06-29 江苏云从曦和人工智能有限公司 Batch picture marking method, system, machine readable medium and equipment
CN111639705A (en) * 2020-05-29 2020-09-08 江苏云从曦和人工智能有限公司 Batch picture marking method, system, machine readable medium and equipment
CN113793306A (en) * 2021-08-23 2021-12-14 上海派影医疗科技有限公司 Breast pathology image identification and detection method and system based on fragment processing
CN113918747A (en) * 2021-09-29 2022-01-11 北京三快在线科技有限公司 Image data cleaning method, device, equipment and storage medium
CN114116965A (en) * 2021-11-08 2022-03-01 竹间智能科技(上海)有限公司 Opinion extraction method for comment text and electronic equipment
CN114220111A (en) * 2021-12-22 2022-03-22 深圳市伊登软件有限公司 Image-text batch identification method and system based on cloud platform

Similar Documents

Publication Publication Date Title
CN109657087A (en) A kind of batch data mask method, device and computer readable storage medium
Yi et al. Scene text recognition in mobile applications by character descriptor and structure configuration
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
CN112699775B (en) Certificate identification method, device, equipment and storage medium based on deep learning
CN108228757A (en) Image search method and device, electronic equipment, storage medium, program
WO2017088537A1 (en) Component classification method and apparatus
CN112528616B (en) Service form generation method and device, electronic equipment and computer storage medium
CN113626607B (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN113221918B (en) Target detection method, training method and device of target detection model
CN111967437A (en) Text recognition method, device, equipment and storage medium
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
CN109933502A (en) Electronic device, the processing method of user operation records and storage medium
CN116662839A (en) Associated big data cluster analysis method and device based on multidimensional intelligent acquisition
CN113704474B (en) Bank outlet equipment operation guide generation method, device, equipment and storage medium
CN113435308B (en) Text multi-label classification method, device, equipment and storage medium
CN118115293A (en) Identity document verification method, device, equipment and storage medium thereof
CN113344125A (en) Long text matching identification method and device, electronic equipment and storage medium
CN117390173A (en) Massive resume screening method for semantic similarity matching
CN116738044A (en) Book recommendation method, device and equipment for realizing college library based on individuation
CN114077682B (en) Intelligent recognition matching processing method and system for image retrieval and storage medium
CN116340537A (en) Character relation extraction method and device, electronic equipment and storage medium
CN114943306A (en) Intention classification method, device, equipment and storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN113723114A (en) Semantic analysis method, device and equipment based on multi-intent recognition and storage medium
CN112580505A (en) Method and device for identifying opening and closing states of network points, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190419