CN110490250A - A kind of acquisition methods and device of artificial intelligence training set - Google Patents

A kind of acquisition methods and device of artificial intelligence training set Download PDF

Info

Publication number
CN110490250A
CN110490250A CN201910757684.2A CN201910757684A CN110490250A CN 110490250 A CN110490250 A CN 110490250A CN 201910757684 A CN201910757684 A CN 201910757684A CN 110490250 A CN110490250 A CN 110490250A
Authority
CN
China
Prior art keywords
image
sample image
cryptographic hash
training set
artificial intelligence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910757684.2A
Other languages
Chinese (zh)
Inventor
洪旭东
唐诗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN201910757684.2A priority Critical patent/CN110490250A/en
Publication of CN110490250A publication Critical patent/CN110490250A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the present application provides the acquisition methods and device of a kind of artificial intelligence training set, is related to field of artificial intelligence, and the acquisition methods of the artificial intelligent training collection include: acquisition sample image;It is to obtain the cryptographic Hash of sample image according to being calculated with sample image;With preset database and cryptographic Hash be according to carrying out coincidence judgement, obtain for indicate in database whether include storing data corresponding with cryptographic Hash judgement result;When determining result not include storing data corresponding with cryptographic Hash in database, sample image is added to default training set, obtains artificial intelligence training set.Implement this embodiment, can be avoided artificial participation, thus save the cost, reduction resource loss.

Description

A kind of acquisition methods and device of artificial intelligence training set
Technical field
This application involves field of artificial intelligence, in particular to a kind of acquisition methods of artificial intelligence training set And device.
Background technique
With the continuous development of society, artificial intelligence technology is also with continuous development, wherein the image based on artificial intelligence Identification technology is also to be continuously updated iteration.However, it has been found in practice that the prerequisite of image recognition is chosen to suitable Training set, it can be seen that, the selection of training set the performance of image recognition is influenced it is very big, and in practice, current training Collection screening technique is based on artificial observation mostly, and therefore, high cost and high resource loss become the problem that training set obtains One of.
Summary of the invention
The acquisition methods and device for being designed to provide a kind of artificial intelligence training set of the embodiment of the present application, can be avoided Artificial participation, thus save the cost, reduction resource loss.
The embodiment of the present application provides a kind of acquisition methods of artificial intelligence training set, which comprises
Obtain sample image;
It is to obtain the cryptographic Hash of the sample image according to being calculated with the sample image;
It is to obtain according to coincidence judgement is carried out for indicating in the database with preset database and the cryptographic Hash Whether include storing data corresponding with the cryptographic Hash judgement result;
It, will be described when the judgement result be in the database does not include storing data corresponding with the cryptographic Hash Sample image is added to default training set, obtains artificial intelligence training set.
During above-mentioned realization, the acquisition methods of the artificial intelligent training collection can preferentially obtain the sample that sample set includes This image, wherein above-mentioned sample set can be video set and be also possible to image set, no matter but what the classification of sample set is, Sample image can extract from sample set obtain always, meanwhile, above-mentioned sample image can be several;In sample image It is acquired later, carries out the extraction of digital information to sample image according to preset algorithm, obtained according further to extraction Digital information carry out cryptographic Hash calculating and generation, obtain cryptographic Hash corresponding with sample image;When each sample image It is corresponding that there are after cryptographic Hash, it can be determined that in preset database whether include the sample image newly got cryptographic Hash, If in database not including above-mentioned cryptographic Hash, it does not include figure corresponding with above-mentioned cryptographic Hash that judging result, which is in database, Picture, and then above-mentioned cryptographic Hash picture corresponding with its is stored into database so that artificial intelligence training set be present in it is above-mentioned Database in;Wherein, above-mentioned database includes great amount of samples image and a large amount of corresponding cryptographic Hash.As it can be seen that implementing this Embodiment, the acquisition and storage that sample image can be carried out by computer equipment judge, artificial so as to avoid It participates in, save the cost, the loss for reducing resource;Meanwhile being judged by the introducing of cryptographic Hash, this method can be improved to sample The acquisition precision of image, and the operation efficiency of computer is improved to a certain extent, because the judgement of cryptographic Hash is more traditional Image judgement it is more simple;In addition, this method can also increase incessantly artificial intelligence training set packet in the process used Therefore the content included is whether used or is reused, this method universality also with higher for the first time.
Further, the step of acquisition sample image includes:
Obtain initial set corresponding with default category;
Image zooming-out is carried out to the initial set to handle to obtain sample image.
During above-mentioned realization, the acquisition of sample image is based on initial set, and the initial set is then in big data All kinds of information aggregates with image information, wherein the initial set may include image set, video set etc.;Above-mentioned On the basis of, initial set has a variety of categories, and the initial set of every kind of category is that there is some difference, and the step defines Initial set is fixed category, this allows for this method when in use and can be adapted for any product in a variety of categories Class, so that the extraction of the sample image under the category is more targeted, after artificial intelligence training set is got It can have higher specific aim and accuracy.
Further, described to be calculated with the sample image for foundation, obtain the cryptographic Hash of the sample image Step includes:
Obtain the image data of the sample image;
It is to obtain the cryptographic Hash of the sample image according to being calculated with described image data.
During above-mentioned realization, this method can carry out the acquisition of image data according to sample image, realize image Digitization, consequently facilitating computer equipment is calculated accordingly, and then improves simplicity when this method is used, and On the basis of artificial not replaceable data processing, this method is had higher efficiency;On the other hand, this method is with image data To obtain the corresponding cryptographic Hash of sample image, it is seen then that this method not only only used during use according to being calculated Image data has also carried out further processing to image data, to realize the secondary treatment of sample image, obtains and tests The corresponding mark of this image, i.e., corresponding cryptographic Hash.It can be seen that this method can limit the acquisition process of cryptographic Hash, thus Improve the acquisition modes and mapping mode of cryptographic Hash, wherein this kind of acquisition modes can be improved the acquisition efficiency of cryptographic Hash and obtain Take precision;Meanwhile this method can also be by way of secondary calculating to the mark for getting sample image, it is seen then that this kind of side Method can be improved above-mentioned mark, i.e. the acquisition precision of cryptographic Hash, additionally it is possible to guarantee the using effect of cryptographic Hash after acquisition.
Further, described to be calculated with described image data for foundation, obtain the cryptographic Hash of the sample image Step includes:
Resolution ratio scaling processing and gray proces are carried out to described image data, obtain preprocessed data;
It is to obtain average gray according to being calculated with the preprocessed data;
It is to obtain comparing knot according to each subdata traversed in the preprocessed data with the average gray Fruit;
It is to obtain the cryptographic Hash of the sample image according to being calculated with the comparison result.
During above-mentioned realization, this method can further limit the preparation of cryptographic Hash acquisition, i.e., to picture number According to the resolution ratio scaling processing and gray proces of progress, two kinds of above-mentioned processing can provide a kind of more smart for the acquisition of cryptographic Hash True acquisition space, therefore can be avoiding cryptographic Hash from improving the precision of cryptographic Hash there are in the case where excessive redundancy;In addition, The process intermediate-resolution scaling processing can also carry out different processing from gray proces according to above-mentioned category, it can be seen that, There are the changing methods of various reasonable for this method, and come under among the description of this method.Wherein, image image is resolved rate contracting Putting obtain after processing and gray proces is preprocessed data rather than cryptographic Hash, it can be seen that, the process is also only to data The process of precision processing is carried out, after this, the acquisition of specific average gray and the use of traversal comparison method can be with The composition subvalue of cryptographic Hash is obtained, is carrying out corresponding permutation and combination according to multiple composition subvalues or further calculate all can be with Obtain fixed cryptographic Hash.It can be seen that the acquisition of the cryptographic Hash is based on fixed data processing method, and at this kind of data Reason mode specific aim with higher, therefore, the restriction of this kind of data processing method can be improved the acquisition precision of cryptographic Hash, and Cryptographic Hash can be made to can be applied to a kind of preset relatively suitable use space, to improve the effective of artificial intelligence training set Property.
Further, described that coincidence judgement is carried out for foundation with preset database and the cryptographic Hash, it obtains for table After showing the judgement result for whether including the steps that storing data corresponding with the cryptographic Hash in the database, the method Further include:
The judgement result be the database in include storing data corresponding with the cryptographic Hash when, delete and institute State the corresponding sample image of cryptographic Hash.
During above-mentioned realization, this method can carry out screening to duplicate sample image and extract and delete, and among these, The basis of everything is the cryptographic Hash of sample image, it can be seen that, by the use to cryptographic Hash, can also to sample image into Row filters out, to guarantee the degree of purity of artificial intelligent training collection, improves the validity of artificial intelligence training set when in use.
Further, the data structure of the database is red black tree.
During above-mentioned realization, when the data structure of database is red black tree, the lookup of cryptographic Hash can be made to grasp Work can be realized under faster search speed, therefore, artificial intelligence training set can be improved using the data structure of red black tree Acquisition efficiency.
Further, it is described the judgement result be the database in do not include storage corresponding with the cryptographic Hash When data, the step of being added to preset training set, obtain artificial intelligence training set the sample image, includes:
The judgement result be the database in do not include storing data corresponding with the cryptographic Hash when, with described Sample image and preset pure color determine standard for according to carry out pure color determine, obtain for indicate the sample image whether be The definitive result of solid-color image;
When it is not solid-color image that the definitive result, which is the sample data, the sample image is added to preset Training set obtains artificial intelligence training set.
During above-mentioned realization, this method can in the database there is no when the corresponding cryptographic Hash of sample image, into One step carries out pure color judgement to sample image, it is seen then that implement this embodiment, it can be obviously problematic to pure color or color Image carries out filtering out processing, to obtain ninsolid color image, and the ninsolid color image is stored, and obtains artificial intelligence training Collection.It can be seen that the acquisition of artificial intelligence training set is to compare to determine two committed steps with pure color based on cryptographic Hash, therefore The acquisition precision of artificial intelligence training set will be largely increased.
Further, described to determine that standard to determine according to progress pure color, obtains with the sample image and preset pure color To for indicating that the step of whether sample image is the definitive result of solid-color image includes:
Obtain multiple block images of the sample image;
It determines that standard carries out pure color to the multiple block image and determines with the pure color, obtains the sub- result of multiple determinations;
Determine that standard is true according to the pure color for carrying out the sample image with the sub- result of the multiple determination and the pure color It is fixed, obtain for indicate the sample image whether be solid-color image definitive result.
During above-mentioned realization, this method can limit pure color judgement method as the method for the multiple determination of piecemeal, In, the judgement of the multiple solid-color image on basis may be implemented in the use of this method, so that the definitive result of solid-color image is more With reliability, to improve the use reliability of artificial intelligence training set.
Further, the pure color determines that standard includes pure color standard section, described to determine standard to institute with the pure color The step of stating multiple block images and carry out pure colors and determine, obtaining multiple determinations sub- result include:
It obtains and the one-to-one color mean value of the multiple block image;
It determines whether the color mean value is subordinated to pure color standard section one by one, obtains the sub- result of multiple determinations.
During above-mentioned realization, the basis that pure color determines is the color mean value of block image, and getting, the color is equal After value, determine in standard that pure color standard section carries out value judgement to color mean value referring to pure color, when color mean value meet it is pure When colour standard section, determine block image for pure color block image stator result really, it is seen then that determine that sub- result is joined using this kind The precision of definitive result can be improved with judgement, meanwhile, when this kind determines the quantity increase of sub- result, what pure color determined Precision will become higher.
Further, described to determine that standard to determine according to progress pure color, obtains with the sample image and preset pure color To after for indicating the step of whether sample image is the definitive result of solid-color image, the method also includes:
When it is solid-color image that the definitive result, which is the sample data, sample graph corresponding with the cryptographic Hash is deleted Picture.
During above-mentioned realization, this method can also filter out solid-color image during the determination of solid-color image, thus It ensure that the degree of purity of sample image, and then improve the validity of sample image.
The embodiment of the present application second aspect provides a kind of acquisition device of artificial intelligence training set, the artificial intelligence instruction Practicing the acquisition device collected includes:
Acquiring unit, for obtaining sample image;
Computing unit, for being to obtain the cryptographic Hash of the sample image according to being calculated with the sample image;
Judging unit, for being to obtain according to coincidence judgement is carried out for table with preset database and the cryptographic Hash Show in the database whether include storing data corresponding with the cryptographic Hash judgement result;
Adding unit does not include storage corresponding with the cryptographic Hash for being in the database in the judgement result When data, the sample image is added to default training set, obtains artificial intelligence training set.
During above-mentioned realization, the acquisition device of the artificial intelligent training collection can complete sample graph by acquiring unit The acquisition of picture, then processing calculating is carried out to sample image by computing unit, the corresponding cryptographic Hash of sample image is obtained, is then led to It crosses judging unit to judge the repeatability of cryptographic Hash, to guarantee to add the Hash in the unduplicated situation of cryptographic Hash It is worth corresponding sample image to default training set, to guarantee that the content of artificial intelligent training collection does not repeat.Wherein, above-mentioned Hash Value does not repeat to guarantee that sample image is not repeated with the image stored in database.It can be seen that being instructed using the artificial intelligence The acquisition device for practicing collection can be executed independently, the acquisition of artificial intelligence training set be completed, to avoid artificial participation, Jin Ershi Existing save the cost, the effect for reducing resource loss.
Further, the adding unit includes:
It determines subelement, does not include deposit corresponding with the cryptographic Hash for being in the database in the judgement result When storing up data, determine that standard to determine according to progress pure color, is obtained for indicating with the sample image and preset pure color State sample image whether be solid-color image definitive result;
Subelement is added, for when it is not solid-color image that the definitive result, which is the sample data, by the sample Image is added to preset training set, obtains artificial intelligence training set.
During above-mentioned realization, adding unit also as it include really stator unit and with pure color determine function Can, and the function that the pure color determines can carry out pure color to sample image and determine, thus realize the secondary determination of sample image, into And it ensure that the acquisition precision and validity of the artificial intelligence training set in future usage of artificial intelligence training set.
The embodiment of the present application third aspect provides a kind of electronic equipment, including memory and processor, the storage Device is for storing computer program, and the processor runs the computer program so that the electronic equipment is executed according to this Shen Please artificial intelligence training set described in any one of embodiment first aspect acquisition methods.
The embodiment of the present application fourth aspect provides a kind of computer readable storage medium, is stored with computer program and refers to It enables, when the computer program instructions are read and run by a processor, executes any one of the embodiment of the present application first aspect The acquisition methods of the artificial intelligence training set.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application will make below to required in the embodiment of the present application Attached drawing is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore should not be seen Work is the restriction to range, for those of ordinary skill in the art, without creative efforts, can be with Other relevant attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of the acquisition methods of artificial intelligence training set provided by the embodiments of the present application;
Fig. 2 is the flow diagram of the acquisition methods of another artificial intelligence training set provided by the embodiments of the present application;
Fig. 3 is a kind of structural schematic diagram of the acquisition device of artificial intelligence training set provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of the acquisition device of another artificial intelligence training set provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application is described.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile the application's In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
Embodiment 1
Fig. 1 is please referred to, Fig. 1 shows for a kind of process for the acquisition methods that the embodiment of the present application provides artificial intelligence training set It is intended to.Training set before the acquisition methods of this kind of artificial intelligence training set can be applied to any artificial intelligent training obtained Journey, specifically, there are little bit differents for the training set and the training set of artificial intelligence training pattern training process, because, artificial intelligence Training set described in energy training pattern training process should be a part of this kind of artificial intelligence training set.Wherein, the artificial intelligence Can the acquisition methods of training set include:
S101, sample image is obtained.
In the present embodiment, what sample image referred to is a part in big data, for content included by sample image It is not limited in any way in the present embodiment.
In the present embodiment, the acquisition source of sample image is not limited in any way equally, it can be seen that, sample image can be with It is one of by all means or accessed by one of various means.For example, the acquisition of sample image Source can be the video data for drawing stream, and be also possible to the image data of storage, to not being limited in any way in this present embodiment.
In the present embodiment, means the present embodiment of acquisition is not also limited in any way.
As an alternative embodiment, the step of obtaining sample image may include:
Obtain several initial pictures;
Duplicate removal processing is carried out to several initial pictures, obtains multiple stand-by images;
Sample image is determined in multiple stand-by images.
Implement this embodiment, primary rough duplicate checking can be carried out in the acquisition process of sample image, to drop The workload of low follow-up work, while this kind of occupied resource of rough duplicate checking is much smaller than the occupied resource of follow-up work, because This, this method also has during use solves resource, the effect improved efficiency.In addition, this method can also be by sample Image zooming-out is locked in a range, and such method can increase sample on the basis of guaranteeing the precision of sample image The data volume of image increases the application space of above-mentioned sample image.
S102, it is that foundation is calculated with sample image, obtains the cryptographic Hash of sample image.
In the present embodiment, though sample image is image, it be that sample image includes that sample image, which specifically refers to, All information, wherein just including the image data of sample image, and above-mentioned calculating basis is the image data.
In the present embodiment, image data includes a variety of data of the size of image, color, ratio etc., to this this implementation It is not limited in any way in example.
In the present embodiment, cryptographic Hash is corresponding with sample image, wherein it should be noted that when sample image is in the presence of more When width, then corresponding cryptographic Hash should be also different, the reason for this is that the corresponding sample image of identical cryptographic Hash belongs to phase Like image.
In the present embodiment, the calculating basis of cryptographic Hash can be with any one of above-mentioned image data data, it is preferred that The calculating of cryptographic Hash is only based on the color data in image data.Wherein, using calculating Hash based on color data The method of value has biggish image identification space, while the redundancy condition generated in order to avoid excessively multidata fusion, breathes out The basis of the calculation method of uncommon value can play accurate and succinct double effects using color data.
S103, with preset database and cryptographic Hash be according to carrying out coincidence judgement, obtain be in database for indicating No includes the judgement result of storing data corresponding with cryptographic Hash.
In the present embodiment, multiple cryptographic Hash are included at least in preset database, specifically, preset database includes The corresponding cryptographic Hash of all images in stored artificial intelligence training set.
It in the present embodiment, can also be including the image in history artificial intelligence training set in preset database.
In the present embodiment, it is overlapped and determines to may include two ways, one is whether the cryptographic Hash of judgement sample data deposits With in preset database, if it is present obtaining existing result;Secondly the permission of the cryptographic Hash for judgement sample data Whether section is overlapped with the data in preset database, also or based on the cryptographic Hash of sample data, judges preset The error of data in database allows whether section includes sample data.
In the present embodiment, it is overlapped the effect for determining that this process can play duplicate removal, and effect is more accurate.
In the present embodiment, storing data corresponding with cryptographic Hash can be image data, or Hash Value Data, it is right It is not limited in any way in this present embodiment.
In the present embodiment, above-mentioned database and the corresponding relationship of cryptographic Hash may include that cryptographic Hash database corresponds to sample This image cryptographic Hash, image data base correspond to sample image cryptographic Hash.
In the present embodiment, determine result for indicating in database whether to include storing data corresponding with cryptographic Hash.
In the present embodiment, determine that result specifically can be used to indicate that in database with the presence or absence of in sample image cryptographic Hash Identical cryptographic Hash.And implement this embodiment, then the judgement of sample image cryptographic Hash can be made more accurate, avoid losing Leakage.
S104, when determining result not include storing data corresponding with cryptographic Hash in database, sample image is added Default training set is added to, artificial intelligence training set is obtained.
In the present embodiment, above-mentioned storing data can be image or cryptographic Hash, to not making any limit in this present embodiment It is fixed.
In the present embodiment, when determine result show in database not with the same or similar number of sample image cryptographic Hash According to when, sample image is added to default training set, obtains artificial intelligence training set.
As an alternative embodiment, determining result not include storage number corresponding with cryptographic Hash in database According to when, the step of being added to default training set, obtain artificial intelligence training set sample image includes:
When determining result not include storing data corresponding with cryptographic Hash in database, cryptographic Hash is stored to data Sample image is added to default training set, obtains artificial intelligence training set by library.
Implement this embodiment, the cryptographic Hash of sample image can be stored, so that subsequent judgement is richer Richness, more accurately.
As an alternative embodiment, determining result not include storage number corresponding with cryptographic Hash in database According to when, after the step of being added to default training set, obtain artificial intelligence training set sample image, this method further include:
Artificial intelligence training set is determined as default training set.
Implement this embodiment, mode is extracted in the self-loopa that artificial intelligence training set may be implemented, so that Artificial intelligence training set uses opportunity more wide in range, and then improves the universality that artificial intelligence training set obtains.
In the present embodiment, the acquisition methods of this kind of artificial intelligence training set are applied to computer equipment, this kind of computer is set It is standby that there is computing function, but for not being limited in any way in entity the present embodiment of the equipment, for example, above-mentioned computer Equipment can be computer, can also be server unit etc..Wherein, computer equipment can be used as this kind of artificial intelligence training set Acquisition methods executing subject.
As it can be seen that implementing the acquisition methods of artificial intelligence training set described in Fig. 1, it can preferentially obtain what sample set included Sample image, wherein above-mentioned sample set can be video set and be also possible to image set, no matter but the classification of sample set is assorted , sample image can extract from sample set obtain always, meanwhile, above-mentioned sample image can be several;In sample Image is acquired later, the extraction of digital information is carried out to sample image according to preset algorithm, according further to extraction Obtained digital information carries out the calculating and generation of cryptographic Hash, obtains cryptographic Hash corresponding with sample image;When each sample graph Whether as all corresponding to, there are after cryptographic Hash, it can be determined that including the Hash of the sample image newly got in preset database Value, if in database not including above-mentioned cryptographic Hash, it does not include corresponding with above-mentioned cryptographic Hash that judging result, which is in database, Image, and then above-mentioned cryptographic Hash picture corresponding with its is stored into database, so that artificial intelligence training set is present in In the database stated;Wherein, above-mentioned database includes great amount of samples image and a large amount of corresponding cryptographic Hash.As it can be seen that implementing this Kind embodiment, the acquisition and storage that sample image can be carried out by computer equipment judge, artificial so as to avoid Participation, save the cost, reduce resource loss;Meanwhile being judged by the introducing of cryptographic Hash, this method can be improved to sample The acquisition precision of this image, and the operation efficiency of computer is improved to a certain extent, because the judgement of cryptographic Hash relatively passes The image judgement of system is more simple;In addition, this method can also increase incessantly artificial intelligence training set in the process used Including content, therefore, whether it is first using or reuse, this method universality also with higher.
Embodiment 2
Fig. 2 is please referred to, Fig. 2 is the process of the acquisition methods of another artificial intelligence training set provided by the embodiments of the present application Schematic diagram.The flow diagram of the acquisition methods of artificial intelligence training set described in Fig. 2 is the artificial intelligence according to described in Fig. 1 What the flow diagram of the acquisition methods of energy training set improved.Wherein, pure color determines that standard includes pure color standard regions Between and pure color decision threshold ratio, the acquisition methods of the artificial intelligent training collection include:
S201, initial set corresponding with default category is obtained.
In the present embodiment, category can be understood as type or the distinguishing any address of tool.
In the present embodiment, category may refer to image category, image sources classification.
In the present embodiment, presetting category is then pre-set category corresponding with artificial intelligence training set, for example, Artificial intelligence training set needs any category, and the current category of artificial intelligence training set is default category in other words.
In the present embodiment, default category can be determined according to default training set, to not appointing in this present embodiment What is limited.
In the present embodiment, initial set be initial image set or video set etc., it is any to not making in this present embodiment It limits.
S202, initial set progress image zooming-out is handled to obtain sample image.
In the present embodiment, the type of initial set and the type of extraction process are corresponding, and for example initial set is video, that Extraction process is then video image extraction process.
In the present embodiment, extraction process may include certain calculation amount and verifying amount, for example, calculation amount can be with Calculated for noise reduction, the compression of drop memory space calculates etc., and verifying then may include fuzzy verifying etc., in this present embodiment It is not limited in any way.
For example, step S201 can prepare initial set in advance, (such as can specifically swim according to different categories Play, star show etc.) carrying out drawing stream, (wherein the type of video data can be FLASH VIDEO, i.e. Streaming Media to obtain video data Format);Then step S202 can be then right using ffmpeg (Fast Forward Mpeg) after video data is got Video data carries out transcoding and obtains sample image corresponding with category, in practice, it is available not to repeat such operation With the sample image of category.
S203, the image data for obtaining sample image.
In the present embodiment, image data includes the color data of sample image, dimension data, ratio data etc., to this It is not limited in any way in the present embodiment.
In the present embodiment, image data can be three primary colors data (i.e. RGB data).
For example, step S203 can use ffmpeg and be decoded to obtain RGB data to sample image, and here may be used By using in a manner of calling directly ffmpeg order line, (every handle once just needs to adjust an order line, the higher one kind of precision Mode), decoded RGB data (a kind of mode of fast speed) can also be obtained by using the mode in the library ffmpeg.
S204, resolution ratio scaling processing and gray proces are carried out to image data, obtains preprocessed data.
In the present embodiment, carrying out resolution ratio scaling processing to image data can be understood as carrying out according to fixed ratio image The image scaling processing that example ruler carries out, and the treatment process is based on data in practice.
In the present embodiment, carrying out gray proces to image data can be understood as coloring image into gray image Processing, the treatment process and processing result are also all data mode.
In the present embodiment, preprocessed data is the processing result of image data.
For example, after obtaining the RGB data of sample image, by source RGB data according to the size scaling of sample image At the gray level image (i.e. preprocessed data) of 8*8 resolution ratio.
S205, it is that foundation is calculated with preprocessed data, obtains average gray.
In the present embodiment, preprocessed data can be greyscale image data, and the calculating is carried out accordingly according to data Calculating, specifically, the calculating can be mean value calculation.
In the present embodiment, preprocessed data is greyscale image data, i.e., preprocessed data includes in treated image The gray scale class of each pixel, and carry out conversion on the basis of the gray scale class and average gray is calculated.
In the present embodiment, average gray is mean value in an image based on gray scale, for representing the preprocessed data The mean gray-scale of corresponding image.
In the present embodiment, preprocessed data is by resolution ratio scaling processing and gray proces, and therefore, which is also base In resolution ratio scaling processing and the result after gray proces, so while average gray is corresponding with sample image, still Average gray can not directly acquire in sample image.
For example, after the gray level image of above-mentioned 8*8 resolution ratio obtains, the gray level image of 8*8 resolution ratio is calculated Average gray.
S206, it is that foundation traverses each subdata compared in preprocessed data with average gray, obtains comparison result.
It include subdata corresponding with pixel in the present embodiment, in preprocessed data, and the quantity of the pixel is through excessive Resolution scaling processing obtains fruiting quantities.
In the present embodiment, traversal can relatively compare for the size of traversal formula, specifically, can be to judge average gray Size relation between the gray value of each pixel, to obtain comparison result.
For example, each of the gray level image of 8*8 resolution ratio pixel is traversed according to average gray, if pixel Value is greater than or equal to mean value, is denoted as 1, is otherwise denoted as 0.Finally obtain 64 comparison results, i.e. 64 numbers.
S207, it is that foundation is calculated with comparison result, obtains the cryptographic Hash of sample image.
In the present embodiment, comparison result can be 0 or 1, that is, indicate the size relation with average gray value.
In the present embodiment, for the comparison result, it can be calculated with further progress and simplify or refine, to this this implementation It is not limited in any way in example.
For example, it after getting 64 above-mentioned comparison results, can be calculated according to 64 comparison results To the cryptographic Hash (fingerprint that the cryptographic Hash can be referred to as sample image) of sample image.Wherein, 64 bit digitals can be with parallel combinations Sequence for above-mentioned cryptographic Hash, specific 64 bit digital will fix;In addition, 64 bit digitals can carry out further transcoding or Person calculates, and obtains the cryptographic Hash of sample image, for example, 11111111 can be converted to the 16 of ff from binary data Binary data.
S208, with preset database and cryptographic Hash be according to carrying out coincidence judgement, obtain be in database for indicating No includes the judgement result of storing data corresponding with cryptographic Hash;Wherein, the data structure of database is red black tree.
In the present embodiment, the data structure of red black tree can make corresponding search operation in the time complexity of O (logn) Lower completion, to improve the search speed of data.
In the present embodiment, multiple cryptographic Hash are included at least in preset database, specifically, preset database includes The corresponding cryptographic Hash of all images in stored artificial intelligence training set.
It in the present embodiment, can also be including the image in history artificial intelligence training set in preset database.
In the present embodiment, it is overlapped and determines to may include two ways, one is whether the cryptographic Hash of judgement sample data deposits With in preset database, if it is present obtaining existing result;Secondly the permission of the cryptographic Hash for judgement sample data Whether section is overlapped with the data in preset database, also or based on the cryptographic Hash of sample data, judges preset The error of data in database allows whether section includes sample data.
In the present embodiment, it is overlapped the effect for determining that this process can play duplicate removal, and effect is more accurate.
In the present embodiment, storing data corresponding with cryptographic Hash can be image data, or Hash Value Data, it is right It is not limited in any way in this present embodiment.
In the present embodiment, above-mentioned database and the corresponding relationship of cryptographic Hash may include that cryptographic Hash database corresponds to sample This image cryptographic Hash, image data base correspond to sample image cryptographic Hash.
In the present embodiment, determine result for indicating in database whether to include storing data corresponding with cryptographic Hash.
In the present embodiment, determine that result specifically can be used to indicate that in database with the presence or absence of in sample image cryptographic Hash Identical cryptographic Hash.And implement this embodiment, then the judgement of sample image cryptographic Hash can be made more accurate, avoid losing Leakage.
S209, when determining result to include storing data corresponding with cryptographic Hash in database, delete and cryptographic Hash pair The sample image answered.
In the present embodiment, deleting the corresponding sample image of cryptographic Hash means that there are the same or similar images.
S210, when determining result not include storing data corresponding with cryptographic Hash in database, obtain sample image Multiple block images.
In the present embodiment, when determining result not include storing data corresponding with cryptographic Hash in database, it is believed that sample This image and database include that there is no correlations for content, therefore based on piecemeal, judgement sample image whether be it is completely black or Complete white unwanted picture.
As an example it is assumed that the resolution ratio of original image is M*N, the width and height that wherein M and N is respectively video are (with pixel For unit).Original image is decomposed into the block image for the k*k size not overlapped.
S211, it obtains and the one-to-one color mean value of multiple block images.
In the present embodiment, color mean value can the image mean value that is calculated of the three primary colors of the RGB according to, wherein RGB tri- is former The respective value range of color is [0,255].
For example, color mean value is single numerical value, and numerical value also belongs to [0,255].
S212, it determines whether color mean value is subordinated to pure color standard section one by one, obtains the sub- result of multiple determinations.
In the present embodiment, pure color standard section can be [0,3] and [252,255].
In the present embodiment, determine that sub- result can be 1 or 0, wherein 1 is used to indicate that the color mean value of block image belongs to Pure color standard section, as pure color;0 for indicating that the color mean value of block image is not belonging to pure color standard section, i.e., is not pure Color.
In the present embodiment, the quantity for determining sub- result is identical as the quantity of block image.
S213, it determines that standard is to determine according to the pure color for carrying out sample image with the sub- result of multiple determinations and pure color, obtains For indicate sample image whether be solid-color image definitive result.
As an alternative embodiment, determining standard for according to progress sample graph with the sub- result of multiple determinations and pure color The pure color of picture determines, obtains for indicating that the step of whether sample image is the definitive result of solid-color image includes:
Using the sub- result of multiple determinations as foundation, block image quantity and multiple single pure color categorical measures are obtained;
It is according to progress sample graph with block image quantity, multiple single pure color categorical measures and pure color decision threshold ratio The pure color of picture determines, obtain for indicate sample image whether be solid-color image definitive result.
In the present embodiment, pure color decision threshold ratio can be 3/4, specifically, belonging to pure color standard section in color mean value Really when the ratio of the total quantity of stator fruiting quantities and determining sub- result is greater than 3/4, determine that the sample image is solid-color image.
In the present embodiment, black or pure white and half black half white pure color comparison can also be carried out, to this this implementation No longer add to repeat in example.
S214, when it is solid-color image that definitive result, which is sample data, delete corresponding with cryptographic Hash sample image.
In the present embodiment, deleting the corresponding sample image of cryptographic Hash means that the image is black or pure white image , i.e. the sample image qualification that is not engaged in training set, therefore delete.
S215, when it is not solid-color image that definitive result, which is sample data, sample image is added to preset training set, Obtain artificial intelligence training set.
Acquisition precision, the standard of artificial intelligence training set can be improved in this embodiment of implementation steps S201~S215 True rate and arithmetic speed.
In the present embodiment, the acquisition methods of this kind of artificial intelligence training set are applied to computer equipment, this kind of computer is set It is standby that there is computing function, but for not being limited in any way in entity the present embodiment of the equipment, for example, above-mentioned computer Equipment can be computer, can also be server unit etc..Wherein, computer equipment can be used as this kind of artificial intelligence training set Acquisition methods executing subject.
As it can be seen that implementing the acquisition methods of artificial intelligence training set described in Fig. 2, can be carried out by computer equipment The acquisition and storage of sample image judge, so as to avoid artificial participation, save the cost, the loss for reducing resource;Meanwhile Judged by the introducing of cryptographic Hash, this method can be improved to the acquisition precision of sample image, and mention to a certain extent The high operation efficiency of computer because the judgement of cryptographic Hash it is more traditional image judgement it is more simple;In addition, this method can also It is enough to increase the content that artificial intelligence training set includes incessantly in the process used, therefore, whether use for the first time or heavy It is multiple to use, this method universality also with higher;It can also can be adapted for a variety of categories when this method is used In any category so that the extraction of the sample image under the category is more targeted, in artificial intelligence training Collection can have higher specific aim and accuracy after getting;The acquisition precision that cryptographic Hash can also be improved, to guarantee The using effect of cryptographic Hash after acquisition;Cryptographic Hash can also be made to can be applied to a kind of preset relatively suitable use space, To improve the validity of artificial intelligence training set, wherein above-mentioned use space can be by resolution ratio scaling processing and gray scale The requirement of reason determines;Sample image can also be filtered out by the use to cryptographic Hash, to guarantee artificial intelligent training The degree of purity of collection improves the validity of artificial intelligence training set when in use;It can also be mentioned using the data structure of red black tree The acquisition efficiency of high artificial intelligence training set;The acquisition of artificial intelligent training collection can also be made to be based on, and cryptographic Hash compares and pure color is sentenced Fixed two committed steps, to greatly improve the acquisition precision of artificial intelligence training set;Can also, on the basis of pure color determines The determination reliability so that solid-color image is improved again, to improve the use reliability of artificial intelligence training set;It can also filter out Image that is duplicate or not meeting pure color standard.
Embodiment 3
Fig. 3 is please referred to, Fig. 3 is that a kind of structure of the acquisition device of artificial intelligence training set provided by the embodiments of the present application is shown It is intended to.Wherein, the acquisition device of the artificial intelligent training collection includes:
Acquiring unit 310, for obtaining sample image;
Computing unit 320, for being to obtain the cryptographic Hash of sample image according to being calculated with sample image;
Judging unit 330, for being to obtain according to coincidence judgement is carried out for indicating with preset database and cryptographic Hash In database whether include storing data corresponding with cryptographic Hash judgement result;
Adding unit 340 will for when determining result not include storing data corresponding with cryptographic Hash in database Sample image is added to default training set, obtains artificial intelligence training set.
In the present embodiment, what sample image referred to is a part in big data, for content included by sample image It is not limited in any way in the present embodiment.
In the present embodiment, the acquisition source of sample image is not limited in any way equally, it can be seen that, sample image can be with It is one of by all means or accessed by one of various means.For example, the acquisition of sample image Source can be the video data for drawing stream, and be also possible to the image data of storage, to not being limited in any way in this present embodiment.
In the present embodiment, obtaining means the present embodiment that acquiring unit 310 uses also is not limited in any way.
In the present embodiment, though sample image is image, it be that sample image includes that sample image, which specifically refers to, All information, wherein just including the image data of sample image, and above-mentioned calculating basis is the image data.
In the present embodiment, image data includes a variety of data of the size of image, color, ratio etc., to this this implementation It is not limited in any way in example.
In the present embodiment, cryptographic Hash is corresponding with sample image, wherein it should be noted that when sample image is in the presence of more When width, then corresponding cryptographic Hash should be also different, the reason for this is that the corresponding sample image of identical cryptographic Hash belongs to phase Like image.
In the present embodiment, the calculating basis of cryptographic Hash can be with any one of above-mentioned image data data, it is preferred that The calculating of cryptographic Hash is only based on the color data in image data.Wherein, using calculating Hash based on color data The method of value has biggish image identification space, while the redundancy condition generated in order to avoid excessively multidata fusion, breathes out The basis of the calculation method of uncommon value can play accurate and succinct double effects using color data.
In the present embodiment, multiple cryptographic Hash are included at least in preset database, specifically, preset database includes The corresponding cryptographic Hash of all images in stored artificial intelligence training set.
It in the present embodiment, can also be including the image in history artificial intelligence training set in preset database.
In the present embodiment, it is overlapped and determines to may include two ways, one is whether the cryptographic Hash of judgement sample data deposits With in preset database, if it is present obtaining existing result;Secondly the permission of the cryptographic Hash for judgement sample data Whether section is overlapped with the data in preset database, also or based on the cryptographic Hash of sample data, judges preset The error of data in database allows whether section includes sample data.
In the present embodiment, it is overlapped the effect for determining that this process can play duplicate removal, and effect is more accurate.
In the present embodiment, storing data corresponding with cryptographic Hash can be image data, or Hash Value Data, it is right It is not limited in any way in this present embodiment.
In the present embodiment, above-mentioned database and the corresponding relationship of cryptographic Hash may include that cryptographic Hash database corresponds to sample This image cryptographic Hash, image data base correspond to sample image cryptographic Hash.
In the present embodiment, determine result for indicating in database whether to include storing data corresponding with cryptographic Hash.
In the present embodiment, determine that result specifically can be used to indicate that in database with the presence or absence of in sample image cryptographic Hash Identical cryptographic Hash.And implement this embodiment, then the judgement of sample image cryptographic Hash can be made more accurate, avoid losing Leakage.
In the present embodiment, above-mentioned storing data can be image or cryptographic Hash, to not making any limit in this present embodiment It is fixed.
In the present embodiment, when determine result show in database not with the same or similar number of sample image cryptographic Hash According to when, sample image is added to default training set, obtains artificial intelligence training set.
For example, the acquisition device of this kind of artificial intelligence training set can be prepared in advance just by acquiring unit 310 Initial set specifically can obtain video data (wherein video according to drawing stream is carried out to different categories (such as game, star show etc.) The type of data can be FLASH VIDEO, i.e. stream media format);Then it is then utilized after video data is got again Ffmpeg (Fast Forward Mpeg) carries out transcoding to video data and obtains sample image corresponding with category, is practicing In, repeat the sample image of the available different categories of such operation.
In the present embodiment, the acquisition device of artificial intelligence training set can be quoted in embodiment 1 or embodiment 2 and retouched State it is any illustrate, to no longer adding to repeat in this present embodiment.
As it can be seen that the acquisition device of artificial intelligence training set described in implementing Fig. 3, can be completed by acquiring unit 310 The acquisition of sample image, then processing calculating is carried out to sample image by computing unit 320, obtain the corresponding Hash of sample image Value, is then judged by repeatability of the judging unit 330 to cryptographic Hash, to guarantee in the unduplicated situation of cryptographic Hash Under, the corresponding sample image of the cryptographic Hash is added to default training set, to guarantee that the content of artificial intelligent training collection does not repeat. Wherein, above-mentioned cryptographic Hash does not repeat to guarantee that sample image is not repeated with the image stored in database.It can be seen that using The acquisition device of the artificial intelligent training collection can be executed independently, complete the acquisition of artificial intelligence training set, to avoid artificial Participation, and then realize save the cost, reduce resource loss effect.
Embodiment 4
Fig. 4 is please referred to, Fig. 4 is the structure of the acquisition device of another artificial intelligence training set provided by the embodiments of the present application Schematic diagram.The structural schematic diagram of the acquisition device of artificial intelligence training set described in Fig. 4 is the artificial intelligence according to described in Fig. 3 What the structural schematic diagram of the acquisition device of energy training set improved.Wherein, above-mentioned acquiring unit 310 may include:
First obtains subelement 311, for obtaining initial set corresponding with default category;
Subelement 312 is extracted, handles to obtain sample image for carrying out image zooming-out to initial set.
As an alternative embodiment, computing unit 320 may include:
Second obtains subelement 321, for obtaining the image data of sample image;
Computation subunit 322, for being to obtain the cryptographic Hash of sample image according to being calculated with image data.
As an alternative embodiment, computation subunit 322 can also include:
Processing module obtains preprocessed data for carrying out resolution ratio scaling processing and gray proces to image data;
Computing module, for being to obtain average gray according to being calculated with preprocessed data;
Comparison module is obtained for being each subdata compared in preprocessed data according to traversal with average gray Comparison result;
Computing module is also used to be calculated with comparison result for foundation, obtains the cryptographic Hash of sample image.
As an alternative embodiment, the acquisition device of artificial intelligence training set further include:
Unit 350 is deleted, for deleting when determining result to include storing data corresponding with cryptographic Hash in database Sample image corresponding with cryptographic Hash.
As an alternative embodiment, the data structure of database is red black tree.
As an alternative embodiment, adding unit 340 may include:
Determine subelement 341, for when determining result not include storing data corresponding with cryptographic Hash in database, Determine that standard to determine according to progress pure color, is obtained for indicating whether sample image is pure with sample image and preset pure color The definitive result of chromatic graph picture;
Subelement 342 is added, for when it is not solid-color image that definitive result, which is sample data, sample image to be added to Preset training set obtains artificial intelligence training set.
As an alternative embodiment, determining that subelement 341 may include:
Module is obtained, for obtaining multiple block images of sample image;
Determining module obtains multiple determining sons for determining that standard carries out pure color to multiple block images and determines with pure color As a result;
Determining module is also used to determine that standard is according to the pure color for carrying out sample image with the sub- result of multiple determinations and pure color Determine, obtain for indicate sample image whether be solid-color image definitive result.
In the present embodiment, determining module can execute acquisition and the one-to-one color mean value of multiple block images;One by one It determines whether color mean value is subordinated to pure color standard section, obtains the operation of the sub- result of multiple determinations.
As an alternative embodiment, deleting unit 350, it is also used to be sample data in definitive result to be pure color figure When picture, sample image corresponding with cryptographic Hash is deleted.
Implement this embodiment, artificial intelligence instruction can be improved by using the acquisition device of artificial intelligence training set Practice acquisition precision, accuracy rate and the arithmetic speed of collection.
In the present embodiment, the acquisition device of artificial intelligence training set can be quoted in embodiment 1 or embodiment 2 and retouched State it is any illustrate, to no longer adding to repeat in this present embodiment.
As it can be seen that implementing the acquisition device of artificial intelligence training set described in Fig. 4, can be completed by acquiring unit 310 The acquisition of sample image, then processing calculating is carried out to sample image by computing unit 320, obtain the corresponding Hash of sample image Value, is then judged by repeatability of the judging unit 330 to cryptographic Hash, to guarantee in the unduplicated situation of cryptographic Hash Under, the corresponding sample image of the cryptographic Hash is added to default training set, to guarantee that the content of artificial intelligent training collection does not repeat. Wherein, above-mentioned cryptographic Hash does not repeat to guarantee that sample image is not repeated with the image stored in database.It can be seen that using The acquisition device of the artificial intelligent training collection can be executed independently, complete the acquisition of artificial intelligence training set, to avoid artificial Participation, and then realize save the cost, reduce resource loss effect;The combination of other multiple units realization pair can also be passed through The raising of the acquisition precision of artificial intelligence training set, efficiency etc..
The embodiment of the present application provides a kind of electronic equipment, including memory and processor, and memory is based on storing Calculation machine program, processor runs computer program so that electronic equipment is executed according to any in the embodiment of the present application 1 or embodiment 2 The acquisition methods of item artificial intelligence training set.
The embodiment of the present application provides a kind of computer readable storage medium, is stored with computer program instructions, calculates When machine program instruction is read and run by a processor, any one of the embodiment of the present application 1 or embodiment 2 artificial intelligence instruction are executed Practice the acquisition methods of collection.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the application, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the application can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
The above description is only an example of the present application, the protection scope being not intended to limit this application, for ability For the technical staff in domain, various changes and changes are possible in this application.Within the spirit and principles of this application, made Any modification, equivalent substitution, improvement and etc. should be included within the scope of protection of this application.It should also be noted that similar label and Letter indicates similar terms in following attached drawing, therefore, once it is defined in a certain Xiang Yi attached drawing, then in subsequent attached drawing In do not need that it is further defined and explained.
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Claims (14)

1. a kind of acquisition methods of artificial intelligence training set, which is characterized in that the described method includes:
Obtain sample image;
It is to obtain the cryptographic Hash of the sample image according to being calculated with the sample image;
With preset database and the cryptographic Hash be according to carrying out coincidence judgement, obtain for indicate in the database whether Judgement result including storing data corresponding with the cryptographic Hash;
The judgement result be the database in do not include storing data corresponding with the cryptographic Hash when, by the sample Image is added to default training set, obtains artificial intelligence training set.
2. the acquisition methods of artificial intelligence training set according to claim 1, which is characterized in that the acquisition sample image The step of include:
Obtain initial set corresponding with default category;
Image zooming-out is carried out to the initial set to handle to obtain sample image.
3. the acquisition methods of artificial intelligence training set according to claim 1, which is characterized in that described with the sample graph As the step of being calculated for foundation, obtain the cryptographic Hash of the sample image includes:
Obtain the image data of the sample image;
It is to obtain the cryptographic Hash of the sample image according to being calculated with described image data.
4. the acquisition methods of artificial intelligence training set according to claim 3, which is characterized in that described with described image number Include: according to the step of being calculated for foundation, obtain the cryptographic Hash of the sample image
Resolution ratio scaling processing and gray proces are carried out to described image data, obtain preprocessed data;
It is to obtain average gray according to being calculated with the preprocessed data;
It is to obtain comparison result according to each subdata traversed in the preprocessed data with the average gray;
It is to obtain the cryptographic Hash of the sample image according to being calculated with the comparison result.
5. the acquisition methods of artificial intelligence training set according to claim 1, which is characterized in that described with preset data Library and the cryptographic Hash be according to carrying out coincidence judgement, obtain for indicate in the database whether include and the cryptographic Hash After the step of judgement result of corresponding storing data, the method also includes:
The judgement result be the database in include storing data corresponding with the cryptographic Hash when, delete with the Kazakhstan It is uncommon to be worth corresponding sample image.
6. the acquisition methods of artificial intelligence training set according to claim 1, which is characterized in that the data of the database Structure is red black tree.
7. the acquisition methods of artificial intelligence training set according to claim 1, which is characterized in that described to be tied in the judgement When fruit is does not include storing data corresponding with the cryptographic Hash in the database, the sample image is added to preset Training set, the step of obtaining artificial intelligence training set include:
The judgement result be the database in do not include storing data corresponding with the cryptographic Hash when, with the sample Image and preset pure color determine that standard to determine according to progress pure color, is obtained for indicating whether the sample image is pure color The definitive result of image;
When it is not solid-color image that the definitive result, which is the sample data, the sample image is added to preset training Collection, obtains artificial intelligence training set.
8. the acquisition methods of artificial intelligence training set according to claim 7, which is characterized in that described with the sample graph Picture and preset pure color determine that standard to determine according to progress pure color, is obtained for indicating whether the sample image is pure color figure The step of definitive result of picture includes:
Obtain multiple block images of the sample image;
It determines that standard carries out pure color to the multiple block image and determines with the pure color, obtains the sub- result of multiple determinations;
It determines that standard is to determine according to the pure color for carrying out the sample image with the sub- result of the multiple determination and the pure color, obtains To for indicate the sample image whether be solid-color image definitive result.
9. the acquisition methods of artificial intelligence training set according to claim 8, which is characterized in that the pure color determines standard It is described to determine that standard carries out pure color to the multiple block image and determines with the pure color including pure color standard section, it obtains more The step of a determination sub- result includes:
It obtains and the one-to-one color mean value of the multiple block image;
It determines whether the color mean value is subordinated to pure color standard section one by one, obtains the sub- result of multiple determinations.
10. the acquisition methods of artificial intelligence training set according to claim 7, which is characterized in that described with the sample Image and preset pure color determine that standard to determine according to progress pure color, is obtained for indicating whether the sample image is pure color After the step of definitive result of image, the method also includes:
When it is solid-color image that the definitive result, which is the sample data, sample image corresponding with the cryptographic Hash is deleted.
11. a kind of acquisition device of artificial intelligence training set, which is characterized in that the acquisition device includes:
Acquiring unit, for obtaining sample image;
Computing unit, for being to obtain the cryptographic Hash of the sample image according to being calculated with the sample image;
Judging unit, for being to obtain according to coincidence judgement is carried out for indicating with preset database and the cryptographic Hash State in database whether include storing data corresponding with the cryptographic Hash judgement result;
Adding unit does not include storing data corresponding with the cryptographic Hash for being in the database in the judgement result When, the sample image is added to default training set, obtains artificial intelligence training set.
12. the acquisition device of artificial intelligence training set according to claim 10, which is characterized in that the adding unit packet It includes:
It determines subelement, does not include storage number corresponding with the cryptographic Hash for being in the database in the judgement result According to when, with the sample image and preset pure color determine standard for according to carry out pure color determine, obtain for indicating the sample This image whether be solid-color image definitive result;
Subelement is added, for when it is not solid-color image that the definitive result, which is the sample data, by the sample image It is added to preset training set, obtains artificial intelligence training set.
13. a kind of electronic equipment, which is characterized in that the electronic equipment includes memory and processor, and the memory is used In storage computer program, the processor runs the computer program so that the electronic equipment is executed according to claim The acquisition methods of artificial intelligence training set described in any one of 1 to 10.
14. a kind of readable storage medium storing program for executing, which is characterized in that computer program instructions are stored in the read/write memory medium, When the computer program instructions are read and run by a processor, perform claim requires 1 to 10 described in any item artificial intelligence The acquisition methods of energy training set.
CN201910757684.2A 2019-08-19 2019-08-19 A kind of acquisition methods and device of artificial intelligence training set Pending CN110490250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910757684.2A CN110490250A (en) 2019-08-19 2019-08-19 A kind of acquisition methods and device of artificial intelligence training set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910757684.2A CN110490250A (en) 2019-08-19 2019-08-19 A kind of acquisition methods and device of artificial intelligence training set

Publications (1)

Publication Number Publication Date
CN110490250A true CN110490250A (en) 2019-11-22

Family

ID=68551375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910757684.2A Pending CN110490250A (en) 2019-08-19 2019-08-19 A kind of acquisition methods and device of artificial intelligence training set

Country Status (1)

Country Link
CN (1) CN110490250A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798435A (en) * 2020-07-08 2020-10-20 国网山东省电力公司东营供电公司 Image processing method, and method and system for monitoring invasion of engineering vehicle into power transmission line
CN112182277A (en) * 2020-11-09 2021-01-05 成都优查科技有限公司 Method for matching aluminum template by image processing technology
CN112990335A (en) * 2021-03-31 2021-06-18 江苏方天电力技术有限公司 Intelligent recognition self-learning training method and system for power grid unmanned aerial vehicle inspection image defects

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599122A (en) * 2009-07-02 2009-12-09 阿里巴巴集团控股有限公司 A kind of image-recognizing method and device
CN103617233A (en) * 2013-11-26 2014-03-05 烟台中科网络技术研究所 Method and device for detecting repeated video based on semantic content multilayer expression
US8699789B2 (en) * 2011-09-12 2014-04-15 Xerox Corporation Document classification using multiple views
CN104778481A (en) * 2014-12-19 2015-07-15 五邑大学 Method and device for creating sample library for large-scale face mode analysis
CN106570141A (en) * 2016-11-04 2017-04-19 中国科学院自动化研究所 Method for detecting approximately repeated image
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
CN107480203A (en) * 2017-07-23 2017-12-15 北京中科火眼科技有限公司 It is a kind of to be directed to identical and similar pictures duplicate removal view data cleaning method
CN107729935A (en) * 2017-10-12 2018-02-23 杭州贝购科技有限公司 The recognition methods of similar pictures and device, server, storage medium
CN107798389A (en) * 2017-11-06 2018-03-13 国网重庆市电力公司电力科学研究院 A kind of image data set construction method, system and computer readable storage devices
CN108073934A (en) * 2016-11-17 2018-05-25 北京京东尚科信息技术有限公司 Nearly multiimage detection method and device
CN108595710A (en) * 2018-05-11 2018-09-28 杨晓春 A kind of quick mass picture De-weight method
CN108830294A (en) * 2018-05-09 2018-11-16 四川斐讯信息技术有限公司 A kind of augmentation method of image data
CN109840556A (en) * 2019-01-24 2019-06-04 浙江大学 A kind of image classification recognition methods based on twin network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599122A (en) * 2009-07-02 2009-12-09 阿里巴巴集团控股有限公司 A kind of image-recognizing method and device
US8699789B2 (en) * 2011-09-12 2014-04-15 Xerox Corporation Document classification using multiple views
CN103617233A (en) * 2013-11-26 2014-03-05 烟台中科网络技术研究所 Method and device for detecting repeated video based on semantic content multilayer expression
CN104778481A (en) * 2014-12-19 2015-07-15 五邑大学 Method and device for creating sample library for large-scale face mode analysis
CN106570141A (en) * 2016-11-04 2017-04-19 中国科学院自动化研究所 Method for detecting approximately repeated image
CN108073934A (en) * 2016-11-17 2018-05-25 北京京东尚科信息技术有限公司 Nearly multiimage detection method and device
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
CN107480203A (en) * 2017-07-23 2017-12-15 北京中科火眼科技有限公司 It is a kind of to be directed to identical and similar pictures duplicate removal view data cleaning method
CN107729935A (en) * 2017-10-12 2018-02-23 杭州贝购科技有限公司 The recognition methods of similar pictures and device, server, storage medium
CN107798389A (en) * 2017-11-06 2018-03-13 国网重庆市电力公司电力科学研究院 A kind of image data set construction method, system and computer readable storage devices
CN108830294A (en) * 2018-05-09 2018-11-16 四川斐讯信息技术有限公司 A kind of augmentation method of image data
CN108595710A (en) * 2018-05-11 2018-09-28 杨晓春 A kind of quick mass picture De-weight method
CN109840556A (en) * 2019-01-24 2019-06-04 浙江大学 A kind of image classification recognition methods based on twin network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张珂 等: "非受限条件下的深度人脸年龄分类", 《计算机应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798435A (en) * 2020-07-08 2020-10-20 国网山东省电力公司东营供电公司 Image processing method, and method and system for monitoring invasion of engineering vehicle into power transmission line
CN112182277A (en) * 2020-11-09 2021-01-05 成都优查科技有限公司 Method for matching aluminum template by image processing technology
CN112990335A (en) * 2021-03-31 2021-06-18 江苏方天电力技术有限公司 Intelligent recognition self-learning training method and system for power grid unmanned aerial vehicle inspection image defects
CN112990335B (en) * 2021-03-31 2021-10-15 江苏方天电力技术有限公司 Intelligent recognition self-learning training method and system for power grid unmanned aerial vehicle inspection image defects

Similar Documents

Publication Publication Date Title
Zhong et al. An end-to-end dense-inceptionnet for image copy-move forgery detection
CN109377445A (en) Model training method, the method, apparatus and electronic system for replacing image background
CN110490250A (en) A kind of acquisition methods and device of artificial intelligence training set
CN108062478A (en) The malicious code sorting technique that global characteristics visualization is combined with local feature
CN107750015A (en) Detection method, device, storage medium and the equipment of video copy
CN103257992A (en) Method and system for retrieving similar videos
CN107622280B (en) Modularized processing mode image saliency detection method based on scene classification
CN116363440B (en) Deep learning-based identification and detection method and system for colored microplastic in soil
CN108335290B (en) Image area copying and tampering detection method based on LIOP feature and block matching
CN109886147A (en) A kind of more attribute detection methods of vehicle based on the study of single network multiple-task
CN107545049A (en) Image processing method and related product
CN107133854A (en) Information recommendation method and device
CN107239784A (en) A kind of image identification method, device, electronic equipment and readable storage medium storing program for executing
CN106557765A (en) Note detection means and note detection method
CN113609984A (en) Pointer instrument reading identification method and device and electronic equipment
CN114581646A (en) Text recognition method and device, electronic equipment and storage medium
CN104881668B (en) A kind of image fingerprint extracting method and system based on representative local mode
Kleiner et al. Applying property testing to an image partitioning problem
CN109902751A (en) A kind of dial digital character identifying method merging convolutional neural networks and half-word template matching
CN117119253B (en) High-quality video frame extraction method for target object
CN114519689A (en) Image tampering detection method, device, equipment and computer readable storage medium
CN111710360A (en) Method, system, device and medium for predicting protein sequence
CN109543571B (en) Intelligent identification and retrieval method for special-shaped processing characteristics of complex products
CN115187127B (en) Space analysis-based intelligent detection method for detailed planning hierarchical management
JP2004192555A (en) Information management method, device and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191122