CN110287348A

CN110287348A - A kind of GIF format picture searching method based on machine learning

Info

Publication number: CN110287348A
Application number: CN201910298627.2A
Authority: CN
Inventors: 薛景; 张政; 廖芷瑄; 朱知萌
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2019-04-15
Filing date: 2019-04-15
Publication date: 2019-09-27

Abstract

A kind of GIF format picture search method based on machine learning includes the following steps: step S1, obtains the GIF picture in network, and pre-processes to GIF picture；Step S2, pretreated GIF picture is cut, key frame therein is extracted, calculates the cryptographic Hash of key frame, and the Hash value information of generation and GIF picture are stored to basic database；Step S3, convolutional neural networks model is constructed, using the data training convolutional neural networks model in basic database, obtains the kernel function for prediction；Step S4, target GIF picture to be searched is split, and extracts key frame, then the key frame extracted substitution kernel function is calculated, obtain best match type；Step S5, calculate the cryptographic Hash of the key frame of target GIF picture to be searched, and using Hash value information it is predicted that type in search similar picture, if not having.New classification is then created in current class.Picture searching function of the invention, purpose is increasing GIF picture searching function, and improves the accuracy of search, the final utilization rate for improving picture searching.

Description

A kind of GIF format picture searching method based on machine learning

Technical field

The invention belongs to computer application technologies, and in particular to a kind of GIF format picture inspection based on machine learning Suo Fangfa.

Background technique

With being constantly progressive for search technique in recent years, the mode and range of search are not limited solely to text and document Search, picture searching come into the public visual field gradually.User submits the picture for wanting search, and website or software will be feedbacked to them The source of information, picture on picture and a large amount of similar pictures.Simultaneously because when user is when using search engine Accurate verbal description cannot be often provided, then picture search enters into the public visual field gradually, can not accurately express in verbal description When, user can be with uploading pictures, and by the scanning to picture, the crawl of information is fed back accordingly to user.Simultaneously for The picture record that developer will browse according to user extracts the interested information of user, to carry out relevant information push.By This, the demand of picture recognition is gradually increased.

Since picture searching is different from traditional text search, a large amount of information is usually contained on a picture, in addition to master Information and secondary information are wanted, traditional picture searching can not distinguish these information.Such as: user wants to search for the photograph of certain famous person Piece, and the picture in his hand is the picture of " certain famous person is playing soccer ", the information in figure not only includes the letter that user wants Point is ceased, there are also football, this secondary information, the result of search largely include the picture of football, rather than required for ownership goal Target search object.Traditional picture searching software cannot differentiate main and secondary information, also not analyze the images to source, simultaneously Softwares all at present are scanned for both for still image, without the image recognition specifically for dynamic GIF format etc.. It is photographic search engine one huge for cardon can not be searched for a large amount of cardon is used in the interaction of social network road Defect.The application of picture searching is not extensive at present, but with the diversification of information, only traditional text search can not It meets the needs of users.

With the development of artificial intelligence and machine learning, picture recognition technology has progress at full speed, picture recognition technology It is evolving.Either the identification of portrait or image picture is all more intelligent, provides good basis to picture searching.

Picture searching technology is increasingly taken seriously at present, and many companies have all carried out R and D to it.Figure at present Piece searches for the software and project for having had several maturations, and for two company of home and overseas song and Baidu, the two is all carried out The exploitation of picture recognition function, other than the company of two exploitation search engines, scanning knowledge figure is had also been developed in Taobao Function.In the document of Li Wei, the picture searching function that Google newly releases is described, shopping can be directly displayed by changing function of search Information, which, can be with the type of commodity in Direct Recognition figure in user's search pictures, and searches out similar commodity Recommend user.In the document of the introduced Baidu's expression search of old official communication continuous heavy rain, then describes picture searching and gone out in terms of electric business Application, in the application of social networks.Nowadays in the development of " expression packet " culture, the search for the purpose of searching for expression packet is drawn It holds up and similarly comes into being.Chen Shengnan, Li Jingyi proposed in their article instantly photographic search engine in different field Competitive relation, propose simultaneously, although current picture searching technology can substantially meet the demand of user, due to data The requirement to precision and range of diversity and user improve so that background data base is difficult to support huge data volume.It wants Want to realize high-precision beam search, is also difficult to reach with present art technique.In addition to the research mentioned in above-mentioned article, also The country is dedicated to the Baidu of search engine research, search dog, Sohu.com Inc., joined picture in their search engine and searches to send and draws It holds up, such as " figure is known by Baidu ".Develop the Tencent of social platform, Sina also produces in wechat and microblogging in its social activity and equally joined figure The function of piece search.In the electric business platform Taobao of Alibaba, day cat, the shopping platform Netease of Netease, which sternly chooses, all joined figure Piece searches the function of element and search of taking pictures.But for now, first is that the accuracy of picture searching is to be improved, but current figure In the field that dynamic picture is searched for, there are also researchs to be deployed for piece search, therefore this project is dedicated to the search of development behavior picture, And improve the accuracy such as picture searching.

Bibliography:

Li Wei Google releases picture function of search, can directly display the East China shopping information [J] science and technology, 2017 (5)

Hundred degree of expression search engine products designs of old official communication continuous heavy rain and Management plan [D] Zhejiang University, 2017.

Chen Shengnan, Li Jingyi brief talk the economic guide of development [J] science and technology of pictorial search platform under internet economy, 2017(31).。

Summary of the invention

It is a kind of based on machine learning the technical problem to be solved by the present invention is to overcome the deficiencies of the prior art and provide Dynamic picture is cut into static frames by GIF format picture search method, this method, and there are many phases in the static frames being cut into Like but the different frame of clarity, then extract from these similar frames clarity highest and the best key frame of recognition effect. Then the key frame that will be extracted converts convolution kernel for picture by convolutional neural networks, and is stored in database and target Picture is compared.

The present invention provides a kind of GIF format picture search method based on machine learning, includes the following steps:

Step S1, the GIF picture in network is obtained, and GIF picture is pre-processed；

Step S2, pretreated GIF picture is cut, key frame therein is extracted, calculates the cryptographic Hash of key frame, And the Hash value information of generation and GIF picture are stored to basic database；

Step S3, convolutional neural networks model is constructed, the data training convolutional neural networks mould in basic database is utilized Type obtains the kernel function for prediction；

Step S4, target GIF picture to be searched is split, and extracts key frame, then the key frame extracted is substituted into Kernel function is calculated, and best match type is obtained；

Step S5, the cryptographic Hash of the key frame of target GIF picture to be searched is calculated, and using Hash value information Similar picture is searched in the type of prediction, if not having.New classification is then created in current class.

As further technical solution of the present invention, in step S1, GIF figure is crawled from internet using crawlers Piece resource, and each picture is cut, cutting size is 256 × 256, and the picture cut is stored in database In.

Further, in step S2, a series of static frames, static frames are extracted from cutting in pretreated GIF picture It is middle to there is a large amount of repetition and similar, the dynamic details first in exclusion GIF picture, then leave theme included in static frames And content, extract wherein clarity highest, the frame of camera lens content can most be described as key frame.

Further, color, texture and shape feature information are extracted from key frame as video frequency abstract and basic number The data source indexed according to library.

Further, the extracting method of key frame is specifically:

In the environment of Python 3.x, GIF picture is read frame by frame using the image class in the library PIL, judges current read Frame whether be read previous frame a part, if it is not, then saving present frame and compressing it into 256 × 256 sizes；

The pixel of the frame of compressed present frame and a upper preservation is compared, and the pixel of present frame and upper preservation frame is done Difference, then the summation that takes absolute value；Difference is saved, and smoothing processing is done to difference, finally being chosen from difference according to threshold value has generation The key frame of table.

Further, smooth discrete function used in smoothing processing is Hanning window function, formula are as follows:

Wherein, Hanning window is as the sum of 3 sinc (t) functions, and two in bracket relative to first spectrum window to the left The right side respectively moves π/t, so that secondary lobe is cancelled out each other, has eliminated High-frequency Interference and leakage energy.

Further, in step S3, convolutional neural networks are divided into six layers by structure；

First layer is input layer, by picture compression is 256 × 256 sizes when pretreatment, if input is black-and-white photograph, What is inputted is 256 × 256 two-dimentional neuron；If input is RBG color image, what is inputted is 256 × 256 three-dimensional mind Through member；

The second layer is convolutional layer, does inner product to image and filtering matrix；

Third layer is excitation layer, and excitation layer carries out a Nonlinear Mapping to the output of convolutional layer, generally uses Rule letter Number: f (x)=max (x, o)；

4th layer is pond layer, carries out dimensionality reduction operation by the characteristic pattern that convolutional layer obtains to image, output depth is constant； Layer 5 is full articulamentum, is fitted again to the feature of characteristic pattern, reduces the loss of characteristic information；

Layer 6 is output layer, exports the kernel function for calculating image feature value.

Further, in step S2, the image feature value that the key frame split out obtains after convolution is stored to basis In database.

Compared with the prior art, the advantages of the present invention are as follows be cut into static frames for dynamic picture, the static state being cut into There are many similar but different clarity frames in frame, then extracts clarity highest and recognition effect from these similar frames Best key frame.Then the key frame that will be extracted converts convolution kernel for picture by convolutional neural networks, and is stored in number This method increase the functions of search of GIF picture according to being compared in library with Target Photo, and improve the accurate of picture searching Property, the final utilization rate for improving picture searching in user's routine use function of search.

Detailed description of the invention

Fig. 1 is algorithm flow schematic diagram of the invention；

Fig. 2 is the picture raw data plot of GIF before picture smooth treatment of the invention；

Fig. 3 is the GIF image data curve of image of the invention after smoothing processing；

Fig. 4 is neural network structure used in the present invention；

Specific embodiment

Referring to Fig. 1, a kind of GIF format picture search method based on machine learning provided in this embodiment, including such as Lower step:

In step S1, GIF picture resource is crawled from internet using crawlers, and cut out to each picture It cuts, cutting size is 256 × 256, and in the database by the picture cut storage.

In step S2, a series of static frames are extracted in pretreated GIF picture from cutting, are existed in static frames a large amount of Repetition and similar, exclude the dynamic details in GIF picture, then leave theme and content included in static frames, extract Wherein clarity highest, the frame of camera lens content can most be described as key frame.

The number that color, texture and shape feature information are indexed as video frequency abstract and basic database is extracted from key frame According to source.

There are two aspects for the known purpose for extracting key frame, exclude the dynamic details in dynamic picture first, only leave Theme and content included in static frames.

When choosing key frame, selects included content information most, can most summarize the static map of remaining repeating frame Picture.

Secondly, including color, texture and shape feature from the information of the extraction in key frame, as video frequency abstract sum number Each frame picture according to the data source that library indexes, without repeating video.

In order to guarantee that the accuracy of the following comparison, the extracted information of key frame should not only include content and theme Feature, and regard the difference of feature and different.

It is usually the case that extracted key frame should be more as far as possible in order to retain the maximum information content of former camera lens, In the hope of not missing the information being intended by video.And this selection mode, it often will cause the redundancy and repetition of key frame, because Key frame to be extracted often mutually includes and mutually repeats.

In the inadequate frame of deletion clarity, while it will also leave out the excessive frame of number of repetition.Choose the primary of key frame Consider that aspect is the dissimilarity between key frame, using dissimilarity as the selection standard of key frame.Only stay next frame or several Frame can most express the key frame deposit database of camera lens content, to reach the storage space of occupancy required for reducing each cardon While, it can effectively summarize most information.

The extracting method of key frame is specifically:

Smoothing processing is as shown in Figures 2 and 3, and the smooth discrete function used is Hanning window function, formula are as follows:

In step S3, convolutional neural networks are divided into six layers by structure；

The convolutional neural networks are divided into 16 layers in total:

First layer is input layer, different from general neural network, and the input of convolutional neural networks is generally defaulted as picture, this Specific character can be incorporated into network to sample by us when input data, reduced function complexity and saved big The parameter of amount.By taking this method as an example, picture compression is become to the size of 256*256 when pretreatment, if input is that black and white is shone Piece, then what is inputted is 256 × 256 two-dimentional neuron；If it is RBG color image, then what is inputted is the three-dimensional of 3*256*256 Neuron；

The second layer is convolutional layer, and the core operation of convolutional layer is exactly exactly to do inner product, such as Fig. 4 to image and filtering matrix It is shown；

Third layer is excitation layer, and the effect of excitation layer mainly carries out a Nonlinear Mapping to convolutional layer, because of convolution Layer calculates or one kind is linear to be calculated, and generally uses Rule function: f (x)=max (x, o)；

4th layer is pond layer, if obtained characteristic pattern or bigger can pass through after picture is by convolutional layer Pond layer carries out dimensionality reduction operation, the depth of output or constant to each characteristic pattern；

Layer 5 is dropout layers, and the function of this layer is allowed in order to avoid certain features only just come into force under fixed Combination Some random nodes do not work, and network is allowed to go to learn the single spy of some universal general character rather than training sample consciously Property；

Layer 6 is to the 9th layer of repetitive operation: convolutional layer-excitation layer-pond layer-dropout layers obtains more ginsengs Number extracts more features information；

Tenth layer to the 13rd layer is continued to repeat above operation, and extracts more features information；

14th layer is full articulamentum, is fitted again to feature, reduces the loss of characteristic information；

15th layer is the same layer 5 of dropout layer function；

16th layer is output layer.(output result is the parameter information for having trained convolution kernel).

In step S2, the image feature value that the key frame split out obtains after convolution is stored into basic database

The training step of neural network are as follows:

1. selected training group seeks N number of sample as training group at random respectively from sample set；

2. each weight, threshold value are set to segment close to 0 random value, and initialize precision controlling and participate in and learn Rate；

3. choosing VGG16 as training pattern, its target output vector is provided；

4. calculating middle layer output vector, the reality output amount of network is calculated；

5. the element in output vector is compared with the element in object vector, output error is calculated；For in The hidden unit of interbed is also required to calculate error；

6. successively calculating the adjustment amount of each weight and the adjustment amount of threshold value；

7. adjusting weight and adjustment threshold value；

8. whether judge index meets required precision after undergoing M, and 3 continuation iteration are returned if being unsatisfactory for, if full Sufficient wine enters in next step；

9. training terminates, weight and threshold value are saved hereof.At this moment it is considered that each weight has arrived at stabilization, Classifier has been formed.It is trained again, directly exports weight from file and threshold value is trained, do not need to carry out initial Change.

The basic principles, main features and advantages of the invention have been shown and described above.Those skilled in the art should Understand, the present invention do not limited by above-mentioned specific embodiment, the description in above-mentioned specific embodiment and specification be intended merely into One step illustrates the principle of the present invention, and under the premise of not departing from spirit of that invention range, the present invention also has various change and changes Into these changes and improvements all fall within the protetion scope of the claimed invention.The scope of protection of present invention is by claim Book and its equivalent thereof.

Claims

1. a kind of GIF format picture search method based on machine learning, which is characterized in that include the following steps,

Step S2, pretreated GIF picture is cut, key frame therein is extracted, calculates the cryptographic Hash of key frame, and will The Hash value information and GIF picture of generation are stored to basic database；

Step S3, building convolutional neural networks model is obtained using the data training convolutional neural networks model in basic database Take the kernel function in prediction；

Step S4, target GIF picture to be searched is split, and extracts key frame, then the key frame extracted is substituted into core letter Number is calculated, and best match type is obtained；

Step S5, calculate the cryptographic Hash of the key frame of target GIF picture to be searched, and using Hash value information it is predicted that Type in search similar picture, if not having.New classification is then created in current class.

2. a kind of GIF format picture search method based on machine learning according to claim 1, which is characterized in that institute It states in step S1, crawls GIF picture resource from internet using crawlers, and cut to each picture, cut Size is 256 × 256, and in the database by the picture cut storage.

3. a kind of GIF format picture search method based on machine learning according to claim 1, which is characterized in that institute It states in step S2, extracts a series of static frames in pretreated GIF picture from cutting, there is a large amount of repeat in static frames With similar, first in exclusion GIF picture dynamic details, then theme and content included in static frames is left, extract it Middle clarity highest can most describe the frame of camera lens content as key frame.

4. a kind of GIF format picture search method based on machine learning according to claim 1 or 3, which is characterized in that The data that color, texture and shape feature information are indexed as video frequency abstract and basic database are extracted from the key frame Source.

5. a kind of GIF format picture search method based on machine learning according to claim 1, which is characterized in that institute The extracting method for stating key frame is specifically:

In the environment of Python 3.x, GIF picture is read frame by frame using the image class in the library PIL, judges the frame currently read Whether be read previous frame a part, if it is not, then saving present frame and compressing it into 256 × 256 sizes；

The pixel of the frame of compressed present frame and a upper preservation is compared, and the pixel of present frame and upper preservation frame is made the difference, Take absolute value summation again；Difference is saved, and smoothing processing is done to difference, is finally chosen from difference according to threshold value representative Key frame.

6. a kind of GIF format picture search method based on machine learning according to claim 5, which is characterized in that institute Stating smooth discrete function used in smoothing processing is Hanning window function, formula are as follows:

Wherein, Hanning window is as the sum of 3 sinc (t) functions, and two in bracket are each to the left and right relative to first spectrum window π/t is moved, so that secondary lobe is cancelled out each other, has eliminated High-frequency Interference and leakage energy.

7. a kind of GIF format picture search method based on machine learning according to claim 1, which is characterized in that institute It states in step S3, the convolutional neural networks are divided into six layers by structure；

First layer is input layer, by picture compression is that 256 × 256 sizes input if input is black-and-white photograph when pretreatment Be 256 × 256 two-dimentional neuron；If input is RBG color image, what is inputted is 256 × 256 three-dimensional nerve Member；

Third layer is excitation layer, and excitation layer carries out a Nonlinear Mapping to the output of convolutional layer, generally uses Rule function: f (x)=max (x, o)；

4th layer is pond layer, carries out dimensionality reduction operation by the characteristic pattern that convolutional layer obtains to image, output depth is constant；

Layer 5 is full articulamentum, is fitted again to the feature of characteristic pattern, reduces the loss of characteristic information；

8. a kind of GIF format picture search method based on machine learning according to claim 1, which is characterized in that institute It states in step S2, the image feature value that the key frame split out obtains after convolution is stored into basic database.