CN108537283A

CN108537283A - A kind of image classification method and convolutional neural networks generation method

Info

Publication number: CN108537283A
Application number: CN201810331479.5A
Authority: CN
Inventors: 林煜; 余清洲; 许清泉; 苏晋展; 张伟
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2018-09-14

Abstract

A kind of convolutional neural networks generation method the invention discloses image classification method, for carrying out classification processing to image, convolutional neural networks generation method, mobile terminal and computing device for the word in image to be identified, described image sorting technique is suitable for executing in the terminal, the mobile terminal includes image library, multiple images are stored in described image library, described image sorting technique includes step：To each image in image library, classification processing is carried out to obtain its corresponding classification to the image；If the classification is text class, Text region is carried out to the image, to extract the text message that the image is included；The image store path and image name of the text message and the image are associated storage.

Description

A kind of image classification method and convolutional neural networks generation method

Technical field

The present invention relates to technical field of image processing, more particularly to a kind of image classification method, for dividing image The convolutional neural networks generation method of class processing, the convolutional neural networks generation side for the word in image to be identified Method, mobile terminal and computing device.

Background technology

With the continuous development of hardware technology, more and more people begin to use such as smart mobile phone, tablet computer mobile Terminal carries out photograph taking and stores, to record precious moment.When the number of pictures preserved in mobile terminal increasingly When more, since photo is various, and different classifications is all adhered to separately, usually will appear the feelings that can not search out a certain photo in time Condition brings poor experience to user.

Image in the photograph album of mobile terminal is typically divided into each classification, according to class by existing image classification algorithms Image management is not carried out, but not carries out further operating again.Although such processing mode is conveniently used for being looked into according to classification Image is looked for, then cannot achieve but if quickly to navigate to a certain image with specific information, such as user couple one Image part above word has impression, but other content is forgotten, is just difficult to through user itself to the image at this time Classification recognizes, quickly and accurately to get information included in required image and image.Accordingly, it is desirable to provide a kind of new Image classification method improve above-mentioned processing procedure.

Invention content

For this purpose, the present invention provides a kind of image classification scheme, and provide the convolution for carrying out classification processing to image Neural network generates scheme and the convolutional neural networks for the word in image to be identified generate scheme, to try hard to solve Or at least alleviate above there are the problem of.

According to an aspect of the present invention, a kind of image classification method is provided, it is mobile whole suitable for executing in the terminal End includes image library, multiple images is stored in image library, this method comprises the following steps：First, to each in image library Image is opened, classification processing is carried out to obtain its corresponding classification to the image；If classification is text class, to the image into style of writing Word identifies, to extract the text message that the image is included；By the image store path and image of text message and the image Title is associated storage.

Optionally, in image classification method according to the present invention, when receiving the term of user's key entry, this method Further include：Search whether that there are same or similar text messages according to term；If in the presence of text information is obtained Associated image store path；Its corresponding image is found according to the image store path, by the image and text information It is shown to user.

Optionally, in image classification method according to the present invention, Text region is carried out to the image, to extract the figure As included text message the step of include：The corresponding character image region of each single word for obtaining that the image included； Text region, the word for being included with each character image region of determination are carried out to each character image region respectively；Based on each word Generate the corresponding text message of the image.

Optionally, in image classification method according to the present invention, the corresponding text envelope of the image is generated based on each word The step of breath includes：Obtain the position relationship between each character image region in the image；According to position relationship, to each word figure As the corresponding word in region is combined, to generate the corresponding text message of the image.

Optionally, it is stored in image classification method according to the present invention, in mobile terminal for dividing image Class processing, trained first convolutional neural networks carry out classification processing to obtain its corresponding image type to the image The step of include：The image is input in trained first convolutional neural networks and carries out image classification；According to the first volume The output of product neural network determines the classification of the image.

Optionally, it is stored in image classification method according to the present invention, in mobile terminal for the text in image Second convolutional neural networks that word is identified, trained carry out Text region to the image, are wrapped with extracting the image The step of text message contained includes：The corresponding character image region of each single word for obtaining that the image included；Respectively will Each character image region is input in trained second convolutional neural networks and carries out Text region, according to second convolutional Neural The output of network determines the word that each character image region is included；The corresponding text message of the image is generated based on each word.

Optionally, in image classification method according to the present invention, trained first convolutional neural networks pass through following Mode acquires：Process block is built, process block includes convolutional layer；Pond layer, full articulamentum and grader are built respectively；According to Multiple process blocks and pond layer build the first convolutional neural networks, the first convolutional neural networks in conjunction with full articulamentum and grader It is input with process block, is output with grader；According to image category data acquisition system pair the first convolution nerve net obtained in advance Network is trained, and the classification corresponding to output instruction input picture so as to grader, image category data acquisition system includes multiple Image category information, each image category information include meeting the first image classification corresponding with first image of pre-set dimension Information.

Optionally, in image classification method according to the present invention, trained second convolutional neural networks pass through following Mode acquires：The first process block is built, the first process block includes the first convolutional layer；Build second processing block, second processing Block includes the first full articulamentum；The first pond layer, the second full articulamentum and the first grader are built respectively；According to one or more First process block, the first pond layer and second processing block build the second convolution god in conjunction with the second full articulamentum and the first grader Through network, the second convolutional neural networks are input with the first process block, are output with the first grader；According to the text obtained in advance The second convolutional neural networks of word sets of image data pair are trained, to be wrapped in the output instruction input picture of the first grader The word contained, text image data set include multiple character image information, and each character image information includes meeting first in advance If text information included in the character image of size and the character image.

According to a further aspect of the invention, provide a kind of mobile terminal, including one or more processors, memory with And one or more programs, wherein one or more programs are stored in memory and are configured as by one or more processors It executes, one or more programs include the instruction for executing image classification method according to the present invention.

According to a further aspect of the invention, a kind of computer-readable storage medium of the one or more programs of storage is provided Matter, one or more programs include instruction, are instructed when by mobile terminal execution so that mobile terminal execution is according to the present invention Image classification method.

According to a further aspect of the invention, a kind of convolutional neural networks life for carrying out classification processing to image is provided At method, suitable for being executed in computing device, this method comprises the following steps：First, process block is built, process block includes convolution Layer；Pond layer, full articulamentum and grader are built respectively；According to multiple process blocks and pond layer, in conjunction with full articulamentum and classification Device builds convolutional neural networks, and convolutional neural networks are input with process block, are output with grader；According to the figure obtained in advance As categorical data set is trained convolutional neural networks, the class corresponding to output instruction input picture so as to grader Not, image category data acquisition system includes multiple images classification information, and each image category information includes meet pre-set dimension One image classification information corresponding with first image.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, Build process block the step of further include：Build active coating；Active coating is added after convolutional layer, to form process block.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, Pond layer is any in maximum pond layer and average pond layer.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, According to multiple process blocks and pond layer, include in conjunction with the step of full articulamentum and grader structure convolutional neural networks：According to pre- If concatenate rule, after each process block is connected with maximum pond layer, connection is averaged pond layer；After average pond layer The full articulamentum and grader being sequentially connected are added, to build with process block as input, with the convolutional Neural that grader is output Network.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, Convolutional neural networks are trained according to the image category data acquisition system obtained in advance, input is indicated so as to the output of grader The step of classification corresponding to image includes：To the image category information that each is extracted, wrapped with the image category information The first image included is the input of first process block in convolutional neural networks, is believed with the classification included by the image category information Breath is the output of grader, is trained to convolutional neural networks.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, The quantity of process block is 3.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, The quantity of maximum pond layer is 2, and the quantity of average pond layer is 1.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, Classification information is any one of animal class, Building class, class, landscape class, figure kind and text class in kind.

Optionally, in the convolutional neural networks generation method according to the present invention for carrying out classification processing to image, Further including the step of image category data acquisition system is generated in advance, image category data acquisition system is generated in advance includes：Each is waited for It handles picture and carries out image procossing, to obtain the first image that each pending picture is corresponding, meets pre-set dimension；To each The first image for meeting pre-set dimension obtains its corresponding pending associated classification information of picture, according to classification information and is somebody's turn to do First image generates corresponding image category information；Collect each image category information, to form image category data acquisition system.

According to a further aspect of the invention, a kind of convolutional Neural net for the word in image to be identified is provided Network generation method, suitable for being executed in computing device, this method comprises the following steps：First, the first process block is built, at first It includes the first convolutional layer to manage block；Second processing block is built, second processing block includes the first full articulamentum；The first pond is built respectively Layer, the second full articulamentum and the first grader；According to one or more first process blocks, the first pond layer and second processing block, Convolutional neural networks are built in conjunction with the second full articulamentum and the first grader, convolutional neural networks are input with the first process block, It is output with the first grader；Convolutional neural networks are trained according to the text image data set obtained in advance, so as to The word for including in the output instruction input picture of first grader, text image data set include multiple character image letters Breath, each character image information include meeting word letter included in the character image and the character image of the first pre-set dimension Breath.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, build the first process block the step of further include：Build the first active coating；The first active coating is added after the first convolutional layer, To form the first process block.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, build second processing block the step of further include：Build the second active coating；The second activation of addition after the first full articulamentum Layer, to form second processing block.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, the first pond layer is maximum pond layer.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, according to one or more first process blocks, the first pond layer and second processing block, classify in conjunction with the second full articulamentum and first Device build convolutional neural networks the step of include：According to preset first concatenate rule, by each first process block, the first pond layer After being connected with second processing block, the second full articulamentum is connected；The first grader is added after the second full articulamentum, with structure It is input to build with the first process block, with the convolutional neural networks that the first grader is output.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, convolutional neural networks are trained according to the text image data set obtained in advance, so as to the output of the first grader Indicate input picture in include word the step of include：To the character image information that each is extracted, with the character image Character image included by information is the input of first the first process block in convolutional neural networks, with the word image information institute Including text information be the first grader output, convolutional neural networks are trained.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, the quantity of the first process block is 5, and the quantity of second processing block is 1, and the quantity of the first pond layer is 3.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, text information is single word, and single word is any in numeric class word, alphabetic class word and Chinese character class word Kind.

Optionally, in the convolutional neural networks generation method according to the present invention for the word in image to be identified In, further including the step of text image data set is generated in advance, text image data set is generated in advance includes：To each Pending word picture carries out image procossing, and to obtain, each pending word picture is corresponding, meets the text of the first pre-set dimension Word image；To each character image, its corresponding pending associated text information of word picture is obtained, according to text information Character image information corresponding with character image generation；Collect each character image information, to form text image data set.

According to a further aspect of the invention, provide a kind of computing device, including one or more processors, memory with And one or more programs, wherein one or more programs are stored in memory and are configured as by one or more processors It executes, one or more programs include for executing the convolutional Neural net according to the present invention for carrying out classification processing to image The instruction of network generation method and/or convolutional neural networks generation method for the word in image to be identified.

According to a further aspect of the invention, a kind of computer-readable storage medium of the one or more programs of storage is provided Matter, one or more programs include instruction, and instruction is when executed by a computing apparatus so that computing device executes according to the present invention For carrying out the convolutional neural networks generation method of classification processing to image and/or for the word in image to be identified Convolutional neural networks generation method.

Image classification method according to the present invention, to each image in image library, first to the image classification to obtain Its corresponding classification carries out Text region if classification is text class to the image, to extract included text message, By the image store path and image name associated storage of text message and the image.In the above scheme, if receiving user When the term of key entry, it will search whether that there are same or similar text messages according to the term, and if it exists, then obtain The image store path for taking text information association finds its corresponding image, by the image according to the image store path It is shown to user with text information, to realize the quick and precisely positioning to image needed for user, greatly facilitates user Search for obscuring image information content improves usage experience.In addition, using trained first convolutional neural networks come pair Image is classified, and the word in image is identified by trained second convolutional neural networks, wherein the first volume Product neural network and the second convolutional neural networks all have smaller network structure, then relying on capable and vigorous miniature neural network real Existing image classification and Text region, may be implemented the processing in mobile phone mobile terminal or low profile edge equipment, when in use not It needs to be communicated with server end, without uploading high in the clouds, avoids to communication network, such as the dependence of 4G networks, improve Availability under no network or weak signal network, and due to being not necessarily to largely calculate service, also reduce corresponding operation dimension Protect cost.

Convolutional neural networks generation method according to the present invention for carrying out classification processing to image, the convolutional Neural net Network is miniature neural network, and structure is that each process block and maximum pond layer are carried out continuous heap according to preset concatenate rule It is folded, and accordingly connect into average pond layer, full articulamentum and grader come what is realized, ensure that the feature of extraction is substantially better than manually Design feature, to realize being obviously improved for accuracy of identification, to which False Rate be greatly reduced.Wherein, in addition to convolutional layer in process block Except, active coating can also be accordingly added, to alleviate over-fitting.After completing training to the convolutional neural networks, the training Good convolutional neural networks can be used as image classification model transplantations to mobile terminal to apply.

Convolutional neural networks generation method according to the present invention for the word in image to be identified, convolution god Structure through network is according to preset first concatenate rule by each first process block, the first pond layer and second processing block phase Even, it and accordingly connects and realizes into the second full articulamentum and the first grader, equally also ensure that extracted feature carries There is abundant image information, contributes to the promotion of accuracy of identification.Wherein, the first process block can add the first active coating, at second Reason block can add the second active coating, to alleviate over-fitting.After completing training to the convolutional neural networks, the trained volume Product neural network can be used as Text region model transplantations to mobile terminal to apply.

Description of the drawings

To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical Component or element.

Fig. 1 shows the schematic diagram of mobile terminal 100 according to an embodiment of the invention；

Fig. 2 shows the flow charts of image classification method 200 according to an embodiment of the invention；

Fig. 3 shows the structural schematic diagram of process block according to an embodiment of the invention；

Fig. 4 shows the structural schematic diagram of the first convolutional neural networks according to an embodiment of the invention；

Fig. 5 A show the structural schematic diagram of the first process block according to an embodiment of the invention；

Fig. 5 B show the structural schematic diagram of second processing block according to an embodiment of the invention；

Fig. 6 shows the structural schematic diagram of the second convolutional neural networks according to an embodiment of the invention；

Fig. 7 shows the schematic diagram of computing device 700 according to an embodiment of the invention；

Fig. 8 shows the convolutional neural networks according to an embodiment of the invention for carrying out classification processing to image The flow chart of generation method 800；And

Fig. 9 shows the convolutional Neural according to an embodiment of the invention for the word in image to be identified The flow chart of network generation method 900.

Specific implementation mode

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Fig. 1 is the structure diagram of mobile terminal 100.Mobile terminal 100 may include memory interface 102, one or more A data processor, image processor and/or central processing unit 104 and peripheral interface 106.

Memory interface 102, one or more processors 104 and/or peripheral interface 106 either discrete component, It can be integrated in one or more integrated circuits.In the mobile terminal 100, various elements can pass through one or more communication Bus or signal wire couple.Sensor, equipment and subsystem may be coupled to peripheral interface 106, a variety of to help to realize Function.

For example, motion sensor 110, light sensor 112 and range sensor 114 may be coupled to peripheral interface 106, To facilitate the functions such as orientation, illumination and ranging.Other sensors 116 can equally be connected with peripheral interface 106, such as positioning system System (such as GPS receiver), temperature sensor, biometric sensor or other sensor devices, it is possible thereby to help to implement phase The function of pass.

Camera sub-system 120 and optical sensor 122 can be used for the camera of convenient such as recording photograph and video clipping The realization of function, wherein the camera sub-system and optical sensor for example can be charge coupling device (CCD) or complementary gold Belong to oxide semiconductor (CMOS) optical sensor.It can help to realize by one or more radio communication subsystems 124 Communication function, wherein radio communication subsystem may include radio-frequency transmitter and transmitter and/or light (such as infrared) receiver And transmitter.The particular design and embodiment of radio communication subsystem 124 can depend on mobile terminal 100 is supported one A or multiple communication networks.For example, mobile terminal 100 may include be designed to support LTE, 3G, GSM network, GPRS network, EDGE network, Wi-Fi or WiMax network and Bluebooth^TMThe communication subsystem 124 of network.

Audio subsystem 126 can be coupled with loud speaker 128 and microphone 130, to help to implement to enable voice Function, such as speech recognition, speech reproduction, digital record and telephony feature.I/O subsystems 140 may include touch screen control Device 142 processed and/or other one or more input controllers 144.Touch screen controller 142 may be coupled to touch screen 146.It lifts For example, the touch screen 146 and touch screen controller 142 can be detected using any one of a variety of touch-sensing technologies The contact and movement or pause carried out therewith, wherein detection technology include but is not limited to capacitive character, resistive, infrared and table Face technology of acoustic wave.Other one or more input controllers 144 may be coupled to other input/control devicess 148, such as one Or the pointer device of multiple buttons, rocker switch, thumb wheel, infrared port, USB port, and/or stylus etc.It is described One or more button (not shown)s may include the up/down for 130 volume of controlling loudspeaker 128 and/or microphone Button.

Memory interface 102 can be coupled with memory 150.The memory 150 may include that high random access is deposited Reservoir and/or nonvolatile memory, such as one or more disk storage equipments, one or more optical storage apparatus, and/ Or flash memories (such as NAND, NOR).Memory 150 can store an operating system 172, for example, Android, iOS or The operating system of Windows Phone etc.The operating system 172 may include for handling basic system services and execution The instruction of task dependent on hardware.Memory 150 can also store program 174.It, can be from memory when mobile device is run Load operating system 172 in 150, and executed by processor 104.Program 174 at runtime, can also add from memory 150 It carries, and is executed by processor 104.Program 174 operates on operating system, is provided using operating system and bottom hardware Interface realizes the various desired functions of user, such as instant messaging, web page browsing, pictures management.Program 174 can be independently of What operating system provided, can also be that operating system is included.In addition, when program 174 is mounted in mobile terminal 100, Drive module can be added to operating system.In some embodiments, mobile terminal 100 is configured as executing according to the present invention Image classification method.Wherein, one or more programs 174 of mobile terminal 100 include for executing image according to the present invention The instruction of sorting technique 200.

Fig. 2 shows the flow charts of image classification method 200 according to an embodiment of the invention.Image classification method 200 are suitable for executing in mobile terminal (such as mobile terminal 100 shown in FIG. 1), and mobile terminal 100 includes image library, the figure As being stored with multiple images in library.According to one embodiment of present invention, the image library of mobile terminal 100 is construed as phase Volume, the image stored in photograph album, can also either the photo that user is shot by the camera of mobile terminal 100 It is to utilize other approach, such as sectional drawing or preservation, the current page shown by the screen to mobile terminal 100 carries out image preservation It is formed by picture, to this present invention and is not limited.According to one embodiment of present invention, the image of mobile terminal 100 10 images are stored in library, be denoted as respectively M1, M2 ..., M10, for ease of description, below will by taking image M1 as an example progress side The related description of method 200.

Method 200 starts from step S210, in step S 210, to each image in image library, is carried out to the image Classification processing is to obtain its corresponding classification.According to one embodiment of present invention, it is stored in mobile terminal 100 for figure Picture carries out classification processing, trained first convolutional neural networks, can carry out classification processing to the image in the following way To obtain its corresponding image type.First, which is input in trained first convolutional neural networks and carries out image Classification determines the classification of the image further according to the output of first convolutional neural networks.It for ease of understanding, below will be first to obtaining The process of trained first convolutional neural networks illustrates.

Specifically, first building process block, process block includes convolutional layer.In view of controlling over-fitting, according to the present invention One embodiment can also build active coating when building process block, active coating is added after convolutional layer, with formed place Manage block.Fig. 3 shows the structural schematic diagram of process block according to an embodiment of the invention.As shown in figure 3, in process block In, including the convolutional layer and active coating that are sequentially connected.In this embodiment, using ReLU (Rectified Linear Unit) activation primitive of the function as active coating, to adjust the output by convolutional layer, the output for avoiding next layer is last layer Linear combination and arbitrary function can not be approached.

After the structure for completing process block, then pond layer, full articulamentum and grader are built respectively.One according to the present invention Embodiment, pond layer are any in maximum pond layer and average pond layer, and pondization utilizes the principle of image local correlation, right Image carries out sub-sample, to reduce under data processing and retain useful information.

Next, according to multiple process blocks and pond layer, the first convolution nerve net is built in conjunction with full articulamentum and grader Network, the first convolutional neural networks are input with process block, are output with the grader.According to one embodiment of present invention, may be used Build the first convolutional neural networks in the following way.First, according to preset concatenate rule, by each process block and maximum pond Change layer carries out after being connected, and connects average pond layer, and the full articulamentum being sequentially connected then is added after average pond layer and is divided Class device, to build with process block as input, with the first convolutional neural networks that the grader is output.Wherein, the number of process block Amount is 3, and the quantity of maximum pond layer is 2, and the quantity of average pond layer is 1.

In this embodiment, 3 process blocks are connected with 2 maximum pond layers according to preset concatenate rule, 1 average pond layer is connected later, and adds the full articulamentum and grader being sequentially connected after average pond layer, to structure It is input to build out with 1 process block, with the first convolutional neural networks that the grader is output.Fig. 4 is shown according to the present invention One embodiment the first convolutional neural networks structural schematic diagram.As shown in figure 4, in the first convolutional neural networks, it is Using process block A1 as input terminal, behind be sequentially connected maximum pond layer B1, process block A2, process block A3, maximum pond layer B2, flat Equal pond layer C1, full articulamentum D1 and grader E1, wherein grader E1 are output end.Each processing unit illustrated in fig. 4 The order of connection is as arranged according to preset concatenate rule.It, can be according to practical application about pre-setting for concatenate rule Scene, network training situation, system configuration and performance requirement etc. are suitably adjusted, these skills for understanding the present invention program It can be readily apparent that for art personnel, and also within protection scope of the present invention, not repeated herein.

After building the first convolutional neural networks, start to be trained it.According to the image category data obtained in advance Set is trained the first convolutional neural networks, the classification corresponding to output instruction input picture so as to grader, image Categorical data set includes multiple images classification information, each image category information include meet pre-set dimension the first image and The corresponding classification information of first image.It according to one embodiment of present invention, can be in the following way to the first convolutional Neural Network is trained.In this embodiment, it to the image category information that each is extracted, is wrapped with the image category information The first image included is the input of first process block in the first convolutional neural networks, with the class included by the image category information Other information is the output of the grader, is trained to the first convolutional neural networks.Wherein, pre-set dimension be preferably 220px × 220px, the first image are RGB triple channel images, corresponding classification information be animal class, Building class, class in kind, landscape class, Any one of figure kind and text class.

Below by by taking an image category information X in image category data acquisition system as an example, to the first convolutional neural networks Training process illustrate.Image category information X includes the first image X1 classification information X2 corresponding with first image, the The size of one image X1 is 220px × 220px, and classification information X2 is text class.Training when, be with the first image X1 be processing The input of block A1, classification information X2 are that the output of grader E1 carries out the training of the first convolutional neural networks.

Table 1 shows that the parameter setting example of process block A1~A3 according to an embodiment of the invention, table 2 are shown The parameter setting example of maximum pond layer B1~B2 according to an embodiment of the invention and average pond layer C1.Wherein, right For the value of 1 the inside circle zero padding this parameter of table, " 0 " indicates to operate without boundary zero padding, and " 1 " indicates to be inputted convolutional layer The each row and each column of outside 1 pixel unit in edge of image is with 0 filling.If without particularly pointing out, it is related to boundary zero padding below Content is subject to above description.The content of Tables 1 and 2 is specifically as follows respectively：

Processing unit	Convolution kernel size	Boundary zero padding	Step-length	Convolution nuclear volume
					Process block A1	5×5	0	4	45
Process block A2	1×1	0	1	45
					Process block A3	3×3	1	1	100

Table 1

Processing unit	Pond block size	Step-length
			Maximum pond layer B1	3×3	2
Maximum pond layer B2	3×3	2
			Average pond layer C1	4×4	2

Table 2

Parameter setting is carried out to process block A1~A3 with reference to table 1, with reference to table 2 to maximum pond layer B1~B2 and average pond Layer C1 carries out parameter setting, and is based on the first image X1 processing of the above parameter pair.Specifically, first by the first image X1 inputs It is RGB triple channel images to process block A1, the first image X1, size is 220px × 220px.Convolutional layer in process block A1 has The number of parameters of 45 convolution kernels, each convolution kernel is 5 × 5 × 3, is equivalent to the convolution kernel of 45 5 × 5 sizes respectively at 3 Channel progress convolution, step-length 4, then after the convolution of the convolutional layer, according toIt is found that at this time The size of the image arrived is 54px × 54px, that is, obtains the characteristic pattern of 45 54px × 54px sizes, whereinExpression takes downwards It is whole.Since triple channel is combined carry out process of convolution in the convolutional layer, the active coating in process block A1 Input is the single channel image of 45 54px × 54px, and after the processing of the active coating, the output for obtaining process block A1 is 45 Open the characteristic pattern of 54px × 54px.

Then, into maximum pond layer B1.Maximum pond layer B1 uses Maximum overlap pond, i.e., to the spy of 54px × 54px Sign figure carries out piecemeal, and the size of each block is 3 × 3, step-length 2, and counts the maximum value of each block, as image behind pond Pixel value.According toIt is found that the characteristic pattern size of Chi Huahou is 26px × 26px, then by maximum pond After layer B1, the characteristic pattern of 45 26px × 26px is obtained.

Next, the characteristic pattern of 45 26px × 26px of maximum pond layer B1 outputs is input in process block A2, locate Convolutional layer in reason block A2 has 45 convolution kernels, and the number of parameters of each convolution kernel is 1 × 1, is equivalent to 45 1 × 1 sizes Convolution kernel carries out convolution, step-length 1.According toIt is found that the size of the image obtained at this time be 26px × 26px obtains the characteristic pattern of 45 26px × 26px sizes.Using the processing of active coating in process block A2, process block is obtained The output of A2 is the characteristic pattern of 45 26px × 26px.The characteristic pattern of this 45 26px × 26px is input to process block A3 again, Convolutional layer in process block A3 has 100 convolution kernels, and the number of parameters of each convolution kernel is 3 × 3, is equivalent to 100 3 × 3 big Small convolution kernel carries out convolution, step-length 1.By each row of outside 1 pixel unit in the edge of the convolutional layer institute input feature vector figure With each row with 0 filling, then after the convolution of the convolutional layer, according toIt is found that obtain at this time The size of image is 26px × 26px, that is, obtains the characteristic pattern of 100 26px × 26px sizes.Using being activated in process block A3 The processing of layer, the output for obtaining process block A3 is the characteristic pattern of 100 26px × 26px.

At this point, the characteristic pattern for 100 26px × 26px that process block A3 is exported is in the processing by maximum pond layer B2 Afterwards, according toIt is found that the output of maximum pond layer B2 is the characteristic pattern of 100 12px × 12px.By this Input of the characteristic pattern of 100 12px × 12px as average pond layer C1, average pond layer C1 is using average overlapping pool, i.e., Piecemeal is carried out to the characteristic pattern of 12px × 12px, the size of each block is 4 × 4, step-length 2, and counts the average value of each block, Pixel value as image behind pond.ByIt is found that the characteristic pattern size of Chi Huahou is 5px × 5px, then pass through It crosses after averagely pond layer C1, obtains the characteristic pattern of 100 5px × 5px.Hereafter, into full articulamentum D1, due to being to image Classification be identified, be classification problem more than one, and in this embodiment image category be animal class, Building class, material object This 6 type of class, landscape class, figure kind and text class is other any, therefore the output of full articulamentum D1 is also 6, right respectively The probability for answering 6 kinds of classifications to occur.It is the corresponding classification of maximum probability that grader E1, which selects softmax graders, output, such Classification information X2 that Wei be corresponding to the first image X1.It is the technological means of maturation about the content of softmax graders, this Place is not repeated.In order to train first convolutional neural networks, according to the corresponding classification information X2 of the first image X1 of input For this foreseen outcome of text class, the output of grader E1 is adjusted, by the method backpropagation of minimization error to adjust Each parameter in whole first convolutional neural networks.It is trained by a large amount of picture type information in image type data acquisition system Afterwards, trained first convolutional neural networks are obtained.

In addition, for training the image category data acquisition system of the first convolutional neural networks to need to be generated in advance, according to Another embodiment of the present invention, can be generated in advance image category data acquisition system in the following way.First, each is waited locating It manages picture and carries out image procossing, to obtain the first image that each pending picture is corresponding, meets pre-set dimension.Wherein, it presets Size is 220px × 220px, when handling pending picture, typically using the most short side of the pending picture as base The most short side is adjusted to 224px, such as the pending picture of a 112px × 200px by standard, be adjusted to 224px × The size of 400px, then cut from the middle of the picture after adjustment, with obtain the pending picture is corresponding, 220px × First image of 220px sizes.Obtain each pending picture is corresponding, meet the first image of pre-set dimension after, to each The first image is opened, its corresponding pending associated classification information of picture is obtained, is given birth to according to category information and first image At corresponding image category information, collect each image category information, to form image category data acquisition system.

Based on this, according to one embodiment of present invention, image M1 is input to trained first convolutional neural networks Middle carry out image classification, the output of grader E1 is 6 probability values, maximum in trained first convolutional neural networks Probability value is 0.77, and the 6th output for being grader E1, corresponding classification is text class, thus can determine that image M1 is corresponding Classification is text class.

In turn, step S220 is executed, if the category is text class, Text region is carried out to the image, to extract this The text message that image is included.According to one embodiment of present invention, can word knowledge be carried out to the image in the following way Not, to extract the text message that the image is included.First, the corresponding word of each single word for obtaining that the image included Image-region, then Text region is carried out to each character image region respectively, with the word that each character image region of determination is included, It is finally based on each word and generates the corresponding text message of the image.

So, according to step S210 it is found that the classification of image M1 is text class, then image M1 each lists for being included first are obtained The corresponding character image region of a word, then Text region is carried out to each character image region respectively, with each character image of determination The word that region is included.In this embodiment, it is stored in mobile terminal 100 for the word in image to be identified , trained second convolutional neural networks, can Text region be carried out to image M1 in the following way, to extract the image Including text message.First, the corresponding character image region of each single word that image M1 is included is obtained, it respectively will be each Character image region is input in trained second convolutional neural networks and carries out Text region, then, according to second convolution The output of neural network determines the word that each character image region is included, then generates the corresponding text of the image based on each word Information.

Since image M1 contains 3 single words, by the corresponding character image region of this 3 single words be denoted as Q1, Q2, Q3 then need to carry out Text region to character image region Q1, Q2 and Q3 respectively, to determine its word for being included.Below The explanation of Text region process is carried out by taking the Q1 of character image region as an example.Certainly, for ease of understanding, next first trained to obtaining The process of the second good convolutional neural networks illustrates.

Specifically, first building the first process block, the first process block includes the first convolutional layer.In view of control over-fitting is existing As according to one embodiment of present invention, when building the first process block, the first active coating can also be built, in the first convolution The first active coating is added after layer, to form the first process block.Fig. 5 A show according to an embodiment of the invention first The structural schematic diagram of process block.As shown in Figure 5A, in the first process block, including the first convolutional layer being sequentially connected and first swashs Layer living.In this embodiment, using activation letter of ReLU (the Rectified Linear Unit) functions as the first active coating Number avoids next layer of output that from can not being approached for the linear combination of last layer and appoints to adjust the output by the first convolutional layer Meaning function.

Second processing block is built again, and second processing block includes the first full articulamentum.In view of controlling over-fitting, according to One embodiment of the present of invention can also build the second active coating when building second processing block, after the first full articulamentum The second active coating is added, to form second processing block.Fig. 5 B show second processing block according to an embodiment of the invention Structural schematic diagram.As shown in Figure 5 B, in second processing block, including the first full articulamentum being sequentially connected and the second activation Layer.In this embodiment, using activation letter of ReLU (the Rectified Linear Unit) functions as the second active coating Number avoids next layer of output that from can not being approached for the linear combination of last layer to adjust the output by the first full articulamentum Arbitrary function.

After the structure for completing the first process block and second processing block, build respectively the first pond layer, the second full articulamentum and First grader.According to one embodiment of present invention, the first pond layer is maximum pond layer.

Then, according to one or more first process blocks, the first pond layer and second processing block, in conjunction with the second full articulamentum With the first grader build the second convolutional neural networks, the second convolutional neural networks with the first process block be input, with this first Grader is output.According to one embodiment of present invention, the second convolutional neural networks can be built in the following way.It is first First, according to preset first concatenate rule, after each first process block, the first pond layer are connected with second processing block, even The second full articulamentum is connect, then adds the first grader after the second full articulamentum, to build with the first process block as input, With the second convolutional neural networks that first grader is output.Wherein, the quantity of process block is 5 at first, second processing block Quantity be 1, the quantity of the first pond layer is 3.

In this embodiment, according to preset first concatenate rule by 5 the first process blocks, 3 the first pond layers and 1 A second processing block is connected, and connects the second full articulamentum later, and add the first grader after the second full articulamentum, To construct with 1 the first process block as input, with the second convolutional neural networks that first grader is output.Fig. 6 shows The structural schematic diagram of the second convolutional neural networks according to an embodiment of the invention is gone out.As shown in fig. 6, in the second convolution In neural network, be using the first process block F1 as input terminal, behind be sequentially connected the first pond layer G1, the first process block F2, One pond layer G2, the first process block F3, the first process block F4, the first process block F5, the first pond layer G3, second processing block H1, Second full articulamentum J1 and the first grader K1, wherein the first grader K1 is output end.Each processing unit illustrated in fig. 6 The order of connection is as arranged according to preset first concatenate rule.It, can basis about pre-setting for the first concatenate rule Practical application scene, network training situation, system configuration and performance requirement etc. are suitably adjusted, these are for understanding the present invention It can be readily apparent that for the technical staff of scheme, and also within protection scope of the present invention, not gone to live in the household of one's in-laws on getting married herein It states.

After building the second convolutional neural networks, start to be trained it.According to the text image data obtained in advance Set is trained the second convolutional neural networks, so as to the text for including in the output instruction input picture of first grader Word, text image data set include multiple character image information, and each character image information includes meeting the first pre-set dimension Character image and the character image included in text information.According to one embodiment of present invention, such as lower section can be passed through The second convolutional neural networks of formula pair are trained.In this embodiment, to the character image information that each is extracted, with this Character image included by character image information is the input of first the first process block in the second convolutional neural networks, with this article Text information included by word image information is the output of first grader, is trained to the second convolutional neural networks.Its In, the first pre-set dimension is preferably 114px × 114px, and character image is single channel image, and corresponding text information is single Word, single word are any in numeric class word, alphabetic class word and Chinese character class word.Numeric class word includes 0 ~9 this 10 numbers, alphabetic class word include this 26 small English alphabets of a~z and A~Z this 26 capitalization English letters, in Literary Chinese characters kind word includes 3755 first-level Chinese characters of GB 2312 (Chinese Character Set Code for Informati) standard, it is known that text Word information is any of 10+26 × 2+3755=3817 single words.

Below by by taking a character image information Y in text image data set as an example, to the second convolutional neural networks Training process illustrate.Character image information Y includes character image Y1 text information Y2 corresponding with the character image, text The size of word image Y1 is 114px × 114px, and text information Y2 is Chinese character class word " spoon ".It is with word in training The output that image Y1 is the input of the first process block F1, text information Y2 is the first grader K1 carries out the second convolutional neural networks Training.

Table 3 shows that the parameter setting example of the first process block F1~F5 according to an embodiment of the invention, table 4 are shown The parameter setting example of the first pond layer G1~G3 according to an embodiment of the invention is gone out.The content of table 3 and table 4 is specific It is as follows respectively：

Processing unit	Convolution kernel size	Boundary zero padding	Step-length	Convolution nuclear volume
					First process block F1	11×11	0	4	96
First process block F2	5×5	1	1	256
					First process block F3	3×3	1	1	384
First process block F4	3×3	1	1	384
					First process block F5	3×3	1	1	256

Table 3

Processing unit	Pond block size	Step-length
			First pond layer G1	3×3	2
First pond layer G2	3×3	2
			First pond layer G3	3×3	2

Table 4

Parameter setting is carried out to first process block F1~F5 with reference to table 3, first pond layer G1~G3 is joined with reference to table 4 Number setting, and based on the above parameter to character image Y1 processing.After character image Y1 is input to the first process block F1, warp The relevant treatment for crossing subsequent processing units, the output for obtaining the first pond layer G3 is the characteristic pattern of 256 3px × 3px.It needs Bright, first process block F1~F5 can refer to the relevant treatment of image the processing procedure of process block A2 and A3 as above, and first Pond layer G1~G3 can refer to the relevant treatment of image the processing procedure of as above maximum pond layer B1 and B2, only in parameter In setting, such as the quantity of convolution kernel and size, pond block size, step-length, whether boundary zero padding exist it is different, herein no longer It repeats.

Next, the output of the first pond layer G3 is input in second processing block H1, second processing block H1 includes successively The first connected full articulamentum and the second active coating.The characteristic pattern of above-mentioned 256 3px × 3px enters the of second processing block H1 After one full articulamentum, the characteristic pattern of 4096 1px × 1px is obtained.At this point, the characteristic pattern of 1px × 1px actually only has 1 Pixel value, therefore the output of the first full articulamentum can be considered one 1 × 4096 feature vector.By this 4096 1px × 1px Characteristic pattern be input to the active coating in second processing block H1, by the processing of the active coating, obtain the defeated of second processing block H1 Go out for the characteristic pattern of 4096 1px × 1px.

Finally, into the second full articulamentum J1, the output of second processing block H1 obtains after the second full articulamentum J1 processing Obtained the characteristic pattern of 4096 1px × 1px.It is classification problem more than one due to being that word is identified, and in the embodiment party Text information is any of 3817 single words in formula, therefore the output of the first grader K1 is also 3817, respectively The probability that corresponding 3817 single words occur, and softmax graders are selected, output is the corresponding single text of maximum probability Word, the single word are the text information Y2 corresponding to character image Y1.In order to train second convolutional neural networks, according to defeated The corresponding text information Y2 of character image Y1 entered are " spoon " this foreseen outcome, are adjusted to the output of the first grader K1 It is whole, by the method backpropagation of minimization error to adjust each parameter in the second convolutional neural networks.By character image number After a large amount of character image information is trained in set, trained second convolutional neural networks are obtained.

In addition, for training the text image data set of the second convolutional neural networks to need to be generated in advance, according to Another embodiment of the present invention, can be generated in advance character image according to set in the following way.First, pending to each Word picture carries out image procossing, and to obtain, each pending word picture is corresponding, meets the character image of the first pre-set dimension. Wherein, the first pre-set dimension is that 114px × 114px will typically be waited locating when handling pending word picture with this Reason word picture zooms to the first pre-set dimension, to form corresponding character image.Later, it to each character image, obtains Its corresponding pending associated text information of word picture, according to text information word corresponding with character image generation Image information collects each character image information, to form text image data set.

Based on this, according to one embodiment of present invention, character image region Q1 is input to trained second convolution Text region is carried out in neural network.In view of the input of the second convolutional neural networks is single channel image, it will usually to word Image-region Q1 first carries out gray proces, and the RGB triple channel images of script are converted to gray level image to generate corresponding single-pass Road image, then the single channel image is input to trained second convolutional neural networks.In turn, character image region Q1 is held After row gray proces, it is character image region R1 to obtain its corresponding single channel image, and character image region R1 is by training After the processing of the second good convolutional neural networks, the output for obtaining the first grader K1 is 3817 probability values, maximum Probability value is 0.63, and the 965th output for being the first grader K1, corresponding word is " small ", thus can determine character image The word that region Q1 is included is " small ".So, it is based on processing procedure as above, it may be determined that character image region Q2 and Q3 are included Word be " sesame " and " fiber crops " respectively.

After the word that each character image region for obtaining image M1 is included, need based on each Character generation diagram as M pairs The text message answered.According to one embodiment of present invention, can to generate the image based on each word in the following way corresponding Text message.First, the position relationship between each character image region in the image is obtained, it is right then according to the position relationship The corresponding word in each character image region is combined, to generate the corresponding text message of the image.In this embodiment, first The position relationship between character image region Q1, Q1 and Q3 is obtained, position relationship here is not limited to coordinate position, anteroposterior position Relationship, overlying relation etc. are set, character image region Q1, Q2 and Q3 is obtained as sequential position relationship side by side, recycles language Adopted correlation technology obtains the corresponding text messages of image M after being combined " small ", " sesame " and " fiber crops " be " small sesame ".It needs It is bright, the division and acquisition in character image region are carried out to image, and carry out according to location information and semantic association technology The processing that text message generates, can refer to existing mature technology, is not repeated herein.

Finally, in step S230, text information and the image store path and image name of the image are closed Connection storage.According to one embodiment of present invention, it is known that the image store path of image M1 is /storage/emulated/0/ DCIM/Camera/IMG_20171213_185253.jpg, image name IMG_20171213_185253.jpg, by text envelope Breath " small sesame " and the image store path and image name of image M1 are associated storage, for example can be stored in mobile terminal In 100 memory 150.If it is worth noting that, the text message generated includes multiple and different content, can be used as follows The symbol of scribing line etc is separated processing, such as " small sesame _ 7.59 yuan/jin ".

In practical applications, typically by the image classification model based on above-mentioned trained first convolutional neural networks, And the Text region model encapsulation based on trained second convolutional neural networks is being related to picture storage, query function etc Mobile application in, such as take pictures class application, mobile phone photo album.Before downloading this kind of mobile application of installation or mobile terminal manufacture System configuration process in, image classification model, Text region model, categorical data and lteral data etc. are directly deployed in shifting Dynamic terminal 100, shared memory space is smaller, and memory source occupancy is low, and has higher accuracy of identification and accuracy rate, response Speed can provide the user with better experience.

After the image store path for the image that text message is corresponding and image name associated storage, this can be passed through One incidence relation quickly and accurately shows the relevant text message of term institute and image of its key entry to user.According to this hair Another bright embodiment first searches whether exist and its phase when receiving the term of user's key entry according to the term Same or similar text message, and if it exists, the image store path for then obtaining text information association is stored further according to the image Path searching shows the image and text information to user to its corresponding image.In this embodiment, user keys in Term be " bank ", then found in the presence of text message similar with its according to the term, text information is " to promote trade and investment Bank _ all-purpose card _ 622588120816xxxx_ Unionpay ", wherein containing " bank " this word.Next, obtaining the text The image store path of information association, it is /storage/emulated/0/DCIM/Camera/ to obtain the image store path IMG_20171210_185214.jpg finds its corresponding image according to the image store path, which is image M2, Image M2 and text information are shown to user.

Fig. 7 is the block diagram of Example Computing Device 700.In basic configuration 702, computing device 700, which typically comprises, is System memory 706 and one or more processor 704.Memory bus 708 can be used for storing in processor 704 and system Communication between device 706.

Depending on desired configuration, processor 704 can be any kind of processing, including but not limited to：Microprocessor (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 704 may include such as The cache of one or more rank of on-chip cache 710 and second level cache 712 etc, processor core 714 and register 716.Exemplary processor core 714 may include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 718 can be with processor 704 are used together, or in some implementations, and Memory Controller 718 can be an interior section of processor 704.

Depending on desired configuration, system storage 706 can be any type of memory, including but not limited to：Easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores Device 706 may include operating system 720, one or more program 722 and program data 724.In some embodiments, Program 722 may be arranged to be executed instruction using program data 724 by one or more processors 704 on an operating system.

Computing device 700 can also include contributing to from various interface equipments (for example, output equipment 742, Peripheral Interface 744 and communication equipment 746) to basic configuration 702 via the communication of bus/interface controller 730 interface bus 740.Example Output equipment 742 include graphics processing unit 748 and audio treatment unit 750.They can be configured as contribute to via One or more port A/V 752 is communicated with the various external equipments of such as display or loud speaker etc.Outside example If interface 744 may include serial interface controller 754 and parallel interface controller 756, they, which can be configured as, contributes to Via one or more port I/O 758 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set Standby 746 may include network controller 760, can be arranged to convenient for via one or more communication port 764 and one The communication that other a or multiple computing devices 762 pass through network communication link.

Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or the computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can To include any information delivery media." modulated data signal " can such signal, one in its data set or more It is a or it change can the mode of coding information in the signal carry out.As unrestricted example, communication media can be with Include the wire medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein may include depositing Both storage media and communication media.

Computing device 700 can be implemented as server, such as file server, database server, application program service Device and WEB server etc. can also be embodied as a part for portable (or mobile) electronic equipment of small size, these electronic equipments Can be such as cellular phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individual Helmet, application specific equipment or may include any of the above function mixing apparatus.Computing device 700 can also be real It includes desktop computer and the personal computer of notebook computer configuration to be now.

In some embodiments, computing device 700 is configured as executing according to the present invention for classifying to image The convolutional neural networks generation method of processing and/or the convolutional neural networks generation side for the word in image to be identified Method.Wherein, one or more programs 722 of computing device 700 include according to the present invention for being carried out to image for executing The convolutional neural networks generation method 800 handled of classifying and/or the convolutional Neural net for the word in image to be identified The instruction of network generation method 900.

Fig. 8 shows that the convolutional neural networks according to an embodiment of the invention for carrying out classification processing to image are given birth to At the flow chart of method 800.Convolutional neural networks generation method 800 for carrying out classification processing to image is suitable for setting in calculating It is executed in standby (such as computing device 700 shown in Fig. 7).

As shown in figure 8, method 800 starts from step S810.In step S810, process block is built, process block includes convolution Layer.According to one embodiment of present invention, process block can be built in the following way.First, active coating is built, then in the volume The active coating is added after lamination, to form process block.

Then, S820 is entered step, builds pond layer, full articulamentum and grader respectively.Wherein, pond layer is maximum pond Change any in layer and average pond layer.

Next, in step S830, according to multiple process blocks and pond layer, in conjunction with full articulamentum and grader structure volume Product neural network, the convolutional neural networks are input with process block, are output with the grader.An implementation according to the present invention Example can build convolutional neural networks in the following way according to multiple process blocks and pond layer in conjunction with full articulamentum and grader. In this embodiment, first according to preset concatenate rule, after each process block is connected with maximum pond layer, connection is average Pond layer, then the full articulamentum and grader being sequentially connected are added after average pond layer, to build with process block as input, With the convolutional neural networks that the grader is output.Wherein, the quantity of process block is 3, and the quantity of maximum pond layer is 2, average The quantity of pond layer is 1.

Finally, step S840 is executed, the convolutional neural networks are carried out according to the image category data acquisition system obtained in advance Training, the classification corresponding to output instruction input picture so as to the grader, image category data acquisition system includes multiple images Classification information, each image category information include meeting the first image classification letter corresponding with first image of pre-set dimension Breath.According to one embodiment of present invention, the convolutional neural networks can be trained in the following way.Specifically, to every The one image category information extracted is the in the convolutional neural networks with the first image included by the image category information The input of one process block, with the output that the classification information included by the image category information is the grader, to convolution god It is trained through network.Wherein, classification information is in animal class, Building class, class in kind, landscape class, figure kind and text class It is any.

It is according to the present invention for training the image type data acquisition system of the convolutional neural networks to need to be generated in advance Image type data acquisition system can be generated in advance in another embodiment in the following way.Figure is carried out to each pending picture As processing, to obtain the first image that each pending picture is corresponding, meets pre-set dimension, to each sufficient pre-set dimension that is filled First image obtains its corresponding pending associated classification information of picture, is generated according to category information and first image Corresponding image category information, collects each image category information, to form image category data acquisition system.

It should be noted that generating the convolutional Neural for carrying out classification processing to image in above-mentioned steps S810~S840 The process of network, and the process for the image type data acquisition system for training the convolutional neural networks is generated in advance, handle details And embodiment can be found in the related content for being related to the first convolutional neural networks in method 200 in step S210, details are not described herein again.

Fig. 9 shows the convolutional Neural net according to an embodiment of the invention for the word in image to be identified The flow chart of network generation method 900.Convolutional neural networks generation method 900 for the word in image to be identified is suitable for It is executed in computing device (such as computing device 700 shown in Fig. 7).

As shown in figure 9, method 900 starts from step S910.In step S910, the first process block, the first process block are built Including the first convolutional layer.According to one embodiment of present invention, the first process block can be built in the following way.First, it builds Then first active coating adds first active coating after first convolutional layer, to form the first process block.

In step S920, second processing block is built, second processing block includes the first full articulamentum.According to the present invention one A embodiment can build second processing block in the following way.First, the second active coating is built, then in the first full connection Second active coating is added after layer, to form second processing block.

Then, S930 is entered step, builds the first pond layer, the second full articulamentum and the first grader respectively.Wherein, One pond layer is maximum pond layer.

Next, in step S940, according to one or more first process blocks, the first pond layer and second processing block, Convolutional neural networks are built in conjunction with the second full articulamentum and the first grader, which is defeated with the first process block Enter, is output with first grader.It according to one embodiment of present invention, can be in the following way according to one or more the One process block, the first pond layer and second processing block build convolutional neural networks in conjunction with the second full articulamentum and the first grader. In this embodiment, according to preset first concatenate rule, by each first process block, the first pond layer and second processing block into After row is connected, the second full articulamentum is connected, the first grader is added after the second full articulamentum, to build with the first process block It is the convolutional neural networks exported with first grader for input.Wherein, the quantity of the first process block is 5, second processing block Quantity be 1, the quantity of the first pond layer is 3.

Finally, step S950 is executed, the convolutional neural networks are carried out according to the text image data set obtained in advance Training, so as to the word for including in the output instruction input picture of the first grader, text image data set includes multiple texts Word image information, each character image information include meeting included in the character image and the character image of the first pre-set dimension Text information.According to one embodiment of present invention, the convolutional neural networks can be trained in the following way.Specifically , to the character image information that each is extracted, with the character image included by the word image information for the convolutional Neural The input of first the first process block in network is first grader with the text information included by the word image information Output, is trained the convolutional neural networks.Wherein, text information is single word, and single word is numeric class word, word It is any in female class word and Chinese character class word.

It is according to the present invention for training the text image data set of the convolutional neural networks to need to be generated in advance Text image data set can be generated in advance in another embodiment in the following way.To each pending word picture into Row image procossing, with obtain each pending word picture it is corresponding, meet the first pre-set dimension character image, to each Zhang Wen Word image obtains its corresponding pending associated text information of word picture, is given birth to according to the text information and the character image At corresponding character image information, collect each character image information, to form text image data set.

It should be noted that generating the convolution for the word in image to be identified in above-mentioned steps S910~S950 The process of neural network, and the process for the text image data set for training the convolutional neural networks is generated in advance, processing Details and embodiment can be found in the related content for being related to the second convolutional neural networks in method 200 in step S220, herein no longer It repeats.

Image in the photograph album of mobile terminal is typically divided into each classification, according to class by existing image classification algorithms Image management is not carried out, but not carries out further operating again, if quickly to navigate to a certain figure with specific information As then cannot achieve.Image classification method according to the ... of the embodiment of the present invention, to each image in image library, first to the image Classification is to obtain its corresponding classification, if classification is text class, carries out Text region to the image, is included to extract Text message, by the image store path and image name associated storage of text message and the image.In the above scheme, if connecing When receiving the term of user's key entry, it will search whether that there are same or similar text messages according to the term, if In the presence of, then the image store path of text information association is obtained, its corresponding image is found according to the image store path, The image and text information are shown to user, it is greatly square to realize the quick and precisely positioning to image needed for user Search of the user for obscuring image information content, improve usage experience.In addition, utilizing trained first convolutional Neural Network is identified the word in image by trained second convolutional neural networks to classify to image, In the first convolutional neural networks and the second convolutional neural networks all have smaller network structure, then relying on capable and vigorous small-sized god Through real-time performance image classification and Text region, the processing in mobile phone mobile terminal or low profile edge equipment may be implemented, It need not be communicated with server end when use, without uploading high in the clouds, be avoided to communication network, such as the dependence of 4G networks Property, the availability under no network or weak signal network is improved, and due to being not necessarily to largely calculate service, also reduced corresponding Operation maintenance cost.

A5. the method as described in any one of A1-4 is stored with for being carried out at classification to image in the mobile terminal Reason, trained first convolutional neural networks, the described pair of image carry out classification processing to obtain its corresponding image type The step of include：The image is input in trained first convolutional neural networks and carries out image classification；According to the first volume The output of product neural network determines the classification of the image.A6. the method as described in any one of A1-5 is deposited in the mobile terminal It contains for second convolutional neural networks that the word in image is identified, trained, the described pair of image is into style of writing Word identifies, includes the step of text message that the image is included to extract：Obtain each single word that the image is included Corresponding character image region；Each character image region is input in trained second convolutional neural networks into style of writing respectively Word identifies, the word that each character image region is included is determined according to the output of second convolutional neural networks；Based on each word Generate the corresponding text message of the image.A7. the method as described in A5 or 6, trained first convolutional neural networks are logical Following manner is crossed to acquire：Process block is built, the process block includes convolutional layer；Respectively build pond layer, full articulamentum and Grader；According to multiple process blocks and pond layer, the first convolutional neural networks, institute are built in conjunction with the full articulamentum and grader It is input that the first convolutional neural networks, which are stated, with process block, is output with the grader；According to the image category number obtained in advance First convolutional neural networks are trained according to set, corresponding to the output instruction input picture so as to the grader Classification, described image categorical data set include multiple images classification information, and each image category information includes meeting to preset ruler The first very little image classification information corresponding with first image.A8. the method as described in any one of A5-7, it is described to train The second convolutional neural networks acquire in the following manner：The first process block is built, first process block includes first Convolutional layer；Second processing block is built, the second processing block includes the first full articulamentum；The first pond layer, second are built respectively Full articulamentum and the first grader；According to one or more first process blocks, the first pond layer and second processing block, in conjunction with described Second full articulamentum and the first grader build the second convolutional neural networks, and second convolutional neural networks are with the first process block It is output with first grader for input；According to the text image data set obtained in advance to second convolution god It is trained through network, so as to the word for including in the output instruction input picture of first grader, the character image Data acquisition system includes multiple character image information, each character image information include meet the first pre-set dimension character image and Text information included in the character image.

B12. the step of method as described in B11, the structure process block further includes：Build active coating；In the convolution The active coating is added after layer, to form process block.B13. the method as described in B11 or 12, the pond layer are maximum pond Change any in layer and average pond layer.B14. the method as described in B13, it is described according to multiple process blocks and pond layer, in conjunction with The full articulamentum and grader structure convolutional neural networks the step of include：According to preset concatenate rule, by each process block After being connected with maximum pond layer, average pond layer is connected；What addition was sequentially connected after the average pond layer connects entirely Layer and grader are connect, to build with process block as input, with the convolutional neural networks that the grader is output.B15. such as B11- Method described in any one of 14, the image category data acquisition system that the basis obtains in advance carry out the convolutional neural networks It trains, includes so as to the step of the indicating the classification corresponding to input picture that export of the grader：Each is extracted Image category information is first process block in the convolutional neural networks with the first image included by the image category information Input, with the classification information included by the image category information be the grader output, to the convolutional neural networks It is trained.B16. the quantity of the method as described in any one of B11-15, the process block is 3.B17. as appointed in B14-16 The quantity of method described in one, maximum pond layer is 2, and the quantity of the average pond layer is 1.B18. as in B11-17 Any one of them method, the classification information are in animal class, Building class, class in kind, landscape class, figure kind and text class It is any.B19. the method as described in any one of B11-18 further includes that image category data acquisition system is generated in advance, described advance Generate image category data acquisition system the step of include：Image procossing is carried out to each pending picture, it is each pending to obtain Picture is corresponding, meets the first image of pre-set dimension；To the first image of each sufficient pre-set dimension that is filled, it is corresponding to obtain its The pending associated classification information of picture, according to classification information image category information corresponding with the first image generation； Collect each image category information, to form image category data acquisition system.

C21. the step of method as described in C20, the first process block of the structure further includes：Build the first active coating； First active coating is added after first convolutional layer, to form the first process block.C22. the side as described in C20 or 21 The step of method, the structure second processing block further includes：Build the second active coating；Institute is added after the described first full articulamentum The second active coating is stated, to form second processing block.C23. the method as described in any one of C20-22, first pond layer are Maximum pond layer.C24. the method as described in any one of C20-23, it is described according to one or more first process blocks, the first pond Change layer and second processing block, includes in conjunction with the step of the described second full articulamentum and the first grader structure convolutional neural networks： According to preset first concatenate rule, after each first process block, the first pond layer are connected with second processing block, connection the Two full articulamentums；First grader is added after the described second full articulamentum, to build with the first process block as input, With the convolutional neural networks that first grader is output.C25. the method as described in any one of C20-24, the basis The text image data set obtained in advance is trained the convolutional neural networks, so as to the output of first grader Indicate input picture in include word the step of include：To the character image information that each is extracted, with the character image Character image included by information is the input of first the first process block in the convolutional neural networks, is believed with the character image The included text information of breath is the output of first grader, is trained to the convolutional neural networks.C26. such as The quantity of method described in any one of C20-25, first process block is 5, and the quantity of the second processing block is 1, described The quantity of first pond layer is 3.C27. the method as described in any one of C20-26, the text information are single word, institute It is any in numeric class word, alphabetic class word and Chinese character class word to state single word.C28. as any in C20-27 Method described in, further includes that text image data set is generated in advance, the step that text image data set is generated in advance Suddenly include：Image procossing is carried out to each pending word picture, each pending word picture is corresponding, meets the to obtain The character image of one pre-set dimension；To each character image, its corresponding pending associated word letter of word picture is obtained Breath, according to text information character image information corresponding with character image generation；Collect each character image information, with shape At word sets of image data.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice without these specific details.In some instances, well known method, knot is not been shown in detail Structure and technology, so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：It is i.e. required to protect Shield the present invention claims the feature more features than being expressly recited in each claim.More precisely, as following As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, it abides by Thus the claims for following specific implementation mode are expressly incorporated in the specific implementation mode, wherein each claim itself As a separate embodiment of the present invention.

Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Between can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or be segmented into addition multiple Submodule.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in the one or more equipment different from the embodiment.It can be the module or list in embodiment Member or group between be combined into one between module or unit or group, and can be divided into addition multiple submodule or subelement or Between subgroup.Other than such feature and/or at least some of process or unit exclude each other, it may be used any Combination is disclosed to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, abstract and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

In addition, be described as herein can be by the processor of computer system or by executing for some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device：The device is used to implement performed by the element by the purpose in order to implement the invention Function.

Various technologies described herein are realized together in combination with hardware or software or combination thereof.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take embedded tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other arbitrary machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to put into practice this hair Bright equipment.

In the case where program code executes on programmable computers, computing device generally comprises processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is configured for storage program code；Processor is configured for according to the memory Instruction in the said program code of middle storage executes the image classification method of the present invention, for carrying out classification processing to image Convolutional neural networks generation method and/or convolutional neural networks generation method for the word in image to be identified.

By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates Machine readable medium includes computer storage media and communication media.Computer storage media storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed embodies computer-readable instruction, data structure, program module or other data, and includes that any information passes Pass medium.Above any combination is also included within the scope of computer-readable medium.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being described in this way must Must have the time it is upper, spatially, in terms of sequence or given sequence in any other manner.

Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this Many modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to this The done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of image classification method, suitable for executing in the terminal, the mobile terminal includes image library, described image library In be stored with multiple images, the method includes the steps：

To each image in image library, classification processing is carried out to obtain its corresponding classification to the image；

If the classification is text class, Text region is carried out to the image, to extract the text message that the image is included；

The image store path and image name of the text message and the image are associated storage.

2. the method as described in claim 1, when receiving the term of user's key entry, the method further includes：

Search whether that there are same or similar text messages according to the term；

If in the presence of the image store path of text information association is obtained；

Its corresponding image is found according to the image store path, the image and text information are shown to user.

3. method as claimed in claim 1 or 2, the described pair of image carries out Text region, is included to extract the image Text message the step of include：

The corresponding character image region of each single word for obtaining that the image included；

Text region, the word for being included with each character image region of determination are carried out to each character image region respectively；

The corresponding text message of the image is generated based on each word.

4. method as claimed in claim 3, described the step of generating the image corresponding text message based on each word, includes：

Obtain the position relationship between each character image region in the image；

According to the position relationship, the corresponding word in each character image region is combined, to generate the corresponding text of the image This information.

5. a kind of mobile terminal, including：

One or more processors；

Memory；And

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include for executing according to described in any one of claim 1-4 Method instruction.

6. a kind of computer readable storage medium of the one or more programs of storage, one or more of programs include instruction, Described instruction is when by mobile terminal execution so that the mobile terminal execution method as claimed in one of claims 1-4.

7. a kind of convolutional neural networks generation method for carrying out classification processing to image, suitable for being executed in computing device, The method includes the steps：

Process block is built, the process block includes convolutional layer；

Pond layer, full articulamentum and grader are built respectively；

According to multiple process blocks and pond layer, convolutional neural networks, the convolution are built in conjunction with the full articulamentum and grader Neural network is input with process block, is output with the grader；

The convolutional neural networks are trained according to the image category data acquisition system obtained in advance, so as to the grader Classification corresponding to output instruction input picture, described image categorical data set includes multiple images classification information, Mei Getu As classification information includes meeting the first image classification information corresponding with first image of pre-set dimension.

8. a kind of convolutional neural networks generation method for the word in image to be identified, suitable for being held in computing device Row, the method includes the steps：

The first process block is built, first process block includes the first convolutional layer；

Second processing block is built, the second processing block includes the first full articulamentum；

The first pond layer, the second full articulamentum and the first grader are built respectively；

According to one or more first process blocks, the first pond layer and second processing block, in conjunction with the described second full articulamentum and the One grader builds convolutional neural networks, and the convolutional neural networks are input with the first process block, with first grader For output；

The convolutional neural networks are trained according to the text image data set obtained in advance, so as to first classification The word for including in the output instruction input picture of device, the text image data set includes multiple character image information, often A character image information includes meeting text information included in the character image and the character image of the first pre-set dimension.

9. a kind of computing device, including：

One or more processors；

Memory；And

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, one or more of programs include for execute according to the method for claim 7 and/or The instruction of method according to any one of claims 8.

10. a kind of computer readable storage medium of the one or more programs of storage, one or more of programs include instruction, Described instruction is when executed by a computing apparatus so that the computing device is executed according to the method for claim 7 and/or weighed Profit requires the method described in 8.