CN110232417A - Image-recognizing method, device, computer equipment and computer readable storage medium - Google Patents

Image-recognizing method, device, computer equipment and computer readable storage medium Download PDF

Info

Publication number
CN110232417A
CN110232417A CN201910523751.4A CN201910523751A CN110232417A CN 110232417 A CN110232417 A CN 110232417A CN 201910523751 A CN201910523751 A CN 201910523751A CN 110232417 A CN110232417 A CN 110232417A
Authority
CN
China
Prior art keywords
fisrt feature
subgraph
image
feature
ray
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910523751.4A
Other languages
Chinese (zh)
Other versions
CN110232417B (en
Inventor
胡益清
姜德强
刘银松
叶朝萍
任博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910523751.4A priority Critical patent/CN110232417B/en
Publication of CN110232417A publication Critical patent/CN110232417A/en
Application granted granted Critical
Publication of CN110232417B publication Critical patent/CN110232417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/13Sensors therefor
    • G06V40/1318Sensors therefor using electro-optical elements or layers, e.g. electroluminescent sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1347Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1365Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of image-recognizing method, device, computer equipment and computer readable storage mediums, belong to image technique field.The present invention is by carrying out feature extraction to images to be recognized, generate fisrt feature figure, significance level of the characteristic point based on subgraph in the fisrt feature figure in the fisrt feature figure, the fisrt feature figure is decoded, in decoding process, when any subgraph with the maximum characteristic point of significance level is identical relative to the position of the fisrt feature figure in previous subgraph when, it can then determine that the effective informations such as the character for including in the image are decoded, the computer equipment terminates decoding process, exports the effective informations such as the character for including in the image according to the decoding result of acquisition.This image recognition mode, can judge whether the effective information in image decodes completion in decoding process, to terminate decoding process in advance, to reduce the operand in image recognition processes, improve image recognition efficiency.

Description

Image-recognizing method, device, computer equipment and computer readable storage medium
Technical field
The present invention relates to image technique field, in particular to a kind of image-recognizing method, device, computer equipment and calculating Machine readable storage medium storing program for executing.
Background technique
With the development of machine learning techniques, computer equipment can be based on deep neural network to the word for including in image The information such as symbol are identified.Currently, generalling use the image recognition based on deep neural network building in image recognition tasks Model carries out feature extraction to image to be identified, obtains the characteristic pattern of image, be decoded to characteristic pattern, to obtain image In include the information such as character.
But in the picture, the effective informations such as character usually only occupy a part of region of image, other than effective information There is also biggish white space, in above-mentioned image recognition mode, image recognition model need to each region to image all It is decoded, the region for not including the effective informations such as character is decoded, and the operand that will lead to image recognition increases, figure Time-consuming as identification increases, and recognition efficiency reduces.
Summary of the invention
The embodiment of the invention provides a kind of image-recognizing method, device, computer equipment and computer-readable storage mediums Matter, the problem of can solve image recognition low efficiency in the related technology.The technical solution is as follows:
On the one hand, a kind of image-recognizing method is provided, this method comprises:
Obtain image to be identified;
By the image input picture identification model, feature extraction is carried out to the image by the image recognition model, obtains the One characteristic pattern, based on significance level of the characteristic point in the fisrt feature figure in the fisrt feature figure, to the fisrt feature figure into Row decoding, when detecting that significance level is maximum in any subgraph and previous subgraph in the fisrt feature figure in decoding process When position of the characteristic point in the fisrt feature figure is identical, terminate decoding, the feature vector that output decoding obtains;
The feature vector of image recognition model output is decoded, the character information that the image is included is obtained.
On the one hand, a kind of pattern recognition device is provided, which includes:
Module is obtained, for obtaining image to be identified;
The image input picture identification model is carried out feature to the image by the image recognition model and mentioned by output module Take, obtain fisrt feature figure, based on significance level of the characteristic point in the fisrt feature figure in the fisrt feature figure, to this first Characteristic pattern is decoded, when detecting important journey in any subgraph and previous subgraph in the fisrt feature figure in decoding process Spend position of the maximum characteristic point in the fisrt feature figure it is identical when, terminate decoding, the obtained feature vector of output decoding;
Decoder module, the feature vector for exporting to the image recognition model are decoded, and are obtained the image and are included Character information.
On the one hand, provide a kind of computer equipment, the computer equipment include one or more processors and one or Multiple memories are stored at least one program code in the one or more memory, and at least one program code is by this One or more processors are loaded and are executed to realize operation performed by the image-recognizing method.
On the one hand, a kind of computer readable storage medium is provided, is stored at least in the computer readable storage medium One program code, at least one program code are loaded as processor and are executed to realize performed by the image-recognizing method Operation.
Technical solution provided in an embodiment of the present invention generates fisrt feature by carrying out feature extraction to images to be recognized Figure, significance level of the characteristic point based on subgraph in the fisrt feature figure in the fisrt feature figure, to the fisrt feature figure into Row decoding, in decoding process, when the maximum characteristic point of significance level is first special relative to this in any subgraph and previous subgraph When the position of sign figure is identical, then it can determine that the effective informations such as the character for including in the image are decoded, the computer equipment Decoding process is terminated, the effective informations such as the character for including in the image are exported according to the decoding result of acquisition.This image recognition Mode, can judge whether the effective information in image decodes completion in decoding process, so that decoding process is terminated in advance, with Reduce the operand in image recognition processes, improves image recognition efficiency.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of structural block diagram of image identification system provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of image-recognizing method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of length provided in an embodiment of the present invention memory network in short-term;
Fig. 4 is a kind of structural schematic diagram of encoder provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram for encoding embedding grammar provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of decoder provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram that decoding process is truncated provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of image recognition result provided in an embodiment of the present invention;
Fig. 9 is a kind of sample image schematic diagram of construction provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of pattern recognition device provided in an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention;
Figure 12 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
The technical process of embodiment to facilitate the understanding of the present invention, below to some nouns involved in the embodiment of the present invention It explains:
Attention mechanism (Attention Mechanism): being fast from bulk information using limited attention resource Speed filters out the means of high price value information.Visual attention mechanism is brain signal treatment mechanism specific to human vision.People Class vision obtains the target area for needing to pay close attention to, that is, general described attention by quickly scanning global image Focus to obtain the detailed information of more required concern targets, and presses down then to the more attention resources of this regional inputs Make other garbages.It is various not that attention mechanism is widely used in natural language processing, image recognition and speech recognition etc. It is one of the core technology for most meriting attention and understanding in depth in depth learning technology in the deep learning task of same type.
To sum up, mainly there are two aspects for attention mechanism: first is that determining needs pay close attention to which part of input;Second is that being assigned The messaging resource of limit is to part and parcel.Attention mechanism in deep learning inherently make peace the mankind selectivity view Feel that attention mechanism is similar, core objective is also to comform to select the information more crucial to current task in multi information.
Characteristic pattern: being the character matrix for being used to indicate characteristics of image.During image characteristics extraction, computer equipment can To carry out convolution algorithm to image by least one convolutional layer in convolutional neural networks, a convolutional layer can export one Convolution algorithm is as a result, using the convolution algorithm result as the characteristic pattern of the image.In embodiments of the present invention, by the convolutional Neural Fisrt feature figure of the characteristic pattern that the last one convolutional layer exports in network as the image.
Subgraph: by one group of feature point group in fisrt feature figure at.In the decoding process of the fisrt feature figure, the calculating Machine equipment can successively scan each region of the fisrt feature figure, using one group of characteristic point for including in each region as this One subgraph of one characteristic pattern, according to scanning sequency, the subgraph that adjacent twice sweep obtains can be referred to as subgraph and The previous subgraph of the subgraph.
Fig. 1 is a kind of structural block diagram of image identification system provided in an embodiment of the present invention.The image identification system 100 packet It includes: terminal 110 and Platform of Image Recognition 140.
Terminal 110 is connected by wireless network or cable network with Platform of Image Recognition 110.Terminal 110 can be intelligence Mobile phone, game host, desktop computer, tablet computer, E-book reader, MP3 player, MP4 player and it is on knee just Take at least one of computer.110 installation and operation of terminal has the application program for supporting image recognition.The application program can be with It is character recognition class application program etc..Schematically, terminal 110 is the terminal that user uses, the application journey run in terminal 110 User account is logged in sequence.
Terminal 110 is connected by wireless network or cable network with Platform of Image Recognition 140.
Platform of Image Recognition 140 include a server, multiple servers, cloud computing platform and virtualization center in extremely Few one kind.Platform of Image Recognition 140 is used to that the application program of image recognition to be supported to provide background service.Optionally, image is known Other platform 140 undertakes main identification work, and terminal 110 undertakes secondary identification work;Alternatively, Platform of Image Recognition 140 undertakes time Identify work, terminal 110 undertakes main identification work;Alternatively, Platform of Image Recognition 140 or terminal 110 respectively can be independent Undertake identification work.
Optionally, Platform of Image Recognition 140 includes: access server, image recognition server and database.Access service Device is used to provide access service for terminal 110.Image recognition server is for providing image recognition related background service, the figure As identification server can carry graphics processor, support graphics processor multithreads computing.Image recognition server can To be one or more.When image recognition server is more, there are at least two image recognition servers for providing not With service, and/or, there are at least two image recognition servers for providing identical service, such as with load balancing side Formula provides same service, and the embodiment of the present application is not limited this.Image knowledge can be set in image recognition server Other model, in model training and application process, which can be with carrying image processor GPU (Graphics Processing Unit, graphics processor), and support image processor concurrent operation.In the embodiment of the present application, the image Identification model is based on identification model constructed by attention mechanism.
Terminal 110 can refer to one in multiple terminals, and the present embodiment is only illustrated with terminal 110.
Those skilled in the art could be aware that the quantity of above-mentioned terminal can be more or less.For example above-mentioned terminal can be with Only one perhaps above-mentioned terminal be tens or several hundred or greater number, above-mentioned image identification system further includes at this time Other terminals.The embodiment of the present application is not limited the quantity and device type of terminal.
Fig. 2 is a kind of flow chart of image-recognizing method provided in an embodiment of the present invention.This method can be applied to above-mentioned Terminal or server, and terminal and server can be considered as a kind of computer equipment, therefore, the embodiment of the present invention is based on It calculates machine equipment to be introduced as executing subject, referring to fig. 2, which can specifically include following steps:
201, computer equipment obtains image to be identified.
It may include at least one character in the image to be identified, for example, algorithm, text etc..This is to be identified Image can be the width being stored in computer equipment or one group of image, or the computer equipment intercepts in video Image, can also be the image for having the computer equipment of image collecting function and acquiring in real time, the embodiment of the present invention is to specific Without limitation using which kind of image.
202, the computer equipment is by the image input picture identification model.
The character information that the image recognition model includes in image for identification, the image recognition model can be for based on depths The model of neural network design is spent, for example, the deep neural network can (Recurrent Neural Network, be followed for RNN Ring neural network), CNN (Convolutional Neural Networks, convolutional neural networks) etc..
The image of arbitrary dimension can be inputted the Model of Target Recognition by the computer equipment, can also be by Image Adjusting Target size inputs the image recognition model again.In a kind of possible implementation, which inputs mesh for the image Mark identification model before, in the light of actual conditions the image can be zoomed in and out, using by the size adjusting of the image as target ruler It is very little.Wherein, which can be determined by the training process of the image recognition model, in a kind of possible implementation, The width and height of sample image can be counted in the training process of the image recognition model, it can be by statistical magnitude most Target size of the high width and high numerical value as the mode input image, can also be to all wide and high numerical value point of statistics It is not averaged, using the average value as the target size of the mode input image.
After the image is inputted Model of Target Recognition by the computer equipment, which can be to the image of input It is pre-processed, converts the image to the character matrix being made of multiple pixel values, so that computer equipment progress is subsequent Calculating process.
203, the computer equipment carries out feature extraction to the image by the image recognition model, obtains fisrt feature Figure.
In embodiments of the present invention, which can pass through one or more convolutional layers in image recognition model Feature extraction is carried out to the image, generates fisrt feature figure.In a kind of possible implementation, which be can wrap Multiple convolutional layers are included, firstly, the corresponding character matrix of the figure and a convolutional layer are carried out convolution algorithm by the computer equipment, Characteristics of image is extracted, using the convolution algorithm result of the convolutional layer as the characteristic pattern of the image, then, this feature figure is inputted Next convolutional layer continues convolution algorithm, finally, the characteristic pattern that the computer equipment is exported based on the last one convolutional layer Generate fisrt feature figure.
Specifically, above-mentioned convolution algorithm process is illustrated by taking one of convolutional layer as an example, convolutional layer can be with Including at least one convolution kernel, the corresponding scanning window of each convolution kernel, the size of the scanning window is big with the convolution kernel Small identical, during convolution kernel carries out convolution algorithm, which can slide on characteristic pattern according to target step, Successively scan each region of this feature figure, wherein the target step can be configured by developer.With a convolution kernel For, during convolution algorithm, when the scanning window of the convolution kernel slides into any region of characteristic pattern, the computer Equipment reads the corresponding numerical value of each characteristic point in the region, and convolution kernel numerical value corresponding with each characteristic point is carried out a little Multiplication, then add up to each product, using accumulation result as a characteristic point.Later, the scanning window of the convolution kernel The subsequent region of characteristic pattern is slid into according to target step, carries out convolution algorithm again, exports a characteristic point, until this feature The whole region of figure is scanned, by whole feature point groups of output at a new characteristic pattern, as next convolutional layer Input.
The number of convolution kernel can be by exploit person in the number of convolutional layer and each convolutional layer in the image recognition model Member's setting, it is in embodiments of the present invention, optimal for the recognition effect that reaches the image recognition model, convolution number of layers can be set 5 are set to, the convolution kernel number in each convolutional layer is followed successively by 64,128,256,256,512, and the size of each convolution kernel is 3*3。
It is more reasonable to be distributed the input data of each convolutional layer, the Generalization Capability of the image recognition model is improved, is prevented Over-fitting, the characteristic pattern which can also export each convolutional layer optimize.In a kind of possible implementation In, batch unit, pond unit and linear amending unit can also be sequentially connected after each convolutional layer, wherein at this batch The characteristic pattern that reason unit can export the convolutional layer carries out batch standardization, makes this feature figure in the mean value of each dimension 0, variance 1, to optimize the distribution of each characteristic value in this feature figure, which may include at least one pond layer, At least one pond layer can be scanned by scanning window according to each region of target step to this feature figure, to this The characteristic value that each region includes in characteristic pattern carries out average pondization processing, to carry out dimensionality reduction, the linear amendment to this feature figure Unit may include linear activation primitive, which can be used for carrying out nonlinear transformation to this feature figure.Wherein, The scanning window and target step of each pond layer can be arranged by developer.
To obtain more fine-grained characteristic pattern, improving image recognition precision can basis in a kind of possible implementation The image of different Aspect Ratios applies different pond layers, and specifically, which can be in the last one pond layer The step-length that scanning window is slided is adjusted, and reduces the scanning window in the longer step-length slided on one side.In the present invention In embodiment, when convolution number of layers is set as 5, first pond layer can be expert at and be respectively provided with the step-length on column direction It is 2, second pond layer is expert at and is disposed as 2 with the step-length on column direction, by the step of third pond layer in the row direction Length is set as 1, and step-length in a column direction is set as 2, sets 2 for the 4th step-length of pond layer in the row direction, is arranging Step-length on direction is set as 1, and when the length of image is greater than width, the step-length of the 5th pond layer in the row direction is arranged It is 1, step-length in a column direction is set as 2, when the length of image is less than width, in the row direction by the 5th pond layer Step-length is set as 2, and step-length in a column direction is set as 1.
204, significance level of the characteristic point in the fisrt feature figure in the computer equipment fisrt feature figure, to this first Characteristic pattern is decoded.
The computer equipment can based on the long construction encoder and decoder of memory network in short-term to the fisrt feature figure into Row decoding, memory network can remember the input information obtained every time the length in short-term, by the input information preservation in net Inside network, and it is applied to during current operation.It is a kind of long short-term memory net provided in an embodiment of the present invention referring to Fig. 3, Fig. 3 The structural schematic diagram of network, (a) figure in Fig. 3 are the schematic diagram of one long memory network in short-term, and (b) figure is the long short-term memory net The structural schematic diagram that network is chronologically unfolded, the length in short-term memory network may include input unit 301, Hidden unit 302 and Output unit 303, wherein the list entries of the input unit can mark as x0, x1... ..., xt-1, xt, the Hidden unit Operation result can mark as h0, h1... ..., ht-1, ht, the output result of the output unit can mark as y0, y1... ..., yt-1, yt, wherein t is the integer more than or equal to 0, as shown in (b) figure, an input unit, an output unit And at least one Hidden unit may be constructed a node 304, the operation result of Hidden unit can transmit in the node 304 To the Hidden unit of next node 305, the Hidden unit of the node 305 is allow to carry out operation based on list entries before.
The computer equipment is decoded the fisrt feature figure, can specifically include following steps one to three:
Step 1: the computer equipment obtains multiple First rays of the fisrt feature figure, each First ray is used for table Show in the fisrt feature figure that a subgraph and scanning sequency are located at the characteristic information of the subgraph before and after the subgraph.
The computer equipment can successively scan each region of the fisrt feature figure, one group for will including in each region A subgraph of the characteristic point as the fisrt feature figure.The computer equipment obtains the First ray of each subgraph, in one kind In possible implementation, which sequentially inputs encoder for each subgraph in the fisrt feature figure, the encoder Including at least one the first Hidden unit, for each first Hidden unit, first Hidden unit to receive this first One subgraph of characteristic pattern and the First ray of upper first Hidden unit output are weighted, and obtain first sequence Column.
Wherein, which may include at least one two-way length memory network in short-term, and memory network can in short-term for each length To include multiple nodes, each node may include at least one first Hidden unit.The section of two-way length memory network in short-term Point number can be configured by developer, and in embodiments of the present invention, the number of nodes is corresponding with the images to be recognized The number of pixel value is identical in character matrix.Each two-way length is before memory network can carry out simultaneously in short-term to operation and backward Operation, in forward direction operation, two-way length in short-term a Hidden unit in memory network can based on subgraph currently entered with And the First ray of previous Hidden unit output is weighted, and a First ray is generated, thus in the encoder pair The fisrt feature figure can fully consider the content of the fisrt feature figure front portion during being encoded, rear to operation In, a Hidden unit of the two-way length in short-term in memory network can be based on subgraph currently entered and the latter Hidden unit The First ray of output is weighted, and generates a First ray, to carry out in the encoder to the fisrt feature figure The latter part of content of fisrt feature figure can be fully considered during coding.
Referring to fig. 4, Fig. 4 is a kind of structural schematic diagram of encoder provided in an embodiment of the present invention, which includes One two-way length memory network in short-term, it is two-way to this by taking the node 401,402 and 403 in the two-way length in short-term memory network as an example The calculating process of long memory network in short-term is illustrated.Specifically, forward direction operation is the first Hidden unit base of the node 402 In list entries xtAnd the First ray h of the first Hidden unit of previous node 401t-1, generate the first of the node 402 Sequences ht, and by First ray htInput the first Hidden unit of the latter node 403;Consequent operation is the node 402 First Hidden unit is based on list entries xtAnd the First ray h of the first Hidden unit of the latter node 403t+1, generating should The First ray h of node 402t, and by First ray htInput the first Hidden unit of previous node 401.
It can also include multiple sub- Hidden units in first Hidden unit for the precision for improving image recognition, it should Multiple sub- Hidden units can be used for that the content that first Hidden unit inputs is weighted.The number of the sub- Hidden unit Mesh can be configured by developer, in embodiments of the present invention, optimal for the recognition effect that reaches the image recognition model, 18 can be set by the number of the sub- Hidden unit.
It is above-mentioned based on two-way length in short-term memory network construction encoder mode, the encoder can be made to get to be identified The contextual information of image carries out coding insertion to the convolution feature of convolutional layer output, for example, to the calculation in identification image When formula is identified, the structure feature of the available formula of the encoder from left to right, from top to bottom, the structure spy based on formula Sign, encodes the images to be recognized.It is a kind of showing for coding embedding grammar provided in an embodiment of the present invention referring to Fig. 5, Fig. 5 It is intended to, 501 be the schematic diagram of encoder, and 502 be the schematic diagram of convolution feature.
Step 2: the computer equipment, which is based on multiple First ray, obtains multiple attention matrixes, an attention torque Battle array is for indicating corresponding subgraph for the significance level of the fisrt feature figure.
In a kind of possible implementation, which can be solved multiple First ray by decoder Code, obtains multiple attention matrixes, can specifically include following steps:
Firstly, the computer equipment can include by whole First ray input decoders, the decoder at least one Two Hidden units.Wherein, which may include at least one unidirectional long memory network in short-term, each unidirectional long short-term memory Network may include multiple nodes, and each node may include at least one second Hidden unit, the unidirectional long short-term memory net The interstitial content of network can be configured by developer.
Then, for each the second Hidden unit, second Hidden unit is to received upper second Hidden unit One the second sequence of output carries out similarity with the whole First ray and compares, and obtains second sequence, second sequence In one group of element be used to indicate second Hidden unit on this second sequence and a First ray similarity, The similarity is bigger, for indicating that the numerical value of one group of element of the similarity is bigger.
Finally, multiple second sequences are weighted with the whole First ray computer equipment respectively, generate Multiple attention matrixes.
Specifically, it is illustrated in conjunction with generating process of the Fig. 6 to above-mentioned attention matrix, is the present invention referring to Fig. 6, Fig. 6 A kind of structural schematic diagram for decoder that embodiment provides, which includes a unidirectional long memory network in short-term, with this The acquisition process of above-mentioned attention matrix is illustrated for a node 601 in unidirectional long memory network in short-term, the solution Whole First rays that first Hidden unit generates in encoder 400 in code 600 obtaining step one of device, by the upper of the node 601 The second sequence si that the second Hidden unit in one node generates carries out similarity with the whole First ray and compares, in one kind In possible implementation, which can obtain second sequence s by alignment modeliWith the whole First ray Similarity matrix ei, following formula (1) can be expressed as:
eij=a (si-1, hj) (1)
Wherein, eijIndicate the second sequence si-1With First ray hjBetween similarity matrix, nonlinear function a be alignment Model, si-1Indicate second sequence that a node generates, hjPresentation code device generates First ray, and i and j are big In 0 integer.
It, can be by softmax (normalization index) function to the phase after the computer equipment obtains the similarity matrix It is normalized like degree matrix, generates attention weight matrix α, following formula (2) can be expressed as:
Wherein, eijIndicate that similarity matrix, exp () indicate that exponent arithmetic, K indicate the number of similarity matrix, K is big In 0 integer.
The attention weight matrix can be used to indicate with α each subgraph in the fisrt feature figure in current decoding step Significance level, which is weighted the attention weight matrix α and whole First rays, generates and pay attention to Torque battle array c can be expressed as following formula (3):
Wherein, αijFor attention weight matrix, hjFor First ray, T is the number of First ray, and i, j and T are to be greater than 0 integer, and j is less than or equal to T.
Step 3: being based on multiple attention matrix, which is decoded.
It in the decoding process, can be slided on fisrt feature figure based on scanning window, every primary institute of sliding is really Fixed region is properly termed as the subgraph of fisrt feature figure, for each subgraph, by decoder based on each in the subgraph Each element in attention matrix corresponding to characteristic point and the subgraph is weighted, to obtain decoding result, It is feature vector.
205, when detecting significance level in any subgraph and previous subgraph in the fisrt feature figure in decoding process When position of the maximum characteristic point in the fisrt feature figure is identical, which terminates to decode, and output decoding obtains Feature vector.
The computer equipment can be based on the maximum characteristic point of significance level in each subgraph in the fisrt feature figure Position, judges whether the effective information in the image identifies completion, when any subgraph and significance level in previous subgraph are maximum When position of the characteristic point in the fisrt feature figure is identical, which determines that the effective information in the image has identified At end decoding.
In a kind of possible implementation, the computer equipment is based on the attention matrix c obtained in step 204i, decoding The output sequence y of devicei-1And the second sequence si-1, generate the second sequence si, the computer equipment is based on second sequence siIt generates Output sequence yi.The computer equipment is based on second sequence siIt determines in the subgraph where the maximum characteristic point of significance level Position can specifically include following steps:
Step 1: to obtain in the attention matrix of the subgraph characteristic point maximum value first special relative to this for the computer equipment Levy the position of figure.
Step 2: the attention matrix when any subgraph and the characteristic point maximum value phase in the attention matrix of previous subgraph When identical for the position of the fisrt feature figure, then the computer equipment determines that the image decoding is completed, and terminates decoding.
Referring to Fig. 7, Fig. 7 be it is provided in an embodiment of the present invention it is a kind of be truncated decoding process schematic diagram, to formula 700 into When row, which can successively identify 701 regions, 702 regions and 703 regions in image, as in Fig. 7 (a) figure, (b) figure is with shown in (c) figure, and when recognizing 703 region in image, which can be based on formula feature, pass through note Meaning power mechanism determines that the identification of the effective information in the image is completed, so that decoding process be truncated in advance, avoid to invalid information into Row decoding, improves decoding efficiency.
For the accuracy rate for improving image recognition result, in a kind of possible implementation, the computer equipment is available Position of the characteristic point maximum value relative to the fisrt feature figure in the attention matrix of one subgraph, as first position, with And the characteristic point maximum value in the attention matrix of top n subgraph relative to the fisrt feature figure position as one group of second It sets, when the first position and each second position are all the same, which determines that the image decoding is completed, and terminates solution Code.Wherein, N is the integer greater than 1, and the specific value of N can be arranged by developer.
206, the feature vector that the computer equipment exports the image recognition model is decoded, and is obtained the image and is wrapped The character information contained.
The computer equipment can be based on a second sequence si, obtain the output sequence y of decoderi, by the output sequence yiFeature vector as the image.
Multiple feature vector is carried out similarity with standard vector set respectively and compared by the computer equipment, determining and each A maximum multiple standard vectors of feature vector similarity include using character indicated by multiple standard vector as the image Character.Wherein, which includes feature vector corresponding to each character in character lists.In a kind of possible reality In existing mode, the computer equipment can by calculate each vector in multiple feature vector and standard vector set away from From, obtain in the standard vector set with character indicated by the smallest vector of the distance between this feature vector, as the spy The decoding result of vector is levied, the decoding result of whole feature vectors, the character information for being included as the image are obtained.
It is a kind of schematic diagram of image recognition result provided in an embodiment of the present invention referring to Fig. 8, Fig. 8, (a) figure in Fig. 8 For a kind of schematic diagram of user's input picture, (b) figure is a kind of schematic diagram of the image recognition result of computer equipment output, is used Family can the image to be identified as shown in (a) figure input computer equipment, which is known by above-mentioned image Other process is identified, and exports the image recognition result as shown in (b) figure.
Technical solution provided in an embodiment of the present invention generates fisrt feature by carrying out feature extraction to images to be recognized Figure, significance level of the characteristic point based on subgraph in the fisrt feature figure in the fisrt feature figure, to the fisrt feature figure into Row decoding, in decoding process, when the maximum characteristic point of significance level is first special relative to this in any subgraph and previous subgraph When the position of sign figure is identical, then it can determine that the effective informations such as the character for including in the image are decoded, the computer equipment Decoding process is terminated, the effective informations such as the character for including in the image are exported according to the decoding result of acquisition.This image recognition Mode, can judge whether the effective information in image decodes completion in decoding process, so that decoding process is terminated in advance, with Reduce the operand in image recognition processes, improves image recognition efficiency.Above-mentioned technical proposal has used deep learning network Formula identification network can be improved when being applied in formula identification field in the stationkeeping ability of learning ability and attention mechanism Recognition capability and efficiency.
For example, whether can judge the formula based on the structure feature of formula when identifying to the formula in image Identification is completed, and when the lower right corner of the attention extended stationary periods in image, which can determine that the formula has identified At decoded calculating process is truncated in the computer equipment in advance, avoids being decoded invalid information, improves the solution of decoder Code efficiency.
Certainly, above-mentioned image-recognizing method can also be merged with other image-recognizing methods, for example, with individual character recognition methods Fusion, and ballot technology is based further on to determine decoding result from distinct methods, to improve the accuracy of identification of topic.
Above-described embodiment mainly describe computer equipment carry out image recognition process, and carry out image recognition it Before, training dataset need to be constructed, the image recognition model is trained, which may include what multiple had been marked Sample image.But in the training dataset building process, the acquisition difficulty of sample image is larger, marks higher cost, and And the negligible amounts for the sample image that can be got, it is difficult to which covering common character distribution form comprehensively is unable to satisfy the instruction of model Practice demand.In a kind of possible implementation, which can be special based on the image that the image recognition model learning arrives Sign construction sample image, can specifically include following steps:
Step 1: characteristics of image of the computer equipment based on the image recognition model extraction, constructs sample data.
The computer equipment can be trained the image recognition model using the training dataset including truthful data, By being adjusted in the training process to the parameters in the image recognition model, keep the image recognition model available Training data concentrates the characteristics of image of each image, and the computer equipment is based on the multiple images latent structure sample number got According to.For example, the image recognition model is carried out by the training dataset when training dataset is one group of image comprising formula After training, the structure feature of available formula, the structure feature may include the probability that number or operator occur in formula Sample data can be constructed based on the structure feature got Deng, the computer equipment.
Step 2: the computer equipment renders the sample data, sample image is generated.
For Optimized model training effect, in a kind of possible implementation, sample number of the computer equipment based on construction According to generate sample image after, which can be converted, for example, the computer equipment can to the sample image into The distortion of row text, addition ambient noise, rotation and font transformation etc. reason, to enhance the diversity of image.
It is a kind of sample image schematic diagram of construction provided in an embodiment of the present invention referring to Fig. 9, Fig. 9, (a) figure in Fig. 9 For the schematic diagram for carrying out font transformation to the sample data constructed, (b) figure is to carry out deformation to the sample data constructed Schematic diagram.
Figure 10 is a kind of structural schematic diagram of pattern recognition device provided in an embodiment of the present invention, referring to Figure 10, the device Include:
Module 1001 is obtained, for obtaining image to be identified;
Output module 1002, for by the image input picture identification model, by the image recognition model to the image into Row feature extraction obtains fisrt feature figure, based on significance level of the characteristic point in the fisrt feature figure in the fisrt feature figure, The fisrt feature figure is decoded, when detecting any subgraph and previous subgraph in the fisrt feature figure in decoding process When position of the middle maximum characteristic point of significance level in the fisrt feature figure is identical, terminate decoding, the spy that output decoding obtains Levy vector;
Decoder module 1003, the feature vector for exporting to the image recognition model are decoded, and obtain the image institute The character information for including.
In a kind of possible implementation, which is used for:
Multiple First rays of the fisrt feature figure are obtained, each First ray is for indicating in the fisrt feature figure one Subgraph and scanning sequency are located at the characteristic information of the subgraph before and after the subgraph;
Multiple attention matrixes are obtained based on multiple First ray, an attention matrix is for indicating corresponding subgraph For the significance level of the fisrt feature figure.
In a kind of possible implementation, which is used for:
Each subgraph in the fisrt feature figure is sequentially input into encoder, which includes at least one first hidden layer Unit;
For each first Hidden unit, first Hidden unit to a subgraph of the fisrt feature figure received and The First ray of upper first Hidden unit output is weighted, and obtains a First ray.
In a kind of possible implementation, which is used for:
By whole First ray input decoders, which includes at least one second Hidden unit;
For each the second Hidden unit, second Hidden unit is to received upper second Hidden unit output One the second sequence carries out similarity with the whole First ray and compares, and obtains second sequence, one in second sequence Group element is used to indicate the similarity of second sequence and a First ray of second Hidden unit on this, this is similar Degree is bigger, for indicating that the numerical value of one group of element of the similarity is bigger;
Multiple second sequences are weighted with the whole First ray respectively, generate multiple attention matrixes.
In a kind of possible implementation, which is used for:
Obtain position of the element maximum value relative to the fisrt feature figure in the attention matrix of the subgraph;
When the element maximum value in the attention matrix of any subgraph and the attention matrix of previous subgraph relative to this When the position of one characteristic pattern is identical, it is determined that the image decoding is completed, and decoding is terminated.
In a kind of possible implementation, which is used for:
Multiple feature vector is carried out similarity with standard vector set respectively to compare, determination and each feature vector phase Like the maximum multiple standard vectors of degree, the character for including as the image using character indicated by multiple standard vector.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.
It should be understood that pattern recognition device provided by the above embodiment is when carrying out image recognition, only with above-mentioned each The division progress of functional module can according to need and for example, in practical application by above-mentioned function distribution by different function Energy module is completed, i.e., the internal structure of device is divided into different functional modules, to complete whole described above or portion Divide function.In addition, pattern recognition device provided by the above embodiment and image-recognizing method embodiment belong to same design, have Body realizes that process is detailed in embodiment of the method, and which is not described herein again.
Computer equipment provided by above-mentioned technical proposal can be implemented as terminal or server, for example, Figure 11 is this hair A kind of structural schematic diagram for terminal that bright embodiment provides.The terminal 1100 may is that smart phone, tablet computer, MP3 are played Device (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio layer Face 4) player, laptop or desktop computer.Terminal 1100 is also possible to referred to as user equipment, portable terminal, above-knee Other titles such as type terminal, terminal console.
In general, terminal 1100 includes: one or more processors 1101 and one or more memories 1102.
Processor 1101 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1101 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 1101 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 1101 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1101 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.
Memory 1102 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1102 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1102 can Storage medium is read for storing at least one instruction, at least one instruction for performed by processor 1101 to realize this hair The image-recognizing method that bright middle embodiment of the method provides.
In some embodiments, terminal 1100 is also optional includes: peripheral device interface 1103 and at least one periphery are set It is standby.It can be connected by bus or signal wire between processor 1101, memory 1102 and peripheral device interface 1103.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1103.Specifically, peripheral equipment includes: In radio circuit 1104, display screen 1105, camera 1106, voicefrequency circuit 1107, positioning component 1108 and power supply 1109 extremely Few one kind.
Peripheral device interface 1103 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1101 and memory 1102.In some embodiments, processor 1101, memory 1102 and periphery Equipment interface 1103 is integrated on same chip or circuit board;In some other embodiments, processor 1101, memory 1102 and peripheral device interface 1103 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.
Radio circuit 1104 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1104 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1104 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1104 include: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, volume solution Code chipset, user identity module card etc..Radio circuit 1104 can by least one wireless communication protocol come with it is other Terminal is communicated.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio frequency electrical Road 1104 can also include NFC (Near Field Communication, wireless near field communication) related circuit, the present invention This is not limited.
Display screen 1105 is for showing UI (User Interface, user interface).The UI may include figure, text, Icon, video and its their any combination.When display screen 1105 is touch display screen, display screen 1105 also there is acquisition to exist The ability of the touch signal on the surface or surface of display screen 1105.The touch signal can be used as control signal and be input to place Reason device 1101 is handled.At this point, display screen 1105 can be also used for providing virtual push button and/or dummy keyboard, it is also referred to as soft to press Button and/or soft keyboard.In some embodiments, display screen 1105 can be one, and the front panel of terminal 1100 is arranged;Another In a little embodiments, display screen 1105 can be at least two, be separately positioned on the different surfaces of terminal 1100 or in foldover design; In still other embodiments, display screen 1105 can be flexible display screen, is arranged on the curved surface of terminal 1100 or folds On face.Even, display screen 1105 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1105 can be with Using LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.
CCD camera assembly 1106 is for acquiring image or video.Optionally, CCD camera assembly 1106 includes front camera And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.? In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle Shooting function.In some embodiments, CCD camera assembly 1106 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for Light compensation under different-colour.
Voicefrequency circuit 1107 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1101 and handled, or be input to radio circuit 1104 to realize that voice is logical Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 1100 to be multiple. Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 1101 or radio frequency will to be come from The electric signal of circuit 1104 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 11011 may be used also To include earphone jack.
Positioning component 1108 is used for the current geographic position of positioning terminal 1100, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1108 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.
Power supply 11011 is used to be powered for the various components in terminal 1100.Power supply 11011 can be alternating current, straight Galvanic electricity, disposable battery or rechargeable battery.When power supply 11011 includes rechargeable battery, which can be supported Wired charging or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 1100 further includes having one or more sensors 1110.One or more sensing Device 1110 includes but is not limited to: acceleration transducer 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensing Device 1114, optical sensor 1115 and proximity sensor 1116.
Acceleration transducer 1111 can detecte the acceleration in three reference axis of the coordinate system established with terminal 1100 Size.For example, acceleration transducer 1111 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 1101 acceleration of gravity signals that can be acquired according to acceleration transducer 1111, control display screen 1105 with transverse views or indulge The display of direction view progress user interface.Acceleration transducer 1111 can be also used for adopting for game or the exercise data of user Collection.
Gyro sensor 1112 can detecte body direction and the rotational angle of terminal 1100, gyro sensor 1112 Acquisition user can be cooperateed with to act the 3D of terminal 1100 with acceleration transducer 1111.Processor 1101 is according to gyro sensors The data that device 1112 acquires, following function may be implemented: action induction (for example changing UI according to the tilt operation of user) is clapped Image stabilization, game control and inertial navigation when taking the photograph.
The lower layer of side frame and/or display screen 1105 in terminal 1100 can be set in pressure sensor 1113.Work as pressure When the side frame of terminal 1100 is arranged in sensor 1113, user can detecte to the gripping signal of terminal 1100, by processor 1101 carry out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1113 acquires.When pressure sensor 1113 When the lower layer of display screen 1105 is set, the pressure operation of display screen 1105 is realized to UI according to user by processor 1101 Operability control on interface is controlled.Operability control includes button control, scroll bar control, icon control, dish At least one of single control part.
Fingerprint sensor 1114 is used to acquire the fingerprint of user, is collected by processor 1101 according to fingerprint sensor 1114 Fingerprint recognition user identity, alternatively, by fingerprint sensor 1114 according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1101, which grasps Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1114 can be set Set the front, the back side or side of terminal 1100.When being provided with physical button or manufacturer Logo in terminal 1100, fingerprint sensor 1114 can integrate with physical button or manufacturer Logo.
Optical sensor 1115 is for acquiring ambient light intensity.In one embodiment, processor 1101 can be according to light The ambient light intensity that sensor 1115 acquires is learned, the display brightness of display screen 1105 is controlled.Specifically, when ambient light intensity is higher When, the display brightness of display screen 1105 is turned up;When ambient light intensity is lower, the display brightness of display screen 1105 is turned down.Another In one embodiment, the ambient light intensity that processor 1101 can also be acquired according to optical sensor 1115, dynamic adjustment camera shooting The acquisition parameters of head assembly 1106.
Proximity sensor 1116, also referred to as range sensor are generally arranged at the front panel of terminal 1100.Proximity sensor 1116 for acquiring the distance between the front of user Yu terminal 1100.In one embodiment, when proximity sensor 1116 is examined When measuring the distance between the front of user and terminal 1100 and gradually becoming smaller, display screen 1105 is controlled from bright screen by processor 1101 State is switched to breath screen state;When proximity sensor 1116 detects that the distance between user and the front of terminal 1100 gradually become When big, display screen 1105 is controlled by processor 1101 and is switched to bright screen state from breath screen state.
It, can be with it will be understood by those skilled in the art that the restriction of the not structure paired terminal 1100 of structure shown in Figure 11 Including than illustrating more or fewer components, perhaps combining certain components or being arranged using different components.
Figure 12 is a kind of structural schematic diagram of server provided in an embodiment of the present invention, the server 1200 can because of configuration or Performance is different and generates bigger difference, may include one or more processors (central processing units, CPU) 1201 and one or more memories 1202, wherein be stored at least one in the one or more memory 1202 Program code, at least one program code are loaded by the one or more processors 1201 and are executed to realize above-mentioned each side The method that method embodiment provides.Certainly, which can also have wired or wireless network interface, keyboard and input The components such as output interface, to carry out input and output, which can also include other for realizing functions of the equipments Component, this will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction, Above-metioned instruction can be executed by processor to complete the image-recognizing method in above-described embodiment.For example, the computer-readable storage Medium can be read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM), tape, floppy disk and light data Store equipment etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, which can store in a kind of computer-readable storage In medium, storage medium mentioned above can be read-only memory, disk or CD etc..
It above are only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all in the spirit and principles in the present invention Within, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (14)

1. a kind of image-recognizing method, which is characterized in that the described method includes:
Obtain image to be identified;
By described image input picture identification model, feature extraction is carried out to described image by described image identification model, is obtained Fisrt feature figure, based on significance level of the characteristic point in the fisrt feature figure in the fisrt feature figure, to described first Characteristic pattern is decoded, important in any subgraph and previous subgraph in the fisrt feature figure when detecting in decoding process When position of the maximum characteristic point of degree in the fisrt feature figure is identical, terminate decoding, the output obtained feature of decoding to Amount;
The feature vector of described image identification model output is decoded, the character information that described image is included is obtained.
2. the method according to claim 1, wherein it is described based on characteristic point in the fisrt feature figure described Significance level in fisrt feature figure is decoded the fisrt feature figure, comprising:
Multiple First rays of the fisrt feature figure are obtained, each First ray is for indicating in the fisrt feature figure one Subgraph and scanning sequency are located at the characteristic information of the subgraph before and after the subgraph;
Multiple attention matrixes are obtained based on the multiple First ray, an attention matrix is for indicating corresponding subgraph pair In the significance level of the fisrt feature figure;
Based on the multiple attention matrix, the fisrt feature figure is decoded.
3. according to the method described in claim 2, it is characterized in that, multiple first sequences for obtaining the fisrt feature figure Column, comprising:
Each subgraph in the fisrt feature figure is sequentially input into encoder, the encoder includes at least one first hidden layer Unit;
For each first Hidden unit, first Hidden unit to a subgraph of the fisrt feature figure received and The First ray of upper first Hidden unit output is weighted, and obtains a First ray.
4. according to the method described in claim 2, it is characterized in that, described obtain multiple attentions based on the multiple First ray Torque battle array, comprising:
By whole First ray input decoders, the decoder includes at least one second Hidden unit;
For each the second Hidden unit, second Hidden unit to received upper second Hidden unit output one A second sequence carries out similarities with whole First rays and compares, and obtains second sequence, in second sequence One group of element is used to indicate the similarity of second sequence and a First ray of upper second Hidden unit, institute State that similarity is bigger, for indicating that the numerical value of one group of element of the similarity is bigger;
Multiple second sequences are weighted with whole First rays respectively, generate multiple attention matrixes.
5. according to the method described in claim 2, it is characterized in that, described ought detect the fisrt feature in decoding process Any subgraph in figure with when position of the maximum characteristic point of significance level in the fisrt feature figure is identical in previous subgraph, Terminate decoding, comprising:
Obtain position of the element maximum value relative to the fisrt feature figure in the attention matrix of the subgraph;
When the element maximum value in the attention matrix of any subgraph and the attention matrix of previous subgraph is relative to described first When the position of characteristic pattern is identical, it is determined that described image decoding is completed, and decoding is terminated.
6. the method according to claim 1, wherein the feature vector to the output of described image identification model It is decoded, obtains the character information that described image is included, comprising:
The multiple feature vector is carried out similarity with standard vector set respectively to compare, determination is similar to each feature vector Spend maximum multiple standard vectors, the character for including as described image using character indicated by the multiple standard vector.
7. a kind of pattern recognition device, which is characterized in that described device includes:
Module is obtained, for obtaining image to be identified;
Output module, for being carried out to described image by described image identification model by described image input picture identification model Feature extraction obtains fisrt feature figure, based on important journey of the characteristic point in the fisrt feature figure in the fisrt feature figure Degree, the fisrt feature figure is decoded, when detected in decoding process any subgraph in the fisrt feature figure and When position of the maximum characteristic point of significance level in the fisrt feature figure is identical in previous subgraph, terminate decoding, output solution The feature vector that code obtains;
Decoder module, the feature vector for exporting to described image identification model are decoded, and are obtained described image and are included Character information.
8. device according to claim 7, which is characterized in that the output module is used for:
Multiple First rays of the fisrt feature figure are obtained, each First ray is for indicating in the fisrt feature figure one Subgraph and scanning sequency are located at the characteristic information of the subgraph before and after the subgraph;
Multiple attention matrixes are obtained based on the multiple First ray, an attention matrix is for indicating corresponding subgraph pair In the significance level of the fisrt feature figure;
Based on the multiple attention matrix, the fisrt feature figure is decoded.
9. device according to claim 8, which is characterized in that the output module is used for:
Each subgraph in the fisrt feature figure is sequentially input into encoder, the encoder includes at least one first hidden layer Unit;
For each first Hidden unit, first Hidden unit to a subgraph of the fisrt feature figure received and The First ray of upper first Hidden unit output is weighted, and obtains a First ray.
10. device according to claim 8, which is characterized in that the output module is used for:
By whole First ray input decoders, the decoder includes at least one second Hidden unit;
For each the second Hidden unit, second Hidden unit to received upper second Hidden unit output one A second sequence carries out similarities with whole First rays and compares, and obtains second sequence, in second sequence One group of element is used to indicate the similarity of second sequence and a First ray of upper second Hidden unit, institute State that similarity is bigger, for indicating that the numerical value of one group of element of the similarity is bigger;
Multiple second sequences are weighted with whole First rays respectively, generate multiple attention matrixes.
11. device according to claim 8, which is characterized in that the output module is used for:
Obtain position of the element maximum value relative to the fisrt feature figure in the attention matrix of the subgraph;
When the element maximum value in the attention matrix of any subgraph and the attention matrix of previous subgraph is relative to described first When the position of characteristic pattern is identical, it is determined that described image decoding is completed, and decoding is terminated.
12. device according to claim 7, which is characterized in that the decoder module is used for:
The multiple feature vector is carried out similarity with standard vector set respectively to compare, determination is similar to each feature vector Spend maximum multiple standard vectors, the character for including as described image using character indicated by the multiple standard vector.
13. a kind of computer equipment, which is characterized in that the computer equipment includes one or more processors and one or more A memory, is stored at least one program code in one or more of memories, at least one program code by One or more of processors are loaded and are executed to realize that claim 1 to the described in any item images of claim 6 such as is known Operation performed by other method.
14. a kind of computer readable storage medium, which is characterized in that be stored at least one in the computer readable storage medium Program code, at least one program code are loaded by processor and are executed to realize such as claim 1 to claim 6 Operation performed by described in any item image-recognizing methods.
CN201910523751.4A 2019-06-17 2019-06-17 Image recognition method and device, computer equipment and computer readable storage medium Active CN110232417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910523751.4A CN110232417B (en) 2019-06-17 2019-06-17 Image recognition method and device, computer equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910523751.4A CN110232417B (en) 2019-06-17 2019-06-17 Image recognition method and device, computer equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110232417A true CN110232417A (en) 2019-09-13
CN110232417B CN110232417B (en) 2022-10-25

Family

ID=67860001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910523751.4A Active CN110232417B (en) 2019-06-17 2019-06-17 Image recognition method and device, computer equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110232417B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027390A (en) * 2019-11-11 2020-04-17 北京三快在线科技有限公司 Object class detection method and device, electronic equipment and storage medium
CN113435530A (en) * 2021-07-07 2021-09-24 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and computer readable storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150039637A1 (en) * 2013-07-31 2015-02-05 The Nielsen Company (Us), Llc Systems Apparatus and Methods for Determining Computer Apparatus Usage Via Processed Visual Indicia
CN107527059A (en) * 2017-08-07 2017-12-29 北京小米移动软件有限公司 Character recognition method, device and terminal
US20180005082A1 (en) * 2016-04-11 2018-01-04 A2Ia S.A.S. Systems and methods for recognizing characters in digitized documents
CN108235058A (en) * 2018-01-12 2018-06-29 广州华多网络科技有限公司 Video quality processing method, storage medium and terminal
CN108615036A (en) * 2018-05-09 2018-10-02 中国科学技术大学 A kind of natural scene text recognition method based on convolution attention network
US10095977B1 (en) * 2017-10-04 2018-10-09 StradVision, Inc. Learning method and learning device for improving image segmentation and testing method and testing device using the same
CN108777794A (en) * 2018-06-25 2018-11-09 腾讯科技(深圳)有限公司 The coding method of image and device, storage medium, electronic device
WO2018207059A1 (en) * 2017-05-10 2018-11-15 Sisvel Technology S.R.L. Methods and apparatuses for encoding and decoding digital light field images
CN109117846A (en) * 2018-08-22 2019-01-01 北京旷视科技有限公司 A kind of image processing method, device, electronic equipment and computer-readable medium
WO2019002662A1 (en) * 2017-06-26 2019-01-03 Nokia Technologies Oy An apparatus, a method and a computer program for omnidirectional video
CN109543667A (en) * 2018-11-14 2019-03-29 北京工业大学 A kind of text recognition method based on attention mechanism
CN109684980A (en) * 2018-09-19 2019-04-26 腾讯科技(深圳)有限公司 Automatic marking method and device
US20190180154A1 (en) * 2017-12-13 2019-06-13 Abbyy Development Llc Text recognition using artificial intelligence

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150039637A1 (en) * 2013-07-31 2015-02-05 The Nielsen Company (Us), Llc Systems Apparatus and Methods for Determining Computer Apparatus Usage Via Processed Visual Indicia
US20180005082A1 (en) * 2016-04-11 2018-01-04 A2Ia S.A.S. Systems and methods for recognizing characters in digitized documents
WO2018207059A1 (en) * 2017-05-10 2018-11-15 Sisvel Technology S.R.L. Methods and apparatuses for encoding and decoding digital light field images
WO2019002662A1 (en) * 2017-06-26 2019-01-03 Nokia Technologies Oy An apparatus, a method and a computer program for omnidirectional video
CN107527059A (en) * 2017-08-07 2017-12-29 北京小米移动软件有限公司 Character recognition method, device and terminal
US10095977B1 (en) * 2017-10-04 2018-10-09 StradVision, Inc. Learning method and learning device for improving image segmentation and testing method and testing device using the same
US20190180154A1 (en) * 2017-12-13 2019-06-13 Abbyy Development Llc Text recognition using artificial intelligence
CN108235058A (en) * 2018-01-12 2018-06-29 广州华多网络科技有限公司 Video quality processing method, storage medium and terminal
CN108615036A (en) * 2018-05-09 2018-10-02 中国科学技术大学 A kind of natural scene text recognition method based on convolution attention network
CN108777794A (en) * 2018-06-25 2018-11-09 腾讯科技(深圳)有限公司 The coding method of image and device, storage medium, electronic device
CN109117846A (en) * 2018-08-22 2019-01-01 北京旷视科技有限公司 A kind of image processing method, device, electronic equipment and computer-readable medium
CN109684980A (en) * 2018-09-19 2019-04-26 腾讯科技(深圳)有限公司 Automatic marking method and device
CN109543667A (en) * 2018-11-14 2019-03-29 北京工业大学 A kind of text recognition method based on attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KORUS, P等: "A new approach to high-capacity annotation watermarking based on digital fountain codes", 《MULTIMEDIA TOOLS AND APPLICATIONS》 *
王德廉: "基于深度学习的图像识别系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027390A (en) * 2019-11-11 2020-04-17 北京三快在线科技有限公司 Object class detection method and device, electronic equipment and storage medium
CN111027390B (en) * 2019-11-11 2023-10-10 北京三快在线科技有限公司 Object class detection method and device, electronic equipment and storage medium
CN113435530A (en) * 2021-07-07 2021-09-24 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and computer readable storage medium
CN113435530B (en) * 2021-07-07 2023-10-10 腾讯科技(深圳)有限公司 Image recognition method, device, computer equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110232417B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN110121118A (en) Video clip localization method, device, computer equipment and storage medium
CN111476306B (en) Object detection method, device, equipment and storage medium based on artificial intelligence
WO2020228519A1 (en) Character recognition method and apparatus, computer device and storage medium
CN110210571A (en) Image-recognizing method, device, computer equipment and computer readable storage medium
CN110059661A (en) Action identification method, man-machine interaction method, device and storage medium
CN109829456A (en) Image-recognizing method, device and terminal
CN110110145A (en) Document creation method and device are described
CN111091576A (en) Image segmentation method, device, equipment and storage medium
CN110110787A (en) Location acquiring method, device, computer equipment and the storage medium of target
CN110147533B (en) Encoding method, apparatus, device and storage medium
JP7431977B2 (en) Dialogue model training method, device, computer equipment and program
CN110162604B (en) Statement generation method, device, equipment and storage medium
CN109284445A (en) Recommended method, device, server and the storage medium of Internet resources
CN110147532B (en) Encoding method, apparatus, device and storage medium
CN109840584B (en) Image data classification method and device based on convolutional neural network model
CN110263131A (en) Return information generation method, device and storage medium
CN111598896B (en) Image detection method, device, equipment and storage medium
CN110503160A (en) Image-recognizing method, device, electronic equipment and storage medium
CN111062248A (en) Image detection method, device, electronic equipment and medium
CN113269612A (en) Article recommendation method and device, electronic equipment and storage medium
CN109992685A (en) A kind of method and device of retrieving image
CN110232417A (en) Image-recognizing method, device, computer equipment and computer readable storage medium
CN113763931B (en) Waveform feature extraction method, waveform feature extraction device, computer equipment and storage medium
CN112989198B (en) Push content determination method, device, equipment and computer-readable storage medium
CN110490389A (en) Clicking rate prediction technique, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant