CN109726712A

CN109726712A - Character recognition method, device and storage medium, server

Info

Publication number: CN109726712A
Application number: CN201811347763.8A
Authority: CN
Inventors: 黄锦伦
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2019-05-07

Abstract

The present invention relates to image detection, technical field of image processing, a kind of character recognition method provided by the embodiments of the present application, comprising: requested in response to screenshot, obtain the screenshot and request corresponding screenshot area, screenshot image is generated according to the screenshot area；The screenshot image is filtered, images to be recognized is obtained, the images to be recognized is divided into several regions, obtains the first subgraph；Depth convolutional neural networks algorithm based on attention model extracts the text in each first subgraph, and the text is sent to user in the form of editable.Screenshot image is generated by the screenshot area based on user in this application, text in user's screenshot image is identified, user is facilitated to carry out other operations to the text, as pasted, replicating, it improves in the application scenarios of Text region, especially workplace office, reduces manually recorded frequency, text conversion rate and correctness are improved, the working efficiency of user is further increased.

Description

Character recognition method, device and storage medium, server

Technical field

The present invention relates to image detections, technical field of image processing, and in particular to a kind of character recognition method, device and deposits Storage media, server.

Background technique

User is often met during using the electronic product browse network page, document file page, Product Interface, video It can not be replicated to some in text wherein or text is in image or video and can not come out Word Input.For example, Under the scene for the open class video checked in network, the word content in open class video is recorded or taken notes, is needed To record the lteral data content that video page is shown manually, the efficiency of data acquisition is very low, cause Text region low efficiency and Identify the problem of inaccuracy.However, if it is possible to Webpage, document file page, Product Interface, video identification are gone out in video After existing text, facilitates user and quickly retrieved or text is edited etc..Therefore, how to identify in image Extensive concern of the text by various circles of society.

Summary of the invention

For the problem for overcoming the above technical problem, especially Text region low efficiency and identification inaccuracy, spy proposes following Technical solution:

A kind of character recognition method provided in an embodiment of the present invention, comprising:

It is requested in response to screenshot, obtains the screenshot and request corresponding screenshot area, generated and cut according to the screenshot area Figure image；

The screenshot image is filtered, images to be recognized is obtained, the images to be recognized is divided into several areas Domain obtains the first subgraph；

Depth convolutional neural networks algorithm based on attention model extracts the text in each first subgraph, by institute It states text and is sent to user in the form of editable.

Optionally, described that the screenshot image is filtered, obtain images to be recognized, comprising:

Gray proces are carried out to the screenshot image, obtain grayscale image, the grayscale image is the images to be recognized.

Optionally, the depth convolutional neural networks algorithm based on attention model extracts in each first subgraph Text, comprising:

It is extracted in the images to be recognized and first subgraph respectively according to the depth convolutional neural networks algorithm Text；

The text extracted from the images to be recognized and first subgraph is obtained into institute by attention mechanism State text.

Optionally, described that the images to be recognized is divided into several regions, after obtaining the first subgraph, comprising:

Judge in first subgraph with the presence or absence of the gray value being located in default gray value threshold value；

When the gray value being located in default gray value threshold value is not present in first subgraph, by described first Subgraph is deleted.

The two of adjacent area first subgraphs are pieced together as an image, the second subgraph of acquisition；

The images to be recognized, first subgraph, institute are extracted respectively according to the depth convolutional neural networks algorithm State the text in the second subgraph；

The text extracted from the images to be recognized, first subgraph, second subgraph is passed through into note Power mechanism of anticipating obtains the text.

It is optionally, described that the text is sent to user in the form of editable, comprising:

The text and text in matrix magazine are compared, the type matrix text closest with the text is found；

The type matrix text is sent to user in the form of editable.

Optionally, described to be extracted in the images to be recognized based on depth convolutional neural networks algorithm and attention mechanism Text, comprising:

If being identified in the images to be recognized based on depth convolutional neural networks algorithm and attention mechanism comprising expression Symbol obtains the associated text of emoticon according to the incidence relation of preset emoticon and text.

The embodiment of the invention also provides a kind of character recognition devices, comprising:

Screen capture module obtains the screenshot and requests corresponding screenshot area, according to described section for requesting in response to screenshot Graph region generates screenshot image；

Filtering module obtains images to be recognized, the images to be recognized is drawn for being filtered to the screenshot image It is divided into several regions, obtains the first subgraph；

Sending module extracts each first subgraph for the depth convolutional neural networks algorithm based on attention model In text, the text is sent to user in the form of editable.

The embodiment of the invention also provides a kind of computer readable storage medium, deposited on the computer readable storage medium Computer program is contained, which realizes character recognition method described in any technical solution when being executed by processor.

The embodiment of the invention also provides a kind of servers, comprising:

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and quilt It is configured to be executed by one or more of processors, one or more of application programs are configured to carry out according to any skill The step of character recognition method described in art scheme.

Compared with the prior art, the present invention has the following beneficial effects:

1, a kind of character recognition method provided by the embodiments of the present application, comprising: requested in response to screenshot, obtain the screenshot Corresponding screenshot area is requested, screenshot image is generated according to the screenshot area；The screenshot image is filtered, obtain to It identifies image, the images to be recognized is divided into several regions, obtains the first subgraph；Depth based on attention model Convolutional neural networks algorithm extracts the text in each first subgraph, and the text is sent to use in the form of editable Family.Facilitate workplace to handle official business, user couple is convenient at editable text to the text conversion of mail, chat tool screenshot in user Text carries out editor or other operations, and user is facilitated clearly to describe problem.Different applications is needed, screenshot area can Image including a variety of different-formats, for example, screenshot area may include web page area, video area, PowerPoint PPT, chat The page, and can only be by downloading the document etc. that could be obtained, screenshot area can also need to choose different shapes according to application Formula.Screenshot image is generated according to screenshot area, processing then is filtered to the image, removes the attribute information of image, such as To image go it is hot-tempered, image is subjected to gray proces, some or all of pixel value in image is adjusted, is obtained wait know Other image avoids noise or pixel differences from influencing the identification of text, and should so that the corresponding data of image are more simple Images to be recognized above-mentioned is uploaded to server.After to image procossing, identify that the text in image obtains editable text The editable text is sent to user (text that will identify that is sent to user in the form of editable) by word, server, User is facilitated to carry out other operations to the segment word.

2, a kind of character recognition method provided by the embodiments of the present application identifies text, energy by depth convolutional Neural algorithm The Text region for enough realizing various different form screenshots, can facilitate user to obtain the word content in a variety of screenshots, not only mention The high speed of Text region, also improves the accuracy rate of Text region.

3, a kind of character recognition method provided in an embodiment of the present invention, the depth convolutional Neural based on attention model Network algorithm extracts the text in each first subgraph, comprising: mentions respectively according to the depth convolutional neural networks algorithm Take the text in the images to be recognized and first subgraph；It will be from the images to be recognized and first subgraph The text extracted obtains the text by attention mechanism.Subgraph is inputted in depth convolutional Neural algorithm, text is obtained Images to be recognized is inputted in depth convolutional Neural algorithm, obtains the global characteristics of text by the local feature of word, local feature and Global characteristics are the stroke of text, and wherein depth convolutional neural networks can extract the different characteristic of different scale in subgraph, And then text more can be adequately characterized, so that the feature extracted is also more accurate.Then, local feature and the overall situation is special Sign is merged by attention mechanism, obtains global characteristics (text in images to be recognized) again, is known to improve text Other accuracy rate can extract the local feature and global characteristics of text in entire images to be recognized based on convolutional Neural algorithm, The local feature of text and global characteristics are merged, available more accurate text by attention mechanism, is improved The accuracy of Text region.

The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:

Fig. 1 is a kind of flow diagram of embodiment in the exemplary embodiments of character recognition method of the present invention；

Fig. 2 is the flow diagram of another embodiment in the exemplary embodiments of character recognition method of the present invention；

Fig. 3 is the structural schematic diagram of the exemplary embodiments of character recognition device of the present invention；

Fig. 4 is an example structure schematic diagram of server of the present invention.

Specific embodiment

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refers to that there are the feature, integer, step, operations, but it is not excluded that in the presence of or addition it is one or more other Feature, integer, step, operation.

Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.

It will be appreciated by those skilled in the art that so-called " application ", " application program ", " application software " and class of the invention It is same concept well known to those skilled in the art like the concept of statement, refers to and instructed by series of computation machine and related data The computer software for being suitable for electronics operation of the organic construction of resource.Unless specified, this name itself is not by programming language Type, rank, the operating system of operation of also not rely by it or platform are limited.In the nature of things, this genus also not by appoint The terminal of what form is limited.

A kind of character recognition method provided by the embodiments of the present application, as shown in Figure 1, comprising: S100, S200, S300.

S100: requesting in response to screenshot, obtains the screenshot and requests corresponding screenshot area, raw according to the screenshot area At screenshot image；

S200: being filtered the screenshot image, obtain images to be recognized, the images to be recognized is divided into several A region obtains the first subgraph；

S300: the depth convolutional neural networks algorithm based on attention model extracts the text in each first subgraph The text is sent to user by word in the form of editable.

In order to facilitate workplace office, in user to the text conversion of mail, chat tool screenshot at editable text, just Text is edited in user or other are operated, user is facilitated clearly to describe problem.Therefore, embodiments herein In, screenshot request can be sent by way of clicking icon, after clicking icon, can choose the screenshot for needing to identify Region, screenshot request can also be sent in such a way that user is by key swift to operate, for example, mobile terminal it is fixed one A or multiple keys operate carry out screenshot simultaneously；It can also be screenshot key etc. on computer.Different applications is needed, is cut Graph region may include the image of a variety of different-formats, for example, screenshot area may include web page area, video area, PowerPoint PPT, Chat page, and can only be by downloading the document etc. that could be obtained.Screenshot area can also need to choose according to application Different form, for example, screenshot area may include screen area, visible page region or complete page area.It can also send out After sending out screenshot request, a frame favored area is formed on the page, user can manually adjust the size of frame favored area, so that cutting Graph region can satisfy user demand.In addition, the frame setting of screenshot area is arrived in user in order to be truncated to more content The specified region of the close terminal frame of terminal interface and when persistently carrying out mobile toward terminal interface frame, current page is carried out Scaling, in order to which screenshot area has more contents, for the ease of the text in the identification page, the size of page zoom-in and zoom-out can be with It specifies, can also be determined by the resolution ratio at interface for research staff, can also be when moving operation to be cancelled, determination takes The size of page zoom-in and zoom-out when disappearing.

On the basis of above-mentioned, according to screenshot area generate screenshot image, as canvas image, bmp image, jpg image, Png image etc..For example, can be according to screenshot area using in chrome extension api Chrome.tabs.captureVisibleTab method generates canvas image.Then pass through canvas.toDataURL method The canvas image of generation is converted to the image of base64 format, processing then is filtered to the image, removes image Attribute information, for example, to image go it is hot-tempered, image is subjected to gray proces, some or all of pixel value in image is adjusted It is whole etc., image to be identified is obtained, so that the corresponding data of image are more simple, noise or pixel differences is avoided to influence text Identification, and the images to be recognized above-mentioned is uploaded to server.Server receives images to be recognized, and sharp on the server The text in images to be recognized is extracted with depth convolutional neural networks algorithm, Text region algorithm can be optical character knowledge (Optical Character Recognition, OCR), it is also possible to depth convolutional Neural algorithm etc., in the implementation of the application In example, Text region algorithm is preferably depth convolutional Neural algorithm.It is editable text by the text obtained after aforementioned identification The editable text is sent to user (text that will identify that is sent to user in the form of editable) by word, server, User is facilitated to carry out other operations to the segment word.Text is identified by convolutional Neural algorithm, can be realized various not similar shapes The Text region of formula screenshot can facilitate user to obtain the word content in a variety of screenshots, not only increase the speed of Text region Degree, also improves the accuracy rate of Text region.

In embodiments herein, in order to avoid the identification of data influence texts other in image, improves text and know Other accuracy carries out gray proces to screenshot image, obtains grayscale image, which is images to be recognized above-mentioned.Image Gray proces can will make the pixel in image more simple.Optionally, in embodiments herein, due to image background Pixel between text has certain difference.Conventional, the pixel value of image background is lower than the pixel value of word segment, It may be that the pixel value of image background is higher than the pixel value of word segment, and then can be extracted from image by distinguishing pixel value Text avoids the extraction of text in background influence image.Grayscale image can also avoid other color components in image from mentioning text Text has color etc. in the influence taken, especially image, then can be in calculating process in the case where no color component It is based only upon luminance information to be calculated, avoids calculating matching etc. to multicolour component, during considerably reducing Word Input Calculation amount, improve text calculate speed.

Optionally, in a kind of wherein embodiment, as shown in Fig. 2, the depth convolution mind based on attention model The text in each first subgraph is extracted through network algorithm, comprising: S310, S320.

S310: the images to be recognized and first subgraph are extracted respectively according to the depth convolutional neural networks algorithm Text as in；

S320: the text extracted from the images to be recognized and first subgraph is obtained by attention mechanism Obtain the text.

In order to improve the accuracy rate of Text region, can be extracted in entire images to be recognized based on depth convolutional Neural algorithm The local feature and global characteristics of text merge the local feature of text and global characteristics by attention mechanism, can To obtain more accurate text, the accuracy of Text region is improved.It, in this application, will be wait know in order to realize foregoing advantages Other image is divided into several regions and obtains the first subgraph, the corresponding width subgraph in each region.Subgraph is inputted and is rolled up In product neural algorithm, the local feature of text is obtained, images to be recognized is inputted in convolutional Neural algorithm, the overall situation of text is obtained Feature, local feature and global characteristics are the stroke of text.Then, by local feature and global characteristics by attention mechanism into Row fusion, obtains global characteristics (text in images to be recognized) again.It should be noted that the size of the first subgraph can be with Based on original picture size, arbitrarily divides and determine.

It, will be described when the gray value being located in the default gray value threshold value is not present in first subgraph First subgraph is deleted.

In order to reduce the calculation amount of Text region, can by do not include character features the first subgraph delete, to avoid Calculating to the image improves the rate of calculating, shortens the time calculated.It can be by the gray value of the first subgraph and default ash Gray value in degree threshold value compares judgement, when there is no the institutes being located in default gray value threshold value in first subgraph When stating gray value, illustrate that there is no character features in first subgraph, it, then can be by first son in order to reduce calculation amount Image-erasing.Optionally, the case where default gray threshold can be according to present image determines.For example, being gray scale in image procossing After figure, the gray value of text is closer to and continuously, and wherein text gray value spacing value is no more than 5, then can will be continuous And gray value of the gray value interval less than 5 is determined as the gray value of text, the gray value in successive range can then be determined as Default gray threshold.When a certain gray value and aforementioned continuous gray value interval are greater than 5, then the gray value may determine that as back The gray scale of scape.

As previously described in order to which that improves Word Input strives for rate, text is operated convenient for user, it can also will be adjacent Two first subgraphs piece together as a subgraph, obtain the second subgraph, and then text is being extracted by convolutional Neural algorithm When feature, it is ensured that the continuity of character features.Therefore, the first subgraph, the second subgraph are inputted into depth convolutional Neural In algorithm, the local feature of text is obtained, images to be recognized is inputted in depth convolutional Neural algorithm, the overall situation for obtaining text is special Sign, local feature and global characteristics are the stroke of text.Then, local feature and global characteristics are carried out by attention mechanism Fusion, obtains global characteristics (text in images to be recognized) again.

The type matrix text is sent to user in the form of editable.

It, can also will be by extracting in order to ensure the accuracy of text after text has been determined on the basis of above-mentioned Text and matrix magazine in text compare, find out the text closest with it, if any 100% consistent text, then The text can be sent to user.If find text lower than 100%, closest text is issued into user.It can be with Global characteristics are inputted into softmax classifier, the resulting maximum corresponding text of probability value is determined as from images to be recognized The text of middle identification.Wherein, softmax classifier is preset with literal pool, may include: including 4500 Chinese characters in common use, 0-9 this Upper 0 number, a-z.This 26 lowercases and A-G this 26 capitalizations.Certainly, the quantity of Chinese characters in common use can be more. In order to avoid the quantity that text compares, the i.e. calculation amount of reduction text comparison, can first remove will extract from images to be recognized Text out, for example, when the text extracted is " inclined Chinese row pattern ", in order to improve text comparison accuracy, so that literary Word can compare to the greatest extent with the text in single matrix magazine, and " inclined Chinese row pattern " is removed, can be by word attribute " the not inclined Song typeface " is asked in replacement, when comparing text, then can only be compared, be reduced to the text in Song typeface matrix magazine The calculation amount of text comparison.

It optionally, include emoticon in picture in order to enable improving the conversion range of text in conjunction with described previously When, in order to be converted, can be looked into the incidence relation of preset value emoticon and text and then based on emoticon Find corresponding text.Therefore, each first subgraph is extracted in the depth convolutional neural networks algorithm based on attention model Text as in, comprising: if identifying comprising emoticon in the images to be recognized, according to preset emoticon and text Incidence relation, obtain the associated text of emoticon.Further, emoticon is operated in order to facilitate user, it should The text of transmission could alternatively be editable emoticon, complete to turn from screenshot emoticon to editable emoticon Become.Emoticon and text are not one-to-one relationships, and a text may correspond to a variety of emoticons, for example smiling face may have How many expression, so that user can change another expression to other side.Therefore, the text is sent out in the form of editable Give user, comprising: if the text is identified to obtain by emoticon, pass is associated with according to preset emoticon and text System, obtains the emoticon of the editable form of the text, and the emoticon of editable form is sent to user.Emoticon Incidence relation with text is several emoticons and a character associative, determines text based on the operation of user at the terminal The corresponding emoticon of word；Obtain the emoticon of the editable form of the text, comprising: show several and the text Associated emoticon inputs according to user and obtains one of symbol.

The embodiment of the invention also provides a kind of character recognition devices, in a kind of wherein embodiment, as shown in figure 3, It include: screen capture module 100, filtering module 200, sending module 300.

Screen capture module 100 obtains the screenshot and requests corresponding screenshot area, according to institute for requesting in response to screenshot It states screenshot area and generates screenshot image；

Filtering module 200 obtains images to be recognized, by the figure to be identified for being filtered to the screenshot image As being divided into several regions, the first subgraph is obtained；

Sending module 300 extracts each first son for the depth convolutional neural networks algorithm based on attention model The text is sent to user by the text in image in the form of editable.

Further, as shown in figure 3, a kind of character recognition device provided in the embodiment of the present invention further include: grayscale image Obtaining unit 210 carries out gray proces to the screenshot image, obtains grayscale image, and the grayscale image is the images to be recognized. First Word Input unit 310, for according to the depth convolutional neural networks algorithm extract respectively the images to be recognized and Text in first subgraph；First attention mechanism computing unit 320, for will be from the images to be recognized and described The text extracted in first subgraph obtains the text by attention mechanism.Judging unit 220, for judging described With the presence or absence of the gray value being located in default gray value threshold value in one subgraph；Unit 230 is deleted, for working as first subgraph There is no when the gray value being located in default gray value threshold value as in, first subgraph is deleted.Second subgraph Obtaining unit 340, for piecing together the two of adjacent area first subgraphs for an image, the second subgraph of acquisition；The Two Word Input units 350, for extracting the images to be recognized, described respectively according to the depth convolutional neural networks algorithm Text in first subgraph, second subgraph；Second attention mechanism computing unit 360, for will be from described wait know Other image, first subgraph, the text extracted in second subgraph obtain the text by attention mechanism. Comparison unit 370 finds the type matrix closest with the text for comparing the text and text in matrix magazine Text；Text transmission unit 380, for the type matrix text to be sent to user in the form of editable；Emoticon identification Unit 400, if identifying in the images to be recognized for the depth convolutional neural networks based on attention model comprising expression Symbol obtains the associated text of emoticon according to the incidence relation of preset emoticon and text.

The embodiment of above-mentioned character recognition method may be implemented in a kind of character recognition device provided in an embodiment of the present invention, tool Body function realizes the explanation referred in embodiment of the method, and details are not described herein.

A kind of computer readable storage medium provided in an embodiment of the present invention stores on the computer readable storage medium There is computer program, character recognition method described in any one technical solution is realized when which is executed by processor.Wherein, institute Stating computer readable storage medium includes but is not limited to any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic Disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, immediately memory), EPROM (EraSable Programmable Read-Only Memory, Erarable Programmable Read only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, Electrically Erasable Programmable Read-Only Memory), Flash memory, magnetic card or light card.It is, storage equipment includes by equipment (for example, computer, mobile phone) can read Any medium of form storage or transmission information can be read-only memory, disk or CD etc..

A kind of computer readable storage medium provided in an embodiment of the present invention, it can be achieved that above-mentioned character recognition method implementation Example facilitates user to carry out other operations to the text, such as glues in this application by identifying to the text in user's screenshot Patch, duplication improve in the application scenarios of Text region, especially workplace office, reduce manually recorded frequency, improve text The rate and correctness of conversion, further increase the working efficiency of user；A kind of Text region provided by the embodiments of the present application Method, comprising: requested in response to screenshot, obtain the screenshot and request corresponding screenshot area, generated according to the screenshot area Screenshot image；The screenshot image is filtered, images to be recognized is obtained, the images to be recognized is divided into several areas Domain obtains the first subgraph；Depth convolutional neural networks algorithm based on attention model extracts in each first subgraph Text, the text is sent to user in the form of editable.In order to facilitate workplace office, in user to mail, chat The text conversion of tool screenshot edits text convenient for user or other is operated, facilitate user at editable text Problem is clearly described.Therefore, in embodiments herein, screenshot request can be sent by way of clicking icon, After clicking icon, the screenshot area for needing to identify can choose, screenshot request can also pass through key swift to operate by user Mode sent, such as fixed one or more key of mobile terminal operates carry out screenshot simultaneously；It can also be calculating Screenshot key etc. on machine.Different applications is needed, screenshot area may include the image of a variety of different-formats, for example, screenshot Region may include web page area, video area, PowerPoint PPT, Chat page, and the text that can only could be obtained by downloading Shelves etc..Screenshot area can also according to application need to choose different forms, for example, screenshot area may include screen area, can Depending on page area or complete page area.A frame favored area, user can also be formed on the page after sending out screenshot request The size of frame favored area can be manually adjusted, so that screenshot area can satisfy user demand.In addition, more in order to be truncated to Content, in user by the specified region of the close terminal frame of the frame of screenshot area setting to terminal interface and lasting progress When mobile toward terminal interface frame, current page is zoomed in and out, in order to which screenshot area has more contents, for the ease of Identify the text in the page, what the size of page zoom-in and zoom-out can specify for research staff, can also by the resolution ratio at interface into Row determines, can also determine the size of page zoom-in and zoom-out when cancelling when moving operation is cancelled.On the basis of above-mentioned, according to cut Graph region generates screenshot image, is then filtered processing to the image, removes the attribute information of image, such as go to image It makes an uproar, image is subjected to gray proces, some or all of pixel value in image is adjusted, figure to be identified is obtained Picture avoids noise or pixel differences from influencing the identification of text, and this is above-mentioned so that the corresponding data of image are more simple Images to be recognized is uploaded to server.Server receives images to be recognized, and utilizes depth convolutional neural networks on the server Algorithm extracts the text in images to be recognized, such as optical character knowledge (Optical Character Recognition, OCR), deeply Spend convolutional Neural algorithm etc..It is editable text by the text obtained after aforementioned identification, server is by the editable text It is sent to user (text that will identify that is sent to user in the form of editable), user is facilitated to carry out the segment word Other operations.Text is identified by depth convolutional Neural algorithm, can be realized the Text region of various different form screenshots, it can be square Just user obtains the word content in a variety of screenshots, not only increases the speed of Text region, also improves Text region Accuracy rate.In addition, the present invention also provides a kind of servers, as shown in figure 4, the server process in another embodiment The devices such as device 503, memory 505, input unit 507 and display unit 509.It will be understood by those skilled in the art that Fig. 4 shows Structure devices out do not constitute the restriction to Servers-all, may include than illustrating more or fewer components or group Close certain components.Memory 505 can be used for storing application program 501 and each functional module, and the operation of processor 503 is stored in The application program 501 of reservoir 505, thereby executing the various function application and data processing of equipment.Memory 505 can be interior Memory or external memory, or including both built-in storage and external memory.Built-in storage may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash Device or random access memory.External memory may include hard disk, floppy disk, ZIP disk, USB flash disk, tape etc..It is disclosed in this invention to deposit Reservoir includes but is not limited to the memory of these types.Memory 505 disclosed in this invention is only used as example rather than as limit It is fixed.

Input unit 507 is used to receive the text of input and the user's input of signal.Input unit 507 may include touching Control panel and other input equipments.Touch panel collect client on it or nearby touch operation (such as client use The operation of any suitable object or attachment such as finger, stylus on touch panel or near touch panel), and according to preparatory The corresponding attachment device of the driven by program of setting；Other input equipments can include but is not limited to physical keyboard, function key (such as Play control button, switch key etc.), trace ball, mouse, one of operating stick etc. or a variety of.Display unit 509 can be used for The information of display client's input is supplied to the information of client and the various menus of computer equipment.Display unit 509 can be adopted With forms such as liquid crystal display, Organic Light Emitting Diodes.Processor 503 is the control centre of computer equipment, is connect using various Mouthful and the entire computer of connection various pieces, by run or execute the software program being stored in memory 503 and/or Module, and the data being stored in memory are called, perform various functions and handle data.One or more shown in Fig. 4 A processor 503 is able to carry out, realizes the function of screen capture module 100, filtering module 200 shown in Fig. 3 function sends mould Function, the function of grayscale image obtaining unit 210, the function of the first Word Input unit 310, the first attention mechanism of block 300 The function of computing unit 320, the function of judging unit 220, the function of deleting unit 240, the second subgraph obtaining unit 340 Function, the function of the second Word Input unit 350, the function of the second attention mechanism computing unit 360, comparison unit 370 Function, the function of text transmission unit 380, the function of emoticon recognition unit 400.

In one embodiment, the server includes one or more processors 503, and one or more storages Device 505, one or more application program 501, wherein one or more of application programs 501 are stored in memory 505 And be configured as being executed by one or more of processors 503, one or more of application programs 301 are configured to carry out Character recognition method described in above embodiments.

A kind of server provided in an embodiment of the present invention, it can be achieved that above-mentioned character recognition method embodiment, in the application In by being identified to the text in user's screenshot, facilitate user to carry out other operations to the text, such as paste, duplication, mention In the high application scenarios of Text region, especially workplace office, manually recorded frequency is reduced, the rate of text conversion is improved And correctness, further increase the working efficiency of user；A kind of character recognition method provided by the embodiments of the present application, comprising: It is requested in response to screenshot, obtains the screenshot and request corresponding screenshot area, screenshot image is generated according to the screenshot area；It is right The screenshot image is filtered, and obtains images to be recognized, and the images to be recognized is divided into several regions, obtains first Subgraph；Depth convolutional neural networks algorithm based on attention model extracts the text in each first subgraph, by institute It states text and is sent to user in the form of editable.In order to facilitate workplace office, in user to the text of mail, chat tool screenshot Word is converted into editable text, edits convenient for user to text or other are operated, user is facilitated clearly to describe to ask Topic.Therefore, in embodiments herein, screenshot request can be sent by way of clicking icon, after clicking icon, It can choose the screenshot area for needing to identify, screenshot request can also be sent out in such a way that user is by key swift to operate It send, such as one or more key that mobile terminal is fixed operates carry out screenshot simultaneously；It can also be screenshot key on computer Deng.Different applications is needed, screenshot area may include the image of a variety of different-formats, for example, screenshot area may include net Page region, video area, PowerPoint PPT, Chat page, and can only be by downloading the document etc. that could be obtained.Screenshot area Domain can also according to application need to choose different forms, for example, screenshot area may include screen area, visible page region or Complete page area.A frame favored area can also be formed on the page, user can manually adjust after sending out screenshot request The size of frame favored area, so that screenshot area can satisfy user demand.In addition, in order to be truncated to more content, with Family is by the specified region of the close terminal frame of the frame setting of screenshot area to terminal interface and persistently carries out toward terminal interface When frame is mobile, current page is zoomed in and out, in order to which screenshot area has more contents, for the ease of in the identification page Text, what the size of page zoom-in and zoom-out can specify for research staff, can also be determined by the resolution ratio at interface, may be used also The size of page zoom-in and zoom-out when being cancelled with determining when moving operation is cancelled.On the basis of above-mentioned, generated and cut according to screenshot area Then figure image is filtered processing to the image, remove the attribute information of image, for example, to image go it is hot-tempered, image is carried out Gray proces are adjusted some or all of pixel value in image, obtain image to be identified, so that image is corresponding Data it is more simple, avoid noise or pixel differences from influencing the identification of text, and the images to be recognized above-mentioned is uploaded To server.Server receives images to be recognized, and to be identified using the extraction of depth convolutional neural networks algorithm on the server Text in image, as optical character knows (Optical Character Recognition, OCR), depth convolutional Neural algorithm Deng.It is editable text by the text obtained after aforementioned identification, which is sent to user and (i.e. will by server The text identified is sent to user in the form of editable), facilitate user to carry out other operations to the segment word.Pass through depth It spends convolutional Neural algorithm and identifies text, can be realized the Text region of various different form screenshots, user can be facilitated to obtain a variety of Word content in screenshot not only increases the speed of Text region, also improves the accuracy rate of Text region.

The embodiment of the character recognition method of above-mentioned offer, specific function may be implemented in server provided in an embodiment of the present invention It is able to achieve the explanation referred in embodiment of the method, details are not described herein.

The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of character recognition method characterized by comprising

It is requested in response to screenshot, obtains the screenshot and request corresponding screenshot area, screenshot figure is generated according to the screenshot area Picture；

The screenshot image is filtered, images to be recognized is obtained, the images to be recognized is divided into several regions, is obtained Obtain the first subgraph；

Depth convolutional neural networks algorithm based on attention model extracts the text in each first subgraph, by the text Word is sent to user in the form of editable.

2. character recognition method according to claim 1, which is characterized in that it is described that the screenshot image is filtered, Obtain images to be recognized, comprising:

3. character recognition method according to claim 1, which is characterized in that the depth convolution based on attention model Neural network algorithm extracts the text in each first subgraph, comprising:

Extract the text in the images to be recognized and first subgraph respectively according to the depth convolutional neural networks algorithm Word；

The text extracted from the images to be recognized and first subgraph is obtained into the text by attention mechanism Word.

4. character recognition method according to claim 3, which is characterized in that if described be divided into the images to be recognized A dry region, after obtaining the first subgraph, comprising:

When the gray value being located in the default gray value threshold value is not present in first subgraph, by described first Subgraph is deleted.

5. character recognition method according to claim 4, which is characterized in that the depth convolution based on attention model Neural network algorithm extracts the text in each first subgraph, comprising:

The images to be recognized, first subgraph, described are extracted respectively according to the depth convolutional neural networks algorithm Text in two subgraphs；

The text extracted from the images to be recognized, first subgraph, second subgraph is passed through into attention Mechanism obtains the text.

6. character recognition method according to any one of claims 1 to 5, which is characterized in that it is described by the text with can The form of editor is sent to user, comprising:

The type matrix text is sent to user in the form of editable.

7. character recognition method according to any one of claims 1 to 5, which is characterized in that described based on depth convolution mind The text in the images to be recognized is extracted through network algorithm and attention mechanism, comprising:

If being identified based on depth convolutional neural networks algorithm and attention mechanism includes emoticon in the images to be recognized, According to the incidence relation of preset emoticon and text, the associated text of emoticon is obtained.

8. a kind of character recognition device characterized by comprising

Screen capture module obtains the screenshot and requests corresponding screenshot area, according to the screenshot area for requesting in response to screenshot Domain generates screenshot image；

Filtering module obtains images to be recognized, the images to be recognized is divided into for being filtered to the screenshot image Several regions obtain the first subgraph；

Sending module extracts in each first subgraph for the depth convolutional neural networks algorithm based on attention model The text is sent to user by text in the form of editable.

9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes claim 1 to 7 described in any item character recognition methods when the program is executed by processor.

10. a kind of server characterized by comprising

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, one or more of application programs are configured to carry out according to claim 1 The step of to 7 described in any item character recognition methods.