CN112101356A

CN112101356A - Method and device for positioning specific text in picture and storage medium

Info

Publication number: CN112101356A
Application number: CN202011035795.1A
Authority: CN
Inventors: 熊博颖; 郑邦东; 张晓丹
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2020-12-18

Abstract

The embodiment of the specification provides a method, a device and a storage medium for positioning a specific text in a picture, wherein the method comprises the following steps: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; calculating classification conditions of specific text regions and non-specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning the specific text in the picture to be positioned based on the classification condition, thereby improving the accuracy of positioning the specific text.

Description

Method and device for positioning specific text in picture and storage medium

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a method and a device for positioning a specific text in a picture and a storage medium.

Background

In the daily business of banking, there are a large number of handwritten or machine credentialed documents, such as credit card applications, foreign exchange transaction applications, etc. The paper documents are scanned and imaged in the service processing process and are transmitted to the background centralized operation center for recording. Because the traffic is large and the number of the input elements is large, OCR recognition is added in the process to replace manual input work.

The current mainstream OCR recognition technology is generally divided into two steps, wherein in the first step, a text area needing to be recognized is positioned, namely, a recognized text line is found; and secondly, intercepting the text line picture in the area, and recognizing the text content by using a recognition model, wherein the minimum unit recognized by OCR is a text line.

For locating text regions to be identified, the method mainly defines the text regions by templates. The template definition mode is that a set of template parameters are defined for each type of document voucher, the parameters comprise the name and coordinate values of each identification element, and element slices are directly intercepted from a picture according to the coordinates through the parameters in the identification process.

Because the layout of different bank document vouchers is different and factors such as writing and printing offset exist, the positions of elements cut out based on a template definition mode are inaccurate, for example, a slice only contains partial content or contains other irrelevant information, and thus the identification accuracy is greatly reduced.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a method, an apparatus, and a storage medium for locating a specific text in a picture, so as to improve accuracy of locating the specific text.

In order to solve the above problem, an embodiment of the present specification provides a method for locating a specific text in a picture, where the method includes: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; calculating classification conditions of specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning the specific text in the picture to be positioned based on the classification condition.

In order to solve the above problem, an embodiment of the present specification further provides an apparatus for locating a specific text in a picture, where the apparatus includes: the acquisition module is used for acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; the extraction module is used for extracting the characteristic value vector of the picture in the training sample; the calculation module is used for carrying out classification conditions of specific text regions in the training samples on the basis of the extracted feature value vector of each picture in the training samples; and the positioning module is used for positioning the specific text in the picture to be positioned based on the classification condition.

In order to solve the above problem, an embodiment of the present specification further provides an electronic device, including: a memory for storing a computer program; a processor for executing the computer program to implement: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; based on the extracted feature value vector of each picture in the training sample, classifying conditions of a specific text region in the training sample are obtained; and positioning the specific text in the picture to be positioned based on the classification condition.

To solve the above problem, embodiments of the present specification further provide a computer-readable storage medium having stored thereon computer instructions, which when executed, implement: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; based on the extracted feature value vector of each picture in the training sample, classifying conditions of a specific text region in the training sample are obtained; and positioning the specific text in the picture to be positioned based on the classification condition.

According to the technical scheme provided by the embodiment of the specification, in the embodiment of the specification, a preset number of training samples can be obtained; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; based on the extracted feature value vector of each picture in the training sample, classifying conditions of a specific text region in the training sample are obtained; and positioning the specific text in the picture to be positioned based on the classification condition. The method provided by the embodiment of the specification can directly position the coordinates of the specific text position to be identified through one-time text detection, has high running speed and high efficiency, does not need to deduce the corresponding specific text position through identifying the fixed mark of the printing template, and is more accurate in positioning the specific text.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic view of a ticket according to an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a method for locating a specific text in a picture according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating a specific area in a picture according to an embodiment of the present disclosure;

fig. 4 is a functional structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 5 is a functional structure diagram of a device for locating a specific text in a picture according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

The industries such as large-scale enterprises, institutions, banks, hospitals, insurance and the like have massive bills which need to be subjected to information acquisition, input and electronic archiving. The bills are identified by adopting an OCR technology, and the information collection is completed, so that the collection efficiency of the bill information can be greatly improved. However, not all information in the ticket needs to be collected. For example, a ticket may generally include fixed printed content, which is the content that the ticket fixedly includes, and handwritten or printed content, which is filled out by a user or generated based on information provided by the user. The handwritten or printed part of the bill is the information that needs to be collected usually. As shown in fig. 1, the deposit receipt of chinese construction bank usually includes fixed printed contents such as "account", "account name", "currency" and the like, and includes handwritten or printed contents such as printed contents which are handwritten by the user or generated according to information provided by the user in a blank portion behind the "account", "account name", "currency" and the like. Therefore, in the process of identifying the bills by using the OCR technology, firstly, text regions to be identified need to be located, that is, identified text lines are found; and secondly, intercepting the text line picture in the area, and recognizing the text content by using a recognition model, wherein the minimum unit recognized by OCR is a text line.

For locating text regions to be identified, the method mainly defines the text regions by templates. The template definition mode is that a set of template parameters are defined for each type of document voucher, the parameters comprise the name and coordinate values of each identification element, and element slices are directly intercepted from a picture according to the coordinates through the parameters in the identification process. However, because the layouts of different bank document vouchers are different and factors such as writing and printing offset exist, the positions of elements cut out based on the template definition mode are inaccurate, for example, a slice only contains partial content or contains other irrelevant information, so that the identification accuracy is greatly reduced.

With the continuous development and application of machine learning, a text detection mode based on deep learning appears for locating text regions needing to be identified. The deep learning text detection mainly uses a deep neural network to carry out model training, learns image text characteristics through a convolutional neural network, and positions text lines by adopting a candidate box-based or pixel segmentation-based mode. The deep learning is a new research direction in the field of machine learning, the intrinsic rule and the expression level of sample data are learned through a neural network, and low-level features are combined to form more abstract high-level expression attribute categories or features so as to find the distribution feature expression of the data. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds. The convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a deep structure, and is one of representative algorithms of deep learning. The convolutional neural network has the characteristic learning ability and can carry out translation invariant classification on input information according to the hierarchical structure of the convolutional neural network.

Although the mainstream text detection model based on deep learning can detect all texts in a picture, it is impossible to distinguish which documents are fixedly printed contents and which handwritten or printed element information, all the located texts need to be identified through general identification, and then which elements need to be identified and extracted are deduced according to information such as fixed contents and position relations on the documents, so that derivation errors caused by identification errors may occur, and the overall efficiency is also influenced by all the text identifications on the picture.

The method comprises the steps of obtaining classification conditions of a specific text area and a non-specific text area according to characteristic information of a labeled text and characteristic information of an unlabelled text in the picture, and then identifying the specific text in the picture to be identified according to the classification conditions, wherein the classification conditions are used for marking coordinates of the specific text in a large number of pictures, and the method is expected to avoid the problems that in the prior art, slices contain other irrelevant information, the fixed printed content of document certificates and the specific text on handwriting or printing cannot be distinguished, and improve the accuracy of specific text positioning. Therefore, the embodiment of the specification provides a method for positioning a specific text in a picture.

Please refer to fig. 2. The embodiment of the specification provides a method for positioning a specific text in a picture. In an embodiment of the present specification, a main body performing the method for locating a specific text in a picture may be an electronic device having a logical operation function, and the electronic device may be a server. The server may be an electronic device having a certain arithmetic processing capability. Which may have a network communication unit, a processor, a memory, etc. Of course, the server is not limited to the electronic device having a certain entity, and may be software running in the electronic device. The server may also be a distributed server, which may be a system with multiple processors, memory, network communication modules, etc. operating in coordination. Alternatively, the server may also be a server cluster formed by several servers. The method may include the following steps.

S210: acquiring a preset number of training samples; the training sample is a picture with coordinates labeled on a specific text area in the picture.

In some embodiments, the picture may be an electronic file, such as a picture file in the format pdf, jpg, png, or the like. The picture may also be obtained by scanning the paper document with a scanner, a digital camera, or other devices. The paper documents can be various bills, newspapers, books, manuscripts and other printed matters.

In some embodiments, the specific text may be all or part of the text in the picture. If the picture is a document of a bill class, however, not all information in the bill needs to be collected. For example, a ticket may generally include fixed printed content, which is the content that the ticket fixedly includes, and handwritten or printed content, which is filled out by a user or generated based on information provided by the user. The fixed printing content in the bill does not need to be specially collected, and the content of the handwritten or printed part is subjected to information collection, input and electronic archiving. Therefore, for the picture of the ticket class, the specific text may be a text corresponding to a handwritten or printed part. As shown in fig. 1, the deposit receipt of chinese construction bank usually includes fixed printed contents, such as "account", "account name", "currency", etc., and includes handwritten or printed contents, such as printed contents handwritten by the user or generated according to information provided by the user in a blank part following the "account", "account name", "currency", etc., and the specific text may be contents of a blank part following the "account", "account name", "currency", etc.

For example, for a newspaper, book, etc. type of picture, the particular text may be the title content. Of course, the text of any part in the picture can be selected as the specific text according to different types of pictures or different requirements of users. The examples in this specification do not limit this.

In some embodiments, a preset number of pictures may be obtained in advance, and coordinate labeling may be performed on a specific text region in a picture, that is, coordinates of the specific text region in the picture are labeled. Specifically, as shown in fig. 3, each specific text region in the picture may be represented by a rectangular box simulation, and therefore, the coordinates of four vertices of the rectangular box may represent the coordinates of the specific text region. Of course, each specific text region is not limited to being represented by vertex coordinates of a rectangular box, and may be represented by other polygon vertex coordinates for easy labeling. The embodiment of the present specification does not limit how to label coordinates of a specific text region in a picture.

In some embodiments, the acquired preset number of pictures are pictures of the same type, for example, pictures all of the types of tickets, newspapers, or books.

In some embodiments, a preset number of pictures that are coordinate labeled with specific text regions may be used as training samples. The server may obtain a preset number of training samples by: the user may import a preset number of training samples in the server. The server may accept an imported preset number of training samples. For example, the server may provide an interactive interface to a user, and the user may import a preset number of training samples in the interactive interface. The server may obtain a preset number of training samples. Alternatively, the user may also import a preset number of training samples in the client. The client may receive a preset number of training samples imported by a user. The client may send a preset number of training samples to the server. The server may obtain a preset number of training samples. For example, the client may provide an interactive interface for a user, and the user may import a preset number of training samples into the interactive interface, so that the client may receive the preset number of training samples imported by the user and send the preset number of training samples to the server. The client may be, for example, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The client may be capable of communicating with the server, for example, via a wired network and/or a wireless network. Of course, the server may also obtain the preset number of training samples in other manners, and in this embodiment of the present specification, there is no limitation on how the server obtains the preset number of training samples.

S220: and extracting the characteristic value vector of the picture in the training sample.

In some embodiments, the vector of feature values comprises a vector of pixel values. Specifically, a picture may be generally formed by a plurality of pixels. For example, a picture displayed on a display is displayed by energizing each light-emitting element capable of displaying a different color on a display screen, and finally, a plurality of such light-emitting elements on the screen are combined to restore the displayed picture. When the picture displayed by the display screen is at the native resolution, each element of the picture for display on the display screen corresponds to each pixel on the picture.

If the picture is a color picture, the color displayed after each light-emitting element is powered on depends on the RGB values of the corresponding pixels in the picture. The RGB color scheme is a color standard in the industry, and various colors are obtained by changing three color channels of Red (Red, R), Green (G), and Blue (B) and superimposing them with each other, and RGB is a color representing three channels of Red, Green, and Blue, and also becomes three primary colors. This color scheme, which includes almost all colors that human vision can perceive, is one of the most widely used color systems. In one commonly used RGB standard, the amount of R, G, B for each color is represented by 1 decimal number between 0 and 255 (corresponding to binary numbers 00000000-11111111). In another common RGB standard for web pages, the RGB value for a pixel is identified by a 6-bit hexadecimal number, such as in the form of # 000000. Those skilled in the art will readily appreciate that the RGB color of a pixel, each color quantity identified by 1 decimal number between 0 and 255, can be converted to a 6-digit hexadecimal number, i.e., there is a one-to-one correspondence between the different representations. In general, the corresponding parts of red (R), green (G) and blue (B) in these standards are mixed to give the final display color of the pixel.

In some embodiments, the RGB values of each pixel in the picture may be extracted and the extracted RGB values may be sorted in an order, for example, by pixel number, so that a feature value vector of the picture may be constructed.

If the picture is a gray picture, the gray color displayed by each light-emitting element on the display after being electrified depends on the RGB values of the corresponding pixels in the picture. In this case, the RGB values of such gray pixels have a certain regularity. In one commonly used criterion, the gray values R, G, B are equal. Thus, using this standard, the gray scale can be divided into 256 levels, representing the color depth of the dots in a black and white image. The gray scale values may also be represented in other ways, such as by 1byte of data. At this time, the value of each bit of the 1byte data has a certain corresponding relationship with the RGB value. Thus, the gray value of each pixel in the picture is extracted, and the extracted gray values are sorted according to the pixel number sequence, so that the characteristic value vector of the picture can also be obtained.

In some embodiments, the above processing may be performed for each picture in the training sample, that is, a feature value vector of each picture is extracted.

In some embodiments, different pictures may have different resolutions. For example, some pictures may have a resolution of 300 × 160, and some pictures may have a resolution of 320 × 150. Therefore, to improve the accuracy of feature vector extraction of pictures, the pictures in the training sample may be set to the same resolution. Specifically, the pixels of the first picture in the training sample may be used as a reference, and the pixels of the subsequent pictures are all set to be the same as the pixels of the first picture. For example, the first picture has pixels of 300 × 160, then the pixels of all the other pictures in the training sample may be set to 300 × 160. Or one pixel can be preset, and all pictures in the training sample are set as preset pixels. For example, if the preset pixels are 320 × 150, the pixels of the picture in the training sample may be all set to 320 × 150.

In some embodiments, after the pictures in the training sample are set to the same pixel, the feature value vector of the pictures in the training sample is extracted.

S230: and calculating the classification condition of a specific text region in the training sample based on the extracted feature value vector of each picture in the training sample.

For the position of a specific text region in a picture, the classification condition of the specific text region in the training sample can be calculated based on the extracted feature value vector of each picture in the training sample. For example, feature vectors of specific text regions in different pictures or the same picture are similar, and feature vectors of specific text regions and non-specific text regions in different pictures or the same picture are different or dissimilar. Therefore, the classification condition of a specific text region in the training sample can be calculated based on the extracted feature value vector of each picture in the training sample.

In some embodiments, a deep learning algorithm may be employed to compute the classification condition for a particular text region in the training sample. Specifically, a deep learning algorithm can be used to construct a classification model. For example, the classification model may be constructed using a linear regression algorithm, a logistic regression algorithm, a decision tree algorithm, or a support vector machine algorithm. Of course, any other deep learning algorithm may be used to construct the classification model, which is not limited in the embodiments of the present specification.

In some embodiments, feature value vectors of pictures in the training sample may be used as input, and the classification model may be trained, and the classification condition of a specific text region in the training sample may be calculated based on the extracted feature value vector of each picture in the training sample.

S240: and positioning the specific text in the picture to be positioned based on the classification condition.

In some embodiments, which regions in the picture are feature text regions can be determined according to the classification conditions, and the position coordinates of the feature text regions in the picture are determined. Specifically, the locating the specific text in the picture to be located based on the classification condition may include the following steps.

S241: and extracting a characteristic value vector of the picture to be positioned.

Specifically, S120 may be referred to in the step of extracting the eigenvalue vector of the picture to be positioned.

S241: and calculating a classification value of the characteristic value vector of the picture to be positioned based on the classification condition.

In some embodiments, a classification value of the feature value vector of the picture to be positioned may be calculated according to the classification condition. Specifically, the feature value vectors of different regions of the picture to be positioned are different, for example, the classification value of the feature value vector of the picture to be positioned can be calculated according to the classification condition, where the specific text region and other regions in the picture are different. The classification value points to the region in which the particular text is located. The classification value may be a coordinate of a specific text region in the picture to be recognized, for example, the classification value may be in the form of (a, b), where a represents an abscissa of the specific text region and b represents an ordinate of the specific text region. Of course, since there may be one or more specific text regions in the picture to be recognized, the classification value may be one coordinate or a plurality of coordinates. The classification value may also be a value that can characterize the coordinates of a specific text region in the picture to be recognized, for example, the classification value may be represented by numbers or letters. Specifically, the picture to be recognized may be divided into a plurality of regions, each region has a corresponding number or letter, for example, the picture to be recognized may be divided into 3 regions, the numbers 1, 2, and 3 may represent each region at all, and the classification value may be one or more of the numbers 1, 2, and 3.

S241: and determining the coordinates of a specific text area in the picture to be positioned according to the classification value.

In some embodiments, no matter which representation the classification value represents, the coordinates of a specific text region in the picture to be positioned can be determined according to the classification value.

In some embodiments, the method may further include performing OCR recognition on a specific text region in the picture to be located according to coordinates of the specific text region, and converting a specific text in the specific text region into a preset text format for output.

In some embodiments, the picture may include a plurality of different specific text regions, each specific text having different content. For example, the deposit receipt of the chinese construction bank usually includes fixed printed contents such as "account", "account name", "currency" and the like, and includes handwritten or printed contents such as printed contents which are handwritten by the user or generated according to information provided by the user in a blank portion following the "account", "account name", "currency" and the like. The handwritten or printed content is a specific text, and the specific text of types such as account information, account name information, currency information and the like is in different areas. However, in some scenarios, not all of the specific texts are information to be collected, and only one or more specific texts may be required to be subjected to information collection. Therefore, in order to improve the efficiency of information acquisition, the pictures in the training samples are also labeled with the category information of the specific text; correspondingly, the method further comprises the following steps: calculating a classification condition of a specific text in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning a specific text in the picture to be positioned based on the classification condition and the category division condition, and determining the category of the specific text.

In some embodiments, the category information may characterize the content information type of a particular text. The category information includes name, account number, and amount. Of course, the type information may also include other information, such as currency, date, etc. The embodiments of the present specification do not limit the type information.

In some embodiments, the feature vectors of specific text regions of the same category of feature text are similar, and therefore, the category classification condition of the specific text in the training sample can be calculated based on the extracted feature value vector of each picture in the training sample.

In some embodiments, a deep learning algorithm may be employed to calculate the classification condition of a specific text in the training sample. Specifically, a deep learning algorithm can be used to construct a classification model. For example, the classification model may be constructed using a linear regression algorithm, a logistic regression algorithm, a decision tree algorithm, or a support vector machine algorithm. Of course, any other deep learning algorithm may be used to construct the classification model, which is not limited in the embodiments of the present specification.

In some embodiments, feature value vectors of pictures in the training sample may be used as input, and the classification model may be trained, and the classification model may calculate a classification condition of a specific text in the training sample based on the extracted feature value vector of each picture in the training sample.

In some embodiments, the category of the specific text may be determined according to the category classification value of the feature vector of the specific text region in the picture to be recognized according to the category classification condition. Wherein the category classification value may characterize a category of the feature text. For example, the category classification value may be represented by a number, 1 represents a feature text of a name category, 2 represents a feature text of an account number category, or the like. Of course, the category classification value may also be represented by letters, a combination of letters and numbers, or any other manner, which is not limited in this specification.

In some embodiments, the method may further comprise: acquiring coordinates of a specific text area of a preset category in the picture to be positioned; and performing OCR recognition on the specific text area of the preset category according to the coordinates, and converting the specific text in the specific text area into a preset text format for output.

The method provided by the embodiment of the specification can acquire a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; based on the extracted feature value vector of each picture in the training sample, classifying conditions of a specific text region in the training sample are obtained; and positioning the specific text in the picture to be positioned based on the classification condition. The method provided by the embodiment of the specification can directly position the coordinates of the specific text position to be identified through one-time text detection, has high running speed and high efficiency, does not need to deduce the corresponding specific text position through identifying the fixed mark of the printing template, and is more accurate in positioning the specific text.

Fig. 4 is a functional structure diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device may include a memory and a processor.

In some embodiments, the memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the method for locating a specific text in a picture by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the user terminal. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an APPlication Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The processor may execute the computer instructions to perform the steps of: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; calculating classification conditions of specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning the specific text in the picture to be positioned based on the classification condition.

In the embodiments of the present description, the functions and effects specifically realized by the electronic device may be explained in comparison with other embodiments, and are not described herein again.

Fig. 5 is a functional structure diagram of a device for locating a specific text in a picture according to an embodiment of the present disclosure, and the device may specifically include the following structural modules.

An obtaining module 510, configured to obtain a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling;

an extracting module 520, configured to extract a feature value vector of a picture in the training sample;

a calculating module 530, configured to calculate a classification condition of a specific text region in the training sample based on the extracted feature value vector of each picture in the training sample;

and the positioning module 540 is used for positioning the specific text in the picture to be positioned based on the classification condition.

The embodiment of the present specification further provides a computer-readable storage medium for a method for locating a specific text in a picture, where the computer-readable storage medium stores computer program instructions, and the computer program instructions, when executed, implement: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; calculating classification conditions of specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning the specific text in the picture to be positioned based on the classification condition.

In the embodiments of the present specification, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used for storing the computer programs and/or modules, and the memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the user terminal, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory. In the embodiments of the present description, the functions and effects specifically realized by the program instructions stored in the computer-readable storage medium may be explained in contrast to other embodiments, and are not described herein again.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and the same or similar parts in each embodiment may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, as for the apparatus embodiment and the apparatus embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference may be made to some descriptions of the method embodiment for relevant points.

After reading this specification, persons skilled in the art will appreciate that any combination of some or all of the embodiments set forth herein, without inventive faculty, is within the scope of the disclosure and protection of this specification.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and vhjhd (Hardware Description Language), which is currently used by most popular version-software. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A method for locating a specific text in a picture, the method comprising:

acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling;

extracting a characteristic value vector of a picture in the training sample;

calculating classification conditions of specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample;

and positioning the specific text in the picture to be positioned based on the classification condition.

2. The method of claim 1, wherein the specific text comprises at least one of handwritten text filled by a user and printed text generated according to information provided by the user.

3. The method of claim 1, wherein the vector of eigenvalues comprises a vector of pixel values.

4. The method according to claim 1, wherein the pixels of the pictures in the obtained training samples are set as preset pixels before the feature value vectors of the pictures in the training samples are extracted.

5. The method according to claim 1, wherein the calculating the classification conditions of specific text regions and non-specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample comprises:

and calculating the classification condition of a specific text region in the training sample through a deep learning algorithm based on the extracted characteristic value vector of each picture in the training sample.

6. The method of claim 5, wherein the deep learning algorithm comprises at least one of a linear regression algorithm, a logistic regression algorithm, a decision tree algorithm, and a support vector machine algorithm.

7. The method according to claim 1, wherein the locating the specific text in the picture to be located based on the classification condition comprises:

extracting a characteristic value vector of the picture to be positioned;

calculating a classification value of the feature value vector of the picture to be positioned based on the classification condition;

and determining the coordinates of a specific text area in the picture to be positioned according to the classification value.

8. The method of claim 1, further comprising:

and performing OCR recognition on the specific text region according to the coordinate of the specific text region in the picture to be positioned, and converting the specific text in the specific text region into a preset text format for output.

9. The method according to claim 1, wherein the pictures in the training sample are further labeled with category information of specific texts;

correspondingly, the method further comprises the following steps:

calculating a classification condition of a specific text in the training sample based on the extracted feature value vector of each picture in the training sample;

and positioning a specific text in the picture to be positioned based on the classification condition and the category division condition, and determining the category of the specific text.

10. The method of claim 9, wherein the category information characterizes a content information type of a particular text.

11. The method of claim 10, wherein the category information includes at least one of a name, an account number, and an amount.

12. The method of claim 9, further comprising:

acquiring coordinates of a specific text area of a preset category in the picture to be positioned;

and performing OCR recognition on the specific text area of the preset category according to the coordinates, and converting the specific text in the specific text area into a preset text format for output.

13. An apparatus for locating a specific text in a picture, the apparatus comprising:

the acquisition module is used for acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling;

the extraction module is used for extracting the characteristic value vector of the picture in the training sample;

the calculation module is used for calculating the classification condition of a specific text region in the training sample based on the extracted feature value vector of each picture in the training sample;

and the positioning module is used for positioning the specific text in the picture to be positioned based on the classification condition.

14. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; calculating classification conditions of specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning the specific text in the picture to be positioned based on the classification condition.

15. A computer readable storage medium having computer instructions stored thereon that when executed perform: acquiring a preset number of training samples; the training sample is a picture in which a specific text area in the picture is subjected to coordinate labeling; extracting a characteristic value vector of a picture in the training sample; calculating classification conditions of specific text regions in the training sample based on the extracted feature value vector of each picture in the training sample; and positioning the specific text in the picture to be positioned based on the classification condition.