CN113205046A

CN113205046A - Method, system, device and medium for identifying question book

Info

Publication number: CN113205046A
Application number: CN202110485611.XA
Authority: CN
Inventors: 郭子滔; 匡柘溪; 王岩
Original assignee: Zuoyebang Education Technology Beijing Co Ltd
Current assignee: Beijing Baige Feichi Technology Co ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-08-03
Anticipated expiration: 2041-04-30
Also published as: CN113205046B

Abstract

The invention relates to the technical field of image recognition, in particular to recognition of incomplete images containing contents, and provides a method, a system, a device and a medium for recognizing a subject book, which aim to solve the technical problem of matching complete resources according to images of incomplete pages, aiming at solving the problem that the existing subject book image recognition depends on the conditions of image acquisition/collection, particularly the recognition efficiency and accuracy are not ideal, and even recognition is wrong or impossible when the images containing the contents are incomplete. Therefore, the method of the invention analyzes the text information obtained from the input image of the subject book containing the incomplete page, searches and determines the character distinguishing points, and then processes the searching result according to the character distinguishing points to obtain the recognition result. Therefore, whether the image of the subject book is complete or not does not affect the accuracy and the recognition efficiency of retrieval and recognition through the content, and the incomplete page image is matched with the complete resource without depending on the image acquisition condition.

Description

Method, system, device and medium for identifying question book

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a method, a system, a device and a medium for recognizing a title book, which are particularly suitable for recognizing image combined content.

Background

In the prior art, the identification of the question book depends on complete and comprehensive image acquisition (shooting, placing and the like) or image information, so that accurate information can be extracted, the corresponding question book is determined, and an identification result is given. Once the image of the existing problem book has defects, the accuracy of the identified page cannot be guaranteed, so that the problem book is inaccurate in identification, inaccurate identification results are given, and even the defect that the identification is wrong or cannot be identified exists.

Therefore, in order to solve the above problems and improve the user experience, the present application is proposed, and the problem to be solved at least includes how to match the complete resources according to the incomplete page information.

Disclosure of Invention

Technical problem to be solved

The invention aims to solve the technical problem of how to determine corresponding subject book content according to incomplete page information, and further match complete resources according to the incomplete page information; furthermore, the obtained image of the whole page of the subject book or the image of the page of the subject book with the context can be more accurately judged by searching and analyzing which subject book is used by the user.

(II) technical scheme

In order to solve the above technical problem, a first aspect of the present invention provides a method for identifying a subject book, including: acquiring text information of a subject book image; according to the text information, retrieving and determining character distinguishing points; and processing the retrieval result based on the character distinguishing point to obtain an identification result corresponding to the image.

According to the preferred embodiment of the present invention, acquiring the text information of the subject book specifically includes: obtaining an input image of a subject book to be identified; performing OCR recognition on the image of the subject book to obtain the text information of the image of the subject book; wherein the image comprises a non-complete image or a complete image.

According to the preferred embodiment of the present invention, the OCR recognition is performed on the image of the subject book, and specifically includes: locating each line of text and identifying the content of said each line of text based on a convolutional neural network; each line of text content identified is connected in series according to the text line typesetting sequence to obtain the result of the text information identified by the OCR; wherein the content of the text information comprises at least one or more of the following items: text, characters, graphics, background.

According to the preferred embodiment of the present invention, before acquiring the text information of the subject book image, the method further comprises: and determining the rough position of the theme in the image of the theme book in advance so as to eliminate the interference information which is not the theme in the image of the theme book.

According to the preferred embodiment of the present invention, retrieving and determining text distinguishing points according to the text information specifically includes: analyzing the text information to obtain a keyword; searching according to the keywords to obtain the text information of each resource title with the same keywords; analyzing common characters between the text information and the text information of each resource topic book and extracting unique characteristics corresponding to the common characters to determine one or more character distinguishing point pairs; selecting one or more character distinguishing point pairs to determine the character distinguishing points in the one or more character distinguishing point pairs with the most representativeness as character distinguishing points for processing the retrieval result; the unique features include one or more of the following: text content, image pixels around text, text content and/or image information around text.

According to a preferred embodiment of the present invention, selecting one or more of the text-distinguishing point pairs at least comprises: the selection is made according to the character distinguishing point pair, wherein the pixel change gradient around the character is the largest and/or the character is in a special position.

According to a preferred embodiment of the present invention, processing the retrieved result based on the text-distinguishing point to obtain an identification result corresponding to the subject book specifically includes: based on each feature information of the longest common substring and/or the character distinguishing points, sorting the retrieval results and determining a final retrieval result candidate set; and outputting the final retrieval result candidate set to a user as the identification result of the subject book.

According to a preferred embodiment of the present invention, further comprising: before the retrieval result is processed, whether the retrieval result is correct is determined at least through integral text information matching and based on text distinguishing point matching probability and/or image pixel matching probability indexes; or, before the final search result candidate set is output, whether the result in the search result candidate set is correct is determined at least through overall text information matching and based on the index of the text distinguishing point matching probability and/or the image pixel matching probability.

In order to solve the above technical problem, a second aspect of the present invention provides an electronic device, including a processor and a memory, where the memory is used for storing a computer executable program, and when the computer program is executed by the processor, the processor executes the method for recognizing the subject book according to the first aspect.

To solve the above technical problem, a third aspect of the present invention provides a computer-readable medium storing a computer-executable program, which when executed, implements the method for recognizing a subject book according to the first aspect.

To solve the above technical problem, a fourth aspect of the present invention provides a subject book identification system, including: the input processing module is used for acquiring text information of the theme image; the retrieval determining module is used for retrieving and determining character distinguishing points according to the text information; and the output processing module is used for processing the retrieval result based on the character distinguishing point so as to obtain the identification result corresponding to the question book.

(III) advantageous effects

The invention obtains corresponding text information by performing OCR recognition on any input incomplete page of the title book image, analyzes the text information to determine the character distinguishing points, executes retrieval by using extracted keywords in the text information and processes and sequences retrieval results by using the character distinguishing points, avoids the problems that the prior title book identification process depends too much on the characteristics of the image and the image acquisition/acquisition mode and whether the image is shielded and covered and cannot be identified or wrongly identified, is particularly suitable for the defect condition of the page information of the title book image, can ensure the correct identification and the complete matching resources without depending too much on the image quality and the integrity, thereby improving the fault tolerance of the page information identification of any title book image and considering the incomplete image page information conditions of whether the image is reversed or covered and the like, therefore, the identification efficiency and the identification convenience are improved for the identification of the image with the combination of the content and the background image, and the identification accuracy can be further improved.

The invention can locate and identify through the convolutional neural network during conversion and clean based on the rough position/area/range of the text characters of the question in the locating and identifying process to eliminate the interference of the contents of non-main non-important points (namely, do not identify), can more accurately locate and identify the text information, and is further favorable for fast, accurate and efficient retrieval and obtaining more accurate results.

After the text information is analyzed and keywords are extracted to search or index out the corresponding question book result of the resource side, various character information related to characters/characters between the text information of the question book in the result is analyzed to obtain a plurality of character distinguishing point pairs, one or more character distinguishing point pairs which can reflect the distinguishing characteristics most can be further determined from the character distinguishing point pairs, and the character distinguishing points are used as the basis for processing the searched question book result. The text distinguishing point refers to the same text/character which appears at the same position in the page in the user input page and the resource page, and the text/character has stronger representativeness, so that each peripheral information, position information and the like of the text/character are assisted in the process of determining the text distinguishing point, a candidate set is obtained by utilizing the text distinguishing point to perform retrieval result processing (for example, result sorting is performed to obtain a plurality of preset numbers), the result of the common text/character (including each characteristic information) can be more accurately determined to be in accordance with the subject book to be identified by the user, and the processing efficiency is further improved.

The invention can optimize the retrieval result or the processed candidate set, for example, one or more indexes (similarity, various matching probabilities and the like) can further eliminate wrong results, and ensure the accuracy and the correctness of the identification result.

Drawings

FIG. 1 is a primary flow diagram of one embodiment of a topic identification method in accordance with the present invention;

FIG. 2 is a block diagram of the main structure of one embodiment of a subject identification system according to the present invention;

FIG. 3 is a main block diagram of one embodiment of an electronic device according to the present invention;

FIG. 4 is a schematic diagram of the main structure of an embodiment of an electronic device according to the present invention;

FIG. 5 is a schematic diagram of the main structure of one embodiment of a computer-readable medium according to the present invention.

Detailed Description

In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that one skilled in the art may, in certain cases, practice the present invention in a manner that does not include the structures, properties, effects or other features described above.

The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, operations/steps in the flowcharts may be divided, operations/steps may be combined or partially combined, and the like, and the execution sequence shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.

The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.

The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will also be understood that, although the various devices, elements, components or sections may be described herein using terms such as first, second, third, etc. to indicate a number, these devices, elements, components or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Moreover, the terms "and/or," "and/or" are intended to include all combinations of any one or more of the listed items.

The term "question book" as used herein refers to various electronic or paper documents including, but not limited to, online or offline exercise books, answer cards, answer sheets, exercise books, and the like, which are related to the contents of test questions.

In order to solve the above technical problem, an embodiment of the method for identifying a subject book provided by the present invention mainly includes: acquiring the text information of the subject book from the input incomplete subject book image, determining keywords according to the analyzed text information for searching and determining character distinguishing points, processing the searching result on the basis of the character distinguishing points, identifying the subject book actually corresponding to the subject book image, and realizing that the incomplete subject book image can also be accurately matched with complete resources.

[ example 1 ]

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the method implementation of the present invention with reference to the accompanying drawings is provided.

This will be described in connection with the flow chart of the main steps of one embodiment of the method of the present invention shown in fig. 1.

In step S110, text conversion is performed on an arbitrary subject image to obtain subject text information.

In one embodiment, it is desirable to obtain an image to be identified. For example, an image of a subject book to be identified is captured. Specifically, the manner and source of the image acquisition, including but not limited to the manner and source of various known server or terminal acquisition, and also not limited to the manner and source of various known remote network and/or local acquisition, can be used as the source of the user input image.

For example: live shots (including pictures, videos), shots that have been stored (e.g., retrieved from a gallery), may be delivered (e.g., downloaded) by a remote content server over the internet, may be uploaded offline locally or remotely, and so forth. Further, these images (pictures, images) of the subject book to be identified may be referred to as subject book images, and if the images are in a video state (such as video recording), the pictures or images of each frame may be captured. Further, these album images may include, but are not limited to, full and incomplete status. Further, the incomplete state includes, but is not limited to, the case where there is a defect such as a photo of the subject book being taken out of the lower right corner of the page or the direction being incorrect. Generally, the users who use the problem book are more minors such as students or the users are in urgent need of answering, so that various incomplete states such as inclination, incorrect position, inversion, missing some parts of the problem book, blocked part, flaws and the like, namely an incomplete image, may exist in the collected image of the problem book, for example, a page shot in real time.

In one embodiment, the text/character recognition may be performed on the image input or to be recognized by the user, for example, the image is converted into computer text processing for each position/area/range and each portion in the image. Further, the conversion method may adopt an OCR recognition method, for example, to analyze the image file to obtain the text and layout information.

The user input can be input through various terminals (mobile phone, tablet computer, mobile computer, desktop computer, other handheld terminal devices, etc.), such as: and opening a client application program APP installed on the terminal, a webpage accessed through a link and the like, and inputting.

Specifically, examples of performing image conversion include text positioning, i.e., finding an area where a title word exists in an image, and text recognition, i.e., performing character recognition on a word in the area where the word exists to obtain the meaning of a text area, including but not limited to end-to-end word recognition and a scene text positioning algorithm.

An example of a conversion is as follows: firstly, the position of each text line can be positioned based on the convolutional neural network CNN, then the content of each line of text can be identified based on the convolutional neural network CNN, and then the identified content of each line of text is connected in series according to the typesetting sequence of the text lines, so that the process of converting the subject image into characters/characters, namely OCR conversion is completed, and the result of the text information, namely the characters/characters corresponding to the subject image is obtained. The convolutional neural network and the cyclic neural network can adopt several existing neural network models for realizing image/picture conversion, and can also adopt models such as SVM and the like. The specific application and conversion process will not be described in detail herein.

In one embodiment, the processing of the converted text message may be optimized by first eliminating the interference or cleaning the interference message. The method mainly comprises the steps of determining the relative position of the title in the image picture, and removing the area except the area corresponding to the title in the image picture according to the relative position so as to eliminate interference information; and performing character recognition conversion on the image picture after the interference information is eliminated to obtain a corresponding recognition text. For example, a rough position of the topic text in the topic image is determined in advance, so as to remove the interference information (for example, some marks made by the user, such as handwriting, etc.) that is not the topic in the topic image.

Taking the picture of the subject book to be identified as a whole page image as an example: an entire page of the image may be cleaned. One example of cleaning can be mainly achieved by removing the interference information from the content of a whole page image (or a frame image). The method specifically comprises the following steps: in a whole page of image, the general position of the subject characters in the image (i.e. the image region corresponding to the converted text information) is roughly/basically determined, and the information not in this general position is not specifically analyzed and identified, that is, the general/basic range or region (i.e. the position) where these converted subject characters/characters and the like are located in the image is found in advance for conversion identification in such a way as picture scanning, character features (background, size, thickness, font, color and the like), even preset conditions (the condition that the middle-upper part of the obtained image is the position range) and the like, so as to further realize the improvement of processing efficiency and the accuracy of the object to be identified.

In one embodiment, obtaining the text information of the title may mainly include, but is not limited to, obtaining text information of characters identified as a title part in an image of the title to be recognized, and the like. The text information in the aspect of title includes words, characters and the like, and the text information subjected to OCR conversion includes but is not limited to words of positioning area, and various information related to the words (information in the aspect of image such as pixel, character, background, color, size, font, thickness and the like).

For example, text information (including words and characters) in terms of topics: 10% of the rise in the early 9 th month, 15% of the fall in the early 11 th month and the fall in the early 10 th month, and more than 9% of the rise or fall in the early 11 th month of beef price? What is the magnitude of the rise and fall? And so on. The text information recognized by the OCR includes, but is not limited to, characters of the positioning area, various information related to the characters (information on image such as pixels, characters, background, color, size, font, thickness, etc.).

One application scenario is for example: the text is obtained from the image of the subject book input by the user through OCR recognition.

Therefore, the text information in the image can be directly obtained without considering whether the state of the image is complete or incomplete. That is, the present invention does not need to limit the manner of capturing the image of the subject book and the manner of extracting the image, and particularly does not need to consider the situation that the image/image has defects or flaws, and the identification can be performed only if the content and the image (picture, image, etc.) with context information exist, and the identification category can be recognized without specially combining the specifications of the special structure, information, position, symbol, etc. of the page, etc. to obtain each text information with corresponding context support.

In step S120, according to the analysis of the text information, a search is performed and a word segmentation point is determined.

In one embodiment, the obtained text information is analyzed to obtain corresponding keywords and characters of character distinguishing points, and the retrieval/index of the subject book (subject book text information) stored in the resource library (database) corresponding to the text information is performed according to the keywords to obtain the corresponding subject book, and the input subject book to be recognized and the corresponding character distinguishing points in the retrieved subject book are further analyzed, so that the corresponding accurate resources can be found by using the specific character distinguishing points subsequently.

The keywords can be extracted from the text information in modes of model prediction and/or probability statistics and the like. Specifically, when obtaining text information, the text can be analyzed by model prediction and/or probability statistics, and extracted: predicted or high probability words. And searching by using the keywords to obtain a search result and obtain a series of album resources containing the keywords. For example: the available keywords (text information) are selected by means of frequency statistics, model prediction (such as the prediction performed by the ner of the neural network model), and the like, and then the search is performed.

Wherein, the text distinguishing points: refers to a character with strong representativeness in the page.

Wherein, the character distinguishing point pair: refers to the same characters with stronger representativeness existing in the input pages and the resource pages.

For example: the same characters appear at the same positions in pages respectively displayed by a user input page (an image of a subject book to be recognized) and a resource page (an image stored already), are characters with stronger representativeness, and can be used as character distinguishing points; the text distinguishing point on the input page and the text distinguishing point on the resource page are a pair of text distinguishing points, which are also called text distinguishing point pairs.

In one embodiment, a pair of character distinguishing points can be determined by searching for characters which are 'common' in user input text information and resource text and extracting unique characteristics of the characters. The text discrimination points may have one or more pairs. For example: the character distinguishing points are determined in the text information of the identified image book page, the unique characteristics of the found characters can be extracted by searching the characters which are 'common' in the text information corresponding to the user input text (such as the identified text information) and the resource image book page, the characteristics include but are not limited to various information such as character content, image pixels around the characters and the like and/or comprehensive information of various information, and then, a pair of character distinguishing points is determined when one common character is found.

Further, the text-distinguishing points are analyzed, that is, the most representative text-distinguishing points are found, for example: in the searching process or the preliminary screening process, there may be hundreds of character distinguishing points, and it is required to select the character with the strongest certainty or the strongest accuracy and the strongest representativeness, such as the character feature with the strongest uniqueness, where the selection at least includes but is not limited to the characteristics of the character that the gradient of the pixel change around the character is the largest, and/or the position itself is more specific, and so on.

One application scenario is for example: according to keywords extracted from text information of a user input question book A, retrieving/indexing question books A1, A2 and … … An stored in a batch of resource libraries (databases) with the same keywords, finding characters (including keywords) shared by A and A1, A and A2 and … … A and An in pairs, extracting unique features of the characters, finding hundreds of character distinguishing point pairs between A and A1, A2 and … … An, determining the character distinguishing point pair with the strongest representativeness in the character distinguishing point pair according to the modes of maximum change gradient of surrounding pixels of each pair, special positions and the like, and taking the corresponding character distinguishing point as a character distinguishing point for processing a subsequent retrieval result.

Therefore, the method extracts the keywords to search a series of resources containing the keywords in a text information analysis mode, finds more accurate and representative character distinguishing points, and is beneficial to subsequent improvement of efficiency and accurate hit of corresponding resource results (recognition results). Therefore, no matter whether the input image page is complete or not, related resources are found directly by depending on text information, and further, characteristics for further accurate recognition, such as character distinguishing points, are determined so as to realize matching of complete resources.

In step S130, an identification result corresponding to the image of the subject book is obtained according to the determined text region and the search result.

In one embodiment, after a topic book (text information, which may correspond to a corresponding topic book image) of a resource library, that is, a retrieval result, is obtained by performing retrieval/indexing through keywords in the text information, the retrieval result may be processed based on a determined word segmentation point, so as to obtain an accurate identification result, that is, a resource topic book matched with an input topic book image, and particularly, for an input incomplete topic book image, the matching complete resource topic book may be determined by accurately identifying the retrieval and word segmentation point by using the text information without considering the problem of image defects, thereby implementing that the incomplete image can accurately position or match complete resources.

An example is as follows: through a machine learning model (such as a logistic regression model), the problem books of the indexed resource library can be sorted at least based on each feature information such as the longest common substring and/or character distinguishing point, wherein various sorting modes using the feature information can adopt a plurality of common situations in the model, and are not described herein again. Further, a question book of the resource inventory corresponding to one or more character distinguishing point pairs in the front of the sequence is selected as a final retrieval candidate set, and then the final retrieval candidate set can be output to a user in sequence, so that a plurality of recognition results are output in the sequence.

In one embodiment, the returned search result may be processed to determine its correctness before or after determining the output candidate set, so that the final output result of the candidate set is more accurate. Specifically, whether the returned result matches with the subject book input by the user may be judged through several indexes, and the several indexes listed below are judgment modes, and may be used singly or in any combination of two or more, and the like, which is not limited herein.

For example: 1. the matching condition of the whole text can be determined by means of similarity calculation, neural network prediction and the like, the matching degree between the retrieved subject book and the input subject book is determined, and the similarity calculation and neural network prediction modes can adopt common calculation and prediction modes, such as: euclidean distance, Manhattan distance and the like, as well as a probabilistic neural network model, BP, RBF, PNN and the like, which are not described herein again; 2. Based on the matching condition of the text distinguishing points, for example, by analyzing that a plurality of text distinguishing points, namely text distinguishing point pairs, which can correspond to each other exist between the subject book in the retrieval result or the candidate set and the input subject book respectively, the probability that the two text distinguishing points are on the same page is increased; 3. if the matching of the image pixels is better if the two images can be matched in many places, the probability that the two images are the same page is increased. Therefore, the search result can be further screened to find one or more suitable and more accurate identified problem books. Here, the recognition result of the output of the judgment on whether the subject book is correct is a floating point number between 0 and 1, that is, the confidence level is analyzed and output according to the index judgment, and the closer to 1, the higher the confidence level of the recognition result is. Furthermore, the recognition result is judged as a confidence level and can be directly displayed to the user or output to the user in a voice prompt mode and the like. Further, these pieces of information (confidence, search candidate individual titles, etc.) may be integrated to obtain a final result.

One application scenario is for example: retrieving/indexing the titles A1, A2, … … An of a batch of resource stocks with the same keyword according to the keyword extracted from the text information of the title A input by the user, wherein 1, 2, … … n are natural number representation numbers which are more than or equal to 1, and the final character distinguishing point a is determined; the confidence degrees of A1 to An are analyzed through the 1 to 3 indexes and output as recognition results, and the results of A3 to A33 in A1 to An are determined after comprehensive analysis; then, based on the feature information of the longest common substring and/or character distinguishing point and the like, the problem books from A3 to A33, namely the resource library, are sorted by the logistic regression model and the like based on the feature information of the longest common substring and/or character distinguishing point a to obtain a candidate set of the recognition results of the problem book to be recognized, which is input by the user, and then the candidate set can be output to the user in sequence, so that a plurality of recognition results are sorted and output (only the first 10 outputs can be selected), and the first problem book is the problem book stored in the resource library which is most matched with the problem book to be recognized, namely which problem book the problem book to be recognized is recognized. Therefore, subsequent work of tutoring and teaching and the like performed on the question book can be supported. In this example, the correctness judgment is performed by using the index for the search result, or the correctness judgment is performed by using the index after the candidate set is screened out, which is not taken as the limitation of the present invention in the example of the application scenario.

Therefore, the invention can more accurately and effectively realize the retrieval and identification of the image result through the retrieval and analysis mode without being limited by the particularity of the image, particularly for the specific subject book content, the feature information related to the character distinguishing points can more accurately distinguish and determine which subject books in the retrieval result are closer to the actual user use, the operation burden of local and network is not increased through the image identification mode, the resource consumption is reduced, and the identification accuracy and speed are improved.

[ example 2 ]

In order to make the objects, technical solutions and advantages of the present invention more apparent, a system implementation of the present invention is described in detail below with reference to specific embodiments and accompanying drawings.

The main block diagram of an embodiment of the method of the present invention shown in fig. 2 will be described. In this embodiment, the system includes at least an input processing module 110, a retrieval determining module 120, and an output processing module 130.

And an input processing module 110, configured to perform text conversion based on the theme image to obtain the theme text information.

In one embodiment, it is desirable to obtain a picture to be identified, where the input theme images, optionally the theme images, may be complete or incomplete pages/frame images. For example, an image of a subject book to be identified is captured. Specifically, the manner and source of the image acquisition, including but not limited to the manner and source of various known server or terminal acquisition, and also not limited to the manner and source of various known remote network and/or local acquisition, can be used as the source of the user input image.

Specifically, examples of performing image conversion include text positioning, i.e., finding a region where a character exists, and text recognition, i.e., performing character recognition on the character in the region where the character exists to obtain the meaning of the text region, including but not limited to end-to-end character recognition and a scene text positioning algorithm.

An example of a conversion: firstly, the position of each text line can be positioned based on the convolutional neural network CNN, then the content of each line of text can be identified based on the convolutional neural network CNN, and the identified line of text content is connected in series according to the typesetting sequence of the text lines, so that the conversion of pictures/images into characters is completed. The convolutional neural network and the cyclic neural network can adopt several existing neural network models for realizing image/picture conversion, and can also adopt models such as SVM and the like. The specific application and conversion process will not be described in detail herein.

One application scenario is for example: and obtaining text from the image of the subject book input by the user through OCR recognition.

And a retrieval determining module 120, configured to perform retrieval and determine text distinguishing points according to the text information.

In one embodiment, the obtained text information is analyzed to obtain corresponding keywords and characters of the character distinguishing points, and the retrieval/indexing of the subject book (subject book text information) stored in the resource library corresponding to the text information is performed according to the keywords to obtain the corresponding subject book, and the input subject book to be recognized and the corresponding character distinguishing points in the retrieved subject book are further analyzed, so that the corresponding accurate resources can be found by using the specific character distinguishing points in the following process.

Further, the text-distinguishing points are analyzed, that is, the most representative text-distinguishing points are found, for example: in the searching process or the preliminary screening process, there may be hundreds of character distinguishing points, and it is required to select the character with the strongest certainty or as accurate as possible and the strongest representativeness, such as the character feature with the strongest uniqueness (including but not limited to the character with the largest gradient of pixel change around the character, and/or the character with more special position).

One application scenario is for example: according to keywords extracted from text information of a user input question book A, searching/indexing question books A1, A2 and … … An of a batch of resource stocks with the same keywords, finding characters (including keywords) shared by A and A1, A and A2 and … … A and An in pairs, extracting unique characteristics of the characters, finding hundreds of character distinguishing point pairs between A and A1, A2 and … … An, determining the character distinguishing point pair with the strongest representativeness according to the modes of maximum surrounding pixel change gradient, special position and the like of each pair in the character distinguishing point pairs, and taking the corresponding character distinguishing point as a character distinguishing point for processing a subsequent search result.

And the output processing module 130 is configured to obtain an identification result corresponding to the image according to the determined text distinguishing point and the search result.

An example is as follows: through a machine learning model (such as a logistic regression model), the problem books of the indexed resource library are sequenced at least based on the feature information such as the longest common substring and/or the character distinguishing points, wherein several common situations can be adopted in the model by using various sequencing modes of the feature information, and are not repeated herein. Further, a subject book of the resource inventory corresponding to one or more character distinguishing point pairs in the top order is selected as a final retrieval candidate set, and then the final retrieval candidate set can be output to a user in sequence, so that a plurality of recognition results are output in the order.

One application scenario is for example: retrieving/indexing the titles A1, A2, … … An of a batch of resource stocks with the same keyword according to the keyword extracted from the text information of the title A input by the user, wherein 1, 2, … … n are natural number representation numbers which are more than or equal to 1, and the final character distinguishing point a is determined; the confidence degrees of A1 to An are analyzed through the 1 to 3 indexes and output as recognition results, and the results of A3 to A33 in A1 to An are determined after comprehensive analysis; then, based on the longest common substring and/or character distinguishing point and other characteristic information, the problem books from A3 to A33, namely the resource library, are sorted by the logistic regression model and the like based on the longest common substring and/or the character distinguishing point a to obtain a candidate set of the recognition results of the problem book to be recognized, which are input by the user, and then the candidate set can be output to the user in sequence, so that a plurality of recognition results are sorted and output (only the first 10 outputs can be selected), and the first problem book is the problem book stored in the resource library which is most matched with the problem book to be recognized, namely the problem book to be recognized is recognized. Therefore, subsequent work of tutoring and teaching and the like can be supported for the subject book. In this example, the correctness evaluation is performed by using the index for the search result, or the correctness evaluation is performed by using the index after the candidate set is screened out, which is not taken as a limitation of the present invention.

[ example 3 ]

The following describes an overall application scenario and further illustrates the implementation of the present invention in conjunction with embodiments 1 and 2:

a database is a library for storing resources, and has many subject books (such as books). The exercise problem page (image) of the problem book shot by the user at will is provided, especially after the incomplete page with deformity is input, which problem book the user of the incomplete page is using can be identified through the text information analysis and retrieval mode, namely the problem book is accurately matched to the complete resource library.

The method specifically comprises the following steps: when the user shoots a whole page on the link or shoots a whole page of an exercise book of paper, incomplete conditions exist, and when the user helps to tutor the user, the image of the page is retrieved through text information, and the question of the page of which question book the user is using is identified to practice. Specifically, OCR recognition is performed on an image input by a user to obtain information related to characters, that is, information on content parts (including features) on an image page, for the image input by the user, there are many interferences, such as handwriting, blocking, and the like, for the image input by the user, after the image input by the user is cleaned, keywords and character distinguishing points are obtained by analyzing the content, that is, the text information, to retrieve an initial result by using the keywords in a resource library, and then the results are sorted and optimized by using the analyzed character distinguishing points, so as to find an accurate result, that is, to match the complete resource of the incomplete page, the method is not interfered by the incomplete or incomplete state of the image, directly utilizes the analysis, retrieval and processing of the text information to obtain an accurate result, and the efficiency is higher than that of direct recognition of the incomplete image, the matching is more accurate.

Therefore, the user can find a book corresponding to the exercise book of the user mainly through the text information. The character distinguishing degree (distinguishing point) is used as the basis of the image retrieval processing of the page of the title book, the efficiency, the accuracy and the interference avoidance of the method are better than those of a mode of retrieving by using an image, and the method can have the effect that the condition that the image is not complete and is difficult to be matched with complete resources due to incomplete state of the image caused by page defects generated when the image is acquired, defects of the image and the like.

[ example 4 ]

Fig. 3 is a block diagram schematically illustrating the structure of an electronic device according to an embodiment of the present invention, which includes a processor and a memory for storing a computer-executable program, wherein when the computer program is executed by the processor, the processor performs the bibliography recognition method as in the foregoing embodiment 1.

As shown in fig. 3, the electronic apparatus is in the form of a general purpose computing device. The processor may be one or more and may work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of physical devices.

The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.

The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).

Optionally, in this embodiment, the electronic apparatus further includes an I/O interface, which is used for data exchange between the electronic apparatus and an external device. The I/O interface may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and/or a memory storage device using any of a variety of bus architectures.

More specifically, refer to a block diagram of a more specific example of the electronic apparatus according to the embodiment shown in fig. 4. The electronic apparatus 200 of the exemplary embodiment is embodied in the form of a general-purpose data processing device. The components of the electronic device 200 may include, but are not limited to: at least one processing unit 210, at least one memory unit 220, a bus 230 connecting different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.

The storage unit 220 stores a computer readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 210 such that the processing unit 210 performs the steps of various embodiments of the present invention. For example, the processing unit 210 may perform the steps of the methods of the foregoing embodiments 2 to 5.

The storage unit 220 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)2201 and/or a cache memory unit 2202, and may further include a read only memory unit (ROM) 2203. The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 230 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic apparatus 200 may also communicate with one or more external devices 300 (e.g., a keyboard, a display, a network device, a bluetooth device, etc.), enable a user to interact with the electronic apparatus 200 via the external devices 300, and/or enable the electronic apparatus 200 to communicate with one or more other data processing devices (e.g., a router, a modem, etc.). Such communication may occur via input/output (I/O) interfaces 250, and may also occur via network adapter 260 with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as an Internet). The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown, other hardware and/or software modules may be used in the electronic device 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be understood that the electronic device shown in fig. 3 and 4 is only one example of the present invention, and elements or components not shown in the above examples may be further included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method or at least part of the steps of the method of the present invention.

[ example 5 ]

Fig. 5 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention. As shown in fig. 5, the computer-readable recording medium stores therein a computer-executable program, which, when executed, implements the above-described title recognition method of the present invention. The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and an electronic processing unit, a server, a client, a mobile phone, a control unit, a processor, etc. included in the system, and the present invention can also be implemented by a vehicle including at least a part of the above system or components. The invention can also be implemented by computer software for performing the method of the invention, for example, by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. of the locomotive side. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, but may also be implemented in a distributed manner by hardware entities without specific details, for example, some method steps executed by a computer program may be executed at the locomotive end, and another part may be executed in a mobile terminal or an intelligent helmet, etc. For computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be distributed over a network, as long as it enables the electronic device to perform the method according to the present invention.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein can be implemented by software, and can also be implemented by software in combination with necessary hardware.

While the foregoing detailed description of the embodiments has described the objects, aspects and advantages of the invention in further detail, it should be understood that the invention is not inherently related to any particular computer, virtual machine or electronic device, but rather can be implemented in various general-purpose devices. The invention is not limited to the specific embodiments, but rather should be construed as broadly within the spirit and scope of the invention as defined in the appended claims.

The main technical scheme of the invention is summarized as follows:

scheme 1, a subject book recognition method, comprising: acquiring text information of a subject book image; searching according to the text information, and determining character distinguishing points; and processing the retrieval result based on the character distinguishing points to obtain an identification result corresponding to the image of the subject book.

Scheme 2, according to the method for recognizing the subject book of scheme 1, the text information of the subject book is obtained, specifically comprising: obtaining a subject book image to be identified; performing OCR recognition on the question book image to obtain the text information of the question book image; wherein the image comprises a non-complete image or a complete image.

Scheme 3, according to the method for recognizing the subject book of scheme 2, performing OCR recognition on the image of the subject book, specifically comprising: locating each line of text and identifying the content of said each line of text based on a convolutional neural network; connecting each line of recognized text content in series according to the text line typesetting sequence to obtain a result of the text information recognized by the OCR; wherein the content of the text information comprises at least one or more of: text, characters, graphics, background.

Solution 4 the method for recognizing a subject book according to any one of solutions 1 to 3,

before acquiring the text information of the subject book image, the method further comprises the following steps: preliminarily determining the positions of the questions in the image of the question book in advance so as to eliminate interference information which is not the questions in the image of the question book;

and/or the presence of a gas in the gas,

the retrieving and determining the character distinguishing point according to the text information specifically comprises: analyzing the text information to obtain a keyword; searching according to the keywords to obtain text information of the resource subject book with the same keywords; analyzing common characters between the text information and the text information of each resource topic book and extracting unique characteristics corresponding to the common characters to determine one or more character distinguishing point pairs; selecting one or more character distinguishing point pairs so as to determine the character distinguishing point in one or more most representative character distinguishing point pairs as the character distinguishing point for processing the retrieval result; the unique features include one or more of the following: text content, image pixels around text, text content and/or image information around text.

Solution 5, the method for recognizing a subject book according to solution 4, wherein the selecting one or more pairs of text distinguishing points at least includes: and selecting the character with the maximum gradient of pixel change around the character and/or the character in a special position according to the character distinguishing point pair.

Scheme 6, the method for recognizing a subject book according to any one of schemes 1 to 5, wherein the step of processing the retrieved result based on the text distinguishing points to obtain a recognition result corresponding to the subject book specifically includes: sorting the retrieval results at least based on the feature information of the longest common substring and/or the character distinguishing points to determine a final retrieval result candidate set; and outputting the final retrieval result candidate set to a user as the identification result of the question book.

Scheme 7, the subject book identification method according to scheme 6, further comprising: before the retrieval result is processed, determining whether the retrieval result is correct at least through integral text information matching and based on text distinguishing point matching probability and/or image pixel matching probability indexes; or, before the final search result candidate set is output, whether the result in the search result candidate set is correct is determined at least through overall text information matching and based on the text distinguishing point matching probability and/or the image pixel matching probability index.

Scheme 8, an electronic device comprising a processor and a memory, the memory for storing a computer executable program, the processor executing the method of theme identification according to any of schemes 1 to 7 when the computer program is executed by the processor.

Solution 9, a computer-readable medium storing a computer-executable program which, when executed, implements the subject book identification method according to any one of solutions 1 to 7.

Scheme 10, a subject book recognition system, includes: the input processing module is used for acquiring text information of the image of the subject book; the retrieval determining module is used for retrieving and determining character distinguishing points according to the text information; and the output processing module is used for processing the retrieval result based on the character distinguishing point so as to obtain the identification result corresponding to the image.

In the scheme 11, the subject book recognition system according to the scheme 10 includes an input processing module that specifically executes: obtaining an input image of a subject book to be identified; performing OCR recognition on the image of the question book to obtain the text information of the image of the question book; wherein the image comprises a non-complete image or a complete image.

In the scheme 12, according to the question book recognition system in the scheme 11, the input processing module performs OCR recognition on the image of the question book, and specifically includes: locating each line of text and identifying the content of said each line of text based on a convolutional neural network; connecting each line of recognized text content in series according to the text line typesetting sequence to obtain the result of the text information recognized by the OCR; wherein the content of the text information comprises at least one or more of the following: text, characters, graphics, background.

Scenario 13, the subject matter book recognition system according to any one of scenarios 10 to 12,

the input processing module also executes the following steps before acquiring the text information of the subject book image: determining the rough position of the theme in the image of the theme book in advance so as to eliminate the interference information which is not the theme in the image of the theme book;

and/or the presence of a gas in the gas,

the retrieval determining module specifically executes: analyzing the text information to obtain a keyword; searching according to the keywords to obtain text information of the resource subject book with the same keywords; analyzing common characters between the text information and the text information of each resource topic book and extracting unique characteristics corresponding to the common characters to determine one or more character distinguishing point pairs; selecting one or more character distinguishing point pairs to determine the character distinguishing point in one or more most representative character distinguishing point pairs as the character distinguishing point for processing the retrieval result; the unique features include one or more of the following: text content, image pixels around text, text content and/or image information around text.

In claim 14, in the system for recognizing a subject book according to claim 13, the retrieving determining module performs selecting one or more of the word segmentation point pairs, and at least includes: and selecting the character according to the maximum pixel change gradient around the character of the character distinguishing point pair and/or the special position of the character.

The topic book recognition system according to any one of the schemes 10 to 14, wherein the output processing module specifically executes: sorting the retrieval results at least based on the feature information of the longest common substring and/or the character distinguishing points to determine a final retrieval result candidate set; and outputting the final retrieval result candidate set to a user as the identification result of the question book.

In claim 16, the output processing module further performs, according to the subject recognition system in claim 15: before the retrieval result is processed, whether the retrieval result is correct is determined at least through integral text information matching and based on character distinguishing point matching probability and/or image pixel matching probability indexes; or, before the final retrieval result candidate set is output, whether the result in the retrieval result candidate set is correct is determined at least through overall text information matching and based on text distinguishing point matching probability and/or image pixel matching probability indexes.

Claims

1. A method for recognizing a subject book is characterized by comprising the following steps:

acquiring text information of a subject book image;

searching according to the text information, and determining character distinguishing points;

and processing the retrieval result based on the character distinguishing points to obtain an identification result corresponding to the image of the subject book.

2. The method for recognizing the subject book according to claim 1, wherein the obtaining of the text information of the subject book specifically includes:

obtaining a subject book image to be identified;

performing OCR recognition on the question book image to obtain the text information of the question book image;

wherein the image comprises a non-complete image or a complete image.

3. The method according to claim 2, wherein performing OCR recognition on the image of the question book specifically comprises: locating each line of text and identifying the content of said each line of text based on a convolutional neural network; each line of text content identified is connected in series according to the text line typesetting sequence to obtain the result of the text information identified by the OCR; wherein the content of the text information comprises at least one or more of: text, characters, graphics, background.

4. The title recognition method according to any one of claims 1 to 3,

before acquiring the text information of the subject book image, the method further comprises the following steps:

preliminarily determining the positions of the questions in the image of the question book in advance so as to eliminate interference information which is not the questions in the image of the question book;

and/or the presence of a gas in the gas,

the retrieving and determining the character distinguishing point according to the text information specifically comprises:

analyzing the text information to obtain a keyword;

searching according to the keywords to obtain text information of the resource subject book with the same keywords;

analyzing common characters between the text information and the text information of each resource topic book and extracting unique characteristics corresponding to the common characters to determine one or more character distinguishing point pairs;

selecting one or more character distinguishing point pairs so as to determine character distinguishing points in one or more character distinguishing point pairs which are most representative as character distinguishing points for processing the retrieval result;

the unique features include one or more of the following: text content, image pixels around text, text content and/or image information around text.

5. The method of claim 4, wherein selecting one or more of the text-distinguishing point pairs comprises at least:

and selecting the character according to the maximum pixel change gradient around the character of the character distinguishing point pair and/or the special position of the character.

6. The method according to any one of claims 1 to 5, wherein processing the retrieved result based on the text-distinguishing point to obtain an identification result corresponding to the subject book, specifically comprises:

sorting the retrieval results at least based on the feature information of the longest common substring and/or the character distinguishing points to determine a final retrieval result candidate set;

and outputting the final retrieval result candidate set to a user as the identification result of the subject book.

7. The question book identification method according to claim 6, further comprising:

before the retrieval result is processed, whether the retrieval result is correct is determined at least through integral text information matching and based on text distinguishing point matching probability and/or image pixel matching probability indexes;

or,

and determining whether the result in the search result candidate set is correct at least through overall text information matching and based on text distinguishing point matching probability and/or image pixel matching probability indexes before outputting the final search result candidate set.

8. An electronic device comprising a processor and a memory, the memory for storing a computer executable program, characterized in that: the computer program, when executed by the processor, performs the method of claim 1 to 7.

9. A computer-readable medium storing a computer-executable program, wherein the computer-executable program, when executed, implements the method of claim 1 to 7.

10. A subject matter identification system, comprising:

the input processing module is used for acquiring text information of the theme image;

the retrieval determining module is used for retrieving and determining character distinguishing points according to the text information;

and the output processing module is used for processing the retrieval result based on the character distinguishing point so as to obtain the identification result corresponding to the image.