CN111666474A

CN111666474A - Method and terminal for searching questions in whole page

Info

Publication number: CN111666474A
Application number: CN201910179093.1A
Authority: CN
Inventors: 袁景伟; 匡柘溪; 郭德强; 宋旸; 王岩; 田宝亮; 黄宇飞; 胡亚龙
Original assignee: Xiaochuanchuhai Education Technology Beijing Co ltd
Current assignee: Beijing Baige Feichi Technology Co ltd
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2020-09-15
Anticipated expiration: 2039-03-08
Also published as: CN111666474B

Abstract

The embodiment of the invention relates to a method for searching questions in whole pages, which comprises the following steps: receiving a whole page picture uploaded by a user; recognizing text information and position coordinate information of the whole page of picture; splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information; respectively obtaining the feature vector of the single question according to the character information of the single question; and searching in a question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.

Description

Method and terminal for searching questions in whole page

Technical Field

The invention relates to the field of education terminals, in particular to a method and a terminal for searching questions in whole pages.

Background

China is a big education country and a country with insufficient education resources and unbalance, and how to allow students in remote areas to enjoy high-quality education and efficient guidance is highly valued by society and the country. With the rapid development of global information education, the online education industry has explosive development in China. High-quality education resources in the large city are transmitted to the three-line or four-line city and remote rural areas through the internet technology. The shooting question searching is an internet product for solving the problem of student learning tutoring after class, and by identifying the pictures uploaded by the user, the analysis and the answer of the corresponding question can be quickly returned, so that the learning can be completed after class, and the learning efficiency can be improved.

The shooting and question searching are used as a mainstream teaching aid for learning after class, and the teaching aid helps thousands of students in middle and primary schools, but the use scene that only one question can be shot each time seriously influences the experience of partial users.

Disclosure of Invention

The embodiment of the invention provides a method and a terminal for searching questions in a whole page, which can return answers of multiple questions on a picture at the same time.

In a first aspect, an embodiment of the present invention provides a method for searching a question in a whole page, where the method includes: receiving a whole page picture uploaded by a user; recognizing text information and position coordinate information of the whole page of picture; splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information; respectively obtaining the feature vector of the single question according to the character information of the single question; and searching in a question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.

Optionally, the identifying the text information and the position coordinate information of the whole page of picture specifically includes: analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.

Optionally, the splitting the multiple titles in the whole page of picture into a single title specifically includes: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.

Optionally, the obtaining the feature vector of the single topic according to the text information of the single topic respectively specifically includes: inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model; the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.

Optionally, the searching and returning one or more topics closest to the single topic and the parsing specifically include: searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.

In a second aspect, an embodiment of the present invention provides a terminal for searching a topic in a whole page, where the terminal includes: the receiving unit is used for receiving the whole page of pictures uploaded by the user; the identification unit is used for identifying the text information and the position coordinate information of the whole page of picture; the splitting unit is used for splitting the multiple titles in the whole page of picture into single titles according to the text information and the position coordinate information; the characteristic unit is used for respectively obtaining the characteristic vector of the single question according to the character information of the single question; and the searching unit is used for searching in the question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.

Optionally, the identification unit is specifically configured to: analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.

Optionally, the splitting unit is specifically configured to: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.

Optionally, the feature unit is specifically configured to: inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model; the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.

Optionally, the search unit is specifically configured to: searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.

The method and the terminal for searching the questions in the whole page provided by the embodiment of the invention identify the text information and the position coordinate information of the picture in the whole page; splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information; respectively obtaining the feature vector of the single question according to the character information of the single question; and searching in the question bank according to the feature vector of the single question, searching and returning one or more questions and analysis which are closest to the single question, and returning the answers of multiple questions on the picture at the same time.

Drawings

Fig. 1 is a flowchart of a method for searching a whole page according to an embodiment of the present invention;

FIG. 2 is a detailed flowchart of a method for searching a whole page according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a terminal for searching a whole page according to an embodiment of the present invention;

FIG. 4 is a returned title and parsing sample diagram according to an embodiment of the present invention;

fig. 5 is a diagram of human-computer interaction according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the objects, technical solutions and advantages of the present invention clearer, the following detailed description of specific embodiments of the present invention is provided with reference to the accompanying drawings, and the embodiments are not limited to the embodiments of the present invention.

The shooting search question is a learning tool for searching question answers through mobile phone shooting, and the tool identifies pictures uploaded by a user and quickly returns analysis and answers of corresponding questions. The whole page shooting and searching is a technology for shooting and searching the whole page, and compared with the shooting and searching, the method can return answers of multiple questions on the picture at the same time.

The whole page shooting and searching technology can well obtain answers of multiple questions on the picture through shooting the whole page. The existing shooting and question searching method carries out OCR recognition on pictures uploaded by users, corresponding questions are searched according to recognized text information, but many questions of pupils in lower grades are graphic questions, namely only a few or no characters are recognized, and at the moment, the existing shooting and question searching technology is difficult to find out correct answers. And the whole page shooting search can utilize the context information in the picture uploaded by the user to search, so that the problem that the title cannot be found due to too few characters is solved.

Fig. 1 is a flowchart of a method for searching questions in a whole page according to an embodiment of the present invention, and fig. 2 is a detailed flowchart of the method for searching questions in a whole page according to the embodiment of the present invention, where the method includes the following steps:

step 101, receiving a whole page picture uploaded by a user.

The whole page picture can be a whole page picture of a study after a textbook class, a whole page picture of a test paper and a whole page picture of an exercise book.

And 102, identifying the text information and the position coordinate information of the whole page of picture.

The text information of the whole page of picture is recognized by an OCR (optical character Recognition) technology, which is a technology for converting the Chinese characters in the paper document into the picture file and converting the characters in the picture into the text format in an optical manner for the printed characters.

Analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.

And 103, splitting the multiple titles in the whole page of picture into single titles according to the text information and the position coordinate information.

Optionally, splitting the multiple titles in the whole page of picture into a single title specifically includes: and dividing a plurality of topics into regions according to the position coordinate information of the text, the serial numbers of the topics and the context relationship.

Specifically, the multi-column splitting is to normalize the multi-column format into a single column format according to the title layout information, so that the multi-title splitting is performed under the same column. And splitting the text according to the position coordinate information of the text, wherein the relative positions of the texts identified by the same topic are adjacent. Splitting according to the serial number of the question in the text, wherein each question generally has a question number, and finding the boundary of adjacent questions according to the question number information. And splitting according to the context information, wherein the general topics are different in description theme, and dividing according to the context information.

For example, a mathematical application topic includes 3 subproblems, which are divided into 4 single topics, and the 4 single topics need to be merged when being returned to a user, and the method is mainly implemented according to whether the retrieval results of the single topics are consistent or not. Because some topics are not split properly, the topics are split into a plurality of topics, and merging and de-duplication are required according to the retrieval result.

Finding a more accurate solution technique through context information: many subjects of students in the lower grades of primary schools are graphic subjects, namely, only a small amount of characters or no characters are identified, and at the moment, the existing shooting and question searching technology is difficult to recall correct answers. And the whole page shooting search can utilize the context information in the picture uploaded by the user to search, so that the problem that the answer cannot be found due to too few characters is solved.

And 104, respectively obtaining the feature vector of the single question according to the character information of the single question.

Inputting the text information of a single topic into a pre-trained topic vectorization model to obtain a feature vector of the single topic, wherein the topic vectorization model is a model based on a neural network.

For example, the text message for a single question is "4. Xiaoming went 100 meters to just half way through, how many meters from school for his home? (6 min) ", inputting the text into a pre-trained topic vectorization model, sent2vec model, to obtain a feature vector of the topic, where the feature vector can be represented as [ x0, x1, x2 …. xn ].

The topic vectorization model may be a neural network-based model, such as a CNN model, and may be obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.

And 105, searching in a question bank according to the feature vector of the single question, and searching and returning one or more questions and analyses which are closest to the single question.

The feature vector matched with the feature vector of the single question can be searched in the question bank in a vector approximate search mode, specifically, the feature vector closest to the feature vector of the single question is searched in the question bank.

The picture uploaded by the user may contain some regions which are not topics, such as the title "unit 3, class 1", the topic description "blank filling questions", and even irrelevant celebrity phrases, and the like, and need to be filtered when the answer result is returned to the user, because these are not the data desired by the user. The adjacent subject areas contain overlapping parts due to the fact that pictures uploaded by part of users are too inclined, inverted and the like, so that the use experience of the users is influenced, and the use effect is improved by using the duplicate removal technology.

Optionally, the returned one or more topics and the resolution closest to the one or more topics establish a mapping relationship with the single topic.

When a user uploads a whole page (textbook, test paper and exercise book) picture, the answer of the whole page of the subject is expected to be obtained, the retrieval system carries out high-speed sampling OCR recognition on the picture uploaded by the user, recalls data similar to the text of the picture according to the recognized characters, then maps the recall result with the picture uploaded by the user, returns the best mapped data and the answer information of all corresponding subjects, and the user can check the answer of the subject one by one according to the subject number.

Optionally, the high-speed sampling OCR recognition technology may uniformly divide the whole page of the picture into 100 regions (5 × 20), take the 1 st, 3 rd and 5 th regions of the odd-numbered lines for recognition, and relatively reduce the recognition time by 70%, and along with the reduction of the recognition regions, the recognized text may also be reduced, further reducing the complexity of the retrieval calculation.

The quick return data can be divided according to the position of the question and retrieve the corresponding answer for each question in an off-line whole page resource processing mode, so that the on-line whole page retrieval is realized and the corresponding whole page answer is directly returned.

When the user uploads the whole page picture, the picture uploaded by the user is subjected to high-speed sampling OCR recognition, data similar to the text of the picture is recalled according to the recognized characters, then the recall result is mapped with the picture uploaded by the user, the data with the best mapping and the corresponding answer information of all questions are returned, and the user can check the answers of the questions one by one according to the question numbers. Questions with few characters such as graphic questions and calculation questions can be recalled and analyzed better.

Fig. 3 is a schematic diagram of a terminal for searching a question of a whole page according to an embodiment of the present invention, where the terminal includes: a receiving unit 301, configured to receive a whole page of pictures uploaded by a user; an identifying unit 302, configured to identify text information and position coordinate information of the whole page of picture; a splitting unit 303, configured to split the multiple titles in the whole page of picture into a single title according to the text information and the position coordinate information; a feature unit 304, configured to obtain a feature vector of the single topic according to the text information of the single topic, respectively; the searching unit 305 is configured to search in the question bank according to the feature vector of the single question, and search and return one or more questions and resolutions closest to the single question, such as the returned question and resolution schematic diagram shown in fig. 4.

Fig. 5 is a human-computer interaction diagram provided in an embodiment of the present invention, in which a user takes a picture of a question to be answered and uploads the picture, a whole-page shooting and searching system processes the picture uploaded by the user, returns answers to all questions in the picture, and draws the position of each question, and the user clicks to view a detailed answer of the corresponding question, where the answer includes but is not limited to multiple parts of analysis, an answer process, an answer, a knowledge point classification, and a video explanation.

Therefore, the embodiment of the invention provides a method and a terminal for searching questions in a whole page, which can return answers of multiple questions on a picture at the same time.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing detailed description is provided to illustrate, explain and enable the best mode of the invention, and it should be understood that the above description is only exemplary of the invention, and is not intended to limit the scope of the invention, which is defined by the following claims.

Claims

1. A method for searching questions in a whole page, the method comprising:

receiving a whole page picture uploaded by a user;

recognizing text information and position coordinate information of the whole page of picture;

splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information;

respectively obtaining the feature vector of the single question according to the character information of the single question;

and searching in a question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.

2. The method according to claim 1, wherein the recognizing the text information and the position coordinate information of the whole page of picture specifically comprises:

analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures;

scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures;

and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.

3. The method according to claim 1, wherein the splitting of the plurality of topics in the whole page of picture into a single topic specifically comprises: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.

4. The method according to claim 1, wherein the obtaining the feature vector of the single topic according to the text information of the single topic respectively specifically includes:

inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model;

the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.

5. The method of claim 1, wherein the finding and returning one or more topics closest to the single topic and parsing specifically comprises:

searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.

6. A terminal for searching questions in whole page, the terminal comprising:

the receiving unit is used for receiving the whole page of pictures uploaded by the user;

the identification unit is used for identifying the text information and the position coordinate information of the whole page of picture;

the splitting unit is used for splitting the multiple titles in the whole page of picture into single titles according to the text information and the position coordinate information;

the characteristic unit is used for respectively obtaining the characteristic vector of the single question according to the character information of the single question;

and the searching unit is used for searching in the question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.

7. The terminal according to claim 6, wherein the identification unit is specifically configured to:

8. The terminal according to claim 6, wherein the splitting unit is specifically configured to: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.

9. The terminal according to claim 6, wherein the feature unit is specifically configured to:

10. The terminal according to claim 6, wherein the search unit is specifically configured to: