CN111666474A - Method and terminal for searching questions in whole page - Google Patents

Method and terminal for searching questions in whole page Download PDF

Info

Publication number
CN111666474A
CN111666474A CN201910179093.1A CN201910179093A CN111666474A CN 111666474 A CN111666474 A CN 111666474A CN 201910179093 A CN201910179093 A CN 201910179093A CN 111666474 A CN111666474 A CN 111666474A
Authority
CN
China
Prior art keywords
question
searching
feature vector
topic
whole page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910179093.1A
Other languages
Chinese (zh)
Other versions
CN111666474B (en
Inventor
袁景伟
匡柘溪
郭德强
宋旸
王岩
田宝亮
黄宇飞
胡亚龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baige Feichi Technology Co ltd
Original Assignee
Xiaochuanchuhai Education Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaochuanchuhai Education Technology Beijing Co ltd filed Critical Xiaochuanchuhai Education Technology Beijing Co ltd
Priority to CN201910179093.1A priority Critical patent/CN111666474B/en
Publication of CN111666474A publication Critical patent/CN111666474A/en
Application granted granted Critical
Publication of CN111666474B publication Critical patent/CN111666474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention relates to a method for searching questions in whole pages, which comprises the following steps: receiving a whole page picture uploaded by a user; recognizing text information and position coordinate information of the whole page of picture; splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information; respectively obtaining the feature vector of the single question according to the character information of the single question; and searching in a question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.

Description

Method and terminal for searching questions in whole page
Technical Field
The invention relates to the field of education terminals, in particular to a method and a terminal for searching questions in whole pages.
Background
China is a big education country and a country with insufficient education resources and unbalance, and how to allow students in remote areas to enjoy high-quality education and efficient guidance is highly valued by society and the country. With the rapid development of global information education, the online education industry has explosive development in China. High-quality education resources in the large city are transmitted to the three-line or four-line city and remote rural areas through the internet technology. The shooting question searching is an internet product for solving the problem of student learning tutoring after class, and by identifying the pictures uploaded by the user, the analysis and the answer of the corresponding question can be quickly returned, so that the learning can be completed after class, and the learning efficiency can be improved.
The shooting and question searching are used as a mainstream teaching aid for learning after class, and the teaching aid helps thousands of students in middle and primary schools, but the use scene that only one question can be shot each time seriously influences the experience of partial users.
Disclosure of Invention
The embodiment of the invention provides a method and a terminal for searching questions in a whole page, which can return answers of multiple questions on a picture at the same time.
In a first aspect, an embodiment of the present invention provides a method for searching a question in a whole page, where the method includes: receiving a whole page picture uploaded by a user; recognizing text information and position coordinate information of the whole page of picture; splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information; respectively obtaining the feature vector of the single question according to the character information of the single question; and searching in a question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.
Optionally, the identifying the text information and the position coordinate information of the whole page of picture specifically includes: analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.
Optionally, the splitting the multiple titles in the whole page of picture into a single title specifically includes: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.
Optionally, the obtaining the feature vector of the single topic according to the text information of the single topic respectively specifically includes: inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model; the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.
Optionally, the searching and returning one or more topics closest to the single topic and the parsing specifically include: searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.
In a second aspect, an embodiment of the present invention provides a terminal for searching a topic in a whole page, where the terminal includes: the receiving unit is used for receiving the whole page of pictures uploaded by the user; the identification unit is used for identifying the text information and the position coordinate information of the whole page of picture; the splitting unit is used for splitting the multiple titles in the whole page of picture into single titles according to the text information and the position coordinate information; the characteristic unit is used for respectively obtaining the characteristic vector of the single question according to the character information of the single question; and the searching unit is used for searching in the question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.
Optionally, the identification unit is specifically configured to: analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.
Optionally, the splitting unit is specifically configured to: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.
Optionally, the feature unit is specifically configured to: inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model; the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.
Optionally, the search unit is specifically configured to: searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.
The method and the terminal for searching the questions in the whole page provided by the embodiment of the invention identify the text information and the position coordinate information of the picture in the whole page; splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information; respectively obtaining the feature vector of the single question according to the character information of the single question; and searching in the question bank according to the feature vector of the single question, searching and returning one or more questions and analysis which are closest to the single question, and returning the answers of multiple questions on the picture at the same time.
Drawings
Fig. 1 is a flowchart of a method for searching a whole page according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart of a method for searching a whole page according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a terminal for searching a whole page according to an embodiment of the present invention;
FIG. 4 is a returned title and parsing sample diagram according to an embodiment of the present invention;
fig. 5 is a diagram of human-computer interaction according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the objects, technical solutions and advantages of the present invention clearer, the following detailed description of specific embodiments of the present invention is provided with reference to the accompanying drawings, and the embodiments are not limited to the embodiments of the present invention.
The shooting search question is a learning tool for searching question answers through mobile phone shooting, and the tool identifies pictures uploaded by a user and quickly returns analysis and answers of corresponding questions. The whole page shooting and searching is a technology for shooting and searching the whole page, and compared with the shooting and searching, the method can return answers of multiple questions on the picture at the same time.
The whole page shooting and searching technology can well obtain answers of multiple questions on the picture through shooting the whole page. The existing shooting and question searching method carries out OCR recognition on pictures uploaded by users, corresponding questions are searched according to recognized text information, but many questions of pupils in lower grades are graphic questions, namely only a few or no characters are recognized, and at the moment, the existing shooting and question searching technology is difficult to find out correct answers. And the whole page shooting search can utilize the context information in the picture uploaded by the user to search, so that the problem that the title cannot be found due to too few characters is solved.
Fig. 1 is a flowchart of a method for searching questions in a whole page according to an embodiment of the present invention, and fig. 2 is a detailed flowchart of the method for searching questions in a whole page according to the embodiment of the present invention, where the method includes the following steps:
step 101, receiving a whole page picture uploaded by a user.
The whole page picture can be a whole page picture of a study after a textbook class, a whole page picture of a test paper and a whole page picture of an exercise book.
And 102, identifying the text information and the position coordinate information of the whole page of picture.
The text information of the whole page of picture is recognized by an OCR (optical character Recognition) technology, which is a technology for converting the Chinese characters in the paper document into the picture file and converting the characters in the picture into the text format in an optical manner for the printed characters.
Analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.
And 103, splitting the multiple titles in the whole page of picture into single titles according to the text information and the position coordinate information.
Optionally, splitting the multiple titles in the whole page of picture into a single title specifically includes: and dividing a plurality of topics into regions according to the position coordinate information of the text, the serial numbers of the topics and the context relationship.
Specifically, the multi-column splitting is to normalize the multi-column format into a single column format according to the title layout information, so that the multi-title splitting is performed under the same column. And splitting the text according to the position coordinate information of the text, wherein the relative positions of the texts identified by the same topic are adjacent. Splitting according to the serial number of the question in the text, wherein each question generally has a question number, and finding the boundary of adjacent questions according to the question number information. And splitting according to the context information, wherein the general topics are different in description theme, and dividing according to the context information.
For example, a mathematical application topic includes 3 subproblems, which are divided into 4 single topics, and the 4 single topics need to be merged when being returned to a user, and the method is mainly implemented according to whether the retrieval results of the single topics are consistent or not. Because some topics are not split properly, the topics are split into a plurality of topics, and merging and de-duplication are required according to the retrieval result.
Finding a more accurate solution technique through context information: many subjects of students in the lower grades of primary schools are graphic subjects, namely, only a small amount of characters or no characters are identified, and at the moment, the existing shooting and question searching technology is difficult to recall correct answers. And the whole page shooting search can utilize the context information in the picture uploaded by the user to search, so that the problem that the answer cannot be found due to too few characters is solved.
And 104, respectively obtaining the feature vector of the single question according to the character information of the single question.
Inputting the text information of a single topic into a pre-trained topic vectorization model to obtain a feature vector of the single topic, wherein the topic vectorization model is a model based on a neural network.
For example, the text message for a single question is "4. Xiaoming went 100 meters to just half way through, how many meters from school for his home? (6 min) ", inputting the text into a pre-trained topic vectorization model, sent2vec model, to obtain a feature vector of the topic, where the feature vector can be represented as [ x0, x1, x2 …. xn ].
The topic vectorization model may be a neural network-based model, such as a CNN model, and may be obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.
And 105, searching in a question bank according to the feature vector of the single question, and searching and returning one or more questions and analyses which are closest to the single question.
The feature vector matched with the feature vector of the single question can be searched in the question bank in a vector approximate search mode, specifically, the feature vector closest to the feature vector of the single question is searched in the question bank.
The picture uploaded by the user may contain some regions which are not topics, such as the title "unit 3, class 1", the topic description "blank filling questions", and even irrelevant celebrity phrases, and the like, and need to be filtered when the answer result is returned to the user, because these are not the data desired by the user. The adjacent subject areas contain overlapping parts due to the fact that pictures uploaded by part of users are too inclined, inverted and the like, so that the use experience of the users is influenced, and the use effect is improved by using the duplicate removal technology.
Optionally, the returned one or more topics and the resolution closest to the one or more topics establish a mapping relationship with the single topic.
When a user uploads a whole page (textbook, test paper and exercise book) picture, the answer of the whole page of the subject is expected to be obtained, the retrieval system carries out high-speed sampling OCR recognition on the picture uploaded by the user, recalls data similar to the text of the picture according to the recognized characters, then maps the recall result with the picture uploaded by the user, returns the best mapped data and the answer information of all corresponding subjects, and the user can check the answer of the subject one by one according to the subject number.
Optionally, the high-speed sampling OCR recognition technology may uniformly divide the whole page of the picture into 100 regions (5 × 20), take the 1 st, 3 rd and 5 th regions of the odd-numbered lines for recognition, and relatively reduce the recognition time by 70%, and along with the reduction of the recognition regions, the recognized text may also be reduced, further reducing the complexity of the retrieval calculation.
The quick return data can be divided according to the position of the question and retrieve the corresponding answer for each question in an off-line whole page resource processing mode, so that the on-line whole page retrieval is realized and the corresponding whole page answer is directly returned.
When the user uploads the whole page picture, the picture uploaded by the user is subjected to high-speed sampling OCR recognition, data similar to the text of the picture is recalled according to the recognized characters, then the recall result is mapped with the picture uploaded by the user, the data with the best mapping and the corresponding answer information of all questions are returned, and the user can check the answers of the questions one by one according to the question numbers. Questions with few characters such as graphic questions and calculation questions can be recalled and analyzed better.
Fig. 3 is a schematic diagram of a terminal for searching a question of a whole page according to an embodiment of the present invention, where the terminal includes: a receiving unit 301, configured to receive a whole page of pictures uploaded by a user; an identifying unit 302, configured to identify text information and position coordinate information of the whole page of picture; a splitting unit 303, configured to split the multiple titles in the whole page of picture into a single title according to the text information and the position coordinate information; a feature unit 304, configured to obtain a feature vector of the single topic according to the text information of the single topic, respectively; the searching unit 305 is configured to search in the question bank according to the feature vector of the single question, and search and return one or more questions and resolutions closest to the single question, such as the returned question and resolution schematic diagram shown in fig. 4.
Optionally, the identification unit is specifically configured to: analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures; scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures; and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.
Optionally, the splitting unit is specifically configured to: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.
Optionally, the feature unit is specifically configured to: inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model; the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.
Optionally, the search unit is specifically configured to: searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.
Fig. 5 is a human-computer interaction diagram provided in an embodiment of the present invention, in which a user takes a picture of a question to be answered and uploads the picture, a whole-page shooting and searching system processes the picture uploaded by the user, returns answers to all questions in the picture, and draws the position of each question, and the user clicks to view a detailed answer of the corresponding question, where the answer includes but is not limited to multiple parts of analysis, an answer process, an answer, a knowledge point classification, and a video explanation.
Therefore, the embodiment of the invention provides a method and a terminal for searching questions in a whole page, which can return answers of multiple questions on a picture at the same time.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing detailed description is provided to illustrate, explain and enable the best mode of the invention, and it should be understood that the above description is only exemplary of the invention, and is not intended to limit the scope of the invention, which is defined by the following claims.

Claims (10)

1. A method for searching questions in a whole page, the method comprising:
receiving a whole page picture uploaded by a user;
recognizing text information and position coordinate information of the whole page of picture;
splitting a plurality of questions in the whole page of picture into single questions according to the text information and the position coordinate information;
respectively obtaining the feature vector of the single question according to the character information of the single question;
and searching in a question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.
2. The method according to claim 1, wherein the recognizing the text information and the position coordinate information of the whole page of picture specifically comprises:
analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures;
scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures;
and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.
3. The method according to claim 1, wherein the splitting of the plurality of topics in the whole page of picture into a single topic specifically comprises: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.
4. The method according to claim 1, wherein the obtaining the feature vector of the single topic according to the text information of the single topic respectively specifically includes:
inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model;
the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.
5. The method of claim 1, wherein the finding and returning one or more topics closest to the single topic and parsing specifically comprises:
searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.
6. A terminal for searching questions in whole page, the terminal comprising:
the receiving unit is used for receiving the whole page of pictures uploaded by the user;
the identification unit is used for identifying the text information and the position coordinate information of the whole page of picture;
the splitting unit is used for splitting the multiple titles in the whole page of picture into single titles according to the text information and the position coordinate information;
the characteristic unit is used for respectively obtaining the characteristic vector of the single question according to the character information of the single question;
and the searching unit is used for searching in the question bank according to the feature vector of the single question, searching and returning one or more questions closest to the single question and analyzing.
7. The terminal according to claim 6, wherein the identification unit is specifically configured to:
analyzing the number of the alternation of black and white pixels in each line of pictures to measure the coordinates of a starting line and an ending line of the characters in the pictures and cutting out each line of character pictures;
scanning the line of character pictures longitudinally row by row again, recording the number of alternate changes of black and white pixels on each row, and measuring the initial ordinate and the end ordinate of each character in the pictures;
and obtaining the position coordinate information of the text information according to the starting line coordinate and the ending line coordinate and the starting ordinate and the ending ordinate.
8. The terminal according to claim 6, wherein the splitting unit is specifically configured to: and by the multi-column splitting of the text information, the position coordinate information of the text, the serial number of the titles and the context relation divide the plurality of titles into regions.
9. The terminal according to claim 6, wherein the feature unit is specifically configured to:
inputting the text information of the single question into a pre-trained question vectorization model to obtain a feature vector of the single question, wherein the question vectorization model is a neural network-based model;
the topic vectorization model is obtained by training through the following steps: labeling each topic sample in the first topic sample training set to label text information of the topic in each topic sample; and performing two-dimensional feature vector extraction on the text content of the question in each question sample by using a neural network model, thereby training to obtain the question vectorization model.
10. The terminal according to claim 6, wherein the search unit is specifically configured to:
searching a feature vector matched with the feature vector of the single question in a question bank in a vector approximate searching mode; and searching the feature vector closest to the feature vector of the single question in the question bank.
CN201910179093.1A 2019-03-08 2019-03-08 Whole page question searching method and terminal Active CN111666474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910179093.1A CN111666474B (en) 2019-03-08 2019-03-08 Whole page question searching method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910179093.1A CN111666474B (en) 2019-03-08 2019-03-08 Whole page question searching method and terminal

Publications (2)

Publication Number Publication Date
CN111666474A true CN111666474A (en) 2020-09-15
CN111666474B CN111666474B (en) 2023-08-25

Family

ID=72382442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910179093.1A Active CN111666474B (en) 2019-03-08 2019-03-08 Whole page question searching method and terminal

Country Status (1)

Country Link
CN (1) CN111666474B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766269A (en) * 2021-03-04 2021-05-07 深圳康佳电子科技有限公司 Picture text retrieval method, intelligent terminal and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308486A (en) * 2008-03-21 2008-11-19 北京印刷学院 Test question automatic generation system and method
US20110196922A1 (en) * 2010-02-08 2011-08-11 At&T Intellectual Property I, L.P. Providing an answer to a question from a social network site using a separate messaging site
CN104715253A (en) * 2015-04-02 2015-06-17 北京贞观雨科技有限公司 Method and server for obtaining test question analysis information
CN106294717A (en) * 2016-08-08 2017-01-04 广东小天才科技有限公司 Based on intelligent terminal search topic method and device
CN106781784A (en) * 2017-01-04 2017-05-31 王骁乾 A kind of intelligence correction system
CN107798321A (en) * 2017-12-04 2018-03-13 海南云江科技有限公司 A kind of examination paper analysis method and computing device
CN108153915A (en) * 2018-01-29 2018-06-12 赵宇航 A kind of educational information fast acquiring method based on internet
CN108319703A (en) * 2018-02-05 2018-07-24 赵宇航 A kind of educational information quick obtaining device Internet-based
CN109271401A (en) * 2018-09-26 2019-01-25 杭州大拿科技股份有限公司 Method, apparatus, electronic equipment and storage medium are corrected in a kind of search of topic
CN109344914A (en) * 2018-10-31 2019-02-15 焦点科技股份有限公司 A kind of method and system of the Text region of random length end to end

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308486A (en) * 2008-03-21 2008-11-19 北京印刷学院 Test question automatic generation system and method
US20110196922A1 (en) * 2010-02-08 2011-08-11 At&T Intellectual Property I, L.P. Providing an answer to a question from a social network site using a separate messaging site
CN104715253A (en) * 2015-04-02 2015-06-17 北京贞观雨科技有限公司 Method and server for obtaining test question analysis information
CN106294717A (en) * 2016-08-08 2017-01-04 广东小天才科技有限公司 Based on intelligent terminal search topic method and device
CN106781784A (en) * 2017-01-04 2017-05-31 王骁乾 A kind of intelligence correction system
CN107798321A (en) * 2017-12-04 2018-03-13 海南云江科技有限公司 A kind of examination paper analysis method and computing device
CN108153915A (en) * 2018-01-29 2018-06-12 赵宇航 A kind of educational information fast acquiring method based on internet
CN108319703A (en) * 2018-02-05 2018-07-24 赵宇航 A kind of educational information quick obtaining device Internet-based
CN109271401A (en) * 2018-09-26 2019-01-25 杭州大拿科技股份有限公司 Method, apparatus, electronic equipment and storage medium are corrected in a kind of search of topic
CN109344914A (en) * 2018-10-31 2019-02-15 焦点科技股份有限公司 A kind of method and system of the Text region of random length end to end

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766269A (en) * 2021-03-04 2021-05-07 深圳康佳电子科技有限公司 Picture text retrieval method, intelligent terminal and storage medium
CN112766269B (en) * 2021-03-04 2024-03-12 深圳康佳电子科技有限公司 Picture text retrieval method, intelligent terminal and storage medium

Also Published As

Publication number Publication date
CN111666474B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
US11508251B2 (en) Method and system for intelligent identification and correction of questions
CN110348400B (en) Score obtaining method and device and electronic equipment
CN109271401B (en) Topic searching and correcting method and device, electronic equipment and storage medium
US11410407B2 (en) Method and device for generating collection of incorrectly-answered questions
CN109817046B (en) Learning auxiliary method based on family education equipment and family education equipment
CN109634961B (en) Test paper sample generation method and device, electronic equipment and storage medium
WO2023273583A1 (en) Exam-marking method and apparatus, electronic device, and storage medium
CN106846961A (en) The treating method and apparatus of electronic test paper
CN111460185A (en) Book searching method, device and system
CN111581367A (en) Method and system for inputting questions
CN113177435A (en) Test paper analysis method and device, storage medium and electronic equipment
CN111126486A (en) Test statistical method, device, equipment and storage medium
CN111610901A (en) AI vision-based English lesson auxiliary teaching method and system
CN112528799B (en) Teaching live broadcast method and device, computer equipment and storage medium
CN111666474B (en) Whole page question searching method and terminal
CN112396897A (en) Teaching system
CN117252259A (en) Deep learning-based natural language understanding method and AI teaching aid system
CN115759293A (en) Model training method, image retrieval device and electronic equipment
CN115393865A (en) Character retrieval method, character retrieval equipment and computer-readable storage medium
CN113569112A (en) Tutoring strategy providing method, system, device and medium based on question
Nguyen et al. Handwriting recognition and automatic scoring for descriptive answers in Japanese language tests
CN113486650A (en) Sentence scanning method and device and storage medium
CN112164262A (en) Intelligent paper reading tutoring system
CN210348859U (en) Examination paper modifying all-in-one machine
CN112307158A (en) Information retrieval method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230612

Address after: 6001, 6th Floor, No.1 Kaifeng Road, Shangdi Information Industry Base, Haidian District, Beijing, 100085

Applicant after: Beijing Baige Feichi Technology Co.,Ltd.

Address before: 100085 4th floor, Huiyuan development building, 1 Kaifa Road, Haidian District, Beijing

Applicant before: XIAOCHUANCHUHAI EDUCATION TECHNOLOGY (BEIJING) CO.,LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant