CN110598217A - Identification method and device of point-to-read content, family education machine and storage medium - Google Patents
Identification method and device of point-to-read content, family education machine and storage medium Download PDFInfo
- Publication number
- CN110598217A CN110598217A CN201910887010.4A CN201910887010A CN110598217A CN 110598217 A CN110598217 A CN 110598217A CN 201910887010 A CN201910887010 A CN 201910887010A CN 110598217 A CN110598217 A CN 110598217A
- Authority
- CN
- China
- Prior art keywords
- page image
- read
- click
- content
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000002093 peripheral effect Effects 0.000 claims abstract description 36
- 238000003058 natural language processing Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 16
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000012549 training Methods 0.000 description 14
- 230000000903 blocking effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000011218 segmentation Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000004888 barrier function Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/142—Image acquisition using hand-held instruments; Constructional details of the instruments
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Character Input (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention belongs to the field of home education machines, and discloses a method and a device for identifying read-on-demand content, a home education machine and a storage medium, wherein the method comprises the following steps: acquiring a point-reading page image; identifying a shielded area in the click-to-read page image; when the ratio of the shielding area to the reading page image is larger than a preset ratio, acquiring the content of the peripheral area of the shielding area; completing the shielded area in the click-to-read page image according to the content of the peripheral area; and acquiring the click-to-read content pointed by the indicator in the completed click-to-read page image. According to the invention, when the shielded area exists in the click-to-read page image, the shielded area is completed through the natural language identification technology, so that the accuracy of click-to-read content identification can be improved, and the problems of low identification accuracy or incapability of identification caused by shielding in the prior art are solved.
Description
Technical Field
The invention belongs to the technical field of home education machines, and particularly relates to a method and a device for identifying read-on-demand content, a home education machine and a storage medium.
Background
Children need read a large amount of books in the study growth process, in order to protect children's eyesight, general head of a family can let children read paper books. Children often encounter various difficulties in reading paper books, such as unknown characters, unrecognized words, and the like. When the children run into problems, the children need to be helped by parents, but the parents are busy in working, so that the children often cannot be helped to solve the problems in time, the reading interest of the children is reduced, and the children are not facilitated to learn. The advent of home education machines solved this problem well.
The family education machine is provided with a point-reading function, when the point-reading function of the family education machine is used for helping children read books, a page image which is read by a user needs to be obtained firstly, then the page image is recognized, and finally the content pointed by the indicator is recognized in the page image. In the actual use process, the situation that the book characters are blocked by fingers due to the fact that gestures of a user are not standard can occur, the content to be pointed by the user cannot be known, and the recognition accuracy is low.
Disclosure of Invention
The invention aims to provide a method and a device for identifying point-reading contents, a family education machine and a storage medium, which solve the problem of low point-reading identification accuracy rate caused by finger shielding.
The technical scheme provided by the invention is as follows:
in one aspect, a method for identifying click-to-read content is provided, including:
acquiring a point-reading page image;
identifying a shielded area in the click-to-read page image;
when the ratio of the shielding area to the reading page image is larger than a preset ratio, acquiring the content of the peripheral area of the shielding area;
completing the shielded area in the click-to-read page image according to the content of the peripheral area;
and acquiring the click-to-read content pointed by the indicator in the completed click-to-read page image.
Further preferably, the identifying the occlusion region in the click-to-read page image specifically includes:
according to the color difference between a shielding object in the point-reading page image and a point-reading page, carrying out binarization processing on the point-reading page image to obtain a binarization point-reading page image;
and identifying a shielding area in the point-reading page image according to the binary point-reading page image.
Further preferably, when the ratio of the occlusion region to the page image is greater than a preset ratio, the acquiring the content of the peripheral region of the occlusion region specifically includes:
when the ratio of the shielding area to the reading page image is larger than a preset ratio, deleting pixel points in the shielding area in the reading page image, and filling blank areas in each line of characters in the shielding area by adopting preset characters;
obtaining a sentence containing the preset character;
completing the occlusion area in the click-to-read page image according to the content of the peripheral area specifically comprises:
performing semantic analysis on the sentence through a natural language processing model to obtain a semantic analysis result of the sentence;
and completing the shielded area in the click-to-read page image according to the semantic parsing result of the sentence.
Further preferably, the acquiring of the click-to-read content pointed by the pointer in the completed click-to-read page image specifically includes:
searching a target storage page matched with the completed click-to-read page image in a database;
identifying and positioning an indicator in the click-to-read page image;
and acquiring the point-reading content corresponding to the indicator in the target storage page according to the indicator.
In another aspect, an apparatus for recognizing click-to-read content is provided, including:
the image acquisition module is used for acquiring a point-reading page image;
the identification module is used for identifying the shielded area in the click-to-read page image;
the content acquisition module is used for acquiring the content of the peripheral area of the shielding area when the ratio of the shielding area to the page image is larger than a preset ratio;
the completion module is used for completing the shielding area in the point-reading page image according to the content of the peripheral area;
the content obtaining module is further configured to obtain click-to-read content pointed by the pointer in the completed click-to-read page image.
Further preferably, the identification module comprises:
the image processing unit is used for carrying out binarization processing on the point-reading page image according to the color difference between the shielding object in the point-reading page image and the point-reading page to obtain a binarization point-reading page image;
and the identification unit is used for identifying the shielding area in the point-reading page image according to the binary point-reading page image.
Further preferably, the content obtaining module includes:
the filling unit is used for deleting pixel points in the shielding area in the reading page image when the ratio of the shielding area to the reading page image is larger than a preset ratio, and filling a blank area in each line of characters in the shielding area by adopting preset characters;
a sentence acquisition unit configured to acquire a sentence including the preset character;
the completion module includes:
the semantic analysis unit is used for carrying out semantic analysis on the sentence through a natural language processing model to obtain a semantic analysis result of the sentence;
and the completion unit is used for completing the shielded area in the click-to-read page image according to the semantic parsing result of the sentence.
Further preferably, the content obtaining module includes:
the searching unit is used for searching a target storage page matched with the supplemented point-reading page image in a database;
the identification and positioning unit is used for identifying and positioning the indicator in the page image;
and the content acquisition unit is used for acquiring the point reading content corresponding to the indicator in the target storage page according to the indicator.
In still another aspect, there is provided a family education machine, including a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the identification method of the read-and-click content described in any one of the above items when executing the computer program.
In still another aspect, a computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements the steps of the identification method for reading contents.
Compared with the prior art, the identification method and device of the point-to-read content, the family education machine and the storage medium provided by the invention have the following beneficial effects:
according to the invention, when the shielded area exists in the click-to-read page image, the shielded area is completed through the natural language identification technology, so that the accuracy of click-to-read content identification can be improved, and the problems of low identification accuracy or incapability of identification caused by shielding in the prior art are solved.
Drawings
The above features, technical features, advantages and implementations of a method, apparatus, home education machine and storage medium for recognizing contents of click-to-read are further described in the following detailed description of preferred embodiments with reference to the accompanying drawings.
FIG. 1 is a flow chart illustrating an embodiment of a method for identifying read-by-touch content according to the present invention;
FIG. 2 is a flow chart illustrating an identification method of read-by-click content according to another embodiment of the present invention;
FIG. 3 is a flow chart illustrating a method for identifying read-by-touch content according to another embodiment of the present invention;
FIG. 4 is a flow chart illustrating a method for identifying read-on-demand content according to yet another embodiment of the present invention;
FIG. 5 is a block diagram schematically illustrating the structure of an embodiment of an apparatus for recognizing read-by-touch contents according to the present invention;
FIG. 6 is a block diagram illustrating the structure of one embodiment of a family education machine of the present invention.
Description of the reference numerals
110. An image acquisition module; 120. an identification module; 121. an image processing unit; 122. an identification unit; 130. a content acquisition module; 131. a fill-in unit; 132. a sentence acquisition unit; 133. a search unit; 134. an identification and location unit; 135. a content acquisition unit; 140. a completion module; 141. a semantic parsing unit; 142. a completion unit; 200. a family education machine; 210. a memory; 220. a processor.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".
The present invention provides an embodiment of a method for identifying click-to-read content, as shown in fig. 1, the method for identifying click-to-read content includes:
s100, acquiring a point-reading page image;
specifically, when a user learns, the front camera of the family education machine can be opened, the reading mode is entered, when the user reads a book, and after a finger for reading is stable, the image of a page pointed by the finger on the book can be obtained by photographing through the camera, and the image of the page is the page image for reading.
S200, identifying a shielding area in the click-to-read page image;
specifically, when a user uses a finger to perform point reading on a book, due to the existence of the finger, a blocking area exists in a point reading page image acquired by photographing, and therefore, the blocking area in the point reading page image needs to be identified first.
When the shielding area is identified in the page image, the trained image identification model can be used for identification, namely, a training sample is obtained first, and then the constructed image identification model is trained by adopting the training sample to obtain the trained image identification model. The training sample at least comprises a page image which is obtained by shielding a page by using articles such as a finger, a pen and the like.
S300, when the ratio of the shielding area to the page image is larger than a preset ratio, acquiring the content of the peripheral area of the shielding area;
specifically, after the occlusion region is identified in the click-to-read page image, a ratio of the occlusion region to the click-to-read page image is calculated, that is, a percentage of the occlusion region in the click-to-read page image is calculated, when the ratio is smaller than a preset ratio, it is indicated that the occlusion region is only caused by a click-to-read finger or other indicator, the occlusion region is small and does not affect the identification of the click-to-read content, at this time, a matched target storage page can be directly searched in a database according to the click-to-read page image so as to know which page of the click-to-read page image corresponds to which page of the book, then the position of the click-to-read region pointed by the indicator in the click-to-read page image is identified, finally, the corresponding click-to-read content is obtained in the target storage page according to the position of the click-to-read region.
When the matched target storage page is searched in the database, the search can be carried out according to the characters in the page image. For example, the storage page with the text repetition rate greater than a preset threshold in the page image can be directly searched and read in the database. The preset threshold may be set according to a preset ratio, and when the preset ratio is increased, the preset threshold should be appropriately decreased, but in order to ensure the accuracy of the search, the preset threshold cannot be set too low.
If the ratio of the shielding area in the reading page image to the reading page image is greater than the preset ratio, it is indicated that other shielding objects exist in the reading page image besides the finger used for reading. For example, when a user clicks, a plurality of fingers or a fist held by the user are all placed on a page, so that a larger area of occlusion exists in a click-to-read page image obtained by photographing. When the occlusion area is large, if the page images are directly matched in the database according to the point reading, a plurality of storage pages can be matched, so that the matching is not accurate, and therefore the occlusion area needs to be completed first. And (4) completing the shielded area, wherein the content of the peripheral area of the shielded area needs to be acquired first. It should be noted that, if the blocked area is very large, the blocked area cannot be completed, and in this case, the user needs to be prompted and the click-to-read page image needs to be obtained again.
S400, completing the shielding area in the point reading page image according to the content of the peripheral area;
specifically, after the content of the peripheral region is acquired, the content of the peripheral region is processed through the natural language processing model according to the relevance between the front content and the rear content of the peripheral region, so as to complete the occlusion region.
The natural language processing model is obtained by training corpus samples in a corpus. The corpus acquisition mode comprises the following steps: electronizing all paper texts as corpora; or crawl the data on the web through a crawler. After the corpus is obtained, the corpus is preprocessed, for example, the corpus is subjected to data cleaning, word segmentation, part-of-speech tagging, feature word removal and the like. Data cleansing is the cleansing and deletion of unwanted noise data, e.g., advertisements, tags, annotations, etc., for crawled web page content. Common data cleansing methods are: manual de-duplication, alignment, deletion, labeling and the like, or regular expression matching, extraction according to parts of speech and named entities, script writing or code batch processing and the like.
Word segmentation is the segmentation of a sentence or paragraph into individual words or phrases. Part-of-speech tagging is the tagging of each word or word with part-of-speech tags, such as adjectives, verbs, nouns, etc. And marking shielding words in the preprocessed corpus to form a training sample, training the natural language processing model through the training sample, and completing the shielding area according to the content of the peripheral area by the trained natural language processing model.
S500, the click-to-read content in the completed click-to-read page image is obtained.
Specifically, after the shielded area in the click-to-read page image is completed, the click-to-read content of the user can be accurately acquired according to the completed click-to-read page image.
After the contents read by the user are obtained, corresponding contents are obtained in the database according to the contents read by the user and in combination with the voice information of the user, and the contents are played or displayed in a voice mode. For example, the content clicked and read by the user is a question, the voice information is "how to solve", and the answering process of the question is acquired in the database and displayed to the user by combining the content clicked and read and the voice information.
In the embodiment, when the shielding region exists in the click-to-read page image, the shielding region is completed through the natural language identification technology, so that the accuracy of click-to-read content identification can be improved, and the problem that the identification accuracy is not high or cannot be identified due to shielding in the prior art is solved.
The present invention provides another embodiment of a method for identifying read-on-demand content, as shown in fig. 2, the method for identifying read-on-demand content includes:
s100, acquiring a point-reading page image;
s210, according to the color difference between a shielding object in the point-reading page image and a point-reading page, carrying out binarization processing on the point-reading page image to obtain a binarization point-reading page image;
specifically, when the occlusion region is identified in the page image, the occlusion region can be identified according to the color difference between the occlusion object and the page. In the point-reading page image, a page can be used as a background, a barrier is used as a target, the point-reading page image is respectively set to be in two different levels by utilizing the difference between the target and the background in the point-reading page image, a proper threshold value is selected to determine whether a certain pixel in the image is the target or the background, and then binarization processing is carried out on the point-reading page image to obtain a binarization point-reading page image.
S220, identifying a shielding area in the point-reading page image according to the binary point-reading page image;
specifically, in the binary image, the whole image presents an obvious black-and-white effect, a general white area represents a background, a black area represents an object, and the background and the object can be conveniently distinguished according to the obvious black-and-white effect. Therefore, after the binary point-reading page image is obtained, the contour information of the shielding region can be obtained in the binary point-reading page image, and the region formed by the contour of the shielding region is the shielding region.
S300, when the ratio of the shielding area to the page image is larger than a preset ratio, acquiring the content of the peripheral area of the shielding area;
specifically, after the occlusion region is identified in the click-to-read page image, the ratio of the occlusion region to the click-to-read page image is calculated, that is, the percentage of the occlusion region in the click-to-read page image is calculated, when the ratio is smaller than a preset ratio, it indicates that the occlusion region is only caused by the click-to-read finger and other indicators, the occlusion region is small, the identification of the click-to-read content is not affected, and at this time, the click-to-read content of the user can be directly identified according to the click-to-read page image.
If the ratio of the shielding area in the reading page image to the reading page image is greater than the preset ratio, it is indicated that other shielding objects exist in the reading page image besides the finger used for reading. For example, when a user clicks, a plurality of fingers or a fist held by the user are all placed on a page, so that a larger area of occlusion exists in a click-to-read page image obtained by photographing. When the occlusion area is large, if the page image is directly matched in the database according to the point reading, the problem of inaccurate matching occurs, and therefore the occlusion area needs to be completed first. And (4) completing the shielded area, wherein the content of the peripheral area of the shielded area needs to be acquired first. It should be noted that, if the blocked area is very large, the blocked area cannot be completed, and in this case, the user needs to be prompted and the click-to-read page image needs to be obtained again.
S400, completing the shielding area in the point reading page image according to the content of the peripheral area;
specifically, after the content of the peripheral region is acquired, the content of the peripheral region is processed through the natural language processing model according to the relevance between the front content and the rear content of the peripheral region, so as to complete the occlusion region.
S500, the click-to-read content pointed by the indicator in the completed click-to-read page image is obtained.
The present invention provides another embodiment of a method for identifying read-on-demand content, as shown in fig. 3, the method for identifying read-on-demand content includes:
s100, acquiring a point-reading page image;
s200, identifying a shielding area in the click-to-read page image;
s310, when the ratio of the shielding area to the reading page image is larger than a preset ratio, deleting pixel points in the shielding area in the reading page image, and filling blank areas in each line of characters in the shielding area by adopting preset characters;
specifically, after the occlusion region is identified in the click-to-read page image, when it is determined that the occlusion region needs to be completely filled, all pixel points corresponding to the occlusion region are deleted in the click-to-read page image, then the blank region of each line of characters in the occlusion region is filled with the preset characters, and the blank region between the character lines does not need to be filled, so that each line can be distinguished conveniently, and sentences containing the preset characters can be obtained conveniently and subsequently. The preset characters may be underlines or wavy lines or various symbols, etc.
For example, the page includes 15 lines of characters, wherein the occlusion area occludes part of the characters in the fourth to eighth lines, and after the occlusion area is deleted in the page image, the characters occluded by the occlusion object in the fourth to eighth lines are filled with underlines.
S320, obtaining a sentence containing the preset character;
specifically, after filling a blank area corresponding to the blocking area in each row with preset characters in the page image to be read, each sentence including the preset characters is extracted from the page image to be read, the sentence can be divided according to punctuation marks, generally, the above one sentence number is used as a starting point, the next adjacent sentence number is used as an end point, and the character between the starting point and the end point is a sentence. And extracting all single sentences comprising the preset characters from the click-to-read page image according to the filled preset characters. Each extracted sentence at least comprises one preset character.
S410, performing semantic analysis on the sentence through a natural language processing model to obtain a semantic analysis result of the sentence;
specifically, after sentences including preset characters are extracted from the page image, each sentence is respectively input into a trained natural language processing model, the natural language processing model performs syntactic analysis on each sentence, the sentence structure and the phrases in the sentences are analyzed, the interrelation of words, phrases and the like in the sentences and the relation of the words and the phrases in the sentences are found, and the semantics of each sentence is deduced.
S420, completing the shielded area in the click-to-read page image according to the semantic parsing result of the sentence;
specifically, after the natural language processing model deduces the semantics of each sentence, the content of the occlusion region can be deduced according to the semantics of the sentences, and then the content of the occlusion region is completed.
S500, the click-to-read content pointed by the indicator in the completed click-to-read page image is obtained.
The present invention provides still another embodiment of a method for identifying read-on-demand content, as shown in fig. 4, the method for identifying read-on-demand content includes:
s100, acquiring a point-reading page image;
s200, identifying a shielding area in the click-to-read page image;
s300, when the ratio of the shielding area to the page image is larger than a preset ratio, acquiring the content of the peripheral area of the shielding area;
s400, completing the shielding area in the point reading page image according to the content of the peripheral area;
s510, searching a target storage page matched with the completed click-to-read page image in a database;
s520, identifying and positioning an indicator in the click-to-read page image;
s530, according to the indication body, point reading content corresponding to the indication body is obtained in the target storage page.
Specifically, after completing the content of the shielding area, searching a matched target storage page in the database according to the completed click-to-read page image, and during matching, searching and matching can be performed according to characters in the completed click-to-read page image. For example, the storage page with the character repetition rate greater than a preset threshold in the click-to-read page image after completion can be directly searched in the database.
And identifying and positioning an indicating body in the page image to be read, wherein the indicating body is a finger, a pen and other tools used by a user for reading, and acquiring the reading content corresponding to the indicating body in the target storage page according to the position of the indicating body in the page image to be read.
After the contents read by the user are obtained, corresponding contents are obtained in the database according to the contents read by the user and in combination with the voice information of the user, and the contents are played or displayed in a voice mode. For example, the content clicked and read by the user is a question, the voice information is "how to solve", and the answering process of the question is acquired in the database and displayed to the user by combining the content clicked and read and the voice information.
It should be understood that, in the foregoing embodiments, the sequence numbers of the steps do not mean the execution sequence, and the execution sequence of the steps should be determined by functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The present invention also provides an embodiment of an identification apparatus for reading contents, as shown in fig. 5, the identification apparatus for reading contents includes:
the image acquisition module 110 is configured to acquire a click-to-read page image;
specifically, when a user learns, the front camera of the family education machine can be opened, the reading mode is entered, when the user reads a book, and after a finger for reading is stable, the image of a page pointed by the finger on the book can be obtained by photographing through the camera, and the image of the page is the page image for reading.
An identifying module 120, configured to identify an occlusion region in the click-to-read page image;
specifically, when a user uses a finger to perform point reading on a book, due to the existence of the finger, a blocking area exists in a point reading page image acquired by photographing, and therefore, the blocking area in the point reading page image needs to be identified first.
When the shielding area is identified in the page image, the trained image identification model can be used for identification, namely, a training sample is obtained first, and then the constructed image identification model is trained by adopting the training sample to obtain the trained image identification model. The training sample at least comprises a page image which is obtained by shielding a page by using articles such as a finger, a pen and the like.
A content obtaining module 130, configured to obtain content of a peripheral area of the blocked area when a ratio of the blocked area to the page image is greater than a preset ratio;
specifically, after the occlusion region is identified in the click-to-read page image, a ratio of the occlusion region to the click-to-read page image is calculated, that is, a percentage of the occlusion region in the click-to-read page image is calculated, when the ratio is smaller than a preset ratio, it is indicated that the occlusion region is only caused by a click-to-read finger or other indicator, the occlusion region is small and does not affect the identification of the click-to-read content, at this time, a matched target storage page can be directly searched in a database according to the click-to-read page image so as to know which page of the click-to-read page image corresponds to which page of the book, then the position of the click-to-read region pointed by the indicator in the click-to-read page image is identified, finally, the corresponding click-to-read content is obtained in the target storage page according to the position of the click-to-read region.
When the matched target storage page is searched in the database, the search can be carried out according to the characters in the page image. For example, the storage page with the text repetition rate greater than a preset threshold in the page image can be directly searched and read in the database. The preset threshold may be set according to a preset ratio, and when the preset ratio is increased, the preset threshold should be appropriately decreased, but in order to ensure the accuracy of the search, the preset threshold cannot be set too low.
If the ratio of the shielding area in the reading page image to the reading page image is greater than the preset ratio, it is indicated that other shielding objects exist in the reading page image besides the finger used for reading. For example, when a user clicks, a plurality of fingers or a fist held by the user are all placed on a page, so that a larger area of occlusion exists in a click-to-read page image obtained by photographing. When the occlusion area is large, if the page images are directly matched in the database according to the point reading, a plurality of storage pages can be matched, so that the matching is not accurate, and therefore the occlusion area needs to be completed first. And (4) completing the shielded area, wherein the content of the peripheral area of the shielded area needs to be acquired first. It should be noted that, if the blocked area is very large, the blocked area cannot be completed, and in this case, the user needs to be prompted and the click-to-read page image needs to be obtained again.
A completion module 140, configured to complete the blocked area in the click-to-read page image according to the content of the peripheral area;
specifically, after the content of the peripheral region is acquired, the content of the peripheral region is processed through the natural language processing model according to the relevance between the front content and the rear content of the peripheral region, so as to complete the occlusion region.
The natural language processing model is obtained by training corpus samples in a corpus. The corpus acquisition mode comprises the following steps: electronizing all paper texts as corpora; or crawl the data on the web through a crawler. After the corpus is obtained, the corpus is preprocessed, for example, the corpus is subjected to data cleaning, word segmentation, part-of-speech tagging, feature word removal and the like. Data cleansing is the cleansing and deletion of unwanted noise data, e.g., advertisements, tags, annotations, etc., for crawled web page content. Common data cleansing methods are: manual de-duplication, alignment, deletion, labeling and the like, or regular expression matching, extraction according to parts of speech and named entities, script writing or code batch processing and the like.
Word segmentation is the segmentation of a sentence or paragraph into individual words or phrases. Part-of-speech tagging is the tagging of each word or word with part-of-speech tags, such as adjectives, verbs, nouns, etc. And marking shielding words in the preprocessed corpus to form a training sample, training the natural language processing model through the training sample, and completing the shielding area according to the content of the peripheral area by the trained natural language processing model.
The content obtaining module 130 is further configured to obtain the click-to-read content pointed by the pointer in the completed click-to-read page image.
Specifically, after the shielded area in the click-to-read page image is completed, the click-to-read content of the user can be accurately acquired according to the completed click-to-read page image.
After the contents read by the user are obtained, corresponding contents are obtained in the database according to the contents read by the user and in combination with the voice information of the user, and the contents are played or displayed in a voice mode. For example, the content clicked and read by the user is a question, the voice information is "how to solve", and the answering process of the question is acquired in the database and displayed to the user by combining the content clicked and read and the voice information.
In the embodiment, when the shielding region exists in the click-to-read page image, the shielding region is completed through the natural language identification technology, so that the accuracy of click-to-read content identification can be improved, and the problem that the identification accuracy is not high or cannot be identified due to shielding in the prior art is solved.
Preferably, the identification module 120 includes:
the image processing unit 121 is configured to perform binarization processing on the click-to-read page image according to a color difference between a blocking object in the click-to-read page image and a click-to-read page to obtain a binarized click-to-read page image;
specifically, when the occlusion region is identified in the page image, the occlusion region can be identified according to the color difference between the occlusion object and the page. In the point-reading page image, a page can be used as a background, a barrier is used as a target, the point-reading page image is respectively set to be in two different levels by utilizing the difference between the target and the background in the point-reading page image, a proper threshold value is selected to determine whether a certain pixel in the image is the target or the background, and then binarization processing is carried out on the point-reading page image to obtain a binarization point-reading page image.
And the identifying unit 122 is configured to identify a blocking area in the point-reading page image according to the binarized point-reading page image.
Specifically, in the binary image, the whole image presents an obvious black-and-white effect, a general white area represents a background, a black area represents an object, and the background and the object can be conveniently distinguished according to the obvious black-and-white effect. Therefore, after the binary point-reading page image is obtained, the contour information of the shielding region can be obtained in the binary point-reading page image, and the region formed by the contour of the shielding region is the shielding region.
Preferably, the content obtaining module 130 includes:
the filling unit 131 is configured to delete a pixel point in the occlusion region in the click-to-read page image when a ratio of the occlusion region to the click-to-read page image is greater than a preset ratio, and fill a blank region in each line of characters in the occlusion region with a preset character;
specifically, after the occlusion region is identified in the click-to-read page image, when it is determined that the occlusion region needs to be completely filled, all pixel points corresponding to the occlusion region are deleted in the click-to-read page image, then the blank region of each line of characters in the occlusion region is filled with the preset characters, and the blank region between the character lines does not need to be filled, so that each line can be distinguished conveniently, and sentences containing the preset characters can be obtained conveniently and subsequently. The preset characters may be underlines or wavy lines or various symbols, etc.
For example, the page includes 15 lines of characters, wherein the occlusion area occludes part of the characters in the fourth to eighth lines, and after the occlusion area is deleted in the page image, the characters occluded by the occlusion object in the fourth to eighth lines are filled with underlines.
A sentence acquisition unit 132 configured to acquire a sentence including the preset character;
specifically, after filling a blank area corresponding to the blocking area in each row with preset characters in the page image to be read, each sentence including the preset characters is extracted from the page image to be read, the sentence can be divided according to punctuation marks, generally, the above one sentence number is used as a starting point, the next adjacent sentence number is used as an end point, and the character between the starting point and the end point is a sentence. And extracting all single sentences comprising the preset characters from the click-to-read page image according to the filled preset characters. Each extracted sentence at least comprises one preset character.
The completion module 140 includes:
a semantic analysis unit 141, configured to perform semantic analysis on the sentence through a natural language processing model to obtain a semantic analysis result of the sentence;
specifically, after sentences including preset characters are extracted from the page image, each sentence is respectively input into a trained natural language processing model, the natural language processing model performs syntactic analysis on each sentence, the sentence structure and the phrases in the sentences are analyzed, the interrelation of words, phrases and the like in the sentences and the relation of the words and the phrases in the sentences are found, and the semantics of each sentence is deduced.
And a completion unit 142, configured to complete the occlusion region in the click-to-read page image according to the semantic parsing result of the sentence.
Specifically, after the natural language processing model deduces the semantics of each sentence, the content of the occlusion region can be deduced according to the semantics of the sentences, and then the content of the occlusion region is completed.
Preferably, the content obtaining module 130 further includes:
the searching unit 133 is configured to search a database for a target storage page matched with the supplemented click-to-read page image;
the identification and positioning unit 134 is used for identifying and positioning the indicator in the click-to-read page image;
a content obtaining unit 135, configured to obtain, according to the indicator, the click-to-read content corresponding to the indicator in the target storage page.
Specifically, after completing the content of the shielding area, searching a matched target storage page in the database according to the completed click-to-read page image, and during matching, searching and matching can be performed according to characters in the completed click-to-read page image. For example, the storage page with the character repetition rate greater than a preset threshold in the click-to-read page image after completion can be directly searched in the database.
And identifying and positioning an indicating body in the page image to be read, wherein the indicating body is a finger, a pen and other tools used by a user for reading, and acquiring the reading content corresponding to the indicating body in the target storage page according to the position of the indicating body in the page image to be read.
After the contents read by the user are obtained, corresponding contents are obtained in the database according to the contents read by the user and in combination with the voice information of the user, and the contents are played or displayed in a voice mode. For example, the content clicked and read by the user is a question, the voice information is "how to solve", and the answering process of the question is acquired in the database and displayed to the user by combining the content clicked and read and the voice information.
Fig. 6 is a schematic structural diagram of a family education machine provided in an embodiment of the present invention, and as shown in fig. 6, the family education machine 200 includes: memory 210, processor 220, and computer programs stored in memory 210 and executable on processor 220, such as: and (5) a book page number identification program. The processor 220 implements the steps in the above-mentioned method for recognizing the book pages when executing the computer program, or the processor 220 implements the functions of the modules in the above-mentioned device for recognizing the book pages when executing the computer program.
The family education machine 200 includes, but is not limited to, a processor 220, a memory 210. Those skilled in the art will appreciate that FIG. 6 is merely an example of the family education machine 200, does not constitute a limitation of the family education machine 200, and may include more or less components than those shown, or some components in combination, or different components, such as: the home tutor 200 can also include input output devices, display devices, network access devices, buses, etc.
The Processor 220 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor 220 may be a microprocessor or the processor may be any conventional processor or the like.
The memory 210 may be an internal storage unit of the family education machine 200, such as: a hard disk or a memory of the home education machine 200. The memory 210 may also be an external storage device of the family education machine 200, such as: a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the home teaching machine 200. Further, the memory 210 may also include both an internal storage unit and an external storage device of the family education machine 200. The memory 210 is used to store computer programs and other programs and data required by the family education machine 200. The memory 210 may also be used to temporarily store data that has been output or is to be output.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in detail in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/family education machine and method can be implemented in other ways. For example, the above-described device/tutor embodiments are merely illustrative, e.g., a division of modules or units is merely a logical division, and there may be other divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the method for recognizing book page numbers of the above-described embodiments.
All or part of the flow of the method according to the embodiments of the present invention may be implemented by sending instructions to relevant hardware through a computer program, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by the processor 220, the steps of the method embodiments may be implemented. Wherein the computer program comprises: computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the content of the computer readable storage medium can be increased or decreased according to the requirements of the legislation and patent practice in the jurisdiction, for example: in certain jurisdictions, in accordance with legislation and patent practice, the computer-readable medium does not include electrical carrier signals and telecommunications signals.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A method for recognizing click-to-read contents is characterized by comprising the following steps:
acquiring a point-reading page image;
identifying a shielded area in the click-to-read page image;
when the ratio of the shielding area to the reading page image is larger than a preset ratio, acquiring the content of the peripheral area of the shielding area;
completing the shielded area in the click-to-read page image according to the content of the peripheral area;
and acquiring the click-to-read content pointed by the indicator in the completed click-to-read page image.
2. The method for identifying read-touch content according to claim 1, wherein the identifying the occlusion region in the read-touch page image specifically comprises:
according to the color difference between a shielding object in the point-reading page image and a point-reading page, carrying out binarization processing on the point-reading page image to obtain a binarization point-reading page image;
and identifying a shielding area in the point-reading page image according to the binary point-reading page image.
3. The method for identifying click-to-read content according to claim 1 or 2, wherein when the ratio of the occlusion region to the click-to-read page image is greater than a preset ratio, the acquiring content of the peripheral region of the occlusion region specifically comprises:
when the ratio of the shielding area to the reading page image is larger than a preset ratio, deleting pixel points in the shielding area in the reading page image, and filling blank areas in each line of characters in the shielding area by adopting preset characters;
obtaining a sentence containing the preset character;
completing the occlusion area in the click-to-read page image according to the content of the peripheral area specifically comprises:
performing semantic analysis on the sentence through a natural language processing model to obtain a semantic analysis result of the sentence;
and completing the shielded area in the click-to-read page image according to the semantic parsing result of the sentence.
4. The method for identifying click-to-read content according to claim 1 or 2, wherein the acquiring of the click-to-read content pointed by the pointer in the completed click-to-read page image specifically includes:
searching a target storage page matched with the completed click-to-read page image in a database;
identifying and positioning an indicator in the click-to-read page image;
and acquiring the point-reading content corresponding to the indicator in the target storage page according to the indicator.
5. An apparatus for recognizing read-by-touch content, comprising:
the image acquisition module is used for acquiring a point-reading page image;
the identification module is used for identifying the shielded area in the click-to-read page image;
the content acquisition module is used for acquiring the content of the peripheral area of the shielding area when the ratio of the shielding area to the page image is larger than a preset ratio;
the completion module is used for completing the shielding area in the point-reading page image according to the content of the peripheral area;
the content obtaining module is further configured to obtain click-to-read content pointed by the pointer in the completed click-to-read page image.
6. The apparatus for recognizing click-to-read contents according to claim 5, wherein the recognition module comprises:
the image processing unit is used for carrying out binarization processing on the point-reading page image according to the color difference between the shielding object in the point-reading page image and the point-reading page to obtain a binarization point-reading page image;
and the identification unit is used for identifying the shielding area in the point-reading page image according to the binary point-reading page image.
7. The apparatus for recognizing click-to-read content according to claim 5 or 6, wherein the content obtaining module comprises:
the filling unit is used for deleting pixel points in the shielding area in the reading page image when the ratio of the shielding area to the reading page image is larger than a preset ratio, and filling a blank area in each line of characters in the shielding area by adopting preset characters;
a sentence acquisition unit configured to acquire a sentence including the preset character;
the completion module includes:
the semantic analysis unit is used for carrying out semantic analysis on the sentence through a natural language processing model to obtain a semantic analysis result of the sentence;
and the completion unit is used for completing the shielded area in the click-to-read page image according to the semantic parsing result of the sentence.
8. The apparatus for recognizing click-to-read content according to claim 5 or 6, wherein the content obtaining module further comprises:
the searching unit is used for searching a target storage page matched with the supplemented point-reading page image in a database;
the identification and positioning unit is used for identifying and positioning the indicator in the page image;
and the content acquisition unit is used for acquiring the point reading content corresponding to the indicator in the target storage page according to the indicator.
9. A family education machine comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for recognizing read-by-touch contents according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for identifying read-by-touch content according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910887010.4A CN110598217B (en) | 2019-09-19 | 2019-09-19 | Click-to-read content identification method and device, home teaching machine and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910887010.4A CN110598217B (en) | 2019-09-19 | 2019-09-19 | Click-to-read content identification method and device, home teaching machine and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110598217A true CN110598217A (en) | 2019-12-20 |
CN110598217B CN110598217B (en) | 2023-10-20 |
Family
ID=68861103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910887010.4A Active CN110598217B (en) | 2019-09-19 | 2019-09-19 | Click-to-read content identification method and device, home teaching machine and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110598217B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708902A (en) * | 2020-06-04 | 2020-09-25 | 南京晓庄学院 | Multimedia data acquisition method |
CN112163513A (en) * | 2020-09-26 | 2021-01-01 | 深圳市快易典教育科技有限公司 | Information selection method, system, device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108494996A (en) * | 2018-05-14 | 2018-09-04 | Oppo广东移动通信有限公司 | Image processing method, device, storage medium and mobile terminal |
CN108551552A (en) * | 2018-05-14 | 2018-09-18 | Oppo广东移动通信有限公司 | Image processing method, device, storage medium and mobile terminal |
CN109656465A (en) * | 2019-02-26 | 2019-04-19 | 广东小天才科技有限公司 | Content acquisition method applied to family education equipment and family education equipment |
CN109766412A (en) * | 2019-01-16 | 2019-05-17 | 广东小天才科技有限公司 | Learning content acquisition method based on image recognition and electronic equipment |
CN109947273A (en) * | 2019-03-25 | 2019-06-28 | 广东小天才科技有限公司 | Point reading positioning method and device |
-
2019
- 2019-09-19 CN CN201910887010.4A patent/CN110598217B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108494996A (en) * | 2018-05-14 | 2018-09-04 | Oppo广东移动通信有限公司 | Image processing method, device, storage medium and mobile terminal |
CN108551552A (en) * | 2018-05-14 | 2018-09-18 | Oppo广东移动通信有限公司 | Image processing method, device, storage medium and mobile terminal |
CN109766412A (en) * | 2019-01-16 | 2019-05-17 | 广东小天才科技有限公司 | Learning content acquisition method based on image recognition and electronic equipment |
CN109656465A (en) * | 2019-02-26 | 2019-04-19 | 广东小天才科技有限公司 | Content acquisition method applied to family education equipment and family education equipment |
CN109947273A (en) * | 2019-03-25 | 2019-06-28 | 广东小天才科技有限公司 | Point reading positioning method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708902A (en) * | 2020-06-04 | 2020-09-25 | 南京晓庄学院 | Multimedia data acquisition method |
CN112163513A (en) * | 2020-09-26 | 2021-01-01 | 深圳市快易典教育科技有限公司 | Information selection method, system, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110598217B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263248B (en) | Information pushing method, device, storage medium and server | |
CN108108426B (en) | Understanding method and device for natural language question and electronic equipment | |
Wilkinson et al. | Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections | |
CN111753767A (en) | Method and device for automatically correcting operation, electronic equipment and storage medium | |
CN110909122B (en) | Information processing method and related equipment | |
CN110609998A (en) | Data extraction method of electronic document information, electronic equipment and storage medium | |
CN110874534B (en) | Data processing method and data processing device | |
CN109033282B (en) | Webpage text extraction method and device based on extraction template | |
CN109490843B (en) | Normalized radar screen monitoring method and system | |
CN111144210A (en) | Image structuring processing method and device, storage medium and electronic equipment | |
CN111274239A (en) | Test paper structuralization processing method, device and equipment | |
CN110741376A (en) | Automatic document analysis for different natural languages | |
CN112434520A (en) | Named entity recognition method and device and readable storage medium | |
CN110598217B (en) | Click-to-read content identification method and device, home teaching machine and storage medium | |
CN112257462A (en) | Hypertext markup language translation method based on neural machine translation technology | |
CN113128241A (en) | Text recognition method, device and equipment | |
CN113326413A (en) | Webpage information extraction method, system, server and storage medium | |
CA3140455A1 (en) | Information extraction method, apparatus, and system | |
CN114970514A (en) | Artificial intelligence based Chinese word segmentation method, device, computer equipment and medium | |
CN110413996B (en) | Method and device for constructing zero-index digestion corpus | |
CN111814481A (en) | Shopping intention identification method and device, terminal equipment and storage medium | |
CN110852105A (en) | Time data normalization method, device, medium and electronic equipment | |
CN118097688A (en) | Universal certificate identification method based on large language model | |
CN111027533B (en) | Click-to-read coordinate transformation method, system, terminal equipment and storage medium | |
CN112818693A (en) | Automatic extraction method and system for electronic component model words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |