CN110427412A - Topic read method, device, topic input device and computer storage medium - Google Patents

Topic read method, device, topic input device and computer storage medium Download PDF

Info

Publication number
CN110427412A
CN110427412A CN201910569060.8A CN201910569060A CN110427412A CN 110427412 A CN110427412 A CN 110427412A CN 201910569060 A CN201910569060 A CN 201910569060A CN 110427412 A CN110427412 A CN 110427412A
Authority
CN
China
Prior art keywords
topic
identification
image
read method
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910569060.8A
Other languages
Chinese (zh)
Inventor
陈德刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE ICT Technologies Co Ltd
Original Assignee
ZTE ICT Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE ICT Technologies Co Ltd filed Critical ZTE ICT Technologies Co Ltd
Priority to CN201910569060.8A priority Critical patent/CN110427412A/en
Publication of CN110427412A publication Critical patent/CN110427412A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of topic read method, device, topic input device and computer storage mediums, wherein topic read method includes: the structure and content of identification purpose image;According to the corresponding empirical model of process of identification structure and content, item analysis experience library is generated;Identification is empirically executed with item analysis experience library;Structural data is generated according to the topic identified.According to the technical solution of the present invention, item analysis is carried out using artificial intelligence (machine learning) technology, topic identification process is instructed by the experience library of foundation, new experience can also be constantly accumulated during being identified to be trained experience library, item analysis (identification) speed can be significantly improved, the time required for topic identification/typing is reduced, the efficiency and accuracy of topic Input Process are promoted.

Description

Topic read method, device, topic input device and computer storage medium
Technical field
The present invention relates to technical field of data processing, read in particular to a kind of topic read method, a kind of topic Device, a kind of topic input device and a kind of computer readable storage medium.
Background technique
Education and the combination of internet are being carried out by current country, and school examination, that learning link is frequently necessary to network is online Examination, on-line training way, other the case where there are also online questionnaire test-types in society.These application scenarios are rising Step section is often paper, the training handbook of only papery, and homework book, questionnaire etc. are needed papery this when Paper, training handbook, the topic typing computer network system in homework book, questionnaire.In the related technology, the record of a little topics Enter usually manual entry network system, this process generally require to expend a large amount of manpower and time for needing to expend also very It is more.
In addition, any discussion of the whole instruction to background technique, not representing the background technique must be fields The prior art that technical staff is known, not representing in the whole instruction to any discussion of the prior art think that this is existing Technology must be widely known or certain common knowledge for constituting this field.
Summary of the invention
The present invention is directed to solve at least one of the technical problems existing in the prior art or related technologies.
For this purpose, an object of the present invention is to provide a kind of topic read methods.
It is another object of the present invention to propose a kind of topic reading device.
It is another object of the present invention to propose a kind of topic input device.
Yet another object of the invention is that proposing a kind of computer readable storage medium.
In the technical solution of the first aspect of the present invention, a kind of topic read method is proposed, comprising: identification purpose The structure and content of image;According to the corresponding empirical model of process of identification structure and content, item analysis experience library is generated;With Item analysis experience library empirically executes identification;Structural data is generated according to the topic identified.
In the technical scheme, item analysis (identification) is carried out using pictograph means of identification, in combination with data point The empirical model that analysis process generates establishes item analysis experience library, and item analysis experience library is established according to topic identification process and quilt For instructing topic identification process, the accumulation and update of experience are formed, steps up recognition efficiency and accuracy, and being capable of basis Demand, which learns more perfect recognition methods by the means of machine learning, after training by certain time can significantly mention High analyte (identification) speed reduces the time required for topic identification/typing, promotes the efficiency of topic Input Process and accurate Degree.In addition, the application proposes simultaneously to identify topic target structure and content, the topic carrier of any typesetting can be carried out Identification promotes topic recognition efficiency, reduces artificial investment.
It will be understood by a person skilled in the art that the vocabulary category such as experience/empirical model, training and study for being proposed in the application In data analysis field or machine learning field, by the achievable operating method or algorithm support in above-mentioned field, the application In the image of topic can be obtained from various topic carriers, for example, the paper of papery, training handbook, homework book, questionnaire tune It tables look-up and waits topics carrier, many algorithms realization, such as convolutional Neural net can be used in the pictograph means of identification that the application proposes Network or other types of deep learning algorithm.
In addition, topic read method according to the above embodiment of the present invention, can also have the following additional technical features:
In any of the above-described technical solution, optionally, the structure of identification purpose image is specifically included: with item analysis Experience library empirically, obtains the image of topic, according to the layout information of image, topic type information and topic sequence information, identifies figure The layout structure of picture, wherein layout structure includes the one or more of following items: the space of a whole page, topic type and topic sequence.
In the technical scheme, the image of image, that is, topic corresponding to topic carrier carries out structural analysis, to obtain version Face structure situation, topic types situation and topic arrangement sequence situation.Due to various topic carriers often have some fixations or often Mode identifies image according to pictograph means of identification to obtain the layout structure of image and facilitate subsequent progress More accurate highly efficient item content identification, can preferably establish structuring number according to the layout structure information of acquisition According to.In addition, in this process, item analysis experience library empirically, carries out the identification of image layout structure, help identifies and shape It at the cumulative of experience and updates, recognition efficiency and accuracy can be stepped up.
In any of the above-described technical solution, optionally, the content of identification purpose image is specifically included: with item analysis The target information of image empirically, according to the figure and/or text in layout structure and image, is identified in experience library, In, target information includes the one or more of following items: stem content, options content, problem content, figure and blank Place.
In the technical scheme, the layout structure combination pictograph means of identification based on image to the content in image into Row analysis identification, can more precisely, obtain the targets such as stem, options, problem, figure, the blank space of topic letter more quickly Breath, these particular contents are usually it is desirable that the effective information obtained, can build according to these target informations and layout structure Vertical more complete electronic edition topic (set).In addition, in this process, item analysis experience library empirically, helps to identify And form the cumulative of experience and update, recognition efficiency and accuracy can be stepped up.
It in any of the above-described technical solution, optionally, identifies the target information of image, specifically includes: is true according to topic sequence The start-stop coordinate for determining topic, the image of topic is cut into according to start-stop coordinate, reads image using pictograph means of identification Target information.
In the technical scheme, after knowing topic sequence, it can be accurately identified the start-stop coordinate of each topic on this basis Position can cut out the image of single topic, if there is the topic to skip according to start-stop coordinate position in the image of topic Then this topic can cut out two or more pictures, then the two or more pictures cut out are merged into a picture, and then utilize Pictograph means of identification carries out content analysis to picture material, obtains stem, options, problem, the figure, blank space of topic Etc. target informations, position and then rapidly extract the particular content of single topic using topic sequence, recognition efficiency is high, obtains Topic is separate unit convenient for building structural data progress data loading.
In any of the above-described technical solution, optionally, identifies the space of a whole page of image, specifically include: with item analysis experience Library empirically, using pictograph means of identification carries out the lateral blank area of image and/or longitudinal blank area and/or title Identification, to obtain the child partition of image.
In the technical scheme, the image of topic is judged, judges whether the image meets paper, topic, test paper The condition of class, to determine the information of the space of a whole page of correspondence image, using pictograph means of identification to picture space of a whole page transverse direction, Zong Xianglian The condition analysis such as continuous blank area, existing title go out the child partition situation of image, for example, image is divided into Dan Lan, two Situations such as column, three columns, four columns, to obtain the information such as layout and the title of topic carrier.In addition, the process is with item analysis Experience library empirically, helps to promote recognition efficiency, additionally it is possible to form the cumulative of experience and update, step up recognition efficiency And accuracy.
In any of the above-described technical solution, optionally, identifies the topic type of image, specifically include: on the basis of child partition On, the topic type for including in image is identified using pictograph means of identification, wherein topic type includes subjective item and objective item, master Sight topic includes at least question-and-answer problem, and objective item includes at least one or more of following items: True-False, single choice, multiple choice And gap-filling questions.
In the technical scheme, on the basis of according to known layout and title (image region has been identified), Using pictograph means of identification, identify the basic topic type of topic carrier, for example, True-False, single choice, multiple choice, gap-filling questions, Question-and-answer problem etc..Word content acquisition is carried out according to topic type information and establishes structural data, can effectively promote data query speed Degree reduces topic management difficulty.
In any of the above-described technical solution, optionally, identifies the topic sequence of image, specifically includes:
Empirically with item analysis experience library, the sequence of the topic in image is identified using pictograph means of identification Number.
In the technical scheme, topic sequence identification is carried out in conjunction with experience library and be able to ascend recognition speed, above topic carrier Usually there is topic serial number in topic, serial number is often continuous, can quickly recognize topic using pictograph means of identification Serial number, for example, big topic, the identification process of small topic are usually big topic preceding, small topic is followed rear, this identification process can also generate The routine information of some text numbers, and often combined with topic type, these information empirically save.
In any of the above-described technical solution, optionally, further includes: the hierarchical relationship between topic is determined according to serial number.
In the technical scheme, some topic carrier structures are more complicated, for example, there are the big topic of multiple tracks, per pass in paper Big topic has the small topic of multiple tracks again below, in this case, can be analyzed according to the result of topic sequence analysis, between big topic, small topic Between serial number it is often continuous, can analyze out it is big topic, small topic hierarchical relationship.
It is optionally, raw according to the corresponding empirical model of process of identification structure and content in any of the above-described technical solution It at item analysis experience library, specifically includes: according to identification structure and identifying that the procedure of rule of content generates empirical model, according to warp It tests model and generates one of printed page analysis experience library, analysis on test forms experience library and topic sequence analysis experience library or a variety of.
In the technical scheme, the identification of various topic carriers often has some fixations or common mode, will identify that Meet use condition, available data record is got off, formed printed page analysis experience library, analysis on test forms experience library, topic sequence analysis Small topic analysis experience library is inscribed greatly in experience library, and empirically module preferentially uses when for executing next identification mission, to be promoted The speed of identification need not then repeat to be put in storage for the empirical mode being put in storage.
For example, during template recognition (analysis), situations such as analyzing Dan Lan, two columns, three columns, four columns, if these feelings Condition, which is verified, to meet the requirements, then by space of a whole page column number, space of a whole page transverse direction, longitudinal space width, length, position etc. as a type of Experience module is recorded.
For another example the topic types on basis have single choice, multiple choice, True-False, gap-filling questions, question-and-answer problem, these are as base The matching of plinth, but many uncertain situations in wider application, these topic types are identified meet the requirements after, It is recorded as topic types, and identifies the procedure of rule of these topics, be also automatically logged into analysis on test forms experience library.
In any of the above-described technical solution, optionally, structural data is generated according to the topic identified, is specifically also wrapped It includes: identifying the content of topic target structure and topic from the image of topic using pictograph means of identification;According to topic Structure and the content of topic generate structural data, and/or by Structure data entry database.
In the technical scheme, it reduces to improve the utilization rate of topic using difficulty, structure is carried out to the topic identified Change processing convenient for inquiring and transferring at any time, the target information in papery topic carrier is enabled to be efficiently utilized and access calculating Machine network system.
In the technical solution of the second aspect of the present invention, a kind of topic reading device is proposed, the topic reading device Including processor, when processor executes computer program realization topic read method as disclosed in any of the above-described technical solution Step.Therefore, which has the advantageous effects of any of the above-described topic read method, no longer superfluous herein It states.
In the technical solution of the third aspect of the present invention, a kind of topic input device is proposed, comprising: control module, Topic input database for receiving the image of topic and/or will identify that;Memory is configured with database in memory, Database is for storing topic;Such as the topic reading device that above-mentioned technical proposal provides, which includes processor, Processor is realized as disclosed in any of the above-described technical solution when executing computer program the step of topic read method.Therefore, The topic input device has the advantageous effects of any of the above-described topic read method, and details are not described herein.
In addition, in the technical scheme, control module is for carrying out data exchange and Row control: being swept for example, receiving Retouch equipment or external incoming picture file;Picture file is passed into topic reading device;It receives verifying topic and reads dress The analysis set is as a result, and be deposited into database.Database is used to record the lteral data of the topic identified, the topic including topic Dry, options, problem, figure, answer etc..
In the technical solution of the fourth aspect of the present invention, a kind of computer readable storage medium is proposed, is stored thereon There is computer program, the computer program is performed, and realizes the topic reading side as disclosed in any of the above-described technical solution The step of method.
Additional aspect and advantage of the invention will become obviously in following description section, or practice through the invention Recognize.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures Obviously and it is readily appreciated that, in which:
Fig. 1 shows the schematic flow diagram of topic read method according to an embodiment of the invention;
Fig. 2 shows the schematic block diagrams of topic reading device according to an embodiment of the invention;
Fig. 3 shows the schematic block diagram of topic reading device according to another embodiment of the invention;
Fig. 4 shows the schematic block diagram of topic input device according to an embodiment of the invention.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying mode, the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application Feature in example and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also To be implemented using other than the one described here other modes, therefore, protection scope of the present invention is not by described below Specific embodiment limitation.
Detection circuit according to the present invention and cooking apparatus are specifically described below with reference to Fig. 1 to Fig. 4.
As shown in Figure 1, topic read method according to an embodiment of the invention, comprising: step S102 identifies topic Image structure and content;Step S104 generates topic point according to the corresponding empirical model of process of identification structure and content Analysis experience library;S106 empirically executes identification with item analysis experience library;S108 generates structure according to the topic identified Change data.
In the technical scheme, item analysis (identification) is carried out using pictograph means of identification, in combination with data point The empirical model that analysis process generates establishes item analysis experience library, and item analysis experience library is established according to topic identification process and quilt For instructing topic identification process, the accumulation and update of experience are formed, steps up recognition efficiency and accuracy, and being capable of basis Demand, which learns more perfect recognition methods by the means of machine learning, after training by certain time can significantly mention High analyte (identification) speed reduces the time required for topic identification/typing, promotes the efficiency of topic Input Process and accurate Degree.In addition, the application proposes simultaneously to identify topic target structure and content, the topic carrier of any typesetting can be carried out Identification promotes topic recognition efficiency, reduces artificial investment.
It will be understood by a person skilled in the art that the vocabulary category such as experience/empirical model, training and study for being proposed in the application In data analysis field or machine learning field, by the achievable operating method or algorithm support in above-mentioned field, the application In the image of topic can be obtained from various topic carriers, for example, the paper of papery, training handbook, homework book, questionnaire tune It tables look-up and waits topics carrier, many algorithms realization, such as convolutional Neural net can be used in the pictograph means of identification that the application proposes Network or other types of deep learning algorithm.
In addition, topic read method according to the above embodiment of the present invention, can also have the following additional technical features:
In any of the above-described technical solution, optionally, the structure of identification purpose image in step S102 is specifically included: Empirically with item analysis experience library, the image of topic is obtained, according to the layout information of image, topic type information and topic sequence letter Breath, identifies the layout structure of image, wherein layout structure includes the one or more of following items: the space of a whole page, topic type and topic Sequence.
In the technical scheme, the image of image, that is, topic corresponding to topic carrier carries out structural analysis, to obtain version Face structure situation, topic types situation and topic arrangement sequence situation.Due to various topic carriers often have some fixations or often Mode identifies image according to pictograph means of identification to obtain the layout structure of image and facilitate subsequent progress More accurate highly efficient item content identification, can preferably establish structuring number according to the layout structure information of acquisition According to.In addition, in this process, item analysis experience library empirically, carries out the identification of image layout structure, help identifies and shape It at the cumulative of experience and updates, recognition efficiency and accuracy can be stepped up.
In any of the above-described technical solution, optionally, the content of identification purpose image in step S102 is specifically included: Empirically with item analysis experience library, according to the figure and/or text in layout structure and image, the mesh of image is identified Mark information, wherein target information includes the one or more of following items: stem content, options content, problem content, figure Shape and blank space.
In the technical scheme, the layout structure combination pictograph means of identification based on image to the content in image into Row analysis identification, can more precisely, obtain the targets such as stem, options, problem, figure, the blank space of topic letter more quickly Breath, these particular contents are usually it is desirable that the effective information obtained, can build according to these target informations and layout structure Vertical more complete electronic edition topic (set).In addition, in this process, item analysis experience library empirically, helps to identify And form the cumulative of experience and update, recognition efficiency and accuracy can be stepped up.
It in any of the above-described technical solution, optionally, identifies the target information of image, specifically includes: is true according to topic sequence The start-stop coordinate for determining topic, the image of topic is cut into according to start-stop coordinate, reads image using pictograph means of identification Target information.
In the technical scheme, after knowing topic sequence, it can be accurately identified the start-stop coordinate of each topic on this basis Position can cut out the image of single topic, if there is the topic to skip according to start-stop coordinate position in the image of topic Then this topic can cut out two or more pictures, then the two or more pictures cut out are merged into a picture, and then utilize Pictograph means of identification carries out content analysis to picture material, obtains stem, options, problem, the figure, blank space of topic Etc. target informations, position and then rapidly extract the particular content of single topic using topic sequence, recognition efficiency is high, obtains Topic is separate unit convenient for building structural data progress data loading.
In any of the above-described technical solution, optionally, identifies the space of a whole page of image, specifically include: with item analysis experience Library empirically, using pictograph means of identification carries out the lateral blank area of image and/or longitudinal blank area and/or title Identification, to obtain the child partition of image.
In the technical scheme, the image of topic is judged, judges whether the image meets paper, topic, test paper The condition of class, to determine the information of the space of a whole page of correspondence image, using pictograph means of identification to picture space of a whole page transverse direction, Zong Xianglian The condition analysis such as continuous blank area, existing title go out the child partition situation of image, for example, image is divided into Dan Lan, two Situations such as column, three columns, four columns, to obtain the information such as layout and the title of topic carrier.In addition, the process is with item analysis Experience library empirically, helps to promote recognition efficiency, additionally it is possible to form the cumulative of experience and update, step up recognition efficiency And accuracy.
In any of the above-described technical solution, optionally, identifies the topic type of image, specifically include: on the basis of child partition On, the topic type for including in image is identified using pictograph means of identification, wherein topic type includes subjective item and objective item, master Sight topic includes at least question-and-answer problem, and objective item includes at least one or more of following items: True-False, single choice, multiple choice And gap-filling questions.
In the technical scheme, on the basis of according to known layout and title (image region has been identified), Using pictograph means of identification, identify the basic topic type of topic carrier, for example, True-False, single choice, multiple choice, gap-filling questions, Question-and-answer problem etc..Word content acquisition is carried out according to topic type information and establishes structural data, can effectively promote data query speed Degree reduces topic management difficulty.
In any of the above-described technical solution, optionally, identifies the topic sequence of image, specifically includes:
Empirically with item analysis experience library, the sequence of the topic in image is identified using pictograph means of identification Number.
In the technical scheme, topic sequence identification is carried out in conjunction with experience library and be able to ascend recognition speed, above topic carrier Usually there is topic serial number in topic, serial number is often continuous, can quickly recognize topic using pictograph means of identification Serial number, for example, big topic, the identification process of small topic are usually big topic preceding, small topic is followed rear, this identification process can also generate The routine information of some text numbers, and often combined with topic type, these information empirically save.
In any of the above-described technical solution, optionally, further includes: the hierarchical relationship between topic is determined according to serial number.
In the technical scheme, some topic carrier structures are more complicated, for example, there are the big topic of multiple tracks, per pass in paper Big topic has the small topic of multiple tracks again below, in this case, can be analyzed according to the result of topic sequence analysis, between big topic, small topic Between serial number it is often continuous, can analyze out it is big topic, small topic hierarchical relationship.
In any of the above-described technical solution, optionally, the process in step S104 according to identification structure and content is corresponding Empirical model generates item analysis experience library, specifically includes: according to identification structure and identifying that the procedure of rule of content generates experience Model, rule of thumb model generate printed page analysis experience library, analysis on test forms experience library and topic sequence analysis one of experience library or It is a variety of.
In the technical scheme, the identification of various topic carriers often has some fixations or common mode, will identify that Meet use condition, available data record is got off, formed printed page analysis experience library, analysis on test forms experience library, topic sequence analysis Small topic analysis experience library is inscribed greatly in experience library, and empirically module preferentially uses when for executing next identification mission, to be promoted The speed of identification need not then repeat to be put in storage for the empirical mode being put in storage.
For example, during template recognition (analysis), situations such as analyzing Dan Lan, two columns, three columns, four columns, if these feelings Condition, which is verified, to meet the requirements, then by space of a whole page column number, space of a whole page transverse direction, longitudinal space width, length, position etc. as a type of Experience module is recorded.
For another example the topic types on basis have single choice, multiple choice, True-False, gap-filling questions, question-and-answer problem, these are as base The matching of plinth, but many uncertain situations in wider application, these topic types are identified meet the requirements after, It is recorded as topic types, and identifies the procedure of rule of these topics, be also automatically logged into analysis on test forms experience library.
In any of the above-described technical solution, optionally, structural data is generated according to the topic identified in step S108, Specifically further include: identify the content of topic target structure and topic from the image of topic using pictograph means of identification;Root Structural data is generated according to topic target structure and the content of topic, and/or by Structure data entry database.
In the technical scheme, it reduces to improve the utilization rate of topic using difficulty, structure is carried out to the topic identified Change processing convenient for inquiring and transferring at any time, the target information in papery topic carrier is enabled to be efficiently utilized and access calculating Machine network system.
As shown in Fig. 2, topic reading device 200 according to an embodiment of the invention, in this embodiment according to upper The topic read method for stating technical solution offer establishes corresponding program module and is able to carry out topic reading, specifically, the topic Reading device 200 includes: AI analysis module 202 and AI study module 204, wherein AI (Artificial Intelligence, Artificial intelligence) i.e. artificial intelligence technology;AI analysis module 202 is responsible for carrying out printed page analysis, analysis on test forms, the analysis of topic sequence, big topic Small topic analysis and item content analysis;AI study module 204 then establishes corresponding printed page analysis experience library, topic according to analytic process Type analysis experience library, topic sequence analyze experience library and inscribe small topic analysis experience library greatly.The task of AI analysis module 202 execution when Wait the experience library that can be formed with reference to AI study module 204.
The method executed in AI analysis module 202 includes:
Printed page analysis, to received picture be made whether to meet paper, topic, class of answering the questions in a test paper condition, mainly utilize figure Text region means are lateral to the picture space of a whole page, are longitudinally continuous the condition analysis such as blank area, existing title goes out Dan Lan, two Situations such as column, three columns, four columns, to obtain the general layout and title of paper.
Analysis on test forms, using pictograph means of identification, identifies paper on the basis of known general layout and title Basic topic type, such as single choice, multiple choice, gap-filling questions, question-and-answer problem etc..
Sequence analysis is inscribed, generally there are topic serial numbers for the topic above paper, and the serial number for inscribing small topic greatly is often continuous, benefit Topic serial number can be identified with pictograph means of identification.
Inscribe the analysis of small topic greatly, with the presence of paper inscribe greatly, big topic has the case where small topic again below, on the basis that topic sequence is analyzed On, it can analyze out the hierarchical relationship for inscribing small topic greatly.
Item content analysis can be accurately identified the start-stop coordinate position of each topic on the basis of inscribing sequence and analyzing, According to start-stop coordinate position, the picture of single topic can be cut out, then this topic can cut out two if there is the topic to skip Or multiple pictures, then the two or more pictures cut out are merged into a picture.And then utilize pictograph means of identification figure Piece content carries out item content analysis, obtains the particular contents such as stem, options, problem, figure, the blank space of topic.
In AI study module 204, the identification of paper often has some fixations or common mode, this module to analyze AI What module identified meet use condition, available data record is got off, and forms printed page analysis experience library, analysis on test forms experience Small topic analysis experience library, is inscribed greatly at topic sequence analysis experience library in library, conduct when executing the identification mission of next time for AI analysis module 202 Experience module preferentially uses, to promote the speed of identification, the empirical mode being put in storage need not then be repeated to be put in storage.
The method executed in AI study module 204 includes:
The formation in printed page analysis experience library: printed page analysis module analysis goes out situations such as Dan Lan, two columns, three columns, four columns, if These situations, which are finally verified, to meet the requirements, AI study module by space of a whole page column number, the space of a whole page laterally, longitudinal space width, length, position It is recorded Deng as a type of experience module.
The formation in analysis on test forms experience library: the topic types on general basis have single choice, multiple choice, True-False, gap-filling questions, Question-and-answer problem, the matching based on these, but many uncertain situations, these topic types quilts in wider application It after identification meets the requirements, is also recorded as topic types, and identifies the procedure of rule of these topics, be also automatically logged into topic Type analysis experience library.
The formation in topic sequence analysis experience library: during identification sequence, identifying satisfactory continuous topic sequence, than Such as, 1,2,3 ...;One, two, three ...;A,B,C……;These topic sequences can include serial number Text region rule as module Process record analyzes experience library to topic sequence.
Inscribe the formation in small topic analysis experience library greatly: big topic, the identification process of small topic are usually big topic preceding, and small topic follows Afterwards, this identification process can also generate the routine information of some text numbers, and often combine with topic type, these information are made For experience preservation.
As shown in figure 3, topic reading device 300 according to another embodiment of the invention, the topic reading device 300 Including processor 302, processor 302 realizes that the topic as disclosed in any of the above-described technical solution is read when executing computer program The step of method.Therefore, which has the advantageous effects of any of the above-described topic read method, In This is repeated no more.
As shown in figure 3, topic input device 400 according to an embodiment of the invention, comprising: control module 402 is used In the topic input database that receives the image of topic and/or will identify that;Memory 404 is configured with database in memory, Database is for storing topic;Such as the topic reading device 300 that above-mentioned technical proposal provides, which includes Processor, processor realize the step of the topic read method as disclosed in any of the above-described technical solution when executing computer program Suddenly.Therefore, which has the advantageous effects of any of the above-described topic read method, no longer superfluous herein It states.
In addition, in the technical scheme, control module 402 is for carrying out data exchange and Row control: for example, receiving Scanning device or external incoming picture file;Picture file is passed into topic reading device 300;Verifying topic is received to read The analysis of device 300 is taken as a result, and in the database that is deposited into memory 404.Database is for recording the topic identified Lteral data, stem, options, problem, figure, answer including topic etc..
One embodiment of the present of invention also defines a kind of computer readable storage medium, is stored thereon with computer journey The step of sequence, the computer program are performed, and realize the topic read method as described in any of the above-described technical solution.Cause This computer readable storage medium has the advantageous effects of any of the above-described topic read method, and details are not described herein.
According to the technique and scheme of the present invention, the topic by the paper of papery, training handbook, in homework book, questionnaire It analyzed using pictograph identification technology by printed page analysis, analysis on test forms, topic sequence, inscribe small topic analysis greatly, item content is analyzed Carry out digitlization Rapid input computer network system, during analysis formed printed page analysis experience library, analysis on test forms experience library, Small topic analysis experience library is inscribed greatly in topic sequence analysis experience library, is saved the time for subsequent analysis task, is stepped up recognition efficiency. Item analysis is carried out using artificial intelligence (machine learning) technology, topic identification process is instructed by the experience library of foundation, is carried out New experience can also be constantly accumulated during identification to be trained experience library, can significantly improve item analysis (identification) Speed reduces the time required for topic identification/typing, promotes the efficiency and accuracy of topic Input Process.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
It should be noted that in the claims, any reference symbol between parentheses should not be configured to power The limitation that benefit requires.Word "comprising" does not exclude the presence of component or step not listed in the claims.Before component Word "a" or "an" does not exclude the presence of multiple such components.The present invention can be by means of including several different components It hardware and is realized by means of properly programmed computer.In the unit claims listing several devices, these are filled Several in setting, which can be, to be embodied by the same item of hardware.The use of word first, second, and third is not Indicate any sequence.These words can be construed to title.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
These are only the preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (13)

1. a kind of topic read method characterized by comprising
The structure and content of identification purpose image;
According to the corresponding empirical model of process of the identification structure and the content, item analysis experience library is generated;
The identification is empirically executed with item analysis experience library;
Structural data is generated according to the topic identified.
2. topic read method according to claim 1, which is characterized in that the structure of identification purpose image is specific to wrap It includes:
Empirically with item analysis experience library, the image of topic is obtained, is believed according to the layout information of described image, topic type Breath and topic sequence information, identify the layout structure of described image, wherein the layout structure includes one or more of following items It is a:
The space of a whole page, topic type and topic sequence.
3. topic read method according to claim 2, which is characterized in that the content of identification purpose image is specific to wrap It includes:
Empirically with item analysis experience library, according to the figure and/or text in the layout structure and described image Word identifies the target information of described image, wherein the target information includes the one or more of following items:
Stem content, options content, problem content, figure and blank space.
4. topic read method according to claim 3, which is characterized in that the target letter for identifying described image Breath, specifically includes:
The start-stop coordinate that the topic is determined according to the topic sequence, the image of the topic is cut into according to the start-stop coordinate, The target information of described image is read using pictograph means of identification.
5. topic read method according to claim 2, which is characterized in that identify the space of a whole page of described image, it is specific to wrap It includes:
Empirically with item analysis experience library, using pictograph means of identification to the lateral blank area of described image And/or longitudinal blank area and/or title are identified, to obtain the child partition of described image.
6. topic read method according to claim 5, which is characterized in that identify the topic type of described image, it is specific to wrap It includes:
On the basis of the child partition, the topic type for including in described image is identified using pictograph means of identification, wherein The topic type includes subjective item and objective item, and the subjective item includes at least question-and-answer problem, and the objective item includes at least following item One or more of mesh: True-False, single choice, multiple choice and gap-filling questions.
7. topic read method according to claim 6, which is characterized in that identify the topic sequence of described image, it is specific to wrap It includes:
Empirically with item analysis experience library, the topic in described image is identified using pictograph means of identification Serial number.
8. topic read method according to claim 7, which is characterized in that further include:
The hierarchical relationship between the topic is determined according to the serial number.
9. topic read method according to any one of claim 1 to 8, which is characterized in that described according to identification The corresponding empirical model of the process of structure and the content generates item analysis experience library, specifically includes:
According to identifying the structure and identifying that the procedure of rule of the content generates empirical model, generated according to the empirical model One of printed page analysis experience library, analysis on test forms experience library and topic sequence analysis experience library are a variety of.
10. topic read method according to any one of claim 1 to 8, which is characterized in that the basis identified Topic generates structural data, specifically further include:
The content of topic target structure and topic is identified from the image of the topic using pictograph means of identification;
Structural data is generated according to the topic target structure and the content of the topic, and/or the structural data is recorded Enter database.
11. a kind of topic reading device, the topic reading device includes processor, which is characterized in that the processor executes The step of topic read method as described in any one of claims 1 to 10 is realized when computer program.
12. a kind of topic input device characterized by comprising
Control module, the topic input database for receiving the image of topic and/or will identify that;
Memory is configured with the database in the memory, and the database is for storing the topic;
Topic reading device as claimed in claim 11.
13. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of being performed, realizing the topic read method as described in any one of claims 1 to 10.
CN201910569060.8A 2019-06-27 2019-06-27 Topic read method, device, topic input device and computer storage medium Pending CN110427412A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910569060.8A CN110427412A (en) 2019-06-27 2019-06-27 Topic read method, device, topic input device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910569060.8A CN110427412A (en) 2019-06-27 2019-06-27 Topic read method, device, topic input device and computer storage medium

Publications (1)

Publication Number Publication Date
CN110427412A true CN110427412A (en) 2019-11-08

Family

ID=68409792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910569060.8A Pending CN110427412A (en) 2019-06-27 2019-06-27 Topic read method, device, topic input device and computer storage medium

Country Status (1)

Country Link
CN (1) CN110427412A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553365A (en) * 2020-04-30 2020-08-18 广东小天才科技有限公司 Method and device for selecting questions, electronic equipment and storage medium
CN112861864A (en) * 2021-01-28 2021-05-28 广东国粒教育技术有限公司 Topic entry method, topic entry device, electronic device and computer-readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932508A (en) * 2018-08-13 2018-12-04 杭州大拿科技股份有限公司 A kind of topic intelligent recognition, the method and system corrected
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932508A (en) * 2018-08-13 2018-12-04 杭州大拿科技股份有限公司 A kind of topic intelligent recognition, the method and system corrected
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553365A (en) * 2020-04-30 2020-08-18 广东小天才科技有限公司 Method and device for selecting questions, electronic equipment and storage medium
CN111553365B (en) * 2020-04-30 2023-11-24 广东小天才科技有限公司 Question selection method and device, electronic equipment and storage medium
CN112861864A (en) * 2021-01-28 2021-05-28 广东国粒教育技术有限公司 Topic entry method, topic entry device, electronic device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
Passi et al. Data vision: Learning to see through algorithmic abstraction
Della Croce et al. A genetic algorithm for the job shop problem
CN108229478A (en) Image, semantic segmentation and training method and device, electronic equipment, storage medium and program
CN109977230B (en) Selected question error cause analysis method suitable for adaptive teaching
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN102436547A (en) Wrong sentence statistical method and system for teaching
CN110288007A (en) The method, apparatus and electronic equipment of data mark
Kordaki et al. Digital storytelling as an effective framework for the development of computational thinking skills
Leavy Feminist content analysis and representative characters
CN116541538B (en) Intelligent learning knowledge point mining method and system based on big data
CN110427412A (en) Topic read method, device, topic input device and computer storage medium
CN112949766A (en) Target area detection model training method, system, device and medium
CN112182308A (en) Multi-feature fusion depth knowledge tracking method and system based on multi-thermal coding
CN106326086B (en) Crucial running log extracting method and device
Zehra et al. Student misconceptions of dynamic programming
CN111405314A (en) Information processing method, device, equipment and storage medium
CN112905451B (en) Automatic testing method and device for application program
CN110675705A (en) Automatic generation method of geometric auxiliary line
Olague et al. Hands-on artificial evolution through brain programming
KR102589553B1 (en) Painting emotion learning method and apparatus
Gutbrod et al. The business experiments navigator (ben)
Thabet et al. Towards intelligent serious games: deep knowledge tracing with hybrid prediction models
CN108460453A (en) It is a kind of to be used for data processing method, the apparatus and system that CTC is trained
CN113837167A (en) Text image recognition method, device, equipment and storage medium
Kaptein et al. The affective storyteller: using character emotion to influence narrative generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191108

WD01 Invention patent application deemed withdrawn after publication