CN116049461B - Question conversion system based on big data cloud platform - Google Patents

Question conversion system based on big data cloud platform Download PDF

Info

Publication number
CN116049461B
CN116049461B CN202310321956.0A CN202310321956A CN116049461B CN 116049461 B CN116049461 B CN 116049461B CN 202310321956 A CN202310321956 A CN 202310321956A CN 116049461 B CN116049461 B CN 116049461B
Authority
CN
China
Prior art keywords
text
outline
database
character
outlines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310321956.0A
Other languages
Chinese (zh)
Other versions
CN116049461A (en
Inventor
祁建春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ideological World Education Technology Co ltd
Original Assignee
Beijing Ideological World Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ideological World Education Technology Co ltd filed Critical Beijing Ideological World Education Technology Co ltd
Priority to CN202310321956.0A priority Critical patent/CN116049461B/en
Publication of CN116049461A publication Critical patent/CN116049461A/en
Application granted granted Critical
Publication of CN116049461B publication Critical patent/CN116049461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)

Abstract

The invention relates to the field of graphic data identification, in particular to a question conversion system based on a big data cloud platform.

Description

Question conversion system based on big data cloud platform
Technical Field
The invention relates to the field of graphic data identification, in particular to a topic conversion system based on a big data cloud platform.
Background
With the development of computer technology, online education technology is gradually popularized, various online education cloud platforms are developed, and the existing online education cloud platform is attached with various functions, such as online teaching, online operation correction, large data course customization and the like, so that the online education cloud platform is popular for users, wherein the online operation correction needs to manually input answers or perform text recognition on paper operation, and therefore, the efficiency and the accuracy of text recognition are very important.
Chinese patent publication No.: CN111814798A discloses a topic digital processing method, comprising: collecting a theme picture; detecting the text line of the theme picture, and identifying the text of the text line detection result to obtain a text line; carrying out formula symbol detection on the title picture, and processing a formula symbol detection result to obtain a formula block; carrying out graph-text chart identification on the text lines and the formula blocks; and sequencing and outputting the text rows, the formula blocks and the graphic chart identification results according to the row relation to obtain the topic digital result. The topic digital processing method provided by the invention has the advantages of strong anti-interference capability for identifying the topic picture, higher accuracy of topic digital processing, high processing speed, convenience in use, improved learning efficiency, saving of a large amount of time for teachers and high customer experience.
It can be seen that the following problems are present in the prior art,
in the prior art, the font difference of the text to be recognized is not considered in text recognition, and a database which is compared in text recognition is not adjusted based on the font difference, so that the efficiency and accuracy of text recognition are improved.
Disclosure of Invention
In order to solve the above problems, the present invention provides a topic transformation system based on a big data cloud platform, which comprises:
the database module comprises a plurality of databases and a plurality of exclusive databases, wherein each database corresponds to different font types and is used for storing sample text outlines corresponding to the font types, and each exclusive database is used for storing sample text outlines exclusive to a user side;
the data interaction module is used for acquiring the topic text pictures uploaded to the cloud platform by the user side;
the data processing module comprises an image analysis unit, a first comparison unit, a second comparison unit and a database construction unit which are connected with each other,
the image analysis unit is connected with the data interaction module and is used for extracting text outlines in the topic text pictures, randomly screening the text outlines, determining font types to which the screened text outlines belong, determining the duty ratio of the font types, and judging an optimal database based on the font type with the highest duty ratio;
the first comparison unit is connected with the database module and is used for acquiring a plurality of character outlines extracted by the image analysis unit, comparing each character outline with a proprietary database of the user side and each character outline in the optimal database, and identifying character contents represented by each character outline according to comparison results;
the second comparison unit is connected with the database module and is used for acquiring the text outline of the text content which cannot be identified by the first comparison unit, sorting the similarity of the font data of each database and the optimal database in a descending order, selecting the databases one by one based on the sorting result, comparing the text outline of each text content which cannot be identified with the sample text outline in the selected databases, and identifying the text content represented by each text outline according to the comparison result;
the database construction unit is connected with the database module and the data interaction module, and is used for acquiring the text outlines which cannot be identified by the second comparison unit, sending the text outlines to a user side through the data interaction module, determining text contents represented by the text outlines, and after the user side confirms, storing the text outlines as sample text outlines into a dedicated database of the user side.
Further, a proportion interval [20%,40% ] is arranged in the image analysis unit, and the proportion of the text outline screened out by the image analysis unit during random screening to the total text outline should belong to the proportion interval [20%,40% ].
Further, the image analysis unit calculates the similarity between the screened text outline and the sample text outline in each database, determines the sample text outline with the highest similarity, determines the database to which the sample text outline belongs, and determines the font type corresponding to the database as the font type to which the screened text outline belongs.
Further, the image analysis unit determines the font type to which the filtered text outline belongs, calculates the duty ratio P of each font type according to the formula (1),
Figure SMS_1
(1)
in the formula (1), ni represents the number of the screened text outlines belonging to the i-th font type, N0 represents the total amount of the screened text outlines, and i is an integer greater than 0.
Further, the image analysis unit determines the font type with the highest duty ratio, and determines the database corresponding to the font type with the highest duty ratio as the optimal database.
Further, the first comparison unit compares each text outline with each text outline in the exclusive database of the user side and the optimal database, and identifies text content represented by each text outline according to the comparison result,
the first comparison unit compares the character outline with various text character outlines to calculate the coincidence degree of the character outline and the sample character outline, screens out the sample character outline with the highest coincidence degree, and if the highest coincidence degree corresponding to the sample character outline is higher than a preset first coincidence degree comparison threshold value, the first comparison unit recognizes that the character content represented by the character outline is identical with the character content represented by the sample character outline.
Further, the second comparison unit pre-stores the font data similarity E0 of any two databases, the font data similarity E0 is calculated according to the formula (2),
Figure SMS_2
(2)
in the formula (2), N represents the number of sample text outlines in the databases, ei represents the similarity between the ith sample text outline in the first database and the ith sample text outline in the second database in the two databases.
Further, the second comparison unit does not compare the text outline with the sample text outline in the rest database after identifying the text content represented by the text outline.
Further, the second comparison unit compares the text outline of each unrecognizable text content with the sample text outline in the selected database, and recognizes the text content represented by each text outline according to the comparison result,
and the second comparison unit compares the character outline with various text character outlines to calculate the coincidence ratio of the character outline and the sample character outline, screens out the sample character outline with the highest coincidence ratio, and if the highest coincidence ratio corresponding to the sample character outline is higher than a preset second coincidence ratio comparison threshold value, the second comparison unit recognizes that the character content represented by the character outline is the same as the character content represented by the sample character outline.
Further, the second overlap ratio comparison threshold is smaller than the first overlap ratio comparison threshold.
Compared with the prior art, the method comprises the steps that a database module, a data interaction module and a data processing module are arranged, the database module comprises a plurality of databases and is used for storing sample text outlines of different font types, the data interaction module receives a question text picture uploaded by a user side, an image analysis unit of the data processing module is used for judging an optimal database based on the font types of partial text outlines in the question text picture, a first comparison unit is used for identifying text contents represented by the text outlines in the question text picture based on the optimal database and a special database of the user side, a second comparison unit is used for replacing the database for identifying the text contents which cannot be identified by the first comparison unit, a database construction unit is used for storing the text outlines in the special database of the user side after confirming the text contents represented by the text outlines which cannot be identified by the first comparison unit and the second comparison unit, the optimal database is used for determining the fonts of the text outlines in the question text picture based on the process, when the text outlines cannot be identified by the optimal database, the database is replaced with the highest data similarity of the optimal database, further, the text outline identification efficiency is improved, and the text outline identification efficiency cannot be further improved when the text outlines cannot be identified by the user side, and the text outline identification rate cannot be further improved.
In particular, the image analysis unit performs random screening on the character outline, determines an optimal database based on the font type to which the screened character outline belongs, limits the proportion of the character outline to the total quantity of the character outline during random screening, characterizes the whole data through the randomly screened data, and simultaneously avoids data operation load caused by excessive screened data.
In particular, before the character outline is identified, the image analysis unit determines the optimal database, in the actual situation, the fonts corresponding to the character outline in the question picture uploaded by each user side have differences, so that the method selects the database corresponding to the font type based on the font type with the highest proportion in part of the character outlines as the optimal database, further reduces the influence of the font differences on character outline identification, and compares the character outlines with the data in the optimal database through the first comparison unit, thereby improving the efficiency and accuracy of text outline identification.
Particularly, the second comparison unit of the invention identifies the text outline replacement databases of which the text content cannot be identified by the first comparison unit, and the databases are replaced based on the similarity of the font data of each database and the current optimal database, and the databases with high similarity with the optimal database are preferably selected as data comparison basis, so that the efficiency and accuracy of text outline identification are improved.
In particular, the first comparison unit and the second comparison unit have different coincidence ratio comparison thresholds, and the second comparison unit recognizes the character outline which cannot be recognized by the first comparison unit, so that the lower coincidence ratio comparison threshold is selected, and the recognition probability of the text outline is improved on the premise of ensuring the reliability.
In particular, the invention also provides a database construction unit which is used for sending the character outlines which cannot be identified by the first comparison unit and the second comparison unit to the user side for confirmation, and storing the character outlines after confirmation into a special database of the user side for subsequent character outline identification, wherein in actual situations, due to writing differences, some character outlines with difficult identification exist, and the identification probability of the character outlines can be effectively improved through the process.
Drawings
FIG. 1 is a schematic diagram of a topic conversion system based on a big data cloud platform according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data processing module according to an embodiment of the invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1 and 2, fig. 1 is a schematic diagram of a topic conversion system based on a big data cloud platform according to the present embodiment, and fig. 2 is a schematic diagram of a data processing module according to the present embodiment, where the topic conversion system based on the big data cloud platform of the present invention includes:
the database module comprises a plurality of databases and a plurality of exclusive databases, wherein each database corresponds to different font types and is used for storing sample text outlines corresponding to the font types, and each exclusive database is used for storing sample text outlines exclusive to a user side;
the data interaction module is used for acquiring the topic text pictures uploaded to the cloud platform by the user side;
the data processing module comprises an image analysis unit, a first comparison unit, a second comparison unit and a database construction unit which are connected with each other,
the image analysis unit is connected with the data interaction module and is used for extracting text outlines in the topic text pictures, randomly screening the text outlines, determining font types to which the screened text outlines belong, determining the duty ratio of the font types, and judging an optimal database based on the font type with the highest duty ratio;
the first comparison unit is connected with the database module and is used for acquiring a plurality of character outlines extracted by the image analysis unit, comparing each character outline with a proprietary database of the user side and each character outline in the optimal database, and identifying character contents represented by each character outline according to comparison results;
the second comparison unit is connected with the database module and is used for acquiring the text outline of the text content which cannot be identified by the first comparison unit, sorting the similarity of the font data of each database and the optimal database in a descending order, selecting the databases one by one based on the sorting result, comparing the text outline of each text content which cannot be identified with the sample text outline in the selected databases, and identifying the text content represented by each text outline according to the comparison result;
the database construction unit is connected with the database module and the data interaction module, and is used for acquiring the text outlines which cannot be identified by the second comparison unit, sending the text outlines to a user side through the data interaction module, determining text contents represented by the text outlines, and after the user side confirms, storing the text outlines as sample text outlines into a dedicated database of the user side.
Specifically, the method for acquiring the text outline by the image analysis unit is not limited, and can be implemented by an existing OCR engine, and the method for acquiring the text outline can be acquired by segmenting an image, which is similar to the method for acquiring the text outline by segmenting in the prior art, and will not be described here again.
Specifically, the specific structure of the data processing module is not limited in this embodiment, and each unit may be configured using a logic unit, and the logic unit may be a field programmable logic unit, a microprocessor, a processor used in a computer, or the like.
Specifically, the specific structure of the database module is not limited in this embodiment, and the database is in a common data storage form, which is a mature prior art and will not be described herein.
Specifically, the specific structure of the data interaction module is not limited in this embodiment, and only needs to be connected with the cloud platform, which is a mature prior art and will not be described herein.
Specifically, the specific algorithm of the similarity is not limited in this embodiment, and algorithms commonly used in the text character recognition field are cosine similarity algorithm and euclidean distance similarity algorithm, and those skilled in the art can select the corresponding similarity algorithm according to specific needs to calculate the similarity between the text outline and the sample text outline, which is not described herein in detail in the prior art.
Specifically, the specific form of the cloud platform is not limited, and only the data uploaded by each user side needs to be received, which is the prior art and is not described herein.
Specifically, a proportion interval [20%,40% ] is arranged in the image analysis unit, and the proportion of the text outline screened out by the image analysis unit during random screening to the total text outline should belong to the proportion interval [20%,40% ] so as to ensure that the screened out data has characterization to the whole data, and meanwhile, avoid that the screened out data is too much to influence the data operation speed.
Specifically, the sample text outline data stored in each database in the database module may be obtained from an open source dictionary database, or may be obtained by constructing sample text outline data of different font types in advance by a person skilled in the art, and the constructed sample text outline of different font types may be obtained by crawling a large data crawler program, or may be obtained by other realizable methods.
Specifically, the image analysis unit calculates the similarity between the screened text outline and the sample text outline in each database, determines the sample text outline with the highest similarity, determines the database to which the sample text outline belongs, and determines the font type corresponding to the database as the font type to which the screened text outline belongs.
Specifically, the image analysis unit determines the font type to which the character outline has been screened, calculates the duty ratio P of each font type according to the formula (1),
Figure SMS_3
(1)
in the formula (1), ni represents the number of the screened text outlines belonging to the ith font type, N0 represents the total amount of the screened text outlines, and i is an integer greater than 0;
the image analysis unit determines the font type with the highest duty ratio, and judges the database corresponding to the font type with the highest duty ratio as the optimal database.
Specifically, the image analysis unit determines the optimal database before recognizing the text outline, and in actual conditions, fonts corresponding to the text outline in the question picture uploaded by each user side have differences, so that the method selects the database corresponding to the font type based on the font type with the highest proportion in part of the text outlines as the optimal database, further reduces the influence of the font differences on the text outline recognition, and compares the text outline with data in the optimal database through the first comparison unit, thereby improving the efficiency and accuracy of text outline recognition.
Specifically, the first comparison unit compares each text outline with each text outline in the exclusive database of the user side and the optimal database, and identifies text content represented by each text outline according to the comparison result,
the first comparison unit compares the character outline with various text character outlines to calculate the coincidence degree of the character outline and the sample character outline, screens out the sample character outline with the highest coincidence degree, and if the highest coincidence degree corresponding to the sample character outline is higher than a preset first coincidence degree comparison threshold value, the first comparison unit recognizes that the character content represented by the character outline is identical with the character content represented by the sample character outline.
Specifically, the second comparison unit pre-stores the font data similarity E0 of any two databases, the font data similarity E0 is calculated according to the formula (2),
Figure SMS_4
(2)
in the formula (2), N represents the number of sample text outlines in the databases, ei represents the similarity between the ith sample text outline in the first database and the ith sample text outline in the second database in the two databases.
Specifically, the second comparison unit identifies the text outline replacement databases of which the text content cannot be identified by the first comparison unit, and the databases are replaced based on the similarity of the font data of each database and the current optimal database, and the databases with high similarity with the optimal database are preferably selected as data comparison bases, so that the efficiency and the accuracy of text outline identification are improved.
Specifically, the second comparison unit does not compare the text outline with the sample text outline in the rest database after identifying the text content represented by the text outline.
Specifically, the second comparison unit compares the text outline with each text outline to calculate the coincidence ratio of the text outline and the sample text outline, and screens out the sample text outline with the highest coincidence ratio, and if the highest coincidence ratio corresponding to the sample text outline is higher than a preset second coincidence ratio comparison threshold, the second comparison unit identifies that the text content represented by the text outline is the same as the text content represented by the sample text outline.
Specifically, the second contact ratio comparison threshold is smaller than the first contact ratio comparison threshold.
Specifically, when the first and second coincidence level comparison thresholds are determined, a plurality of topic text pictures with the text contour amount of 10000 are selected, the topic text pictures are processed through an image analysis unit, the text contours obtained through the image analysis unit are obtained, the text contours of the samples in the optimal database are compared with each other, so as to calculate the coincidence level of the text contours and the sample text contours, the sample text contour with the highest coincidence level is screened out, the text contour with the highest coincidence level is identified, whether the identification result is accurate or not is determined after the text contour with the highest coincidence level is identified, the text contour with the highest coincidence level is screened out, the highest coincidence level of the text contours in the optimal database is recorded, the random variable probability density function is used as a random variable, a normal distribution curve is calculated, the 95% coincidence level curve is calculated according to the probability density function, the confidence level interval is calculated, the 95% coincidence level of the corresponding confidence interval is calculated as a second coincidence level interval, the maximum coincidence level is calculated, the confidence interval is calculated, the 95% coincidence level is calculated, and the threshold value is calculated.
Specifically, the first comparison unit and the second comparison unit have different coincidence ratio comparison thresholds, and the second comparison unit recognizes the character outline which cannot be recognized by the first comparison unit, so that the lower coincidence ratio comparison threshold is selected, and the recognition probability of the text outline is improved on the premise of ensuring the reliability.
Specifically, the invention also provides a database construction unit which is used for sending the character outlines which cannot be identified by the first comparison unit and the second comparison unit to the user side for confirmation, and storing the character outlines after confirmation into a dedicated database of the user side for subsequent character outline identification, wherein in actual situations, due to writing differences, some character outlines with difficult identification exist, and the identification probability of the character outlines can be effectively improved through the process.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (10)

1. The title conversion system based on big data cloud platform is characterized by comprising:
the database module comprises a plurality of databases and a plurality of exclusive databases, wherein each database corresponds to different font types and is used for storing sample text outlines corresponding to the font types, and each exclusive database is used for storing sample text outlines exclusive to a user side;
the data interaction module is used for acquiring the topic text pictures uploaded to the cloud platform by the user side;
the data processing module comprises an image analysis unit, a first comparison unit, a second comparison unit and a database construction unit which are connected with each other,
the image analysis unit is connected with the data interaction module and is used for extracting text outlines in the topic text pictures, randomly screening the text outlines, determining font types to which the screened text outlines belong, determining the duty ratio of the font types, and judging an optimal database based on the font type with the highest duty ratio;
the first comparison unit is connected with the database module and is used for acquiring a plurality of character outlines extracted by the image analysis unit, comparing each character outline with a proprietary database of the user side and each character outline in the optimal database, and identifying character contents represented by each character outline according to comparison results;
the second comparison unit is connected with the database module and is used for acquiring the text outline of the text content which cannot be identified by the first comparison unit, sorting the similarity of the font data of each database and the optimal database in a descending order, selecting the databases one by one based on the sorting result, comparing the text outline of each text content which cannot be identified with the sample text outline in the selected databases, and identifying the text content represented by each text outline according to the comparison result;
the database construction unit is connected with the database module and the data interaction module, and is used for acquiring the text outlines which cannot be identified by the second comparison unit, sending the text outlines to a user side through the data interaction module, determining text contents represented by the text outlines, and after the user side confirms, storing the text outlines as sample text outlines into a dedicated database of the user side.
2. The topic transformation system based on the big data cloud platform of claim 1, wherein a proportion interval [20%,40% ] is arranged in the image analysis unit, and the proportion of the text outline screened out by the image analysis unit during random screening to the total text outline shall belong to the proportion interval [20%,40% ].
3. The topic transformation system based on a big data cloud platform of claim 1, wherein the image analysis unit calculates the similarity between the screened text outline and the sample text outline in each database, determines the sample text outline with the highest similarity, determines the database to which the sample text outline belongs, and determines the font type corresponding to the database as the font type to which the screened text outline belongs.
4. The topic conversion system based on a big data cloud platform as claimed in claim 3, wherein said image parsing unit determines font types to which the filtered text outlines belong, calculates a duty ratio P of each font type according to formula (1),
Figure QLYQS_1
(1)
in the formula (1), ni represents the number of the screened text outlines belonging to the i-th font type, N0 represents the total amount of the screened text outlines, and i is an integer greater than 0.
5. The topic conversion system based on a big data cloud platform of claim 4, wherein said image parsing unit determines a font type with a highest duty ratio, and determines a database corresponding to the font type with the highest duty ratio as an optimal database.
6. The topic transformation system based on a big data cloud platform of claim 1, wherein said first comparison unit compares each of said text outlines with a dedicated database of said user side and each of said text outlines in said optimal database, and identifies text content represented by each of said text outlines according to the comparison result, wherein,
the first comparison unit compares the character outline with various text character outlines to calculate the coincidence degree of the character outline and the sample character outline, screens out the sample character outline with the highest coincidence degree, and if the highest coincidence degree corresponding to the sample character outline is higher than a preset first coincidence degree comparison threshold value, the first comparison unit recognizes that the character content represented by the character outline is identical with the character content represented by the sample character outline.
7. The topic transformation system based on a big data cloud platform of claim 6, wherein font data similarity E0 of any two databases is prestored in the second comparison unit, the font data similarity E0 is calculated according to formula (2),
Figure QLYQS_2
(2)
in the formula (2), N represents the number of sample text outlines in the databases, ei represents the similarity between the ith sample text outline in the first database and the ith sample text outline in the second database in the two databases.
8. The big data cloud platform-based topic conversion system of claim 7, wherein the second comparison unit does not compare the text outline with sample text outlines in the remaining database after identifying text content represented by the text outline.
9. The topic conversion system based on a big data cloud platform of claim 8, wherein said second comparison unit compares each text outline of said unidentifiable text content with a sample text outline in a selected database, identifies text content represented by each said text outline based on the comparison result, wherein,
and the second comparison unit compares the character outline with various text character outlines to calculate the coincidence ratio of the character outline and the sample character outline, screens out the sample character outline with the highest coincidence ratio, and if the highest coincidence ratio corresponding to the sample character outline is higher than a preset second coincidence ratio comparison threshold value, the second comparison unit recognizes that the character content represented by the character outline is the same as the character content represented by the sample character outline.
10. The big data cloud platform based topic conversion system of claim 9, wherein the second contact ratio comparison threshold is less than the first contact ratio comparison threshold.
CN202310321956.0A 2023-03-29 2023-03-29 Question conversion system based on big data cloud platform Active CN116049461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310321956.0A CN116049461B (en) 2023-03-29 2023-03-29 Question conversion system based on big data cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310321956.0A CN116049461B (en) 2023-03-29 2023-03-29 Question conversion system based on big data cloud platform

Publications (2)

Publication Number Publication Date
CN116049461A CN116049461A (en) 2023-05-02
CN116049461B true CN116049461B (en) 2023-05-30

Family

ID=86125877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310321956.0A Active CN116049461B (en) 2023-03-29 2023-03-29 Question conversion system based on big data cloud platform

Country Status (1)

Country Link
CN (1) CN116049461B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758551A (en) * 2023-07-03 2023-09-15 读书郎教育科技有限公司 OCR character recognition method applied to dictionary pen

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5253307A (en) * 1991-07-30 1993-10-12 Xerox Corporation Image analysis to obtain typeface information
CN103400127A (en) * 2013-08-05 2013-11-20 苏州鼎富软件科技有限公司 Picture and text identifying method
WO2015183015A1 (en) * 2014-05-30 2015-12-03 삼성에스디에스 주식회사 Character recognition method and apparatus therefor
CN106570538A (en) * 2015-10-10 2017-04-19 北大方正集团有限公司 Character picture processing method and apparatus thereof
CN109784146A (en) * 2018-12-05 2019-05-21 广州企图腾科技有限公司 A kind of font type recognition methods, electronic equipment, storage medium
CN110197238A (en) * 2019-04-15 2019-09-03 广州企图腾科技有限公司 A kind of recognition methods, system and the terminal device of font classification
CN112784932A (en) * 2021-03-01 2021-05-11 北京百炼智能科技有限公司 Font identification method and device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5253307A (en) * 1991-07-30 1993-10-12 Xerox Corporation Image analysis to obtain typeface information
CN103400127A (en) * 2013-08-05 2013-11-20 苏州鼎富软件科技有限公司 Picture and text identifying method
WO2015183015A1 (en) * 2014-05-30 2015-12-03 삼성에스디에스 주식회사 Character recognition method and apparatus therefor
CN106570538A (en) * 2015-10-10 2017-04-19 北大方正集团有限公司 Character picture processing method and apparatus thereof
CN109784146A (en) * 2018-12-05 2019-05-21 广州企图腾科技有限公司 A kind of font type recognition methods, electronic equipment, storage medium
CN110197238A (en) * 2019-04-15 2019-09-03 广州企图腾科技有限公司 A kind of recognition methods, system and the terminal device of font classification
CN112784932A (en) * 2021-03-01 2021-05-11 北京百炼智能科技有限公司 Font identification method and device and storage medium

Also Published As

Publication number Publication date
CN116049461A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
US6970601B1 (en) Form search apparatus and method
CN106951832B (en) Verification method and device based on handwritten character recognition
US7020338B1 (en) Method of identifying script of line of text
CN109522816A (en) Table recognition method and device, computer storage medium
CN116049461B (en) Question conversion system based on big data cloud platform
CN110543810A (en) Technology for completely identifying header and footer of PDF (Portable document Format) file
JPH0520500A (en) Document recognizing device
CN116206319B (en) Data processing system for clinical trials
CN111340031A (en) Equipment almanac target information extraction and identification system based on image identification and method thereof
Ball et al. Writer verification of historical documents among cohort writers
CN115880708A (en) Method for detecting character paragraph spacing compliance in APP (application) aging-adapted mode
JPH07160822A (en) Pattern recognizing method
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
JP2004171316A (en) Ocr device, document retrieval system and document retrieval program
JPH0749926A (en) Character recognizing device
CN116563869B (en) Page image word processing method and device, terminal equipment and readable storage medium
JPH0981684A (en) Pattern recognition device and character segmentation system
JPH09282417A (en) Character recognition device
JP2671533B2 (en) Character string recognition method and apparatus thereof
CN113128503A (en) System and method for batch acquisition of network equipment parameters
CN117746451A (en) Text information extraction method, apparatus, readable storage medium, and computer program product
CN115100672A (en) Character detection and identification method, device and equipment and computer readable storage medium
CN117556406A (en) Recognition model training method and picture verification code recognition method
CN114242077A (en) Voiceprint data labeling method based on voiceprint model
CN115841670A (en) Operation error question collecting system based on image recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant