CN116206319A - Data processing system for clinical trials - Google Patents

Data processing system for clinical trials Download PDF

Info

Publication number
CN116206319A
CN116206319A CN202310142830.7A CN202310142830A CN116206319A CN 116206319 A CN116206319 A CN 116206319A CN 202310142830 A CN202310142830 A CN 202310142830A CN 116206319 A CN116206319 A CN 116206319A
Authority
CN
China
Prior art keywords
font
text
character
outline
outlines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310142830.7A
Other languages
Chinese (zh)
Other versions
CN116206319B (en
Inventor
陈筱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongxing Zhengyuan Technology Co ltd
Original Assignee
Beijing Zhongxing Zhengyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongxing Zhengyuan Technology Co ltd filed Critical Beijing Zhongxing Zhengyuan Technology Co ltd
Priority to CN202310142830.7A priority Critical patent/CN116206319B/en
Publication of CN116206319A publication Critical patent/CN116206319A/en
Application granted granted Critical
Publication of CN116206319B publication Critical patent/CN116206319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a data processing system for clinical trials, which comprises a data storage module, a data acquisition module and a data processing module, wherein the data processing module analyzes the font similarity condition of a handwritten test record text based on the proportion analysis of character outlines with the same font type in a character outline set, under the first font similarity condition, all character outlines are compared with preset character outlines stored in a randomly selected font database to judge characters represented by the character outlines, under the second font similarity condition, a font database corresponding to the font type with the highest proportion in the character outline set is selected, characters represented by the character outlines are judged one by determining the coincidence ratio of the character outlines in all the character outlines and the preset character outlines stored in the selected font database, and the efficiency and the accuracy of identifying the handwritten test record text are improved on the premise of ensuring the reliability.

Description

Data processing system for clinical trials
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing system for clinical trials.
Background
Clinical trials require recording of clinical data related to the trial during participation of the patient as a subject in the clinical trial, the effect of identifying the clinical data directly affects the reliability of the trial data, has an important impact on the determination of efficacy and safety of the trial drug, and the speed of identifying the clinical data directly relates to the efficiency of the data entry effort.
Chinese patent publication No.: CN109102844a, the invention discloses a method for automatically checking clinical test source data, comprising the following steps: identifying the acquired source data image of the clinical test by using a CTPN network model, determining a text region, then cutting the text region, and cutting out each line of text; performing vertical projection column cutting on each line of cut texts to obtain an effective text region of each line of texts; sequentially inputting the set of the effective text areas into a trained CRNN network to obtain a variable-length sequence recognition result, and extracting the text recognition result by using a regular expression; correcting the text recognition result to obtain a correction result; extracting characteristic values from the error correction result one by one according to the characteristic value set, comparing the characteristic values with standard characteristic values recorded in a database, and marking an alarm state for the extracted characteristic values which are not consistent with the standard characteristic values to form an error reminder. The invention takes CPTN and CRNN as cores to carry out the character recognition of the clinical test source data image, thereby realizing the automatic data verification.
However, the prior art has the following problems:
in the prior art, the difference of fonts in the handwritten text is not considered to influence the accuracy of text recognition, and the comparison database provided with multiple fonts is not considered to carry out comparison to determine the fonts of the handwritten text.
Disclosure of Invention
In order to solve the problem that the difference of fonts in a handwritten text can affect the accuracy of text recognition in the prior art, and the comparison database provided with a plurality of fonts is not considered for comparison and determination of the fonts of the handwritten text, the invention provides a data processing system for clinical trials, which comprises:
the data storage module comprises a plurality of font databases and is used for storing a plurality of preset character outlines corresponding to font types;
the data acquisition module comprises an image acquisition unit for shooting a handwriting test record text to acquire an image;
the data processing module comprises an image analysis unit, a first operation unit and a second operation unit which are mutually connected, wherein the image analysis unit, the first operation unit and the second operation unit are all connected with the data acquisition module and the data storage module,
the image analysis unit is used for acquiring an image shot by the image acquisition unit, extracting character outlines of preset lines from the image to obtain a character outline set, comparing each character outline in the character outline set with data in each font database, judging the font type of each character outline according to a comparison result, and analyzing the font similarity condition of the handwriting test record text based on the proportional analysis of the character outlines with the same font type in the character outline set;
the first operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires the first font similarity, comparing the character outlines with preset character outlines stored in a randomly selected font database one by one, calculating the overlap ratio, and judging characters represented by the character outlines based on the overlap ratio;
the second operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires second font similarity, selecting a font database corresponding to the font type with the highest proportion in the character outline set, determining the coincidence degree of each character outline and a preset character outline stored in the selected font database one by one, and judging characters represented by each character outline based on the coincidence degree.
Further, the image analysis unit compares each text outline in the text outline set with a preset text outline in each font database to calculate the coincidence degree C of the text outline and the preset text outline, screens out a maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset maximum coincidence degree comparison threshold Cm0, and judges the font type of the text outline according to the comparison result,
the image analysis unit determines a font database to be used when calculating the maximum overlap Cm,
under a first coincidence degree comparison result, the image analysis unit judges that the character outline belongs to the font type corresponding to the font database;
under the second coincidence degree comparison result, the image analysis unit judges that the character outline does not belong to the font type corresponding to the font database;
wherein the first coincidence degree comparison result is Cm not less than Cm0, and the second coincidence degree comparison result is Cm less than Cm0.
Further, the image analysis unit calculates the text outline quantity ratio P of each font type in the text outline set according to the formula (1),
Figure BDA0004088205130000031
in the formula (1), N represents the number of character outlines belonging to the same font type, and N represents the number of character outlines in the character outline set.
Further, the image analysis unit screens the calculated text outline quantity ratio of each font type to screen out the maximum quantity ratio P M, The maximum number is taken up by the ratio P M Comparing the text with a preset duty ratio comparison threshold P0, and analyzing and judging the similar font condition of the handwriting test record text according to the comparison result, wherein,
if the comparison result meets a first duty ratio condition, the image analysis unit judges that the handwriting test record text is in a first font similarity condition;
if the comparison result meets a second duty ratio condition, the image analysis unit judges that the handwriting test record text is in a second font similarity condition;
wherein the first duty ratio condition is P M < P0, said second duty cycle condition being P M ≥P0。
Further, the first operation unit or the second operation unit compares the character outline with a plurality of preset character outlines stored in the selected font database one by one to calculate the coincidence degree C of the character outline and the preset character outline, screens out the maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset standard coincidence degree comparison threshold C0, judges the characters represented by the character outline according to the comparison result, wherein C0 is more than Cm0,
under a third coincidence degree comparison result, the first operation unit or the second operation unit judges that the text outline is the same as the text associated with the preset text outline;
under a fourth coincidence degree comparison result, the first operation unit or the second operation unit judges that the character outline cannot identify the represented character;
the third coincidence degree comparison result is Cm > C0, and the fourth coincidence degree comparison result is Cm less than or equal to C0.
Further, the data storage module further comprises a database parsing unit for determining the similarity between the font databases according to the coincidence degree of the preset text outlines stored in the font databases, wherein,
the database analysis unit selects any two font databases, calls preset font outlines from the two font databases one by one for comparison to determine the coincidence degree of the called preset font outlines, calculates the similarity S between the selected font databases according to a formula (2),
Figure BDA0004088205130000032
in the formula (2), ci represents the coincidence degree between the two preset character outlines selected at the ith time, N z And representing the number of preset character outlines in the font database.
Further, the second operation unit obtains font outlines of the characters which cannot be identified, reselects the font database based on the similarity among the font databases, determines the coincidence degree of each of the character outlines and the preset character outlines stored in the reselected font database one by one, and determines the characters represented by each of the character outlines again based on the coincidence degree.
Further, the second operation unit re-selects the font database based on the similarity between the font databases, wherein,
the second operation unit determines a font database which is called when the characters represented by the character outlines are judged, determines a font database which is most similar to the called font database according to the similarity, and determines the font database as a font database which needs to be selected again.
Further, the data processing module further includes a record integration unit, which is connected with the first operation unit, the second operation unit and the data storage module, and is configured to record the characters represented by the determined character outline one by one according to the sequence of the character outline in the image to generate an integrated text of the handwriting test record text, and store the integrated text in the data storage module.
Further, the record integrating unit judges whether to replace the text represented by the text outline with O according to the text outline, wherein,
under the preset condition, the record integrating unit judges that O is used for replacing characters represented by the character outline when the characters are recorded;
the preset condition is that the first operation unit and the second operation unit cannot judge the characters to which the character outlines belong.
Compared with the prior art, the invention has the advantages that the data storage module, the data acquisition module and the data processing module are arranged, the data processing module analyzes the font similarity condition of the handwriting test record text based on the proportion analysis of the character outlines with the same font type in the character outline set, under the first font similarity condition, all the character outlines are compared with the preset character outlines stored in the randomly selected font database one by one to judge the characters represented by the character outlines, under the second font similarity condition, the font database corresponding to the font type with the highest proportion in the character outline set is selected, and the coincidence degree of each character outline in all the character outlines and the preset character outline stored in the selected font database is determined one by one to judge the characters represented by each character outline, so that the recognition efficiency and effect of the handwriting test record text of different fonts are improved.
In particular, in the invention, the image analysis unit compares each text outline in the text outline set with the data in each font database to judge the font type to which each text outline belongs according to the comparison result, in the practical situation, the degree of coincidence characterizes the similarity degree of the text outline and the preset text outline, the higher the similarity degree is, the greater the possibility that the text outline and the preset text outline are identical is, the text outline is compared with a plurality of preset text outlines in the font database, the degree of coincidence between the text outline and each preset text outline is obtained, the preset text outline corresponding to the largest value of the degree of coincidence is the preset text outline with the highest degree of similarity with the text outline in the font database, the font type to which the text outline belongs is scientifically determined, and the accuracy of identifying the text recorded by the subsequent handwriting test is ensured.
In particular, in the invention, the image analysis unit analyzes the similar status of the fonts of the handwriting test record text based on the proportion analysis of the text outlines with the same font types in the text outline set, in the practical situation, the ratio is calculated by the ratio of the number of the text outlines with the same font types to the number of the text outlines in the text outline set, the proportion of the number of the text outlines with various font types in the text outline set is represented, the larger the numerical value of the proportion is the higher the possibility that the text outlines in the handwriting test record text with the text outlines set belong to the font types, if the numerical value of the ratio corresponding to all the font types is lower, the font types of the font outlines in the handwriting test record text cannot be confirmed, and the font types of the font outlines in the handwriting test record text can not be confirmed.
In particular, in the invention, the first operation unit extracts all character outlines in the image under the first font similarity state, compares the character outlines with preset character outlines stored in a randomly selected font database one by one, calculates the coincidence degree, judges characters represented by the character outlines based on the coincidence degree, and in the first font similarity state with lower value of the duty ratio corresponding to all the font types, the character types of the character outlines in the handwriting test record text cannot be confirmed, so that all the character outlines in the image are compared with the randomly selected font database one by one, the characters represented by the character outlines are determined, the reliability of character outline identification is ensured, and the identification effect of the handwriting test record text is ensured.
In particular, in the invention, the second operation unit extracts all character outlines in the image under the second font similar state, selects the font database corresponding to the font type with the highest concentrated ratio of the character outlines, determines the coincidence ratio of each character outline to the preset character outline stored in the selected font database one by one, preferentially selects the font database closest to the font of the handwriting test record text in the data comparison, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability.
In particular, the second operation unit extracts the font outline of the unrecognized represented text, reselects the font database for comparison, reselects the font database to be the font database with the highest similarity with the called font database, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability.
Drawings
FIG. 1 is a schematic diagram of a data processing system for clinical trials according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a data storage module according to an embodiment of the invention;
fig. 3 is a schematic diagram of a data processing module according to an embodiment of the invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, 2 and 3, which are schematic diagrams of a data processing system for clinical trials, a data storage module structure and a data processing module structure according to an embodiment of the present invention, the data processing system for clinical trials of the present invention includes:
the data storage module comprises a plurality of font databases and is used for storing a plurality of preset character outlines corresponding to font types;
the data acquisition module comprises an image acquisition unit for shooting a handwriting test record text to acquire an image;
the data processing module comprises an image analysis unit, a first operation unit and a second operation unit which are mutually connected, wherein the image analysis unit, the first operation unit and the second operation unit are all connected with the data acquisition module and the data storage module,
the image analysis unit is used for acquiring an image shot by the image acquisition unit, extracting character outlines of preset lines from the image to obtain a character outline set, comparing each character outline in the character outline set with data in each font database, judging the font type of each character outline according to a comparison result, and analyzing the font similarity condition of the handwriting test record text based on the proportional analysis of the character outlines with the same font type in the character outline set;
the first operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires the first font similarity, comparing the character outlines with preset character outlines stored in a randomly selected font database one by one, calculating the overlap ratio, and judging characters represented by the character outlines based on the overlap ratio;
the second operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires second font similarity, selecting a font database corresponding to the font type with the highest proportion in the character outline set, determining the coincidence degree of each character outline and a preset character outline stored in the selected font database one by one, and judging characters represented by each character outline based on the coincidence degree.
Specifically, the specific form of the data storage module is not limited, and the function of storing data can be completed only by the data storage module, which is not described in detail in the prior art.
Specifically, the specific form of the data processing module is not limited, and the data processing module can be an external computer, wherein each unit is different functional programs in the computer, and only the functions of data processing and data exchange can be completed, and the detailed description is omitted.
Specifically, the specific calculation mode of the contour coincidence degree is not limited, the calculation can be performed based on the similarity of the patterns, other forms can be also adopted, and the related algorithm model is already mature prior art and is not repeated here.
Specifically, the image analysis unit compares each text outline in the text outline set with a preset text outline in each font database to calculate the coincidence degree C of the text outline and the preset text outline, screens out a maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset maximum coincidence degree contrast threshold Cm0, cm0 is more than 0, and judges the font type of the text outline according to the comparison result,
the image analysis unit determines a font database to be used when calculating the maximum overlap Cm,
under a first coincidence degree comparison result, the image analysis unit judges that the character outline belongs to the font type corresponding to the font database;
under the second coincidence degree comparison result, the image analysis unit judges that the character outline does not belong to the font type corresponding to the font database;
wherein the first coincidence degree comparison result is Cm not less than Cm0, and the second coincidence degree comparison result is Cm less than Cm0.
Specifically, in the invention, the image analysis unit compares each text outline in the text outline set with the data in each font database to judge the font type of each text outline according to the comparison result, in the actual situation, the degree of coincidence characterizes the similarity degree of the text outline and the preset text outline, the higher the similarity degree is, the greater the possibility that the text outline and the preset text outline are identical outline is, the text outline is compared with a plurality of preset text outlines in the font database, the degree of coincidence of the text outline and each preset text outline is obtained, the preset text outline corresponding to the largest value of the degree of coincidence is the preset text outline with the highest degree of similarity with the text outline in the font database, the degree of similarity between the text outline and the font database is quantized reliably, and the font type corresponding to the font database with the degree of similarity of the text outline larger than the preset value is scientifically used as the font type of the text outline, so that the accuracy of the text of the subsequent handwriting test record is ensured.
Specifically, the image analysis unit calculates the text outline quantity ratio P of each font type in the text outline set according to the formula (1),
Figure BDA0004088205130000081
in the formula (1), N represents the number of character outlines belonging to the same font type, and N represents the number of character outlines in the character outline set.
Specifically, the image analysis unit screens the calculated text outline quantity ratio of each font type to screen out the maximum quantity ratio P M, The maximum number is taken up by the ratio P M Comparing the text with a preset duty ratio comparison threshold P0, wherein P0 is more than 0, and analyzing and judging the similar font condition of the handwriting test record text according to the comparison result,
if the comparison result meets a first duty ratio condition, the image analysis unit judges that the handwriting test record text is in a first font similarity condition;
if the comparison result meets a second duty ratio condition, the image analysis unit judges that the handwriting test record text is in a second font similarity condition;
wherein the first duty ratio condition is P M < P0, said second duty cycle condition being P M ≥P0。
Specifically, in the invention, the image analysis unit analyzes the similar status of the fonts of the handwriting test record text based on the proportion analysis of the text outlines with the same font types in the text outline set, in the practical situation, the ratio is calculated by the ratio of the number of the text outlines with the same font types to the number of the text outlines in the text outline set, the proportion of the number of the text outlines with each font type in the text outline set is represented, the larger the numerical value of the proportion is the higher the possibility that the text outline in the handwriting test record text with the text outline set belongs to the font type, if the numerical value of the ratio corresponding to all the font types is lower, the font type of the text outline in the handwriting test record text cannot be confirmed, and the font type of the text outline in the handwriting test record text can not be confirmed.
Specifically, the first operation unit or the second operation unit compares the character outline with a plurality of preset character outlines stored in the selected font database one by one to calculate the coincidence degree C of the character outline and the preset character outline, screens out the maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset standard coincidence degree comparison threshold C0, judges the characters represented by the character outline according to the comparison result, wherein C0 is more than Cm0 is more than 0,
under a third coincidence degree comparison result, the first operation unit or the second operation unit judges that the text outline is the same as the text associated with the preset text outline;
under a fourth coincidence degree comparison result, the first operation unit or the second operation unit judges that the character outline cannot identify the represented character;
the third coincidence degree comparison result is Cm > C0, and the fourth coincidence degree comparison result is Cm less than or equal to C0.
Specifically, in the invention, the first operation unit extracts all character outlines in an image under the first font similarity state, compares the character outlines with preset character outlines stored in a randomly selected font database one by one, calculates the coincidence degree, judges characters represented by the character outlines based on the coincidence degree, and in the first font similarity state with lower value of the duty ratio corresponding to all the font types, the character types of the character outlines in the handwriting test record text cannot be confirmed, so that all the character outlines in the image are compared with the randomly selected font database one by one, the characters represented by the character outlines are determined, the reliability of character outline identification is ensured, and the identification effect of the handwriting test record text is ensured.
Specifically, in the invention, the second operation unit extracts all character outlines in the image under the second font similar state, selects the font database corresponding to the font type with the highest concentrated ratio of the character outlines, determines the coincidence ratio of each character outline to the preset character outline stored in the selected font database one by one, preferentially selects the font database closest to the font of the handwriting test record text in the data comparison, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability.
Specifically, the data storage module further comprises a database parsing unit for determining the similarity between the font databases according to the coincidence degree of the preset text outlines stored in the font databases, wherein,
the database analysis unit selects any two font databases, calls preset font outlines from the two font databases one by one for comparison to determine the coincidence degree of the called preset font outlines, calculates the similarity S between the selected font databases according to a formula (2),
Figure BDA0004088205130000101
in the formula (2), ci represents the coincidence degree between the two preset character outlines selected at the ith time, N z And representing the number of preset character outlines in the font database.
Specifically, the second operation unit obtains font outlines of the characters which cannot be identified, reselects the font database based on the similarity among the font databases, determines the coincidence degree of each of the character outlines and the preset character outlines stored in the reselected font database one by one, and determines the characters represented by each of the character outlines again based on the coincidence degree.
Specifically, the second operation unit extracts the font outline of the unrecognized represented text, reselects the font database for comparison, reselects the font database to be the font database with the highest similarity with the called font database, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability
Specifically, the second operation unit re-selects the font database based on the similarity between the font databases, wherein,
the second operation unit determines a font database which is called when the characters represented by the character outlines are judged, determines a font database which is most similar to the called font database according to the similarity, and determines the font database as a font database which needs to be selected again.
Specifically, the data processing module further includes a record integration unit, which is connected with the first operation unit, the second operation unit and the data storage module, and is configured to record the characters represented by the determined character outline one by one according to the sequence of the character outline in the image to generate an integrated text of the handwriting test record text, and store the integrated text in the data storage module.
Specifically, the record integrating unit judges whether to replace the text represented by the text outline with O according to the text outline, wherein,
under the preset condition, the record integrating unit judges that O is used for replacing characters represented by the character outline when the characters are recorded;
the preset condition is that the first operation unit and the second operation unit cannot judge the characters to which the character outlines belong.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (10)

1. A data processing system for clinical trials, comprising:
the data storage module comprises a plurality of font databases and is used for storing a plurality of preset character outlines corresponding to font types;
the data acquisition module comprises an image acquisition unit for shooting a handwriting test record text to acquire an image;
the data processing module comprises an image analysis unit, a first operation unit and a second operation unit which are mutually connected, wherein the image analysis unit, the first operation unit and the second operation unit are all connected with the data acquisition module and the data storage module,
the image analysis unit is used for acquiring an image shot by the image acquisition unit, extracting character outlines of preset lines from the image to obtain a character outline set, comparing each character outline in the character outline set with data in each font database, judging the font type of each character outline according to a comparison result, and analyzing the font similarity condition of the handwriting test record text based on the proportional analysis of the character outlines with the same font type in the character outline set;
the first operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires the first font similarity, comparing the character outlines with preset character outlines stored in a randomly selected font database one by one, calculating the overlap ratio, and judging characters represented by the character outlines based on the overlap ratio;
the second operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires second font similarity, selecting a font database corresponding to the font type with the highest proportion in the character outline set, determining the coincidence degree of each character outline and a preset character outline stored in the selected font database one by one, and judging characters represented by each character outline based on the coincidence degree.
2. The data processing system for clinical trials according to claim 1, wherein the image analysis unit compares each of the text profiles in the text profile set with a preset text profile in each of the font databases to calculate a degree of coincidence C of the text profile with a preset text profile and screen out a maximum degree of coincidence Cm, compares the maximum degree of coincidence Cm with a preset maximum degree of coincidence comparison threshold Cm0, and determines a font type to which the text profile belongs based on the comparison result,
the image analysis unit determines a font database to be used when calculating the maximum overlap Cm,
under a first coincidence degree comparison result, the image analysis unit judges that the character outline belongs to the font type corresponding to the font database;
under the second coincidence degree comparison result, the image analysis unit judges that the character outline does not belong to the font type corresponding to the font database;
wherein the first coincidence degree comparison result is Cm not less than Cm0, and the second coincidence degree comparison result is Cm less than Cm0.
3. The data processing system for clinical trials according to claim 2, wherein the image analysis unit calculates the number of text outlines of each font type in the text outline set as per formula (1) a ratio P,
Figure FDA0004088205120000021
in the formula (1), N represents the number of character outlines belonging to the same font type, and N represents the number of character outlines in the character outline set.
4. A data processing system for clinical trials according to claim 3, wherein the image analysis unit screens the calculated number of text outline ratios for each font type to find the maximum number of ratios P M The maximum number is taken up by the ratio P M Comparing the text with a preset duty ratio comparison threshold P0, and analyzing and judging the similar font condition of the handwriting test record text according to the comparison result, wherein,
if the comparison result meets a first duty ratio condition, the image analysis unit judges that the handwriting test record text is in a first font similarity condition;
if the comparison result meets a second duty ratio condition, the image analysis unit judges that the handwriting test record text is in a second font similarity condition;
wherein the first duty ratio condition is P M < P0, said second duty cycle condition being P M ≥P0。
5. The data processing system for clinical trial according to claim 4, wherein the first arithmetic unit or the second arithmetic unit compares the character outline with a plurality of preset character outlines stored in the selected font database one by one to calculate the coincidence degree C of the character outline and the preset character outline, and screens out the maximum coincidence degree Cm to compare the maximum coincidence degree Cm with a preset standard coincidence degree comparison threshold C0, and determines the character represented by the character outline according to the comparison result, wherein C0 > Cm0,
under a third coincidence degree comparison result, the first operation unit or the second operation unit judges that the text outline is the same as the text associated with the preset text outline;
under a fourth coincidence degree comparison result, the first operation unit or the second operation unit judges that the character outline cannot identify the represented character;
the third coincidence degree comparison result is Cm > C0, and the fourth coincidence degree comparison result is Cm less than or equal to C0.
6. The data processing system for clinical trials according to claim 1, wherein the data storage module further comprises a database parsing unit for determining the similarity between the font databases based on the coincidence degree of the preset text profiles stored in the font databases, wherein,
the database analysis unit selects any two font databases, calls preset font outlines from the two font databases one by one for comparison to determine the coincidence degree of the called preset font outlines, calculates the similarity S between the selected font databases according to a formula (2),
Figure FDA0004088205120000031
in the formula (2), ci represents the coincidence degree between the two preset character outlines selected at the ith time, N z And representing the number of preset character outlines in the font database.
7. The data processing system for clinical trials according to claim 6, wherein the second arithmetic unit acquires a font outline incapable of recognizing the represented text and, based on the similarity between the font databases, reselects the font databases and determines the degree of coincidence of each of the text outlines with a preset text outline stored in the reselected font database one by one, and determines again the text represented by each of the text outlines based on the degree of coincidence.
8. The data processing system for clinical trials according to claim 7, wherein the second arithmetic unit re-selects the font database based on the similarity between the font databases, wherein,
the second operation unit determines a font database which is called when the characters represented by the character outlines are judged, determines a font database which is most similar to the called font database according to the similarity, and determines the font database as a font database which needs to be selected again.
9. The data processing system for clinical trials according to claim 1, wherein the data processing module further comprises a record integration unit connected to the first operation unit, the second operation unit and the data storage module for recording the text represented by the determined text outline one by one in the order of the text outline in the image to generate an integrated text of the handwritten trial recorded text, and storing the integrated text in the data storage module.
10. The data processing system for clinical trial according to claim 9, wherein the record integrating unit determines whether or not to replace the letter represented by the letter outline with O when the letter represented by the letter outline is recorded, based on the letter outline, wherein,
under the preset condition, the record integrating unit judges that O is used for replacing characters represented by the character outline when the characters are recorded;
the preset condition is that the first operation unit and the second operation unit cannot judge the characters to which the character outlines belong.
CN202310142830.7A 2023-02-17 2023-02-17 Data processing system for clinical trials Active CN116206319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310142830.7A CN116206319B (en) 2023-02-17 2023-02-17 Data processing system for clinical trials

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310142830.7A CN116206319B (en) 2023-02-17 2023-02-17 Data processing system for clinical trials

Publications (2)

Publication Number Publication Date
CN116206319A true CN116206319A (en) 2023-06-02
CN116206319B CN116206319B (en) 2023-09-29

Family

ID=86514297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310142830.7A Active CN116206319B (en) 2023-02-17 2023-02-17 Data processing system for clinical trials

Country Status (1)

Country Link
CN (1) CN116206319B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037184A (en) * 2023-10-10 2023-11-10 深圳牛图科技有限公司 OCR fuzzy recognition system and method based on cloud matching

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299663A (en) * 2018-08-27 2019-02-01 刘梅英 Hand-written script recognition methods, system and terminal device
US20190130232A1 (en) * 2017-10-30 2019-05-02 Monotype Imaging Inc. Font identification from imagery
JP2019139500A (en) * 2018-02-09 2019-08-22 京セラドキュメントソリューションズ株式会社 Handwritten font generation device, image formation device, and handwritten font generation method
CN111144191A (en) * 2019-08-14 2020-05-12 广东小天才科技有限公司 Font identification method and device, electronic equipment and storage medium
CN111626383A (en) * 2020-05-29 2020-09-04 Oppo广东移动通信有限公司 Font identification method and device, electronic equipment and storage medium
CN114866271A (en) * 2022-03-15 2022-08-05 上海东普信息科技有限公司 Electronic certificate generation method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130232A1 (en) * 2017-10-30 2019-05-02 Monotype Imaging Inc. Font identification from imagery
JP2019139500A (en) * 2018-02-09 2019-08-22 京セラドキュメントソリューションズ株式会社 Handwritten font generation device, image formation device, and handwritten font generation method
CN109299663A (en) * 2018-08-27 2019-02-01 刘梅英 Hand-written script recognition methods, system and terminal device
CN111144191A (en) * 2019-08-14 2020-05-12 广东小天才科技有限公司 Font identification method and device, electronic equipment and storage medium
CN111626383A (en) * 2020-05-29 2020-09-04 Oppo广东移动通信有限公司 Font identification method and device, electronic equipment and storage medium
CN114866271A (en) * 2022-03-15 2022-08-05 上海东普信息科技有限公司 Electronic certificate generation method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037184A (en) * 2023-10-10 2023-11-10 深圳牛图科技有限公司 OCR fuzzy recognition system and method based on cloud matching

Also Published As

Publication number Publication date
CN116206319B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN110728225B (en) High-speed face searching method for attendance checking
EP0335696A2 (en) Pattern recognition apparatus
CN109102844B (en) Automatic calibration method for clinical test source data
CN116206319B (en) Data processing system for clinical trials
CN105653517A (en) Recognition rate determining method and apparatus
CN108170806B (en) Sensitive word detection and filtering method and device and computer equipment
US8787702B1 (en) Methods and apparatus for determining and/or modifying image orientation
CN106802898B (en) Data entry method and device
US9135562B2 (en) Method for gender verification of individuals based on multimodal data analysis utilizing an individual&#39;s expression prompted by a greeting
CN110188671B (en) Method for analyzing handwriting characteristics by using machine learning algorithm
CN116049461B (en) Question conversion system based on big data cloud platform
CN113705164A (en) Text processing method and device, computer equipment and readable storage medium
CN112213579A (en) Method and device for identifying faults of turnout switch machine
CN113128504A (en) OCR recognition result error correction method and device based on verification rule
CN115510330B (en) Intelligent information processing method and system based on data mining
CN112286780A (en) Method, device and equipment for testing recognition algorithm and storage medium
CN111241930A (en) Method and system for face recognition
JP4885112B2 (en) Document processing apparatus, document processing method, and document processing program
Marcelli et al. Quantitative evaluation of features for forensic handwriting examination
CN117113978A (en) Text error correction system for debugging by using shielding language model
JP2004046723A (en) Method for recognizing character, program and apparatus used for implementing the method
CN111881733A (en) Worker operation step specification visual identification judgment and guidance method and system
CN113988028B (en) Image form restoration method and system based on template matching and OCR coordinates
CN113837129B (en) Method, device, equipment and storage medium for identifying wrongly written characters of handwritten signature
CN113269003B (en) Wheel hub bar code reading method, encoding method, reading system and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant