CN116206319A

CN116206319A - Data processing system for clinical trials

Info

Publication number: CN116206319A
Application number: CN202310142830.7A
Authority: CN
Inventors: 陈筱
Original assignee: Beijing Zhongxing Zhengyuan Technology Co ltd
Current assignee: Beijing Zhongxing Zhengyuan Technology Co ltd
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-06-02
Anticipated expiration: 2043-02-17
Also published as: CN116206319B

Abstract

The invention relates to the technical field of data processing, in particular to a data processing system for clinical trials, which comprises a data storage module, a data acquisition module and a data processing module, wherein the data processing module analyzes the font similarity condition of a handwritten test record text based on the proportion analysis of character outlines with the same font type in a character outline set, under the first font similarity condition, all character outlines are compared with preset character outlines stored in a randomly selected font database to judge characters represented by the character outlines, under the second font similarity condition, a font database corresponding to the font type with the highest proportion in the character outline set is selected, characters represented by the character outlines are judged one by determining the coincidence ratio of the character outlines in all the character outlines and the preset character outlines stored in the selected font database, and the efficiency and the accuracy of identifying the handwritten test record text are improved on the premise of ensuring the reliability.

Description

Data processing system for clinical trials

Technical Field

The invention relates to the technical field of data processing, in particular to a data processing system for clinical trials.

Background

Clinical trials require recording of clinical data related to the trial during participation of the patient as a subject in the clinical trial, the effect of identifying the clinical data directly affects the reliability of the trial data, has an important impact on the determination of efficacy and safety of the trial drug, and the speed of identifying the clinical data directly relates to the efficiency of the data entry effort.

Chinese patent publication No.: CN109102844a, the invention discloses a method for automatically checking clinical test source data, comprising the following steps: identifying the acquired source data image of the clinical test by using a CTPN network model, determining a text region, then cutting the text region, and cutting out each line of text; performing vertical projection column cutting on each line of cut texts to obtain an effective text region of each line of texts; sequentially inputting the set of the effective text areas into a trained CRNN network to obtain a variable-length sequence recognition result, and extracting the text recognition result by using a regular expression; correcting the text recognition result to obtain a correction result; extracting characteristic values from the error correction result one by one according to the characteristic value set, comparing the characteristic values with standard characteristic values recorded in a database, and marking an alarm state for the extracted characteristic values which are not consistent with the standard characteristic values to form an error reminder. The invention takes CPTN and CRNN as cores to carry out the character recognition of the clinical test source data image, thereby realizing the automatic data verification.

However, the prior art has the following problems:

in the prior art, the difference of fonts in the handwritten text is not considered to influence the accuracy of text recognition, and the comparison database provided with multiple fonts is not considered to carry out comparison to determine the fonts of the handwritten text.

Disclosure of Invention

In order to solve the problem that the difference of fonts in a handwritten text can affect the accuracy of text recognition in the prior art, and the comparison database provided with a plurality of fonts is not considered for comparison and determination of the fonts of the handwritten text, the invention provides a data processing system for clinical trials, which comprises:

the data storage module comprises a plurality of font databases and is used for storing a plurality of preset character outlines corresponding to font types;

the data acquisition module comprises an image acquisition unit for shooting a handwriting test record text to acquire an image;

the data processing module comprises an image analysis unit, a first operation unit and a second operation unit which are mutually connected, wherein the image analysis unit, the first operation unit and the second operation unit are all connected with the data acquisition module and the data storage module,

the image analysis unit is used for acquiring an image shot by the image acquisition unit, extracting character outlines of preset lines from the image to obtain a character outline set, comparing each character outline in the character outline set with data in each font database, judging the font type of each character outline according to a comparison result, and analyzing the font similarity condition of the handwriting test record text based on the proportional analysis of the character outlines with the same font type in the character outline set;

the first operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires the first font similarity, comparing the character outlines with preset character outlines stored in a randomly selected font database one by one, calculating the overlap ratio, and judging characters represented by the character outlines based on the overlap ratio;

the second operation unit is used for extracting all character outlines in the image under the condition that the image analysis unit analyzes and acquires second font similarity, selecting a font database corresponding to the font type with the highest proportion in the character outline set, determining the coincidence degree of each character outline and a preset character outline stored in the selected font database one by one, and judging characters represented by each character outline based on the coincidence degree.

Further, the image analysis unit compares each text outline in the text outline set with a preset text outline in each font database to calculate the coincidence degree C of the text outline and the preset text outline, screens out a maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset maximum coincidence degree comparison threshold Cm0, and judges the font type of the text outline according to the comparison result,

the image analysis unit determines a font database to be used when calculating the maximum overlap Cm,

under a first coincidence degree comparison result, the image analysis unit judges that the character outline belongs to the font type corresponding to the font database;

under the second coincidence degree comparison result, the image analysis unit judges that the character outline does not belong to the font type corresponding to the font database;

wherein the first coincidence degree comparison result is Cm not less than Cm0, and the second coincidence degree comparison result is Cm less than Cm0.

Further, the image analysis unit calculates the text outline quantity ratio P of each font type in the text outline set according to the formula (1),

in the formula (1), N represents the number of character outlines belonging to the same font type, and N represents the number of character outlines in the character outline set.

Further, the image analysis unit screens the calculated text outline quantity ratio of each font type to screen out the maximum quantity ratio P _M， The maximum number is taken up by the ratio P _M Comparing the text with a preset duty ratio comparison threshold P0, and analyzing and judging the similar font condition of the handwriting test record text according to the comparison result, wherein,

if the comparison result meets a first duty ratio condition, the image analysis unit judges that the handwriting test record text is in a first font similarity condition;

if the comparison result meets a second duty ratio condition, the image analysis unit judges that the handwriting test record text is in a second font similarity condition;

wherein the first duty ratio condition is P _M < P0, said second duty cycle condition being P _M ≥P0。

Further, the first operation unit or the second operation unit compares the character outline with a plurality of preset character outlines stored in the selected font database one by one to calculate the coincidence degree C of the character outline and the preset character outline, screens out the maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset standard coincidence degree comparison threshold C0, judges the characters represented by the character outline according to the comparison result, wherein C0 is more than Cm0,

under a third coincidence degree comparison result, the first operation unit or the second operation unit judges that the text outline is the same as the text associated with the preset text outline;

under a fourth coincidence degree comparison result, the first operation unit or the second operation unit judges that the character outline cannot identify the represented character;

the third coincidence degree comparison result is Cm > C0, and the fourth coincidence degree comparison result is Cm less than or equal to C0.

Further, the data storage module further comprises a database parsing unit for determining the similarity between the font databases according to the coincidence degree of the preset text outlines stored in the font databases, wherein,

the database analysis unit selects any two font databases, calls preset font outlines from the two font databases one by one for comparison to determine the coincidence degree of the called preset font outlines, calculates the similarity S between the selected font databases according to a formula (2),

in the formula (2), ci represents the coincidence degree between the two preset character outlines selected at the ith time, N _z And representing the number of preset character outlines in the font database.

Further, the second operation unit obtains font outlines of the characters which cannot be identified, reselects the font database based on the similarity among the font databases, determines the coincidence degree of each of the character outlines and the preset character outlines stored in the reselected font database one by one, and determines the characters represented by each of the character outlines again based on the coincidence degree.

Further, the second operation unit re-selects the font database based on the similarity between the font databases, wherein,

the second operation unit determines a font database which is called when the characters represented by the character outlines are judged, determines a font database which is most similar to the called font database according to the similarity, and determines the font database as a font database which needs to be selected again.

Further, the data processing module further includes a record integration unit, which is connected with the first operation unit, the second operation unit and the data storage module, and is configured to record the characters represented by the determined character outline one by one according to the sequence of the character outline in the image to generate an integrated text of the handwriting test record text, and store the integrated text in the data storage module.

Further, the record integrating unit judges whether to replace the text represented by the text outline with O according to the text outline, wherein,

under the preset condition, the record integrating unit judges that O is used for replacing characters represented by the character outline when the characters are recorded;

the preset condition is that the first operation unit and the second operation unit cannot judge the characters to which the character outlines belong.

Compared with the prior art, the invention has the advantages that the data storage module, the data acquisition module and the data processing module are arranged, the data processing module analyzes the font similarity condition of the handwriting test record text based on the proportion analysis of the character outlines with the same font type in the character outline set, under the first font similarity condition, all the character outlines are compared with the preset character outlines stored in the randomly selected font database one by one to judge the characters represented by the character outlines, under the second font similarity condition, the font database corresponding to the font type with the highest proportion in the character outline set is selected, and the coincidence degree of each character outline in all the character outlines and the preset character outline stored in the selected font database is determined one by one to judge the characters represented by each character outline, so that the recognition efficiency and effect of the handwriting test record text of different fonts are improved.

In particular, in the invention, the image analysis unit compares each text outline in the text outline set with the data in each font database to judge the font type to which each text outline belongs according to the comparison result, in the practical situation, the degree of coincidence characterizes the similarity degree of the text outline and the preset text outline, the higher the similarity degree is, the greater the possibility that the text outline and the preset text outline are identical is, the text outline is compared with a plurality of preset text outlines in the font database, the degree of coincidence between the text outline and each preset text outline is obtained, the preset text outline corresponding to the largest value of the degree of coincidence is the preset text outline with the highest degree of similarity with the text outline in the font database, the font type to which the text outline belongs is scientifically determined, and the accuracy of identifying the text recorded by the subsequent handwriting test is ensured.

In particular, in the invention, the image analysis unit analyzes the similar status of the fonts of the handwriting test record text based on the proportion analysis of the text outlines with the same font types in the text outline set, in the practical situation, the ratio is calculated by the ratio of the number of the text outlines with the same font types to the number of the text outlines in the text outline set, the proportion of the number of the text outlines with various font types in the text outline set is represented, the larger the numerical value of the proportion is the higher the possibility that the text outlines in the handwriting test record text with the text outlines set belong to the font types, if the numerical value of the ratio corresponding to all the font types is lower, the font types of the font outlines in the handwriting test record text cannot be confirmed, and the font types of the font outlines in the handwriting test record text can not be confirmed.

In particular, in the invention, the first operation unit extracts all character outlines in the image under the first font similarity state, compares the character outlines with preset character outlines stored in a randomly selected font database one by one, calculates the coincidence degree, judges characters represented by the character outlines based on the coincidence degree, and in the first font similarity state with lower value of the duty ratio corresponding to all the font types, the character types of the character outlines in the handwriting test record text cannot be confirmed, so that all the character outlines in the image are compared with the randomly selected font database one by one, the characters represented by the character outlines are determined, the reliability of character outline identification is ensured, and the identification effect of the handwriting test record text is ensured.

In particular, in the invention, the second operation unit extracts all character outlines in the image under the second font similar state, selects the font database corresponding to the font type with the highest concentrated ratio of the character outlines, determines the coincidence ratio of each character outline to the preset character outline stored in the selected font database one by one, preferentially selects the font database closest to the font of the handwriting test record text in the data comparison, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability.

In particular, the second operation unit extracts the font outline of the unrecognized represented text, reselects the font database for comparison, reselects the font database to be the font database with the highest similarity with the called font database, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability.

Drawings

FIG. 1 is a schematic diagram of a data processing system for clinical trials according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a data storage module according to an embodiment of the invention;

fig. 3 is a schematic diagram of a data processing module according to an embodiment of the invention.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.

Referring to fig. 1, 2 and 3, which are schematic diagrams of a data processing system for clinical trials, a data storage module structure and a data processing module structure according to an embodiment of the present invention, the data processing system for clinical trials of the present invention includes:

Specifically, the specific form of the data storage module is not limited, and the function of storing data can be completed only by the data storage module, which is not described in detail in the prior art.

Specifically, the specific form of the data processing module is not limited, and the data processing module can be an external computer, wherein each unit is different functional programs in the computer, and only the functions of data processing and data exchange can be completed, and the detailed description is omitted.

Specifically, the specific calculation mode of the contour coincidence degree is not limited, the calculation can be performed based on the similarity of the patterns, other forms can be also adopted, and the related algorithm model is already mature prior art and is not repeated here.

Specifically, the image analysis unit compares each text outline in the text outline set with a preset text outline in each font database to calculate the coincidence degree C of the text outline and the preset text outline, screens out a maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset maximum coincidence degree contrast threshold Cm0, cm0 is more than 0, and judges the font type of the text outline according to the comparison result,

Specifically, in the invention, the image analysis unit compares each text outline in the text outline set with the data in each font database to judge the font type of each text outline according to the comparison result, in the actual situation, the degree of coincidence characterizes the similarity degree of the text outline and the preset text outline, the higher the similarity degree is, the greater the possibility that the text outline and the preset text outline are identical outline is, the text outline is compared with a plurality of preset text outlines in the font database, the degree of coincidence of the text outline and each preset text outline is obtained, the preset text outline corresponding to the largest value of the degree of coincidence is the preset text outline with the highest degree of similarity with the text outline in the font database, the degree of similarity between the text outline and the font database is quantized reliably, and the font type corresponding to the font database with the degree of similarity of the text outline larger than the preset value is scientifically used as the font type of the text outline, so that the accuracy of the text of the subsequent handwriting test record is ensured.

Specifically, the image analysis unit calculates the text outline quantity ratio P of each font type in the text outline set according to the formula (1),

Specifically, the image analysis unit screens the calculated text outline quantity ratio of each font type to screen out the maximum quantity ratio P _M， The maximum number is taken up by the ratio P _M Comparing the text with a preset duty ratio comparison threshold P0, wherein P0 is more than 0, and analyzing and judging the similar font condition of the handwriting test record text according to the comparison result,

Specifically, in the invention, the image analysis unit analyzes the similar status of the fonts of the handwriting test record text based on the proportion analysis of the text outlines with the same font types in the text outline set, in the practical situation, the ratio is calculated by the ratio of the number of the text outlines with the same font types to the number of the text outlines in the text outline set, the proportion of the number of the text outlines with each font type in the text outline set is represented, the larger the numerical value of the proportion is the higher the possibility that the text outline in the handwriting test record text with the text outline set belongs to the font type, if the numerical value of the ratio corresponding to all the font types is lower, the font type of the text outline in the handwriting test record text cannot be confirmed, and the font type of the text outline in the handwriting test record text can not be confirmed.

Specifically, the first operation unit or the second operation unit compares the character outline with a plurality of preset character outlines stored in the selected font database one by one to calculate the coincidence degree C of the character outline and the preset character outline, screens out the maximum coincidence degree Cm, compares the maximum coincidence degree Cm with a preset standard coincidence degree comparison threshold C0, judges the characters represented by the character outline according to the comparison result, wherein C0 is more than Cm0 is more than 0,

Specifically, in the invention, the first operation unit extracts all character outlines in an image under the first font similarity state, compares the character outlines with preset character outlines stored in a randomly selected font database one by one, calculates the coincidence degree, judges characters represented by the character outlines based on the coincidence degree, and in the first font similarity state with lower value of the duty ratio corresponding to all the font types, the character types of the character outlines in the handwriting test record text cannot be confirmed, so that all the character outlines in the image are compared with the randomly selected font database one by one, the characters represented by the character outlines are determined, the reliability of character outline identification is ensured, and the identification effect of the handwriting test record text is ensured.

Specifically, in the invention, the second operation unit extracts all character outlines in the image under the second font similar state, selects the font database corresponding to the font type with the highest concentrated ratio of the character outlines, determines the coincidence ratio of each character outline to the preset character outline stored in the selected font database one by one, preferentially selects the font database closest to the font of the handwriting test record text in the data comparison, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability.

Specifically, the data storage module further comprises a database parsing unit for determining the similarity between the font databases according to the coincidence degree of the preset text outlines stored in the font databases, wherein,

Specifically, the second operation unit obtains font outlines of the characters which cannot be identified, reselects the font database based on the similarity among the font databases, determines the coincidence degree of each of the character outlines and the preset character outlines stored in the reselected font database one by one, and determines the characters represented by each of the character outlines again based on the coincidence degree.

Specifically, the second operation unit extracts the font outline of the unrecognized represented text, reselects the font database for comparison, reselects the font database to be the font database with the highest similarity with the called font database, and further improves the efficiency and the precision of text recognition on the premise of ensuring the reliability

Specifically, the second operation unit re-selects the font database based on the similarity between the font databases, wherein,

Specifically, the data processing module further includes a record integration unit, which is connected with the first operation unit, the second operation unit and the data storage module, and is configured to record the characters represented by the determined character outline one by one according to the sequence of the character outline in the image to generate an integrated text of the handwriting test record text, and store the integrated text in the data storage module.

Specifically, the record integrating unit judges whether to replace the text represented by the text outline with O according to the text outline, wherein,

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A data processing system for clinical trials, comprising:

2. The data processing system for clinical trials according to claim 1, wherein the image analysis unit compares each of the text profiles in the text profile set with a preset text profile in each of the font databases to calculate a degree of coincidence C of the text profile with a preset text profile and screen out a maximum degree of coincidence Cm, compares the maximum degree of coincidence Cm with a preset maximum degree of coincidence comparison threshold Cm0, and determines a font type to which the text profile belongs based on the comparison result,

3. The data processing system for clinical trials according to claim 2, wherein the image analysis unit calculates the number of text outlines of each font type in the text outline set as per formula (1) a ratio P,

4. A data processing system for clinical trials according to claim 3, wherein the image analysis unit screens the calculated number of text outline ratios for each font type to find the maximum number of ratios P _M The maximum number is taken up by the ratio P _M Comparing the text with a preset duty ratio comparison threshold P0, and analyzing and judging the similar font condition of the handwriting test record text according to the comparison result, wherein,

5. The data processing system for clinical trial according to claim 4, wherein the first arithmetic unit or the second arithmetic unit compares the character outline with a plurality of preset character outlines stored in the selected font database one by one to calculate the coincidence degree C of the character outline and the preset character outline, and screens out the maximum coincidence degree Cm to compare the maximum coincidence degree Cm with a preset standard coincidence degree comparison threshold C0, and determines the character represented by the character outline according to the comparison result, wherein C0 > Cm0,

6. The data processing system for clinical trials according to claim 1, wherein the data storage module further comprises a database parsing unit for determining the similarity between the font databases based on the coincidence degree of the preset text profiles stored in the font databases, wherein,

7. The data processing system for clinical trials according to claim 6, wherein the second arithmetic unit acquires a font outline incapable of recognizing the represented text and, based on the similarity between the font databases, reselects the font databases and determines the degree of coincidence of each of the text outlines with a preset text outline stored in the reselected font database one by one, and determines again the text represented by each of the text outlines based on the degree of coincidence.

8. The data processing system for clinical trials according to claim 7, wherein the second arithmetic unit re-selects the font database based on the similarity between the font databases, wherein,

9. The data processing system for clinical trials according to claim 1, wherein the data processing module further comprises a record integration unit connected to the first operation unit, the second operation unit and the data storage module for recording the text represented by the determined text outline one by one in the order of the text outline in the image to generate an integrated text of the handwritten trial recorded text, and storing the integrated text in the data storage module.

10. The data processing system for clinical trial according to claim 9, wherein the record integrating unit determines whether or not to replace the letter represented by the letter outline with O when the letter represented by the letter outline is recorded, based on the letter outline, wherein,