CN113705367A - Formula image area identification method and system based on feature and aspect ratio detection - Google Patents

Formula image area identification method and system based on feature and aspect ratio detection Download PDF

Info

Publication number
CN113705367A
CN113705367A CN202110904935.2A CN202110904935A CN113705367A CN 113705367 A CN113705367 A CN 113705367A CN 202110904935 A CN202110904935 A CN 202110904935A CN 113705367 A CN113705367 A CN 113705367A
Authority
CN
China
Prior art keywords
image area
identified
image
reference formula
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110904935.2A
Other languages
Chinese (zh)
Inventor
崔波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huilang Times Technology Co Ltd
Original Assignee
Beijing Huilang Times Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huilang Times Technology Co Ltd filed Critical Beijing Huilang Times Technology Co Ltd
Priority to CN202110904935.2A priority Critical patent/CN113705367A/en
Publication of CN113705367A publication Critical patent/CN113705367A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The invention discloses a formula image area identification method based on feature and aspect ratio detection, which comprises the following steps: acquiring a reference formula image area and an image area to be identified; SIFT feature extraction is carried out on the reference formula image region and the image region to be identified to obtain feature data; calculating the feature similarity of the reference formula image area and the image area to be identified; calculating the length-width ratio of the reference formula image area and the length-width ratio of the image area to be identified, and calculating the length-width ratio similarity of the image area to be identified and the reference formula image area by using the absolute difference value; and identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the image area to be identified and the image area to be identified by the reference formula. The invention also discloses a formula image area identification method based on the feature and the length-width ratio detection. The invention combines the feature similarity test and the aspect ratio similarity test, and improves the identification precision of the formula image area.

Description

Formula image area identification method and system based on feature and aspect ratio detection
Technical Field
The invention relates to the technical field of image identification, in particular to a formula image area identification method and system based on feature and aspect ratio detection.
Background
As the age grows, more and more papers, applications, reports, etc. appear in our lives, they are also often shown, communicated or stored in the form of pictures. The formula is often a very important part in the thesis and the application, and when the formula is to be identified or edited, it is a very important link to accurately judge which part of the picture is the formula image area.
In view of the above problems, many experts and scholars have conducted intensive research. The traditional method often does not fully consider the characteristics of the formula image area, so that the designed identification method is not targeted, and the identification of the formula image area cannot keep higher precision. Meanwhile, many conventional methods adopt more complex algorithms or require more training samples, thereby greatly increasing the consumption of computing resources. Therefore, how to find a simple formula image region identification method with strong pertinence and identify the formula image region with high precision is a very meaningful work which needs to be solved urgently.
Disclosure of Invention
In order to overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide a method and a system for identifying a formula image region based on feature and aspect ratio detection, which combine a feature similarity check and an aspect ratio similarity check to improve the identification accuracy of the formula image region.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present invention provides a method for identifying an image region based on a formula of feature and aspect ratio detection, including the following steps:
acquiring a reference formula image area and an image area to be identified;
SIFT feature extraction is carried out on the reference formula image region and the image region to be identified so as to obtain feature data of the reference formula image region and the image region to be identified;
calculating the feature similarity of the reference formula image area and the image area to be identified according to the feature data;
calculating the length-width ratio of the reference formula image area and the length-width ratio of the image area to be identified, and calculating the length-width ratio similarity of the image area to be identified and the reference formula image area by using the absolute difference value;
and identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the image area to be identified and the image area to be identified by the reference formula.
In order to ensure accurate identification of the formula image area, the features and the aspect ratio are combined for judgment so as to ensure that an accurate identification result can be obtained. When an image region needs to be identified, firstly, acquiring a plurality of formula image regions which are manually selected and partially representative, taking the formula image regions as reference formula image regions to provide reference data for subsequent image identification, after acquiring the reference formula image regions and the image regions to be identified, carrying out SIFT feature extraction on the plurality of reference formula image regions and the image regions to be identified, and extracting to obtain feature data of the reference formula image regions and the image regions to be identified; after extracting the feature data, calculating the feature similarity of the image region to be identified and a plurality of reference formula image regions by using Euclidean distance on the basis of SIFT feature extraction; calculating the length-width ratios of a plurality of reference formula image areas and the length-width ratio of an image area to be identified, calculating the length-width ratio similarity of the image area to be identified and the plurality of reference formula image areas by using absolute difference values, identifying and determining the type of the image area to be identified according to the characteristic similarity and the length-width ratio similarity of the reference formula image area and the image area to be identified, judging the image area to be identified as a formula image area or a non-formula image area, and if the characteristic similarity of the image area to be identified and any one of the formula image areas is high and the length-width ratio similarity of the image area to be identified and any one of the formula image areas is high, judging the image area to be identified as the formula image area; and if the feature similarity between the image area to be identified and any one formula image area is higher and any one of the aspect ratio similarities between the image area to be identified and any one formula image area is higher does not meet the requirement, judging that the image area to be identified is a non-formula image area.
The method has stronger pertinence to formula image area identification by combining, characteristic similarity detection and length-width ratio similarity detection methods, and can keep higher identification precision to a formula identification task; the algorithm in the whole process is relatively simple, a complex calculation model and a large number of training samples are not utilized, more calculation resources are saved, and the model is low in consumption.
Based on the first aspect, in some embodiments of the present invention, the method for identifying a formula image region based on feature and aspect ratio detection further includes, before the step of performing SIFT feature extraction, the following steps:
and carrying out image enhancement processing on the reference formula image area and the image area to be identified so as to obtain the enhanced reference formula image area and the image area to be identified.
Based on the first aspect, in some embodiments of the present invention, the above method for performing image enhancement processing on the reference formula image area and the image area to be identified includes the following steps:
filtering the reference formula image area and the image area to be identified by using a WLS least square filter;
dividing the filtered reference formula image area and the filtered image area to be identified into a plurality of scales, and carrying out subtraction calculation on the reference formula image area and the image area to be identified between different scales to obtain image detail information of the plurality of reference formula image areas and image detail information of the plurality of image areas to be identified;
and weighting the image detail information of each reference formula image area into the original reference formula image area, and weighting the image detail information of each image area to be identified into the original image area to be identified so as to obtain the enhanced images of the reference formula image area and the image area to be identified.
Based on the first aspect, in some embodiments of the present invention, the method for determining the type of the image area to be identified according to the feature similarity and aspect ratio similarity identification of the image area and the image area to be identified by the reference formula includes the following steps:
comparing the feature similarity of the reference formula image area and the image area to be identified with a preset feature similarity threshold value to generate a first comparison result;
comparing the aspect ratio similarity of the reference formula image area and the image area to be identified with a preset aspect ratio similarity threshold to generate a second comparison result;
and identifying and determining the type of the image area to be identified according to the first comparison result and the second comparison result.
Based on the first aspect, in some embodiments of the present invention, the method for determining the type of the image area to be identified according to the first comparison result and the second comparison result includes the following steps:
a1, judging whether the first comparison result contains information that the characteristics of the image area to be identified and the reference formula image area are similar, if yes, entering the step A2; if not, determining that the image area to be identified is a non-formula image area;
a2, judging whether the second comparison result contains information that the aspect ratio of the image area to be identified is similar to that of the reference formula image area, if so, determining that the image area to be identified is the formula image area; and if not, determining that the image area to be identified is a non-formula image area.
In a second aspect, an embodiment of the present invention provides a formula image area identification system based on feature and aspect ratio detection, including an image acquisition module, a feature extraction module, a feature calculation module, an aspect ratio calculation module, and an identification module, where:
the image acquisition module is used for acquiring a reference formula image area and an image area to be identified;
the characteristic extraction module is used for carrying out SIFT characteristic extraction on the reference formula image area and the image area to be identified so as to obtain characteristic data of the reference formula image area and the image area to be identified;
the characteristic calculation module is used for calculating the characteristic similarity of the reference formula image area and the image area to be identified according to the characteristic data;
the length-width ratio calculation module is used for calculating the length-width ratio of the reference formula image area and the length-width ratio of the image area to be identified, and calculating the length-width ratio similarity of the image area to be identified and the reference formula image area by using the absolute difference value;
and the identification module is used for identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the reference formula image area and the image area to be identified.
In order to ensure accurate identification of the formula image area, the features and the aspect ratio are combined for judgment so as to ensure that an accurate identification result can be obtained. When an image area needs to be identified, firstly, acquiring a plurality of formula image areas which are manually selected and partially representative through an image acquisition module, taking the formula image areas as reference formula image areas to provide reference data for subsequent image identification, after acquiring the reference formula image areas and the image areas to be identified, carrying out SIFT feature extraction on the reference formula image areas and the image areas to be identified through a feature extraction module, and extracting feature data of the reference formula image areas and the image areas to be identified; after extracting the feature data, the feature calculation module calculates the feature similarity of the image region to be identified and a plurality of reference formula image regions by using Euclidean distance on the basis of SIFT feature extraction; calculating the length-width ratios of a plurality of reference formula image areas and the length-width ratio of an image area to be identified by an length-width ratio calculation module, calculating the length-width ratio similarity of the image area to be identified and the plurality of reference formula image areas by using absolute difference values, identifying and determining the type of the image area to be identified according to the feature similarity and the length-width ratio similarity of the reference formula image area and the image area to be identified by an identification module, judging the image area to be identified as a formula image area or a non-formula image area, and if the feature similarity of the image area to be identified and any one formula image area is higher and the length-width ratio similarity of the image area to be identified and any one formula image area is higher, judging the image area to be identified as the formula image area; and if the feature similarity between the image area to be identified and any one formula image area is higher and any one of the aspect ratio similarities between the image area to be identified and any one formula image area is higher does not meet the requirement, judging that the image area to be identified is a non-formula image area.
The system combines, the feature similarity detection and the length-width ratio similarity detection methods, has stronger pertinence to formula image area identification, and can keep higher identification precision to a formula identification task; the algorithm in the whole process is relatively simple, a complex calculation model and a large number of training samples are not utilized, more calculation resources are saved, and the model is low in consumption.
Based on the second aspect, in some embodiments of the present invention, the formula image area recognition system based on feature and aspect ratio detection further includes an image enhancement module, configured to perform image enhancement processing on the reference formula image area and the image area to be recognized, so as to obtain an enhanced reference formula image area and an enhanced image area to be recognized.
Based on the second aspect, in some embodiments of the present invention, the image enhancement module includes a filtering sub-module, a scale detail sub-module, and a weighting enhancement sub-module, wherein:
the filtering submodule is used for filtering the reference formula image area and the image area to be identified by using a WLS least square filter;
the scale detail submodule is used for dividing the filtered reference formula image area and the filtered image area to be identified into a plurality of scales, and carrying out subtraction calculation on the reference formula image area and the image area to be identified between different scales to obtain image detail information of the plurality of reference formula image areas and image detail information of the plurality of image areas to be identified;
and the weighting and strengthening submodule is used for weighting the image detail information of each reference formula image area into the original reference formula image area and weighting the image detail information of each image area to be recognized into the original image area to be recognized so as to obtain a reference formula image area and a strengthened image of the image area to be recognized.
Based on the second aspect, in some embodiments of the invention, the identification module includes a first comparison sub-module, a second comparison sub-module, and an identification determination sub-module, wherein:
the first comparison submodule is used for comparing the feature similarity of the reference formula image area and the image area to be identified with a preset feature similarity threshold value to generate a first comparison result;
the second comparison submodule is used for comparing the aspect ratio similarity of the reference formula image area and the image area to be identified with a preset aspect ratio similarity threshold value to generate a second comparison result;
and the identification and determination submodule is used for identifying and determining the type of the image area to be identified according to the first comparison result and the second comparison result.
Based on the second aspect, in some embodiments of the present invention, the identification determination sub-module includes a first determination unit and a second determination unit, wherein:
the first judging unit is used for judging whether the first comparison result contains information that the characteristics of the image area to be identified and the reference formula image area are similar, and if so, the second judging unit works; if not, determining that the image area to be identified is a non-formula image area
The second judging unit is used for judging whether the second comparison result contains information that the length-width ratio of the image area to be identified is similar to the length-width ratio of the reference formula image area, and if so, determining that the image area to be identified is the formula image area; and if not, determining that the image area to be identified is a non-formula image area.
The embodiment of the invention at least has the following advantages or beneficial effects:
the embodiment of the invention provides a formula image region identification method and system based on feature and length-width ratio detection, after a reference formula image region and an image region to be identified are obtained, SIFT feature extraction is carried out on a plurality of reference formula image regions and the image region to be identified, after feature data are extracted, and feature similarity of the image region to be identified and the plurality of reference formula image regions is calculated by using Euclidean distance on the basis of SIFT feature extraction; and calculating the aspect ratios of the multiple reference formula image areas and the aspect ratio of the image area to be identified, calculating the similarity of the aspect ratios of the image area to be identified and the multiple reference formula image areas by using the absolute difference value, and judging the image area to be the formula image area or the non-formula image area. The method has stronger pertinence to formula image area identification by combining, characteristic similarity detection and length-width ratio similarity detection methods, and can keep higher identification precision to a formula identification task; the algorithm in the whole process is relatively simple, a complex calculation model and a large number of training samples are not utilized, more calculation resources are saved, and the model is low in consumption.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a method for identifying a formula image region based on feature and aspect ratio detection according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the identification and determination of image region types in a method for identifying image regions based on a formula for feature and aspect ratio detection according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the determination of image region types in a method for identifying image regions based on feature and aspect ratio detection according to an embodiment of the present invention;
fig. 4 is a schematic block diagram of an embodiment of the present invention, fig. 1 is a schematic block diagram of a system for identifying a formula image area based on feature and aspect ratio detection according to an embodiment of the present invention.
Icon: 100. an image acquisition module; 200. a feature extraction module; 300. a feature calculation module; 400. an aspect ratio calculation module; 500. an identification module; 510. a first comparison submodule; 520. a second comparison sub-module; 530. identifying a determination submodule; 531. a first judgment unit; 532. a second judgment unit; 600. an image enhancement module; 610. a filtering submodule; 620. a scale detail sub-module; 630. a weight enhancement submodule.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Examples
As shown in fig. 1, in a first aspect, an embodiment of the present invention provides a method for identifying an image region based on a formula of feature and aspect ratio detection, including the following steps:
s1, acquiring a reference formula image area and an image area to be identified;
in order to ensure accurate identification of the formula image area, the features and the aspect ratio are combined for judgment so as to ensure that an accurate identification result can be obtained. When the image areas need to be identified, firstly, a plurality of representative formula image areas selected manually are obtained and used as reference formula image areas to provide reference data for subsequent image identification.
S2, SIFT feature extraction is carried out on the reference formula image area and the image area to be identified so as to obtain feature data of the reference formula image area and the image area to be identified;
after a reference formula image area and an image area to be identified are obtained, SIFT feature extraction is carried out on a plurality of reference formula image areas and image areas to be identified, and feature data of the reference formula image areas and the image areas to be identified are obtained through extraction; SIFT (Scale-invariant feature transform) is a local feature detection algorithm, and the algorithm obtains features by solving feature points (or kernel points) in a picture and descriptors related to Scale and orientation and performs image feature point matching to obtain a good effect, SIFT features not only have Scale invariance, but also can obtain a good detection effect even if a rotation angle, image brightness or a shooting angle of view is changed.
S3, calculating the feature similarity of the reference formula image area and the image area to be identified according to the feature data;
after the feature data are extracted, calculating the feature similarity of the image region to be identified and a plurality of reference formula image regions by using Euclidean distance on the basis of SIFT feature extraction.
S4, calculating the length-width ratio of the reference formula image area and the length-width ratio of the image area to be identified, and calculating the length-width ratio similarity of the image area to be identified and the reference formula image area by using the absolute difference;
and calculating the aspect ratios of the multiple reference formula image areas and the aspect ratio of the image area to be identified, and calculating the similarity of the aspect ratios of the image area to be identified and the multiple reference formula image areas by using the absolute difference.
And S5, identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the reference formula image area and the image area to be identified.
Identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the reference formula image area and the image area to be identified, judging the image area to be identified as the formula image area or a non-formula image area, and if the feature similarity of the image area to be identified and any one formula image area is higher and the aspect ratio similarity of the image area to be identified and any one formula image area is higher, judging the image area to be identified as the formula image area; and if the feature similarity between the image area to be identified and any one formula image area is higher and any one of the aspect ratio similarities between the image area to be identified and any one formula image area is higher does not meet the requirement, judging that the image area to be identified is a non-formula image area.
The method has stronger pertinence to formula image area identification by combining, characteristic similarity detection and length-width ratio similarity detection methods, and can keep higher identification precision to a formula identification task; the algorithm in the whole process is relatively simple, a complex calculation model and a large number of training samples are not utilized, more calculation resources are saved, and the model is low in consumption.
Based on the first aspect, in some embodiments of the present invention, the method for identifying a formula image region based on feature and aspect ratio detection further includes, before the step of performing SIFT feature extraction, the following steps:
and carrying out image enhancement processing on the reference formula image area and the image area to be identified so as to obtain the enhanced reference formula image area and the image area to be identified.
In order to ensure accurate identification of the image subsequently, firstly, image enhancement processing is carried out on each reference formula image area and the image area to be identified before feature extraction, so as to obtain a clear and accurate image.
Based on the first aspect, in some embodiments of the present invention, the above method for performing image enhancement processing on the reference formula image area and the image area to be identified includes the following steps:
filtering the reference formula image area and the image area to be identified by using a WLS least square filter;
dividing the filtered reference formula image area and the filtered image area to be identified into a plurality of scales, and carrying out subtraction calculation on the reference formula image area and the image area to be identified between different scales to obtain image detail information of the plurality of reference formula image areas and image detail information of the plurality of image areas to be identified;
and weighting the image detail information of each reference formula image area into the original reference formula image area, and weighting the image detail information of each image area to be identified into the original image area to be identified so as to obtain the enhanced images of the reference formula image area and the image area to be identified.
Filtering the image by using a WLS (least square filter), wherein the filtered image is divided into a plurality of scales; carrying out deduction calculation among different scales to obtain some detailed information; weighting detail information of different degrees into the original image; and obtaining the enhanced image containing rich detail information.
As shown in fig. 2, in some embodiments of the present invention, the method for determining the type of the image area to be recognized according to the feature similarity and aspect ratio similarity recognition of the image area and the image area to be recognized by the reference formula includes the following steps:
s51, comparing the feature similarity of the reference formula image area and the image area to be identified with a preset feature similarity threshold value to generate a first comparison result;
s52, comparing the aspect ratio similarity of the reference formula image area and the image area to be identified with a preset aspect ratio similarity threshold to generate a second comparison result;
and S53, identifying and determining the type of the image area to be identified according to the first comparison result and the second comparison result.
Respectively comparing the feature similarity and the aspect ratio similarity of the image area to be identified and the reference formula image area with preset threshold values, and determining the image area to be identified as the formula image area when the two similarity degrees are matched with the preset threshold values; and if one of the image areas is not matched, determining that the image area to be identified is a non-formula image area.
As shown in fig. 3, in some embodiments of the present invention according to the first aspect, the method for determining the type of the image region to be identified according to the first comparison result and the second comparison result includes the following steps:
a1, judging whether the first comparison result contains information that the characteristics of the image area to be identified and the reference formula image area are similar, if yes, entering the step A2; if not, determining that the image area to be identified is a non-formula image area;
a2, judging whether the second comparison result contains information that the aspect ratio of the image area to be identified is similar to that of the reference formula image area, if so, determining that the image area to be identified is the formula image area; and if not, determining that the image area to be identified is a non-formula image area.
If the feature similarity of the image area to be identified and any formula image area is high and the aspect ratio similarity of the image area to be identified and any formula image area is high and simultaneously satisfied, judging that the image area to be identified is the formula image area; and if the feature similarity between the image area to be identified and any one formula image area is higher and any one of the aspect ratio similarities between the image area to be identified and any one formula image area is higher does not meet the requirement, judging that the image area to be identified is a non-formula image area.
As shown in fig. 4, in a second aspect, an embodiment of the present invention provides a formula image region identification system based on feature and aspect ratio detection, including an image acquisition module 100, a feature extraction module 200, a feature calculation module 300, an aspect ratio calculation module 400, and an identification module 500, where:
the image acquisition module 100 is configured to acquire a reference formula image area and an image area to be identified;
the feature extraction module 200 is configured to perform SIFT feature extraction on the reference formula image region and the image region to be identified to obtain feature data of the reference formula image region and the image region to be identified;
the feature calculation module 300 is configured to calculate feature similarity between a reference formula image region and an image region to be identified according to the feature data;
an aspect ratio calculation module 400, configured to calculate an aspect ratio of the reference formula image area and an aspect ratio of the image area to be identified, and calculate an aspect ratio similarity between the image area to be identified and the reference formula image area by using the absolute difference;
the identifying module 500 is configured to identify and determine the type of the image area to be identified according to the feature similarity and the aspect ratio similarity between the reference formula image area and the image area to be identified.
In order to ensure accurate identification of the formula image area, the features and the aspect ratio are combined for judgment so as to ensure that an accurate identification result can be obtained. When an image region needs to be identified, firstly, acquiring a plurality of formula image regions which are manually selected and partially representative through an image acquisition module 100, taking the formula image regions as reference formula image regions to provide reference data for subsequent image identification, after acquiring the reference formula image regions and an image region to be identified, carrying out SIFT feature extraction on the plurality of reference formula image regions and the image region to be identified through a feature extraction module 200, and extracting feature data of the reference formula image regions and the image region to be identified; after extracting the feature data, the feature calculation module 300 calculates feature similarities between the image region to be identified and the plurality of reference formula image regions by using the euclidean distance on the basis of the SIFT feature extraction; calculating the aspect ratios of a plurality of reference formula image areas and the aspect ratio of an image area to be identified through an aspect ratio calculation module 400, calculating the aspect ratio similarity of the image area to be identified and the plurality of reference formula image areas by using absolute difference values, identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the reference formula image area and the image area to be identified through an identification module 500, judging the image area to be identified as a formula image area or a non-formula image area, and if the feature similarity of the image area to be identified and any one formula image area is high and the aspect ratio similarity of the image area to be identified and any one formula image area is high, judging the image area to be identified as the formula image area; and if the feature similarity between the image area to be identified and any one formula image area is higher and any one of the aspect ratio similarities between the image area to be identified and any one formula image area is higher does not meet the requirement, judging that the image area to be identified is a non-formula image area.
The system combines, the feature similarity detection and the length-width ratio similarity detection methods, has stronger pertinence to formula image area identification, and can keep higher identification precision to a formula identification task; the algorithm in the whole process is relatively simple, a complex calculation model and a large number of training samples are not utilized, more calculation resources are saved, and the model is low in consumption.
As shown in fig. 4, in some embodiments of the present invention based on the second aspect, the system for identifying a formula image region based on feature and aspect ratio detection further includes an image enhancement module 600, configured to perform image enhancement processing on the reference formula image region and the image region to be identified, so as to obtain an enhanced reference formula image region and an enhanced image region to be identified.
In order to ensure accurate subsequent image identification, the image enhancement module 600 performs image enhancement processing on each reference formula image area and the image area to be identified before feature extraction, so as to obtain a clear and accurate image.
As shown in fig. 4, according to the second aspect, in some embodiments of the present invention, the image enhancement module 600 includes a filtering sub-module 610, a scale detail sub-module 620, and a weighting enhancement sub-module 630, wherein:
a filtering submodule 610, configured to filter the reference formula image area and the image area to be identified by using a WLS least square filter;
the scale detail submodule 620 is configured to divide the filtered reference formula image area and the filtered image area to be identified into multiple scales, and perform subtraction calculation on the reference formula image area and the reference formula image area to be identified between the different scales to obtain image detail information of the multiple reference formula image areas and image detail information of the multiple image areas to be identified;
the weighting and enhancing sub-module 630 is configured to weight the image detail information of each reference formula image area into an original reference formula image area, and weight the image detail information of each to-be-identified image area into an original to-be-identified image area, so as to obtain a reference formula image area and an enhanced image of the to-be-identified image area.
Filtering the image by the filtering sub-module 610 using WLS (least squares filter), dividing the filtered image into a plurality of scales by the scale detail sub-module 620; carrying out deduction calculation among different scales to obtain some detailed information; weighting the detail information of different degrees into the original image through the weighting enhancement sub-module 630; and obtaining the enhanced image containing rich detail information.
Based on the second aspect, in some embodiments of the invention, as shown in fig. 4, the recognition module 500 comprises a first comparison sub-module 510, a second comparison sub-module 520, and a recognition determination sub-module 530, wherein:
the first comparison submodule 510 is configured to compare the feature similarity between the reference formula image area and the image area to be identified with a preset feature similarity threshold, and generate a first comparison result;
the second comparison submodule 520 is configured to compare the aspect ratio similarity between the reference formula image area and the image area to be identified with a preset aspect ratio similarity threshold, and generate a second comparison result;
and the identification and determination submodule 530 is used for identifying and determining the type of the image area to be identified according to the first comparison result and the second comparison result.
Respectively comparing the feature similarity and the aspect ratio similarity of the image area to be identified and the reference formula image area with a preset threshold value through a first comparison sub-module 510 and a second comparison sub-module 520, and determining the image area to be identified as the formula image area when the two similarities are matched with the preset threshold value; if one of the image areas is not matched, determining that the image area to be identified is a non-formula image area
As shown in fig. 4, according to the second aspect, in some embodiments of the present invention, the identification determination sub-module 530 includes a first judgment unit 531 and a second judgment unit 532, wherein:
a first judging unit 531 for judging whether the first comparison result contains information that the features of the image area to be identified and the image area of the reference formula are similar, if yes, a second judging unit 532 works; if not, determining that the image area to be identified is a non-formula image area
A second judging unit 532, configured to judge whether the second comparison result includes information that the aspect ratio of the image area to be identified is similar to the aspect ratio of the reference formula image area, and if so, determine that the image area to be identified is the formula image area; and if not, determining that the image area to be identified is a non-formula image area.
Judging the type of the image in the area to be identified through the first judging unit 531 and the second judging unit 532, and if the feature similarity between the image area to be identified and any one formula image area is higher and the aspect ratio similarity between the image area to be identified and any one formula image area is higher and simultaneously satisfied, judging that the image area to be identified is the formula image area; and if the feature similarity between the image area to be identified and any one formula image area is higher and any one of the aspect ratio similarities between the image area to be identified and any one formula image area is higher does not meet the requirement, judging that the image area to be identified is a non-formula image area.
Also included are a memory, a processor, and a communication interface, which are electrically connected, directly or indirectly, to each other to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by executing the software programs and modules stored in the memory. The communication interface may be used for communicating signaling or data with other node devices.
The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in fig. 4 is merely illustrative and may include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. A formula image area identification method based on feature and aspect ratio detection is characterized by comprising the following steps:
acquiring a reference formula image area and an image area to be identified;
SIFT feature extraction is carried out on the reference formula image region and the image region to be identified so as to obtain feature data of the reference formula image region and the image region to be identified;
calculating the feature similarity of the reference formula image area and the image area to be identified according to the feature data;
calculating the length-width ratio of the reference formula image area and the length-width ratio of the image area to be identified, and calculating the length-width ratio similarity of the image area to be identified and the reference formula image area by using the absolute difference value;
and identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the image area to be identified and the image area to be identified by the reference formula.
2. The method as claimed in claim 1, wherein the method further comprises the following steps before the step of performing SIFT feature extraction:
and carrying out image enhancement processing on the reference formula image area and the image area to be identified so as to obtain the enhanced reference formula image area and the image area to be identified.
3. The method for recognizing the formula image area based on the feature and the aspect ratio detection as claimed in claim 2, wherein the method for performing the image enhancement processing on the reference formula image area and the image area to be recognized comprises the following steps:
filtering the reference formula image area and the image area to be identified by using a WLS least square filter;
dividing the filtered reference formula image area and the filtered image area to be identified into a plurality of scales, and carrying out subtraction calculation on the reference formula image area and the image area to be identified between different scales to obtain image detail information of the plurality of reference formula image areas and image detail information of the plurality of image areas to be identified;
and weighting the image detail information of each reference formula image area into the original reference formula image area, and weighting the image detail information of each image area to be identified into the original image area to be identified so as to obtain the enhanced images of the reference formula image area and the image area to be identified.
4. The method for identifying the image area based on the formula of feature and aspect ratio detection as claimed in claim 1, wherein the method for identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the reference formula image area and the image area to be identified comprises the following steps:
comparing the feature similarity of the reference formula image area and the image area to be identified with a preset feature similarity threshold value to generate a first comparison result;
comparing the aspect ratio similarity of the reference formula image area and the image area to be identified with a preset aspect ratio similarity threshold to generate a second comparison result;
and identifying and determining the type of the image area to be identified according to the first comparison result and the second comparison result.
5. The method for recognizing the image area based on the formula of feature and aspect ratio detection as claimed in claim 4, wherein the method for recognizing and determining the type of the image area to be recognized according to the first comparison result and the second comparison result comprises the following steps:
a1, judging whether the first comparison result contains information that the characteristics of the image area to be identified and the reference formula image area are similar, if yes, entering the step A2; if not, determining that the image area to be identified is a non-formula image area;
a2, judging whether the second comparison result contains information that the aspect ratio of the image area to be identified is similar to that of the reference formula image area, if so, determining that the image area to be identified is the formula image area; and if not, determining that the image area to be identified is a non-formula image area.
6. The system for recognizing the formula image area based on feature and aspect ratio detection is characterized by comprising an image acquisition module, a feature extraction module, a feature calculation module, an aspect ratio calculation module and a recognition module, wherein:
the image acquisition module is used for acquiring a reference formula image area and an image area to be identified;
the characteristic extraction module is used for carrying out SIFT characteristic extraction on the reference formula image area and the image area to be identified so as to obtain characteristic data of the reference formula image area and the image area to be identified;
the characteristic calculation module is used for calculating the characteristic similarity of the reference formula image area and the image area to be identified according to the characteristic data;
the length-width ratio calculation module is used for calculating the length-width ratio of the reference formula image area and the length-width ratio of the image area to be identified, and calculating the length-width ratio similarity of the image area to be identified and the reference formula image area by using the absolute difference value;
and the identification module is used for identifying and determining the type of the image area to be identified according to the feature similarity and the aspect ratio similarity of the reference formula image area and the image area to be identified.
7. The system of claim 6, further comprising an image enhancement module for performing image enhancement processing on the reference formula image region and the image region to be identified to obtain enhanced reference formula image region and image region to be identified.
8. The system of claim 7, wherein the image enhancement module comprises a filter sub-module, a scale detail sub-module, and a weight enhancement sub-module, wherein:
the filtering submodule is used for filtering the reference formula image area and the image area to be identified by using a WLS least square filter;
the scale detail submodule is used for dividing the filtered reference formula image area and the filtered image area to be identified into a plurality of scales, and carrying out subtraction calculation on the reference formula image area and the image area to be identified between different scales to obtain image detail information of the plurality of reference formula image areas and image detail information of the plurality of image areas to be identified;
and the weighting and strengthening submodule is used for weighting the image detail information of each reference formula image area into the original reference formula image area and weighting the image detail information of each image area to be recognized into the original image area to be recognized so as to obtain a reference formula image area and a strengthened image of the image area to be recognized.
9. The system of claim 6, wherein the recognition module comprises a first comparison sub-module, a second comparison sub-module, and a recognition determination sub-module, wherein:
the first comparison submodule is used for comparing the feature similarity of the reference formula image area and the image area to be identified with a preset feature similarity threshold value to generate a first comparison result;
the second comparison submodule is used for comparing the aspect ratio similarity of the reference formula image area and the image area to be identified with a preset aspect ratio similarity threshold value to generate a second comparison result;
and the identification and determination submodule is used for identifying and determining the type of the image area to be identified according to the first comparison result and the second comparison result.
10. The system of claim 9, wherein the recognition determining sub-module comprises a first determining unit and a second determining unit, wherein:
the first judging unit is used for judging whether the first comparison result contains information that the characteristics of the image area to be identified and the reference formula image area are similar, and if so, the second judging unit works; if not, determining that the image area to be identified is a non-formula image area
The second judging unit is used for judging whether the second comparison result contains information that the length-width ratio of the image area to be identified is similar to the length-width ratio of the reference formula image area, and if so, determining that the image area to be identified is the formula image area; and if not, determining that the image area to be identified is a non-formula image area.
CN202110904935.2A 2021-08-07 2021-08-07 Formula image area identification method and system based on feature and aspect ratio detection Pending CN113705367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110904935.2A CN113705367A (en) 2021-08-07 2021-08-07 Formula image area identification method and system based on feature and aspect ratio detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110904935.2A CN113705367A (en) 2021-08-07 2021-08-07 Formula image area identification method and system based on feature and aspect ratio detection

Publications (1)

Publication Number Publication Date
CN113705367A true CN113705367A (en) 2021-11-26

Family

ID=78651828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110904935.2A Pending CN113705367A (en) 2021-08-07 2021-08-07 Formula image area identification method and system based on feature and aspect ratio detection

Country Status (1)

Country Link
CN (1) CN113705367A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080152225A1 (en) * 2004-03-03 2008-06-26 Nec Corporation Image Similarity Calculation System, Image Search System, Image Similarity Calculation Method, and Image Similarity Calculation Program
CN113191277A (en) * 2021-05-06 2021-07-30 北京惠朗时代科技有限公司 Table image region identification method and system based on entropy check
CN113221904A (en) * 2021-05-13 2021-08-06 北京惠朗时代科技有限公司 Semantic associated character recognition method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080152225A1 (en) * 2004-03-03 2008-06-26 Nec Corporation Image Similarity Calculation System, Image Search System, Image Similarity Calculation Method, and Image Similarity Calculation Program
CN113191277A (en) * 2021-05-06 2021-07-30 北京惠朗时代科技有限公司 Table image region identification method and system based on entropy check
CN113221904A (en) * 2021-05-13 2021-08-06 北京惠朗时代科技有限公司 Semantic associated character recognition method and device

Similar Documents

Publication Publication Date Title
CN110399890B (en) Image recognition method and device, electronic equipment and readable storage medium
CN108920580B (en) Image matching method, device, storage medium and terminal
CN114581207B (en) Commodity image big data accurate pushing method and system for E-commerce platform
CN108986125B (en) Object edge extraction method and device and electronic equipment
CN107909119B (en) Method and device for determining similarity between sets
WO2019062534A1 (en) Image retrieval method, apparatus, device and readable storage medium
CN115100450B (en) Intelligent traffic brand automobile big data detection method and system based on artificial intelligence
CN114241484B (en) Social network-oriented image big data accurate retrieval method and system
CN111914921A (en) Similarity image retrieval method and system based on multi-feature fusion
CN116562991B (en) Commodity big data information identification method and system for meta-space electronic commerce platform
CN107315984B (en) Pedestrian retrieval method and device
CN110083731B (en) Image retrieval method, device, computer equipment and storage medium
CN108960246B (en) Binarization processing device and method for image recognition
CN113191277B (en) Table image area identification method and system based on entropy verification
CN112597978B (en) Fingerprint matching method and device, electronic equipment and storage medium
CN114723536B (en) E-commerce platform cheap commodity selection method and system based on image big data comparison
CN113705367A (en) Formula image area identification method and system based on feature and aspect ratio detection
CN115984671A (en) Model online updating method and device, electronic equipment and readable storage medium
CN115713630A (en) Low-quality seal image big data identification method and system based on artificial intelligence
CN115690434A (en) Noise image identification method and system based on expert field denoising result optimization
CN115393617A (en) Simulated trademark rapid detection method and system based on multi-convolution kernel inspection
CN112861874A (en) Expert field denoising method and system based on multi-filter denoising result
CN112613310A (en) Name matching method and device, electronic equipment and storage medium
CN112633250A (en) Face recognition detection experimental method and device
CN113902046B (en) Special effect font recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination