CN114202761B - Information batch extraction method based on picture information clustering - Google Patents

Information batch extraction method based on picture information clustering Download PDF

Info

Publication number
CN114202761B
CN114202761B CN202210140562.0A CN202210140562A CN114202761B CN 114202761 B CN114202761 B CN 114202761B CN 202210140562 A CN202210140562 A CN 202210140562A CN 114202761 B CN114202761 B CN 114202761B
Authority
CN
China
Prior art keywords
objects
points
combined
edge points
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210140562.0A
Other languages
Chinese (zh)
Other versions
CN114202761A (en
Inventor
纪俊光
黎慧燕
陈学言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Shuyuan Zhihui Technology Co ltd
Original Assignee
Guangdong Shuyuan Zhihui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Shuyuan Zhihui Technology Co ltd filed Critical Guangdong Shuyuan Zhihui Technology Co ltd
Priority to CN202210140562.0A priority Critical patent/CN114202761B/en
Publication of CN114202761A publication Critical patent/CN114202761A/en
Application granted granted Critical
Publication of CN114202761B publication Critical patent/CN114202761B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an information batch extraction method, a system and a computer readable storage medium based on picture information clustering, wherein the method comprises the following steps: extracting commodity objects and character objects from the image to be identified, classifying and numbering the commodity objects and the character objects, and determining a coordinate system of each object; dotting the obtained edges of different objects and determining the coordinates of the dotting; performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value; and continuing to perform collision calculation on the combined object and other objects, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined object, judging that the object does not belong to the object in the same combination, continuing to perform collision calculation with other objects of different types until all the objects are combined, and outputting the combined object. The invention can realize the combined identification of the associated objects in the complex background and extract the information.

Description

Information batch extraction method based on picture information clustering
Technical Field
The invention relates to the technical field of intelligent processing of internet big data, in particular to a method and a system for extracting information in batch based on picture information clustering and a computer-readable storage medium.
Background
The OCR technology frame is an important technology often used by Internet companies to identify graphic and text information, and the technology itself is to acquire the information of characters and pictures on paper by optical input devices such as scanners or cameras, analyze the morphological structure of the characters by using various pattern recognition algorithms to form corresponding character feature descriptions, and convert the characters in the images into text formats by using a proper character matching method.
The method is a very practical and efficient technology for analyzing a large number of pictures by using big data, but the traditional identification technology is usually a single information scanning mode, the identified characters are treated as single individuals, the function of identifying combined contents is not realized, and the characters are processed in a block scanning mode, so that the real semantic condition of the description object cannot be accurately known by the identified single characters frequently.
The prior art discloses a method and a device for identifying an object in an image, wherein the method comprises the following steps: preprocessing an image to be recognized to obtain a binary image of the image to be recognized; cutting the binary image into a plurality of sub-regions, and selecting a first sub-region from the plurality of sub-regions, wherein the first sub-region is a sub-region containing preset pixels; combining the first sub-regions to obtain at least one second sub-region based on the distances of different first sub-regions in the binary image; identifying a target object in the second sub-region. The scheme aims at object recognition in a complex background, and does not solve the problem of recognition of associated objects or combined objects.
Disclosure of Invention
The invention provides a method and a system for extracting information in batches based on picture information clustering and a computer-readable storage medium, aiming at overcoming the defect that the existing picture information extraction method does not solve the identification and extraction of associated objects or combined objects.
The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:
the invention provides a method for extracting information in batches based on picture information clustering, which comprises the following steps:
s1: extracting commodity objects and character objects from the images to be recognized by using an OCR recognition method, classifying the commodity objects and the character objects into numbers, taking the objects in each image as independent objects, and determining a coordinate system of each object;
s2: dotting the edges of all the commodity objects and the character objects in each image, marking the dotted points as edge points, and determining the coordinates of the edge points according to the coordinate system of each object;
s3: performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value;
s4: and continuously performing collision calculation on the two combined objects and other objects in different classes respectively, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined objects, judging that the object does not belong to the object in the same combination, continuously searching other objects in different classes for performing collision calculation until all the objects are combined, and outputting the combined object.
Further, in step S1, the OCR recognition method is used to extract the commodity object and the character object from the image to be recognized from left to right and from top to bottom.
Further, the specific process of dotting the edges of all the commodity objects and the character objects in each image is as follows:
determining a dotting object, firstly, respectively taking 4 points at the farthest distances of the upper left corner, the upper right corner, the lower left corner and the lower right corner of the dotting object, and constructing a four-point connecting line into an irregular rectangle;
and respectively taking the centers between points from the upper left corner to the upper right corner, from the lower left corner to the lower right corner, from the upper left corner to the lower left corner and from the upper right corner to the lower right corner, and correspondingly determining 4 points, namely, upper, lower, left and right points.
Further, the collision calculation process is as follows:
the distance between two objects on the x-axis is expressed as | x2-x1| by respectively describing the points adjacent to the two objects as P1 and P2, the coordinate of the point P1 as (x 1, y 1), and the coordinate of the point P2 as (x 2, y 2).
Further, the preset multiple of step S4 is greater than or equal to 2.
Further, in step S4, when the collision calculation is continued to search for other objects of different types, if no valid data is identified, the current process is ended and the combined object is output.
Further, the collision calculation is only performed between different types of objects.
The invention provides a system for extracting information in batches based on image information clustering in a second aspect, which comprises: the information batch extraction method based on the picture information clustering comprises a memory and a processor, wherein the memory comprises an information batch extraction method program based on the picture information clustering, and when the information batch extraction method program based on the picture information clustering is executed by the processor, the following steps are realized:
s1: extracting commodity objects and character objects from the images to be recognized by using an OCR recognition method, classifying and numbering the commodity objects and the character objects, taking the objects in each image as independent objects, and determining a coordinate system of each object;
s2: dotting the edges of all the commodity objects and the character objects in each image, marking the dotted points as edge points, and determining the coordinates of the edge points according to the coordinate system of each object;
s3: performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value;
s4: and continuously performing collision calculation on the two combined objects and other objects in different classes respectively, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined objects, judging that the object does not belong to the object in the same combination, continuously searching other objects in different classes for performing collision calculation until all the objects are combined, and outputting the combined object.
Further, in step S1, the OCR recognition method is used to extract the commodity object and the character object from the image to be recognized from left to right and from top to bottom.
The third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a program of a batch information extraction method based on picture information clustering, and when the program of the batch information extraction method based on picture information clustering is executed by a processor, the steps of the batch information extraction method based on picture information clustering are implemented.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the method, different objects in the picture are firstly identified and classified, then distance calculation is carried out on different independent objects, and then the different objects are combined.
Drawings
FIG. 1 is a flow chart of an information batch extraction method based on image information clustering according to the present invention.
Fig. 2 is a diagram of the recognition effect according to the embodiment of the present invention.
FIG. 3 is a diagram illustrating neighboring points of different objects according to an embodiment of the present invention.
FIG. 4 is a schematic diagram illustrating matching between neighboring points of different objects according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of the collision calculation of neighboring points of the combined object according to the embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1
As shown in fig. 1, a first aspect of the present invention provides a batch information extraction method based on image information clustering, including the following steps:
s1: extracting commodity objects and character objects from the images to be recognized by using an OCR recognition method, classifying the commodity objects and the character objects into numbers, taking the objects in each image as independent objects, and determining a coordinate system of each object;
in a specific embodiment, for example, in a detailed picture of a sales promotion advertisement, the picture has a plurality of mobile phone images and a plurality of corresponding commodity prices, and the images + text are arranged from top to bottom and from left to right, as shown in fig. 2, the names of the commodities and the commodity prices are arranged below the mobile phone images, and at this time, they need to be identified as a combination, that is, the mobile phone image corresponds to the text name and price.
Firstly, a commodity object and a character object are extracted from an image to be recognized, the commodity object and the character object can be respectively extracted by recognition scanning from left to right and from top to bottom by adopting an OCR recognition method, such as the commodity object 001, the character object 001 and the like, the object in each image is taken as an independent object, and a coordinate system of each object is determined;
s2: dotting the edges of all the commodity objects and the character objects in each image, marking the dotted points as edge points, and determining the coordinates of the edge points according to the coordinate system of each object;
in a specific embodiment, after obtaining the classified objects and determining the coordinate system of each object, dotting on the edge of the object is required, for example, eight points may be dotted, specifically as follows: determining a dotting object, firstly, respectively taking 4 points from the farthest distances of the upper left corner, the upper right corner, the lower left corner and the lower right corner of the dotting object, and constructing a four-point connecting line into an irregular rectangle;
and respectively taking centers between points from the upper left corner to the upper right corner, from the lower left corner to the lower right corner, from the upper left corner to the lower left corner and from the upper right corner to the lower right corner, and correspondingly determining 4 points, namely upper, lower, left and right points.
It should be noted that each of the marked points can be used to determine a coordinate position according to the coordinate system of the object for the subsequent calculation of the distance between the objects.
S3: performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value;
in one specific embodiment, the collision calculation process is: the distance between two objects in the x axis is expressed as | x2-x1| by respectively describing the adjacent edge points of the two objects as P1 and P2, the coordinate of the point P1 as (x 1, y 1), and the coordinate of the point P2 as (x 2, y 2). It should be noted that the collision calculation is only performed between different types of objects, such as: only the commodity object 1 and the character object 1 are calculated point-to-point, but the commodity object 1 and the commodity object 2 are not calculated in a collision manner, three edge points of a left lower edge point 1, a lower edge point 2 and a right lower edge point 3 of a commodity map (namely the commodity object) shown in fig. 3 are calculated with three edge points of a left upper edge point 4, an upper edge point 5 and a right upper edge point 6 of the character object, when the distance between the commodity map 1 and the character object is smaller than a preset value, the three edge points of the left lower edge point 1, the lower edge point 2 and the right lower edge point 3 of the commodity object (the commodity map) are set as combined objects, and when the distance between the commodity map 1 and the character object is smaller than the preset value, the three edge points of the left lower edge point 1, the lower edge point 2 and the right lower edge point 3 of the commodity object are calculated with the three edge points of the left upper edge point 4, the upper edge point 5 and the right upper edge point 6 of the character object, and the commodity map 1 and the character object are set as the combined objects.
It should be noted that, in the above processing method, the current collision calculation operation is performed only when the distances of 2 edge points (including more than 2) are close, there is no fixed matching object between the edge points and the edge points, and the edge points are only used for calculation, and the edge points with the closest distance are taken for calculation. As shown in fig. 4, the edge points 2 and 3 of the product object (i.e., the product picture) are computationally matched with the edge points 6 and 7 of the lower character object.
S4: and continuously performing collision calculation on the two combined objects and other objects in different classes respectively, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined objects, judging that the object does not belong to the object in the same combination, continuously searching other objects in different classes for performing collision calculation until all the objects are combined, and outputting the combined object.
In a specific embodiment, as shown in fig. 5, after a combined object is obtained, collision calculation is continuously performed on the two combined objects and other different types of objects, if a distance between edge points of the two objects is greater than a preset multiple of a distance between edge points of currently combined objects in the collision calculation, for example, the preset multiple is 2 times or more than 2 times, it is determined that the object does not belong to an object in the same combination, the other different types of objects are continuously searched for collision calculation until all the objects are combined, and the combined object is output, that is, the recognized text information is output. It should be noted that, when continuing to search for other objects of different types for performing collision calculation, if no valid data is identified, the current process is ended and the combined object is output.
Example 2
The invention provides a system for extracting information in batches based on image information clustering in a second aspect, which comprises: the information batch extraction method based on the picture information clustering comprises a memory and a processor, wherein the memory comprises an information batch extraction method program based on the picture information clustering, and when the information batch extraction method program based on the picture information clustering is executed by the processor, the following steps are realized:
s1: extracting commodity objects and character objects from the images to be recognized by using an OCR recognition method, classifying the commodity objects and the character objects into numbers, taking the objects in each image as independent objects, and determining a coordinate system of each object;
in a specific embodiment, for example, in a detailed picture of the sales promotion advertisement of the product, the picture has a plurality of mobile phone images and a plurality of corresponding product prices, and the images + characters are arranged in an up-down layout from left to right, as shown in fig. 2, the product name and the product price are arranged below the mobile phone image, and then they need to be identified as a combination, which means that the name and the price of the character correspond to the mobile phone image.
Firstly, a commodity object and a character object are extracted from an image to be recognized, the commodity object and the character object can be respectively extracted by recognition scanning from left to right and from top to bottom by adopting an OCR recognition method, such as the commodity object 001, the character object 001 and the like, the object in each image is taken as an independent object, and a coordinate system of each object is determined;
s2: dotting the edges of all the commodity objects and the character objects in each image, marking the dotted points as edge points, and determining the coordinates of the edge points according to the coordinate system of each object;
in a specific embodiment, after obtaining the classified objects and determining the coordinate system of each object, dotting on the edge of the object is required, for example, eight points may be dotted, specifically as follows: determining a dotting object, firstly, respectively taking 4 points at the farthest distances of the upper left corner, the upper right corner, the lower left corner and the lower right corner of the dotting object, and constructing a four-point connecting line into an irregular rectangle;
and respectively taking the centers between points from the upper left corner to the upper right corner, from the lower left corner to the lower right corner, from the upper left corner to the lower left corner and from the upper right corner to the lower right corner, and correspondingly determining 4 points, namely, upper, lower, left and right points.
It should be noted that each of the marked points can be used to determine a coordinate position according to the coordinate system of the object for the subsequent calculation of the distance between the objects.
S3: performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value;
in one specific embodiment, the collision calculation process is: the distance between two objects on the x axis is expressed as | x2-x1| by respectively describing the points adjacent to the two objects as P1 and P2, the coordinates of the point P1 as (x 1, y 1) and the coordinates of the point P2 as (x 2, y 2). It should be noted that the collision calculation is only performed between different types of objects, such as: only the commodity object 1 and the character object 1 are calculated point-to-point, but the commodity object 1 and the commodity object 2 are not calculated in a collision manner, as shown in fig. 3, three edge points of a left lower edge point 1, a lower edge point 2 and a right lower edge point 3 of the commodity object (commodity diagram) are calculated with three edge points of a left upper edge point 4, an upper edge point 5 and a right upper edge point 6 of the character object, and when the distance between the commodity diagram 1 and the character object is smaller than a preset value, the two are set as a combined object.
It should be noted that, in the above processing method, the current collision calculation operation is performed only when the distances of 2 edge points (including more than 2) are close, there is no fixed matching object between the edge points and the edge points, and the edge points are only used for calculation, and the edge points with the closest distance are taken for calculation. As shown in fig. 4, the edge points 2 and 3 of the product object (product map) are computationally matched with the edge points 6 and 7 of the lower character object.
S4: and continuously performing collision calculation on the two combined objects and other objects in different classes respectively, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined objects, judging that the object does not belong to the object in the same combination, continuously searching other objects in different classes for performing collision calculation until all the objects are combined, and outputting the combined object.
In a specific embodiment, after the combined object is obtained, collision calculation is performed on the two combined objects and other different types of objects respectively, in the collision calculation, if a distance between edge points of the two objects is greater than a preset multiple of a distance between edge points of currently combined objects, for example, if the preset multiple is 2 times or more than 2 times, it is determined that the object does not belong to an object in the same combination, the other different types of objects are continuously searched for collision calculation until all the objects are combined, and the combined object is output, that is, the identified text information is output. It should be noted that, when other objects of different types are continuously searched for collision calculation, if no valid data is identified, the current process is ended and the combined object is output.
Example 3
The third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a program of a batch information extraction method based on picture information clustering, and when the program of the batch information extraction method based on picture information clustering is executed by a processor, the steps of the batch information extraction method based on picture information clustering are implemented.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. An information batch extraction method based on picture information clustering is characterized by comprising the following steps:
s1: extracting commodity objects and character objects from the images to be recognized by using an OCR recognition method, classifying the commodity objects and the character objects into numbers, taking the objects in each image as independent objects, and determining a coordinate system of each object;
s2: dotting the edges of all the commodity objects and the character objects in each image, marking the dotted points as edge points, and determining the coordinates of the edge points according to the coordinate system of each object;
the specific process of dotting the edges of all the commodity objects and the character objects in each image is as follows: determining a dotting object, firstly, respectively taking 4 points at the farthest distances of the upper left corner, the upper right corner, the lower left corner and the lower right corner of the dotting object, and constructing a four-point connecting line into an irregular rectangle;
respectively taking centers between points from the upper left corner to the upper right corner, from the lower left corner to the lower right corner, from the upper left corner to the lower left corner and from the upper right corner to the lower right corner, and correspondingly determining 4 points, namely, upper, lower, left and right points;
s3: performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value;
the collision calculation process comprises the following steps:
the adjacent edge points of the two objects are respectively recorded as P1 and P2, the coordinate of the point P1 is recorded as (x 1, y 1), the coordinate of the point P2 is recorded as (x 2, y 2), and then the distance between the two objects on the x axis is recorded as | x2-x1 |;
s4: and continuously performing collision calculation on the two combined objects and other different types of objects respectively, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined objects, judging that the other different types of objects subjected to the collision calculation do not belong to the objects in the same combination, continuously searching other different types of objects for the collision calculation until all the objects are combined, and outputting the combined objects.
2. The batch information extraction method based on picture information clustering of claim 1, wherein in step S1, the OCR recognition method is used to extract commodity objects and character objects from the image to be recognized from left to right and from top to bottom.
3. The batch extraction method of information based on picture information clustering of claim 1, wherein the preset multiple of step S4 is greater than or equal to 2.
4. The batch extraction method of information based on image information clustering of claim 1, wherein in step S4, when continuing to search for other objects of different classes for collision calculation, if no valid data is identified, the current process is also ended and the combined object is output.
5. The batch extraction method for information based on picture information clustering according to claim 1, characterized in that the collision calculation is only performed between different types of objects.
6. The utility model provides an information batch extraction system based on picture information clustering which characterized in that, this system includes: the information batch extraction method based on the picture information clustering comprises a memory and a processor, wherein the memory comprises an information batch extraction method program based on the picture information clustering, and when the information batch extraction method program based on the picture information clustering is executed by the processor, the following steps are realized:
s1: extracting commodity objects and character objects from the images to be recognized by using an OCR recognition method, classifying the commodity objects and the character objects into numbers, taking the objects in each image as independent objects, and determining a coordinate system of each object;
s2: dotting the edges of all the commodity objects and the character objects in each image, marking the dotted points as edge points, and determining the coordinates of the edge points according to the coordinate system of each object;
the specific process of dotting the edges of all the commodity objects and the character objects in each image is as follows: determining a dotting object, firstly, respectively taking 4 points at the farthest distances of the upper left corner, the upper right corner, the lower left corner and the lower right corner of the dotting object, and constructing a four-point connecting line into an irregular rectangle;
respectively taking centers between points from the upper left corner to the upper right corner, from the lower left corner to the lower right corner, from the upper left corner to the lower left corner and from the upper right corner to the lower right corner, and correspondingly determining 4 points, namely, upper, lower, left and right points;
s3: performing collision calculation on adjacent different types of objects by using edge points, and taking the current two objects as combined objects if the distance between the edge points of the two adjacent different types of objects is smaller than a preset value;
the collision calculation process comprises the following steps:
the adjacent edge points of the two objects are respectively marked as P1 and P2, the coordinate of the point P1 is marked as (x 1, y 1), the coordinate of the point P2 is marked as (x 2, y 2), and then the distance between the two objects in the x axis is marked as | x2-x1 |;
s4: and continuously performing collision calculation on the two combined objects and other different types of objects respectively, if the distance between the edge points is greater than a preset multiple of the distance between the edge points of the current combined objects, judging that the other different types of objects subjected to the collision calculation do not belong to the objects in the same combination, continuously searching other different types of objects for the collision calculation until all the objects are combined, and outputting the combined objects.
7. The system of claim 6, wherein in step S1, the image to be recognized is scanned from top to bottom to extract the commodity objects and the text objects from the image to be recognized by using an OCR recognition method.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium includes a program of batch information extraction method based on picture information clustering, and when the program of batch information extraction method based on picture information clustering is executed by a processor, the steps of a batch information extraction method based on picture information clustering according to any one of claims 1 to 5 are implemented.
CN202210140562.0A 2022-02-16 2022-02-16 Information batch extraction method based on picture information clustering Active CN114202761B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210140562.0A CN114202761B (en) 2022-02-16 2022-02-16 Information batch extraction method based on picture information clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210140562.0A CN114202761B (en) 2022-02-16 2022-02-16 Information batch extraction method based on picture information clustering

Publications (2)

Publication Number Publication Date
CN114202761A CN114202761A (en) 2022-03-18
CN114202761B true CN114202761B (en) 2022-06-21

Family

ID=80659023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210140562.0A Active CN114202761B (en) 2022-02-16 2022-02-16 Information batch extraction method based on picture information clustering

Country Status (1)

Country Link
CN (1) CN114202761B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2364417B (en) * 2000-06-30 2004-10-06 Post Office Image processing
US20150310601A1 (en) * 2014-03-07 2015-10-29 Digimarc Corporation Methods and arrangements for identifying objects
CN108038426A (en) * 2017-11-29 2018-05-15 阿博茨德(北京)科技有限公司 The method and device of chart-information in a kind of extraction document
RU2673015C1 (en) * 2017-12-22 2018-11-21 Общество с ограниченной ответственностью "Аби Продакшн" Methods and systems of optical recognition of image series characters
CN108647553B (en) * 2018-05-10 2022-01-25 上海扩博智能技术有限公司 Method, system, device and storage medium for rapidly expanding images for model training
US20200004815A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Text entity detection and recognition from images
CN111310706B (en) * 2020-02-28 2022-10-21 创新奇智(上海)科技有限公司 Commodity price tag identification method and device, electronic equipment and storage medium
CN113705559B (en) * 2021-08-31 2024-05-10 平安银行股份有限公司 Character recognition method and device based on artificial intelligence and electronic equipment

Also Published As

Publication number Publication date
CN114202761A (en) 2022-03-18

Similar Documents

Publication Publication Date Title
TWI631514B (en) Method and system for marking recognition based on mobile terminal
JP5492205B2 (en) Segment print pages into articles
CN110363102A (en) A kind of identification of objects process method and device of pdf document
CN110807454B (en) Text positioning method, device, equipment and storage medium based on image segmentation
CN111461133B (en) Express delivery surface single item name identification method, device, equipment and storage medium
CN113963147B (en) Key information extraction method and system based on semantic segmentation
CN112861861B (en) Method and device for recognizing nixie tube text and electronic equipment
CN115761773A (en) Deep learning-based in-image table identification method and system
CN112883926A (en) Identification method and device for table medical images
CN115713775A (en) Method, system and computer equipment for extracting form from document
CN114581928A (en) Form identification method and system
Karanje et al. Survey on text detection, segmentation and recognition from a natural scene images
Xu et al. Tolerance Information Extraction for Mechanical Engineering Drawings–A Digital Image Processing and Deep Learning-based Model
CN114202761B (en) Information batch extraction method based on picture information clustering
CN110414497A (en) Method, device, server and storage medium for electronizing object
JP2023156991A (en) information processing system
JPH07168910A (en) Document layout analysis device and document format identification device
CN114387600A (en) Text feature recognition method and device, computer equipment and storage medium
CN112348022A (en) Free-form document identification method based on deep learning
Shekar Skeleton matching based approach for text localization in scene images
JP7343115B1 (en) information processing system
US11899704B2 (en) Systems and methods for extracting, digitizing, and using engineering drawing data
Kataria et al. Review on text detection and recognition in images
CN117711004A (en) Form document information extraction method based on image recognition
Aghav et al. Computer Assisted Printed Character recognition in document based images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 519031 room 1016, No. 3000, Huandao East Road, Hengqin New District, Zhuhai City, Guangdong Province

Applicant after: Guangdong Shuyuan Zhihui Technology Co.,Ltd.

Address before: 510520 unit 151, first floor, No. 136, Gaopu Road, Tianhe District, Guangzhou, Guangdong

Applicant before: Guangdong Shuyuan Zhihui Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant