CN116403232B - 一种基于像素值波动的图书信息提取方法 - Google Patents
一种基于像素值波动的图书信息提取方法 Download PDFInfo
- Publication number
- CN116403232B CN116403232B CN202310394804.3A CN202310394804A CN116403232B CN 116403232 B CN116403232 B CN 116403232B CN 202310394804 A CN202310394804 A CN 202310394804A CN 116403232 B CN116403232 B CN 116403232B
- Authority
- CN
- China
- Prior art keywords
- book
- region
- feature
- information
- fluctuation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000008859 change Effects 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 52
- 239000011159 matrix material Substances 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 9
- 238000000034 method Methods 0.000 claims description 9
- 238000003491 array Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/147—Determination of region of interest
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/18086—Extraction of features or characteristics of the image by performing operations within image blocks or by using histograms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/19007—Matching; Proximity measures
- G06V30/19093—Proximity measures, i.e. similarity or distance measures
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于像素值波动的图书信息提取方法,属于图像处理技术领域,包括获取图书封面的图像,预处理,ROI检测得到图书基本信息数组,检查每一个区域存在图书基本信息的概率P,只保留可能存在目标的区域,提取书名和作者所在区域的原始特征,计算特征变化和狗仔波动特征,对目标区域内容进行识别,解决了在图书信息提取时,得到稳定的特征的技术问题,本发明可得到更可靠的波动特征,可以清除掉异常的特征数据,即使在图书封面图像噪音较多、部分被遮挡、图像扭曲、图像较小等情况下,亦可提取出书名和作者等基本信息。
Description
技术领域
本发明属于图像处理技术领域,尤其涉及一种基于像素值波动的图书信息提取方法。
背景技术
图书信息识别是一种依据图像的特征(如统计或几何特征等),从图像中提取出书名和作者信息。目前通用的解决方案是:首先检测图像中检测目标(如书名和作者的信息)的位置坐标,然后提取出目标的图像特征,最后从数据库中的找到最相似的内容。
图书信息识别属于OCR识别的一种,图书信息的目标特征通常分为视觉特征、像素统计特征、图像变换系数特征和图像代数特征等,特征提取就是针对目标的某些特征进行的,它是对目标进行特征建模的过程。特征提取过程包括:采集原始图像、目标检测(定位图书的在图像中的位置和大小)、图像预处理(图像矫正、噪声过滤等)以及特征提取(识别关键点并生成特征向量)。目前,特征提取主要有传统特征的提取算法(SIFT、LBP、HOG等)和基于深度学习的提取方法等两大类。
目前,传统OCR识别的相关算法精度已经达到了很高的水准,在特定场景下,可以提取到高质量的特征数据并能相对准确识别出目标的内容。但图书信息的识别比传统的OCR识别更复杂,主要体现在以下两个方面:
1.因为传统的OCR识别主要识别英文字母和数字,而图书信息除了需要识别英文字母和数字外,还需要识别汉字和特殊符号。
2.图书封面各不相同,图像中干扰信息较多,很难提取到高质量的特征数据。
发明内容
本发明的目的是提供一种基于像素值波动的图书信息提取方法,解决了在图书信息提取时,得到稳定的特征的技术问题。
为实现上述目的,本发明采用如下技术方案:
一种基于像素值波动的图书信息提取方法,包括如下步骤:
步骤1:建立图像处理服务器,在图像处理服务器中建立预处理模块、ROI模块、区域识别模块、特征提取模块、特征构造模块和特征识别模块,图像处理服务器通过互联网获取图书封面的图像;
步骤2:预处理模块对图书封面的图像进行预处理,得到预处理后图像;
步骤3:ROI模块对预处理后图像进行ROI检测,得到图书基本信息数组,图书基本信息数组为二维数组,其内容包含书名信息和作者信息;
步骤4:区域识别模块遍历图书基本信息数组,检查每一个区域存在图书基本信息的概率P;
步骤5:选取任意一个区域A,对区域A的概率P作出判断:如果概率P值到达预设阈值,则,标记区域A为待处理区域,待处理区域为包含了署名信息和作者信息的区域;反之,不对区域A做任何处理;
重复执行步骤5直到对所有区域均判断完毕,执行步骤6;
步骤6:特征提取模块将待处理区域从预处理后图像中截取出来,并提取特征向量,特征向量表示为一个特征向量数组,特征向量数组包含根据图像像素值的变化,提取的预处理后图像中特定区域的HOG特征;
将待处理区域的位置信息和特征向量分别表示为一组位置数组和一组特征向量数组;
步骤7:特征构造模块计算待处理区域的特征的变化,构造波动特征,具体包括如下步骤;
步骤7-1:在任意一个待处理区域B中,计算每个区域特征向量元素两两之间的距离,得到距离矩阵;
步骤7-2:将距离矩阵通过以下公式转换为一个向量,使用该向量作为待处理区域B的波动特征:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n;
步骤8:特征识别模块提取波动特征,并进行识别,采用欧式距离算法进行相似度计算,从特征库中寻找出相似度最高的书名信息和作者信息,作为最终的结果进行输出。
优选的,在执行步骤2时,对图书封面的图像进行的预处理包括统一大小处理、降噪处理、灰度化处理和二值化处理。
优选的,在执行步骤3时,图书基本信息数组具体表示为[X0,Y0,X1,Y1,P],其中,其中X0,Y0分别表示区域左上角X和Y坐标,X1和Y1分别表示区域右下角X和Y坐标,P表示这个区域存在图书基本信息的概率。
优选的,在执行步骤6时,特征向量数组具体表示为[F1,F2,F3,F4,F5,F6,……,Fn],其中,Fi为特征向量数组中的一个特征值,i取值为1到n,位置数组具体表示为[X0,Y0,X1,Y1],其中,X0,Y0,X1,Y1均表示待处理区域的位置坐标。
优选的,在执行步骤7-1时,具体为:设定待处理区域B的特征向量为[x1,x2,x3,x4,x5,x6,……,xn],共有n个维度,根据以下公式计算每个区域特征向量元素两两之间的距离:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n,d表示使用欧式距离的度量方式。
优选的,在执行步骤8时,采用欧式距离方式进行相似度计算的具体公式如下:
其中,Di为相似度值,表示待处理区域的特征向量和特征库中第i个特征向量的相似度,fnn表示待处理区域的波动特征值(原始特征向量有n个特征值),nn表示有该区域nxn个波动特征值,Fnn表示特征库中的某一个波动特征值,原始特征向量共有n个特征值,nn表示共有nxn个波动特征值。
本发明所述的一种基于像素值波动的图书信息提取方法,解决了在图书信息提取时,得到稳定的特征的技术问题,本发明可得到更可靠的波动特征,可以清除掉异常的特征数据,即使在图书封面图像噪音较多、部分被遮挡、图像扭曲、图像较小等情况下,亦可提取出书名和作者等基本信息。
附图说明
图1是本发明的流程图;
图2是本发明的图书原始封皮示意图;
图3是本发明的预处理后的图书封面示意图;
图4是本发明的ROI区域处理后示意图;
图5是本发明的过滤后的RIO区域示意图。
具体实施方式
由图1-图5所示的一种基于像素值波动的图书信息提取方法,包括如下步骤:
步骤1:建立图像处理服务器,在图像处理服务器中建立预处理模块、ROI模块、区域识别模块、特征提取模块、特征构造模块和特征识别模块,图像处理服务器通过互联网获取图书封面的图像;
步骤2:预处理模块对图书封面的图像进行预处理,得到预处理后图像;对图书封面的图像进行的预处理包括统一大小处理、降噪处理、灰度化处理和二值化处理。
本实施例中,先将图书封面图像转化为灰度图,然后在适应高斯模糊进行降噪音(参数高斯核为5),然后再进行二值化处理(最小灰度值参数为80,再大灰度参数为255),灰度值在80~255之间的设为纯白,其他的为纯黑。
步骤3:ROI模块对预处理后图像进行ROI检测,得到图书基本信息数组,图书基本信息数组为二维数组,其内容包含书名信息和作者信息;
图书基本信息数组具体表示为[X0,Y0,X1,Y1,P],其中,其中X0,Y0分别表示区域左上角X和Y坐标,X1和Y1分别表示区域右下角X和Y坐标,P表示这个区域存在图书基本信息的概率。
本实施例中根据灰度值的变化进行ROI检测,得到可能存在书名后作者的区域,灰度值变化阈值参数为0.05,ROI区域最小的像素面积参数为100。
得到4个ROI区域每个ROI区域格式为(x,y,w,h,p)格式,其中x,y表示矩型的左上角坐标,w表示矩型的宽度,h表示矩型的高度,p表示该区域存在目标的概率,4个ROI区域具体表示为:[162,549,244,61,0.5],[249,139,81,343,0.9],[366,91,47,109,0.95],[11,67,92,362,0.45]。
步骤4:区域识别模块遍历图书基本信息数组,检查每一个区域存在图书基本信息的概率P;
步骤5:选取任意一个区域A,对区域A的概率P作出判断:如果概率P值到达预设阈值,则,标记区域A为待处理区域,待处理区域为包含了署名信息和作者信息的区域;反之,不对区域A做任何处理;
重复执行步骤5直到对所有区域均判断完毕,执行步骤6;
本实施例中,经过步骤5的处理后只剩下2个可能存在目标的区域[[249,139,81,343,0.9],[366,91,47,109,0.95]]。
步骤6:特征提取模块将待处理区域从预处理后图像中截取出来,并提取特征向量,特征向量表示为一个特征向量数组,特征向量数组包含根据图像像素值的变化,提取的预处理后图像中特定区域的HOG特征;
将待处理区域的位置信息和特征向量分别表示为一组位置数组和一组特征向量数组;
特征向量数组具体表示为[F1,F2,F3,F4,F5,F6,……,Fn],其中,Fi为特征向量数组中的一个特征值,i取值为1到n,位置数组具体表示为[X0,Y0,X1,Y1],其中,X0,Y0,X1,Y1均表示待处理区域的位置坐标。
本实施例中分别从原始图像的二值图中,截取出第二步提取到的ROI区域,并提取图像的梯度特征(即HOG特征),提取HOG特征时,使用的参数bin为9,得到每个ROI区域对应的特征。
区域[249,139,81,343,0.9]的特征如下:
[0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.264466,20.734375,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.264466,20.734375,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.264466,20.734375,0.000000,0.000000,0.000000,0.000000,5.703125,5.703125,35.395454,5.703125,5.703125,5.703125,5.703125,0.000000,5.156250,5.156250,25.890625,8.420716,5.156250,5.156250,5.156250,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,20.734375,3.264466,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,20.734375,3.264466,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.043978,25.421875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.043978,25.421875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.043978,25.421875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,6.781250,6.781250,32.203125,10.825228,6.781250,6.781250,6.781250,0.000000,6.140625,6.140625,29.976255,6.140625,6.140625,6.140625,6.140625,0.000000,0.000000,0.000000,25.421875,4.043978,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,25.421875,4.043978,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000]。
区域[366,91,47,109,0.95]的特征如下:
[0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.921875,0.000000,45.877567,0.000000,0.000000,0.000000,0.000000,0.000000,4.009815,6.814570,40.516960,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.921875,45.877567,0.000000,3.960375,0.000000,0.000000,0.000000,0.000000,10.824385,40.516960,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.921875,49.837942,3.960375,0.000000,0.000000,0.000000,0.000000,0.000000,10.824385,40.516960,0.000000,0.000000,0.000000,0.000000,18.532661,22.493037,68.370604,21.454536,18.532661,18.532661,18.532661,0.000000,18.534107,18.534107,59.051067,29.358492,18.534107,18.534107,18.534107,0.000000,3.960375,0.000000,45.877567,2.921875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,40.516960,10.824385,0.000000,0.000000,0.000000,
0.000000,0.000000,0.000000,45.877567,0.000000,2.921875,0.000000,0.000000,0.000000,0.000000,0.000000,40.516960,6.814570,4.009815,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,6.336285,38.282017,2.971216,0.000000,0.000000,0.000000,0.000000,0.000000,10.936128,36.392204,0.000000,0.000000,0.000000,0.000000,3.858616,0.000000,6.336285,41.253233,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,10.936128,36.392204,0.000000,3.716517,0.000000,0.000000,0.000000,3.858616,10.194901,41.253233,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,10.936128,40.108721,3.716517,0.000000,0.000000,0.000000,16.073910,16.073910,57.327142,26.268810,19.932526,16.073910,16.073910,0.000000,16.751635,20.468152,56.860356,27.687762,16.751635,16.751635,16.751635,0.000000,0.000000,0.000000,41.253233,6.336285,0.000000,3.858616,0.000000,0.000000,3.716517,0.000000,36.392204,10.936128,0.000000,0.000000,0.000000,0.000000,0.000000,2.971216,38.282017,6.336285,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,36.392204,10.936128,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000]。
步骤7:特征构造模块计算待处理区域的特征的变化,构造波动特征,具体包括如下步骤;
步骤7-1:在任意一个待处理区域B中,计算每个区域特征向量元素两两之间的距离,得到距离矩阵;
设定待处理区域B的特征向量为[x1,x2,x3,x4,x5,x6,……,xn],共有n个维度,根据以下公式计算每个区域特征向量元素两两之间的距离:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n,d表示使用欧式距离的度量方式。
本实施例中,步骤7-2:将距离矩阵通过以下公式转换为一个向量,使用该向量作为待处理区域B的波动特征:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n;
本实施例中计算每个区域特征向量元素两两之间的距离,得到距离矩阵,得到经过步骤5的处理后只剩下2个可能存在目标的区域对应的距离矩阵,并将距离矩阵转换为一个向量,作为该区域的波动特征。
区域[249,139,81,343,0.9]的波动特征如下:
[[0.36.36214337 36.36214337 36.36214337 48.18879 012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]
[36.36214337 0.0.0.52.79632687 48.72026569 48.72026569 36.3621433736.36214337 7.40709554 7.40709554 7.40709554 52.32560685 46.4456322846.44563228 36.36214337]
[36.36214337 0.0.0.52.79632687 48.72026569 48.72026569 36.3621433736.36214337 7.40709554 7.40709554 7.40709554 52.32560685 46.4456322846.44563228 36.36214337]
[36.36214337 0.0.0.52.79632687 48.72026569 48.72026569 36.3621433736.36214337 7.40709554 7.40709554 7.40709554 52.32560685 46.4456322846.44563228 36.36214337]
[48.18879012 52.79632687 52.79632687 52.79632687 0.20.3417757820.34177578 48.18879012 48.18879012 50.74693048 50.74693048 50.746930488.3051091421.74670687 21.74670687 48.18879012]
[36.36214337 48.72026569 48.72026569 48.7202656920.341775780.0.36.36214337 36.36214337 46.44563228 46.44563228 46.44563228 25.211851167.40709554 7.40709554 36.36214337]
[36.36214337 48.72026569 48.72026569 48.7202656920.341775780.0.36.36214337 36.36214337 46.44563228 46.44563228 46.44563228 25.211851167.40709554 7.40709554 36.36214337]
[0.36.36214337 36.36214337 36.36214337 48.18879012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]
[0.36.36214337 36.36214337 36.36214337 48.18879012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]
[35.08222863 7.40709554 7.40709554 7.40709554 50.74693048 46.4456322846.44563228 35.08222863 35.08222863 0.0.0.51.34346974 45.28027713 45.2802771335.08222863]
[35.08222863 7.40709554 7.40709554 7.40709554 50.74693048 46.4456322846.44563228 35.08222863 35.08222863 0.0.0.51.34346974 45.28027713 45.2802771335.08222863]
[35.08222863 7.40709554 7.40709554 7.40709554 50.74693048 46.4456322846.44563228 35.08222863 35.08222863 0.0.0.51.34346974 45.28027713 45.2802771335.08222863]
[50.08965087 52.32560685 52.32560685 52.325606858.3051091425.21185116 25.21185116 50.08965087 50.08965087 51.34346974 51.3434697451.34346974 0.24.20430517 24.20430517 50.08965087]
[35.08222863 46.44563228 46.44563228 46.4456322821.746706877.40709554 7.40709554 35.08222863 35.08222863 45.28027713 45.2802771345.28027713 24.20430517 0.0.35.08222863]
[35.08222863 46.44563228 46.44563228 46.4456322821.746706877.40709554 7.40709554 35.08222863 35.08222863 45.28027713 45.2802771345.28027713 24.20430517 0.0.35.08222863]
[0.36.36214337 36.36214337 36.36214337 48.18879012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]]。
区域[366,91,47,109,0.95]的波动特征如下:
[[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+025.66980648e+015.43919190e+01 0.00000000e+00]
[6.17853508e+01 9.53674316e-07 8.05707802e+00 8.97781025e+001.07079704e+02 7.75096340e+01 8.08111162e+01 6.17853508e+01 6.17853508e+011.28563538e+01 1.22004593e+01 1.31621792e+01
9.52828047e+01 7.11935512e+01 6.92457146e+01 6.17853508e+01]
[6.23519404e+01 8.05707802e+00 0.00000000e+00 6.85957084e+001.03906580e+02 7.40611249e+01 7.75096340e+01 6.23519404e+01 6.23519404e+011.05304023e+01 9.71878327e+00 1.09016743e+01 9.21352526e+01 6.76414660e+016.60840707e+01 6.23519404e+01]
[6.53210057e+01 8.97781025e+00 6.85957084e+00 0.00000000e+001.04908106e+02 7.64262788e+01 7.97726096e+01 6.53210057e+01 6.53210057e+011.27750594e+01 1.21147644e+01 1.30827853e+01 9.28942532e+01 7.02482188e+016.85272617e+01 6.53210057e+01]
[1.14372753e+02 1.07079704e+02 1.03906580e+02 1.04908106e+020.00000000e+00 7.07294492e+01 7.19930200e+01 1.14372753e+02 1.14372753e+029.99598966e+01 9.97497974e+01 9.75792446e+01 1.51814484e+01 7.18469022e+017.38863822e+01 1.14372753e+02]
[6.23519404e+01 7.75096340e+01 7.40611249e+01 7.64262788e+017.07294492e+01 0.00000000e+00 8.05707802e+00 6.23519404e+01 6.23519404e+016.60840707e+01 6.76414660e+01 6.72274476e+01 6.39392021e+01 9.71878327e+001.05304023e+01 6.23519404e+01]
[6.17853508e+01 8.08111162e+01 7.75096340e+01 7.97726096e+017.19930200e+01 8.05707802e+00 9.53674316e-07 6.17853508e+01 6.17853508e+016.92457146e+01 7.11935512e+01 7.05841369e+01 6.53447431e+01 1.22004593e+011.28563538e+01 6.17853508e+01]
[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+02 5.66980648e+015.43919190e+01 0.00000000e+00]
[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+02 5.66980648e+015.43919190e+01 0.00000000e+00]
[5.43919190e+01 1.28563538e+01 1.05304023e+01 1.27750594e+019.99598966e+01 6.60840707e+01 6.92457146e+01 5.43919190e+01 5.43919190e+010.00000000e+00 6.80864636e+00 8.66366446e+00 8.82709997e+01 5.97768875e+015.79201133e+01 5.43919190e+01]
[5.66980648e+01 1.22004593e+01 9.71878327e+00 1.21147644e+019.97497974e+01 6.76414660e+01 7.11935512e+01 5.66980648e+01 5.66980648e+016.80864636e+00 0.00000000e+00 9.27923757e+00 8.81007638e+01 6.15776992e+015.97768875e+01 5.66980648e+01]
[5.96889860e+01 1.31621792e+01 1.09016743e+01 1.30827853e+019.75792446e+01 6.72274476e+01 7.05841369e+01 5.96889860e+01 5.96889860e+018.66366446e+00 9.27923757e+00 0.00000000e+00 8.63807742e+01 6.11577318e+015.93441775e+01 5.96889860e+01]
[1.04634440e+02 9.52828047e+01 9.21352526e+01 9.28942532e+011.51814484e+01 6.39392021e+01 6.53447431e+01 1.04634440e+02 1.04634440e+028.82709997e+01 8.81007638e+01 8.63807742e+01 0.00000000e+00 6.40904101e+016.59173503e+01 1.04634440e+02]
[5.66980648e+01 7.11935512e+01 6.76414660e+01 7.02482188e+017.18469022e+01 9.71878327e+00 1.22004593e+01 5.66980648e+01 5.66980648e+015.97768875e+01 6.15776992e+01 6.11577318e+01 6.40904101e+01 0.00000000e+006.80864636e+00 5.66980648e+01]
[5.43919190e+01 6.92457146e+01 6.60840707e+01 6.85272617e+017.38863822e+01 1.05304023e+01 1.28563538e+01 5.43919190e+01 5.43919190e+015.79201133e+01 5.97768875e+01 5.93441775e+01 6.59173503e+01 6.80864636e+000.00000000e+00 5.43919190e+01]
[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+02 5.66980648e+015.43919190e+01 0.00000000e+00]]。
步骤8:特征识别模块提取波动特征,并进行识别,采用欧式距离算法进行相似度计算,从特征库中寻找出相似度最高的书名信息和作者信息,作为最终的结果进行输出。
在本实施例中,图书封面可能存在多个波动特征向量,假设存在n个波动特征向,则具体表示为:
[
[[f11,f12,f13,f14,f15,f16,……,f1n]],
[[f21,f22,f23,f24,f25,f26,……,f2n]],
[[f31,f32,f33,f34,f35,f36,……,f3n]],
………
[[fn1,fn2,fn3,fn4,fn5,fn6,……,fnn]],
]
本发明从特征库中寻找最相似的特征向量,对应的标签,采用欧式距离方式进行相似度计算的具体公式如下:
其中,其中,Di为相似度值,表示待处理区域的特征向量和特征库中第i个特征向量的相似度,fnn表示待处理区域的波动特征值(原始特征向量有n个特征值),nn表示有该区域nxn个波动特征值,Fnn表示特征库中的某一个波动特征值(原始特征向量共有n个特征值),nn表示共有nxn个波动特征值。
本实施例中,区域[249,139,81,343,0.9]的识别结果为“图书书名测试”,区域[366,91,47,109,0.95]的识别结果为“作者测试”。
以下为本发明与传统技术之间的效果对比:
实验方法:
随机选取100个图书封面图片,分别使用两种方法对图书的书名和作者信息进行提取(Method-A和Method-B)。
Method-A:直接使用图像梯度直方图特征,提取图书的书名和作者信息。
Method-B:使用本发明的方法提取特征,提取图书的书名和作者信息。
实验结果:
使用准确率(acc),召回率(recall)和F1-Score作为评估指标,实验结果表1所示:
准确率(acc) | 召回率(recall) | F1-Score | |
Method-A | 0.89 | 0.76 | 0.56 |
Method-B | 0.95 | 0.87 | 0.73 |
表1
实验总结:
从上述的实验可以看出:本发明提出的方法,在准确率,召回率和F1-Score上均有较大提高,使用本发明提出的方法,可以跟准确的从图书封面中提取到书名和作者信息。
本发明所述的一种基于像素值波动的图书信息提取方法,解决了在图书信息提取时,得到稳定的特征的技术问题,本发明可得到更可靠的波动特征,可以清除掉异常的特征数据,即使在图书封面图像噪音较多、部分被遮挡、图像扭曲、图像较小等情况下,亦可提取出书名和作者等基本信息。
Claims (6)
1.一种基于像素值波动的图书信息提取方法,其特征在于:包括如下步骤:
步骤1:建立图像处理服务器,在图像处理服务器中建立预处理模块、ROI模块、区域识别模块、特征提取模块、特征构造模块和特征识别模块,图像处理服务器通过互联网获取图书封面的图像;
步骤2:预处理模块对图书封面的图像进行预处理,得到预处理后图像;
步骤3:ROI模块对预处理后图像进行ROI检测,得到图书基本信息数组,图书基本信息数组为二维数组,其内容包含书名信息和作者信息;
步骤4:区域识别模块遍历图书基本信息数组,检查每一个区域存在图书基本信息的概率P;
步骤5:选取任意一个区域A,对区域A的概率P作出判断:如果概率P值到达预设阈值,则,标记区域A为待处理区域,待处理区域为包含了署名信息和作者信息的区域;反之,不对区域A做任何处理;
重复执行步骤5直到对所有区域均判断完毕,执行步骤6;
步骤6:特征提取模块将待处理区域从预处理后图像中截取出来,并提取特征向量,特征向量表示为一个特征向量数组,特征向量数组包含根据图像像素值的变化,提取的预处理后图像中特定区域的HOG特征;
将待处理区域的位置信息和特征向量分别表示为一组位置数组和一组特征向量数组;
步骤7:特征构造模块计算待处理区域的特征的变化,构造波动特征,具体包括如下步骤;
步骤7-1:在任意一个待处理区域B中,计算每个区域特征向量元素两两之间的距离,得到距离矩阵;
步骤7-2:将距离矩阵通过以下公式转换为一个向量,使用该向量作为待处理区域B的波动特征
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n;
步骤8:特征识别模块提取波动特征,并进行识别,采用欧式距离算法进行相似度计算,从特征库中寻找出相似度最高的书名信息和作者信息,作为最终的结果进行输出。
2.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤2时,对图书封面的图像进行的预处理包括统一大小处理、降噪处理、灰度化处理和二值化处理。
3.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤3时,图书基本信息数组具体表示为[X0,Y0,X1,Y1,P],其中,其中X0,Y0分别表示区域左上角X和Y坐标,X1和Y1分别表示区域右下角X和Y坐标,P表示这个区域存在图书基本信息的概率。
4.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤6时,特征向量数组具体表示为[F1,F2,F3,F4,F5,F6,……,Fn],其中,Fi为特征向量数组中的一个特征值,i取值为1到n,位置数组具体表示为[X0,Y0,X1,Y1],其中,X0,Y0,X1,Y1均表示待处理区域的位置坐标。
5.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤7-1时,具体为:设定待处理区域B的特征向量为[x1,x2,x3,x4,x5,x6,........,xn],共有n个维度,根据以下公式计算每个区域特征向量元素两两之间的距离:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n,d表示使用欧式距离的度量方式。
6.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤8时,采用欧式距离方式进行相似度计算的具体公式如下:
其中,Di为相似度值,表示待处理区域的特征向量和特征库中第i个特征向量的相似度,fnn表示待处理区域的波动特征值,原始特征向量有n个特征值,nn表示有该区域nxn个波动特征值,Fnn表示特征库中的某一个波动特征值,原始特征向量共有n个特征值,nn表示共有nxn个波动特征值。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310394804.3A CN116403232B (zh) | 2023-04-13 | 2023-04-13 | 一种基于像素值波动的图书信息提取方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310394804.3A CN116403232B (zh) | 2023-04-13 | 2023-04-13 | 一种基于像素值波动的图书信息提取方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116403232A CN116403232A (zh) | 2023-07-07 |
CN116403232B true CN116403232B (zh) | 2024-03-08 |
Family
ID=87007202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310394804.3A Active CN116403232B (zh) | 2023-04-13 | 2023-04-13 | 一种基于像素值波动的图书信息提取方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116403232B (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810488A (zh) * | 2012-11-09 | 2014-05-21 | 阿里巴巴集团控股有限公司 | 图像特征提取方法、图像搜索方法、服务器、终端及系统 |
KR101878239B1 (ko) * | 2017-03-22 | 2018-07-13 | 경남대학교 산학협력단 | 모바일 로봇 기반의 도서 관리 시스템 |
CN110210546A (zh) * | 2019-05-24 | 2019-09-06 | 江西理工大学 | 一种基于图像处理的书籍自动归类方法 |
KR102187053B1 (ko) * | 2019-12-02 | 2020-12-04 | (주)라온파트너스 | 도서 정보 제공 서버 및 그에 의한 도서 정보 제공 방법 |
CN114281982A (zh) * | 2021-12-29 | 2022-04-05 | 中山大学 | 一种多模态融合技术的图书宣传摘要生成方法和系统 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9977955B2 (en) * | 2014-06-19 | 2018-05-22 | Rakuten Kobo, Inc. | Method and system for identifying books on a bookshelf |
-
2023
- 2023-04-13 CN CN202310394804.3A patent/CN116403232B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810488A (zh) * | 2012-11-09 | 2014-05-21 | 阿里巴巴集团控股有限公司 | 图像特征提取方法、图像搜索方法、服务器、终端及系统 |
KR101878239B1 (ko) * | 2017-03-22 | 2018-07-13 | 경남대학교 산학협력단 | 모바일 로봇 기반의 도서 관리 시스템 |
CN110210546A (zh) * | 2019-05-24 | 2019-09-06 | 江西理工大学 | 一种基于图像处理的书籍自动归类方法 |
KR102187053B1 (ko) * | 2019-12-02 | 2020-12-04 | (주)라온파트너스 | 도서 정보 제공 서버 및 그에 의한 도서 정보 제공 방법 |
CN114281982A (zh) * | 2021-12-29 | 2022-04-05 | 中山大学 | 一种多模态融合技术的图书宣传摘要生成方法和系统 |
Non-Patent Citations (2)
Title |
---|
图书馆在架图书的索书号图像提取与分割;方建军 等;北京联合大学学报(自然科学版)(第01期);全文 * |
方建军 等.图书馆在架图书的索书号图像提取与分割.北京联合大学学报(自然科学版).2015,(第01期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN116403232A (zh) | 2023-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hassanin et al. | A real-time approach for automatic defect detection from PCBs based on SURF features and morphological operations | |
Pan et al. | A robust system to detect and localize texts in natural scene images | |
CN104751142B (zh) | 一种基于笔划特征的自然场景文本检测方法 | |
CN111860536B (zh) | 一种图像识别方法、装置及存储介质 | |
CN101122953A (zh) | 一种图片文字分割的方法 | |
Susan et al. | Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues | |
Akbani et al. | Character recognition in natural scene images | |
Chidiac et al. | A robust algorithm for text extraction from images | |
CN108921006B (zh) | 手写签名图像真伪鉴别模型建立方法及真伪鉴别方法 | |
Anjomshoae et al. | Enhancement of template-based method for overlapping rubber tree leaf identification | |
Damayanti et al. | Indonesian license plate recognition based on area feature extraction | |
Karanje et al. | Survey on text detection, segmentation and recognition from a natural scene images | |
Rani et al. | Detection and removal of graphical components in pre-printed documents | |
Van Phan et al. | Collecting handwritten nom character patterns from historical document pages | |
CN116403232B (zh) | 一种基于像素值波动的图书信息提取方法 | |
Kavitha et al. | A robust script identification system for historical Indian document images | |
CN107480728B (zh) | 一种基于傅里叶残差值的打印文件的鉴别方法 | |
Padma et al. | Script Identification from Trilingual Documents using Profile Based Features. | |
CN115731550A (zh) | 一种基于深度学习的药品说明书自动识别方法、系统及存储介质 | |
Zhuge et al. | Robust video text detection with morphological filtering enhanced MSER | |
Liu et al. | A prototype system of courtesy amount recognition for Chinese Bank checks | |
Höhn | Detecting arbitrarily oriented text labels in early maps | |
Padma et al. | Entropy based texture features useful for automatic script identification | |
CN115995080B (zh) | 基于ocr识别的档案智能管理系统 | |
CN106408021B (zh) | 一种基于笔画粗细的手写体与印刷体的鉴别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |