CN116403232B - 一种基于像素值波动的图书信息提取方法 - Google Patents

一种基于像素值波动的图书信息提取方法 Download PDF

Info

Publication number
CN116403232B
CN116403232B CN202310394804.3A CN202310394804A CN116403232B CN 116403232 B CN116403232 B CN 116403232B CN 202310394804 A CN202310394804 A CN 202310394804A CN 116403232 B CN116403232 B CN 116403232B
Authority
CN
China
Prior art keywords
book
region
feature
information
fluctuation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310394804.3A
Other languages
English (en)
Other versions
CN116403232A (zh
Inventor
谢文伟
孙贤军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Shaohao Network Technology Co ltd
Original Assignee
Nanjing Shaohao Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Shaohao Network Technology Co ltd filed Critical Nanjing Shaohao Network Technology Co ltd
Priority to CN202310394804.3A priority Critical patent/CN116403232B/zh
Publication of CN116403232A publication Critical patent/CN116403232A/zh
Application granted granted Critical
Publication of CN116403232B publication Critical patent/CN116403232B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18086Extraction of features or characteristics of the image by performing operations within image blocks or by using histograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19093Proximity measures, i.e. similarity or distance measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于像素值波动的图书信息提取方法,属于图像处理技术领域,包括获取图书封面的图像,预处理,ROI检测得到图书基本信息数组,检查每一个区域存在图书基本信息的概率P,只保留可能存在目标的区域,提取书名和作者所在区域的原始特征,计算特征变化和狗仔波动特征,对目标区域内容进行识别,解决了在图书信息提取时,得到稳定的特征的技术问题,本发明可得到更可靠的波动特征,可以清除掉异常的特征数据,即使在图书封面图像噪音较多、部分被遮挡、图像扭曲、图像较小等情况下,亦可提取出书名和作者等基本信息。

Description

一种基于像素值波动的图书信息提取方法
技术领域
本发明属于图像处理技术领域,尤其涉及一种基于像素值波动的图书信息提取方法。
背景技术
图书信息识别是一种依据图像的特征(如统计或几何特征等),从图像中提取出书名和作者信息。目前通用的解决方案是:首先检测图像中检测目标(如书名和作者的信息)的位置坐标,然后提取出目标的图像特征,最后从数据库中的找到最相似的内容。
图书信息识别属于OCR识别的一种,图书信息的目标特征通常分为视觉特征、像素统计特征、图像变换系数特征和图像代数特征等,特征提取就是针对目标的某些特征进行的,它是对目标进行特征建模的过程。特征提取过程包括:采集原始图像、目标检测(定位图书的在图像中的位置和大小)、图像预处理(图像矫正、噪声过滤等)以及特征提取(识别关键点并生成特征向量)。目前,特征提取主要有传统特征的提取算法(SIFT、LBP、HOG等)和基于深度学习的提取方法等两大类。
目前,传统OCR识别的相关算法精度已经达到了很高的水准,在特定场景下,可以提取到高质量的特征数据并能相对准确识别出目标的内容。但图书信息的识别比传统的OCR识别更复杂,主要体现在以下两个方面:
1.因为传统的OCR识别主要识别英文字母和数字,而图书信息除了需要识别英文字母和数字外,还需要识别汉字和特殊符号。
2.图书封面各不相同,图像中干扰信息较多,很难提取到高质量的特征数据。
发明内容
本发明的目的是提供一种基于像素值波动的图书信息提取方法,解决了在图书信息提取时,得到稳定的特征的技术问题。
为实现上述目的,本发明采用如下技术方案:
一种基于像素值波动的图书信息提取方法,包括如下步骤:
步骤1:建立图像处理服务器,在图像处理服务器中建立预处理模块、ROI模块、区域识别模块、特征提取模块、特征构造模块和特征识别模块,图像处理服务器通过互联网获取图书封面的图像;
步骤2:预处理模块对图书封面的图像进行预处理,得到预处理后图像;
步骤3:ROI模块对预处理后图像进行ROI检测,得到图书基本信息数组,图书基本信息数组为二维数组,其内容包含书名信息和作者信息;
步骤4:区域识别模块遍历图书基本信息数组,检查每一个区域存在图书基本信息的概率P;
步骤5:选取任意一个区域A,对区域A的概率P作出判断:如果概率P值到达预设阈值,则,标记区域A为待处理区域,待处理区域为包含了署名信息和作者信息的区域;反之,不对区域A做任何处理;
重复执行步骤5直到对所有区域均判断完毕,执行步骤6;
步骤6:特征提取模块将待处理区域从预处理后图像中截取出来,并提取特征向量,特征向量表示为一个特征向量数组,特征向量数组包含根据图像像素值的变化,提取的预处理后图像中特定区域的HOG特征;
将待处理区域的位置信息和特征向量分别表示为一组位置数组和一组特征向量数组;
步骤7:特征构造模块计算待处理区域的特征的变化,构造波动特征,具体包括如下步骤;
步骤7-1:在任意一个待处理区域B中,计算每个区域特征向量元素两两之间的距离,得到距离矩阵;
步骤7-2:将距离矩阵通过以下公式转换为一个向量,使用该向量作为待处理区域B的波动特征
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n;
步骤8:特征识别模块提取波动特征,并进行识别,采用欧式距离算法进行相似度计算,从特征库中寻找出相似度最高的书名信息和作者信息,作为最终的结果进行输出。
优选的,在执行步骤2时,对图书封面的图像进行的预处理包括统一大小处理、降噪处理、灰度化处理和二值化处理。
优选的,在执行步骤3时,图书基本信息数组具体表示为[X0,Y0,X1,Y1,P],其中,其中X0,Y0分别表示区域左上角X和Y坐标,X1和Y1分别表示区域右下角X和Y坐标,P表示这个区域存在图书基本信息的概率。
优选的,在执行步骤6时,特征向量数组具体表示为[F1,F2,F3,F4,F5,F6,……,Fn],其中,Fi为特征向量数组中的一个特征值,i取值为1到n,位置数组具体表示为[X0,Y0,X1,Y1],其中,X0,Y0,X1,Y1均表示待处理区域的位置坐标。
优选的,在执行步骤7-1时,具体为:设定待处理区域B的特征向量为[x1,x2,x3,x4,x5,x6,……,xn],共有n个维度,根据以下公式计算每个区域特征向量元素两两之间的距离:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n,d表示使用欧式距离的度量方式。
优选的,在执行步骤8时,采用欧式距离方式进行相似度计算的具体公式如下:
其中,Di为相似度值,表示待处理区域的特征向量和特征库中第i个特征向量的相似度,fnn表示待处理区域的波动特征值(原始特征向量有n个特征值),nn表示有该区域nxn个波动特征值,Fnn表示特征库中的某一个波动特征值,原始特征向量共有n个特征值,nn表示共有nxn个波动特征值。
本发明所述的一种基于像素值波动的图书信息提取方法,解决了在图书信息提取时,得到稳定的特征的技术问题,本发明可得到更可靠的波动特征,可以清除掉异常的特征数据,即使在图书封面图像噪音较多、部分被遮挡、图像扭曲、图像较小等情况下,亦可提取出书名和作者等基本信息。
附图说明
图1是本发明的流程图;
图2是本发明的图书原始封皮示意图;
图3是本发明的预处理后的图书封面示意图;
图4是本发明的ROI区域处理后示意图;
图5是本发明的过滤后的RIO区域示意图。
具体实施方式
由图1-图5所示的一种基于像素值波动的图书信息提取方法,包括如下步骤:
步骤1:建立图像处理服务器,在图像处理服务器中建立预处理模块、ROI模块、区域识别模块、特征提取模块、特征构造模块和特征识别模块,图像处理服务器通过互联网获取图书封面的图像;
步骤2:预处理模块对图书封面的图像进行预处理,得到预处理后图像;对图书封面的图像进行的预处理包括统一大小处理、降噪处理、灰度化处理和二值化处理。
本实施例中,先将图书封面图像转化为灰度图,然后在适应高斯模糊进行降噪音(参数高斯核为5),然后再进行二值化处理(最小灰度值参数为80,再大灰度参数为255),灰度值在80~255之间的设为纯白,其他的为纯黑。
步骤3:ROI模块对预处理后图像进行ROI检测,得到图书基本信息数组,图书基本信息数组为二维数组,其内容包含书名信息和作者信息;
图书基本信息数组具体表示为[X0,Y0,X1,Y1,P],其中,其中X0,Y0分别表示区域左上角X和Y坐标,X1和Y1分别表示区域右下角X和Y坐标,P表示这个区域存在图书基本信息的概率。
本实施例中根据灰度值的变化进行ROI检测,得到可能存在书名后作者的区域,灰度值变化阈值参数为0.05,ROI区域最小的像素面积参数为100。
得到4个ROI区域每个ROI区域格式为(x,y,w,h,p)格式,其中x,y表示矩型的左上角坐标,w表示矩型的宽度,h表示矩型的高度,p表示该区域存在目标的概率,4个ROI区域具体表示为:[162,549,244,61,0.5],[249,139,81,343,0.9],[366,91,47,109,0.95],[11,67,92,362,0.45]。
步骤4:区域识别模块遍历图书基本信息数组,检查每一个区域存在图书基本信息的概率P;
步骤5:选取任意一个区域A,对区域A的概率P作出判断:如果概率P值到达预设阈值,则,标记区域A为待处理区域,待处理区域为包含了署名信息和作者信息的区域;反之,不对区域A做任何处理;
重复执行步骤5直到对所有区域均判断完毕,执行步骤6;
本实施例中,经过步骤5的处理后只剩下2个可能存在目标的区域[[249,139,81,343,0.9],[366,91,47,109,0.95]]。
步骤6:特征提取模块将待处理区域从预处理后图像中截取出来,并提取特征向量,特征向量表示为一个特征向量数组,特征向量数组包含根据图像像素值的变化,提取的预处理后图像中特定区域的HOG特征;
将待处理区域的位置信息和特征向量分别表示为一组位置数组和一组特征向量数组;
特征向量数组具体表示为[F1,F2,F3,F4,F5,F6,……,Fn],其中,Fi为特征向量数组中的一个特征值,i取值为1到n,位置数组具体表示为[X0,Y0,X1,Y1],其中,X0,Y0,X1,Y1均表示待处理区域的位置坐标。
本实施例中分别从原始图像的二值图中,截取出第二步提取到的ROI区域,并提取图像的梯度特征(即HOG特征),提取HOG特征时,使用的参数bin为9,得到每个ROI区域对应的特征。
区域[249,139,81,343,0.9]的特征如下:
[0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.264466,20.734375,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.264466,20.734375,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.264466,20.734375,0.000000,0.000000,0.000000,0.000000,5.703125,5.703125,35.395454,5.703125,5.703125,5.703125,5.703125,0.000000,5.156250,5.156250,25.890625,8.420716,5.156250,5.156250,5.156250,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,20.734375,3.264466,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,29.692329,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,20.734375,3.264466,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.043978,25.421875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.043978,25.421875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.043978,25.421875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,6.781250,6.781250,32.203125,10.825228,6.781250,6.781250,6.781250,0.000000,6.140625,6.140625,29.976255,6.140625,6.140625,6.140625,6.140625,0.000000,0.000000,0.000000,25.421875,4.043978,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,25.421875,4.043978,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,23.835630,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000]。
区域[366,91,47,109,0.95]的特征如下:
[0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.921875,0.000000,45.877567,0.000000,0.000000,0.000000,0.000000,0.000000,4.009815,6.814570,40.516960,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.921875,45.877567,0.000000,3.960375,0.000000,0.000000,0.000000,0.000000,10.824385,40.516960,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.921875,49.837942,3.960375,0.000000,0.000000,0.000000,0.000000,0.000000,10.824385,40.516960,0.000000,0.000000,0.000000,0.000000,18.532661,22.493037,68.370604,21.454536,18.532661,18.532661,18.532661,0.000000,18.534107,18.534107,59.051067,29.358492,18.534107,18.534107,18.534107,0.000000,3.960375,0.000000,45.877567,2.921875,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,40.516960,10.824385,0.000000,0.000000,0.000000,
0.000000,0.000000,0.000000,45.877567,0.000000,2.921875,0.000000,0.000000,0.000000,0.000000,0.000000,40.516960,6.814570,4.009815,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,6.336285,38.282017,2.971216,0.000000,0.000000,0.000000,0.000000,0.000000,10.936128,36.392204,0.000000,0.000000,0.000000,0.000000,3.858616,0.000000,6.336285,41.253233,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,10.936128,36.392204,0.000000,3.716517,0.000000,0.000000,0.000000,3.858616,10.194901,41.253233,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,10.936128,40.108721,3.716517,0.000000,0.000000,0.000000,16.073910,16.073910,57.327142,26.268810,19.932526,16.073910,16.073910,0.000000,16.751635,20.468152,56.860356,27.687762,16.751635,16.751635,16.751635,0.000000,0.000000,0.000000,41.253233,6.336285,0.000000,3.858616,0.000000,0.000000,3.716517,0.000000,36.392204,10.936128,0.000000,0.000000,0.000000,0.000000,0.000000,2.971216,38.282017,6.336285,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,36.392204,10.936128,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000]。
步骤7:特征构造模块计算待处理区域的特征的变化,构造波动特征,具体包括如下步骤;
步骤7-1:在任意一个待处理区域B中,计算每个区域特征向量元素两两之间的距离,得到距离矩阵;
设定待处理区域B的特征向量为[x1,x2,x3,x4,x5,x6,……,xn],共有n个维度,根据以下公式计算每个区域特征向量元素两两之间的距离:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n,d表示使用欧式距离的度量方式。
本实施例中,步骤7-2:将距离矩阵通过以下公式转换为一个向量,使用该向量作为待处理区域B的波动特征
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n;
本实施例中计算每个区域特征向量元素两两之间的距离,得到距离矩阵,得到经过步骤5的处理后只剩下2个可能存在目标的区域对应的距离矩阵,并将距离矩阵转换为一个向量,作为该区域的波动特征。
区域[249,139,81,343,0.9]的波动特征如下:
[[0.36.36214337 36.36214337 36.36214337 48.18879 012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]
[36.36214337 0.0.0.52.79632687 48.72026569 48.72026569 36.3621433736.36214337 7.40709554 7.40709554 7.40709554 52.32560685 46.4456322846.44563228 36.36214337]
[36.36214337 0.0.0.52.79632687 48.72026569 48.72026569 36.3621433736.36214337 7.40709554 7.40709554 7.40709554 52.32560685 46.4456322846.44563228 36.36214337]
[36.36214337 0.0.0.52.79632687 48.72026569 48.72026569 36.3621433736.36214337 7.40709554 7.40709554 7.40709554 52.32560685 46.4456322846.44563228 36.36214337]
[48.18879012 52.79632687 52.79632687 52.79632687 0.20.3417757820.34177578 48.18879012 48.18879012 50.74693048 50.74693048 50.746930488.3051091421.74670687 21.74670687 48.18879012]
[36.36214337 48.72026569 48.72026569 48.7202656920.341775780.0.36.36214337 36.36214337 46.44563228 46.44563228 46.44563228 25.211851167.40709554 7.40709554 36.36214337]
[36.36214337 48.72026569 48.72026569 48.7202656920.341775780.0.36.36214337 36.36214337 46.44563228 46.44563228 46.44563228 25.211851167.40709554 7.40709554 36.36214337]
[0.36.36214337 36.36214337 36.36214337 48.18879012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]
[0.36.36214337 36.36214337 36.36214337 48.18879012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]
[35.08222863 7.40709554 7.40709554 7.40709554 50.74693048 46.4456322846.44563228 35.08222863 35.08222863 0.0.0.51.34346974 45.28027713 45.2802771335.08222863]
[35.08222863 7.40709554 7.40709554 7.40709554 50.74693048 46.4456322846.44563228 35.08222863 35.08222863 0.0.0.51.34346974 45.28027713 45.2802771335.08222863]
[35.08222863 7.40709554 7.40709554 7.40709554 50.74693048 46.4456322846.44563228 35.08222863 35.08222863 0.0.0.51.34346974 45.28027713 45.2802771335.08222863]
[50.08965087 52.32560685 52.32560685 52.325606858.3051091425.21185116 25.21185116 50.08965087 50.08965087 51.34346974 51.3434697451.34346974 0.24.20430517 24.20430517 50.08965087]
[35.08222863 46.44563228 46.44563228 46.4456322821.746706877.40709554 7.40709554 35.08222863 35.08222863 45.28027713 45.2802771345.28027713 24.20430517 0.0.35.08222863]
[35.08222863 46.44563228 46.44563228 46.4456322821.746706877.40709554 7.40709554 35.08222863 35.08222863 45.28027713 45.2802771345.28027713 24.20430517 0.0.35.08222863]
[0.36.36214337 36.36214337 36.36214337 48.18879012 36.3621433736.36214337 0.0.35.08222863 35.08222863 35.08222863 50.08965087 35.0822286335.08222863 0.]]。
区域[366,91,47,109,0.95]的波动特征如下:
[[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+025.66980648e+015.43919190e+01 0.00000000e+00]
[6.17853508e+01 9.53674316e-07 8.05707802e+00 8.97781025e+001.07079704e+02 7.75096340e+01 8.08111162e+01 6.17853508e+01 6.17853508e+011.28563538e+01 1.22004593e+01 1.31621792e+01
9.52828047e+01 7.11935512e+01 6.92457146e+01 6.17853508e+01]
[6.23519404e+01 8.05707802e+00 0.00000000e+00 6.85957084e+001.03906580e+02 7.40611249e+01 7.75096340e+01 6.23519404e+01 6.23519404e+011.05304023e+01 9.71878327e+00 1.09016743e+01 9.21352526e+01 6.76414660e+016.60840707e+01 6.23519404e+01]
[6.53210057e+01 8.97781025e+00 6.85957084e+00 0.00000000e+001.04908106e+02 7.64262788e+01 7.97726096e+01 6.53210057e+01 6.53210057e+011.27750594e+01 1.21147644e+01 1.30827853e+01 9.28942532e+01 7.02482188e+016.85272617e+01 6.53210057e+01]
[1.14372753e+02 1.07079704e+02 1.03906580e+02 1.04908106e+020.00000000e+00 7.07294492e+01 7.19930200e+01 1.14372753e+02 1.14372753e+029.99598966e+01 9.97497974e+01 9.75792446e+01 1.51814484e+01 7.18469022e+017.38863822e+01 1.14372753e+02]
[6.23519404e+01 7.75096340e+01 7.40611249e+01 7.64262788e+017.07294492e+01 0.00000000e+00 8.05707802e+00 6.23519404e+01 6.23519404e+016.60840707e+01 6.76414660e+01 6.72274476e+01 6.39392021e+01 9.71878327e+001.05304023e+01 6.23519404e+01]
[6.17853508e+01 8.08111162e+01 7.75096340e+01 7.97726096e+017.19930200e+01 8.05707802e+00 9.53674316e-07 6.17853508e+01 6.17853508e+016.92457146e+01 7.11935512e+01 7.05841369e+01 6.53447431e+01 1.22004593e+011.28563538e+01 6.17853508e+01]
[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+02 5.66980648e+015.43919190e+01 0.00000000e+00]
[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+02 5.66980648e+015.43919190e+01 0.00000000e+00]
[5.43919190e+01 1.28563538e+01 1.05304023e+01 1.27750594e+019.99598966e+01 6.60840707e+01 6.92457146e+01 5.43919190e+01 5.43919190e+010.00000000e+00 6.80864636e+00 8.66366446e+00 8.82709997e+01 5.97768875e+015.79201133e+01 5.43919190e+01]
[5.66980648e+01 1.22004593e+01 9.71878327e+00 1.21147644e+019.97497974e+01 6.76414660e+01 7.11935512e+01 5.66980648e+01 5.66980648e+016.80864636e+00 0.00000000e+00 9.27923757e+00 8.81007638e+01 6.15776992e+015.97768875e+01 5.66980648e+01]
[5.96889860e+01 1.31621792e+01 1.09016743e+01 1.30827853e+019.75792446e+01 6.72274476e+01 7.05841369e+01 5.96889860e+01 5.96889860e+018.66366446e+00 9.27923757e+00 0.00000000e+00 8.63807742e+01 6.11577318e+015.93441775e+01 5.96889860e+01]
[1.04634440e+02 9.52828047e+01 9.21352526e+01 9.28942532e+011.51814484e+01 6.39392021e+01 6.53447431e+01 1.04634440e+02 1.04634440e+028.82709997e+01 8.81007638e+01 8.63807742e+01 0.00000000e+00 6.40904101e+016.59173503e+01 1.04634440e+02]
[5.66980648e+01 7.11935512e+01 6.76414660e+01 7.02482188e+017.18469022e+01 9.71878327e+00 1.22004593e+01 5.66980648e+01 5.66980648e+015.97768875e+01 6.15776992e+01 6.11577318e+01 6.40904101e+01 0.00000000e+006.80864636e+00 5.66980648e+01]
[5.43919190e+01 6.92457146e+01 6.60840707e+01 6.85272617e+017.38863822e+01 1.05304023e+01 1.28563538e+01 5.43919190e+01 5.43919190e+015.79201133e+01 5.97768875e+01 5.93441775e+01 6.59173503e+01 6.80864636e+000.00000000e+00 5.43919190e+01]
[0.00000000e+00 6.17853508e+01 6.23519404e+01 6.53210057e+011.14372753e+02 6.23519404e+01 6.17853508e+01 0.00000000e+00 0.00000000e+005.43919190e+01 5.66980648e+01 5.96889860e+01 1.04634440e+02 5.66980648e+015.43919190e+01 0.00000000e+00]]。
步骤8:特征识别模块提取波动特征,并进行识别,采用欧式距离算法进行相似度计算,从特征库中寻找出相似度最高的书名信息和作者信息,作为最终的结果进行输出。
在本实施例中,图书封面可能存在多个波动特征向量,假设存在n个波动特征向,则具体表示为:
[
[[f11,f12,f13,f14,f15,f16,……,f1n]],
[[f21,f22,f23,f24,f25,f26,……,f2n]],
[[f31,f32,f33,f34,f35,f36,……,f3n]],
………
[[fn1,fn2,fn3,fn4,fn5,fn6,……,fnn]],
]
本发明从特征库中寻找最相似的特征向量,对应的标签,采用欧式距离方式进行相似度计算的具体公式如下:
其中,其中,Di为相似度值,表示待处理区域的特征向量和特征库中第i个特征向量的相似度,fnn表示待处理区域的波动特征值(原始特征向量有n个特征值),nn表示有该区域nxn个波动特征值,Fnn表示特征库中的某一个波动特征值(原始特征向量共有n个特征值),nn表示共有nxn个波动特征值。
本实施例中,区域[249,139,81,343,0.9]的识别结果为“图书书名测试”,区域[366,91,47,109,0.95]的识别结果为“作者测试”。
以下为本发明与传统技术之间的效果对比:
实验方法:
随机选取100个图书封面图片,分别使用两种方法对图书的书名和作者信息进行提取(Method-A和Method-B)。
Method-A:直接使用图像梯度直方图特征,提取图书的书名和作者信息。
Method-B:使用本发明的方法提取特征,提取图书的书名和作者信息。
实验结果:
使用准确率(acc),召回率(recall)和F1-Score作为评估指标,实验结果表1所示:
准确率(acc) 召回率(recall) F1-Score
Method-A 0.89 0.76 0.56
Method-B 0.95 0.87 0.73
表1
实验总结:
从上述的实验可以看出:本发明提出的方法,在准确率,召回率和F1-Score上均有较大提高,使用本发明提出的方法,可以跟准确的从图书封面中提取到书名和作者信息。
本发明所述的一种基于像素值波动的图书信息提取方法,解决了在图书信息提取时,得到稳定的特征的技术问题,本发明可得到更可靠的波动特征,可以清除掉异常的特征数据,即使在图书封面图像噪音较多、部分被遮挡、图像扭曲、图像较小等情况下,亦可提取出书名和作者等基本信息。

Claims (6)

1.一种基于像素值波动的图书信息提取方法,其特征在于:包括如下步骤:
步骤1:建立图像处理服务器,在图像处理服务器中建立预处理模块、ROI模块、区域识别模块、特征提取模块、特征构造模块和特征识别模块,图像处理服务器通过互联网获取图书封面的图像;
步骤2:预处理模块对图书封面的图像进行预处理,得到预处理后图像;
步骤3:ROI模块对预处理后图像进行ROI检测,得到图书基本信息数组,图书基本信息数组为二维数组,其内容包含书名信息和作者信息;
步骤4:区域识别模块遍历图书基本信息数组,检查每一个区域存在图书基本信息的概率P;
步骤5:选取任意一个区域A,对区域A的概率P作出判断:如果概率P值到达预设阈值,则,标记区域A为待处理区域,待处理区域为包含了署名信息和作者信息的区域;反之,不对区域A做任何处理;
重复执行步骤5直到对所有区域均判断完毕,执行步骤6;
步骤6:特征提取模块将待处理区域从预处理后图像中截取出来,并提取特征向量,特征向量表示为一个特征向量数组,特征向量数组包含根据图像像素值的变化,提取的预处理后图像中特定区域的HOG特征;
将待处理区域的位置信息和特征向量分别表示为一组位置数组和一组特征向量数组;
步骤7:特征构造模块计算待处理区域的特征的变化,构造波动特征,具体包括如下步骤;
步骤7-1:在任意一个待处理区域B中,计算每个区域特征向量元素两两之间的距离,得到距离矩阵;
步骤7-2:将距离矩阵通过以下公式转换为一个向量,使用该向量作为待处理区域B的波动特征
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n;
步骤8:特征识别模块提取波动特征,并进行识别,采用欧式距离算法进行相似度计算,从特征库中寻找出相似度最高的书名信息和作者信息,作为最终的结果进行输出。
2.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤2时,对图书封面的图像进行的预处理包括统一大小处理、降噪处理、灰度化处理和二值化处理。
3.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤3时,图书基本信息数组具体表示为[X0,Y0,X1,Y1,P],其中,其中X0,Y0分别表示区域左上角X和Y坐标,X1和Y1分别表示区域右下角X和Y坐标,P表示这个区域存在图书基本信息的概率。
4.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤6时,特征向量数组具体表示为[F1,F2,F3,F4,F5,F6,……,Fn],其中,Fi为特征向量数组中的一个特征值,i取值为1到n,位置数组具体表示为[X0,Y0,X1,Y1],其中,X0,Y0,X1,Y1均表示待处理区域的位置坐标。
5.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤7-1时,具体为:设定待处理区域B的特征向量为[x1,x2,x3,x4,x5,x6,........,xn],共有n个维度,根据以下公式计算每个区域特征向量元素两两之间的距离:
其中,d(xi,xj)表示距离矩阵中的元素,具体表示为第i和第j个特征值之间的距离,i取值为1到n,j取值为1到n,d表示使用欧式距离的度量方式。
6.如权利要求1所述的一种基于像素值波动的图书信息提取方法,其特征在于:在执行步骤8时,采用欧式距离方式进行相似度计算的具体公式如下:
其中,Di为相似度值,表示待处理区域的特征向量和特征库中第i个特征向量的相似度,fnn表示待处理区域的波动特征值,原始特征向量有n个特征值,nn表示有该区域nxn个波动特征值,Fnn表示特征库中的某一个波动特征值,原始特征向量共有n个特征值,nn表示共有nxn个波动特征值。
CN202310394804.3A 2023-04-13 2023-04-13 一种基于像素值波动的图书信息提取方法 Active CN116403232B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310394804.3A CN116403232B (zh) 2023-04-13 2023-04-13 一种基于像素值波动的图书信息提取方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310394804.3A CN116403232B (zh) 2023-04-13 2023-04-13 一种基于像素值波动的图书信息提取方法

Publications (2)

Publication Number Publication Date
CN116403232A CN116403232A (zh) 2023-07-07
CN116403232B true CN116403232B (zh) 2024-03-08

Family

ID=87007202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310394804.3A Active CN116403232B (zh) 2023-04-13 2023-04-13 一种基于像素值波动的图书信息提取方法

Country Status (1)

Country Link
CN (1) CN116403232B (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810488A (zh) * 2012-11-09 2014-05-21 阿里巴巴集团控股有限公司 图像特征提取方法、图像搜索方法、服务器、终端及系统
KR101878239B1 (ko) * 2017-03-22 2018-07-13 경남대학교 산학협력단 모바일 로봇 기반의 도서 관리 시스템
CN110210546A (zh) * 2019-05-24 2019-09-06 江西理工大学 一种基于图像处理的书籍自动归类方法
KR102187053B1 (ko) * 2019-12-02 2020-12-04 (주)라온파트너스 도서 정보 제공 서버 및 그에 의한 도서 정보 제공 방법
CN114281982A (zh) * 2021-12-29 2022-04-05 中山大学 一种多模态融合技术的图书宣传摘要生成方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015192246A1 (en) * 2014-06-19 2015-12-23 Bitlit Media Inc Method and system for identifying books on a bookshelf

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810488A (zh) * 2012-11-09 2014-05-21 阿里巴巴集团控股有限公司 图像特征提取方法、图像搜索方法、服务器、终端及系统
KR101878239B1 (ko) * 2017-03-22 2018-07-13 경남대학교 산학협력단 모바일 로봇 기반의 도서 관리 시스템
CN110210546A (zh) * 2019-05-24 2019-09-06 江西理工大学 一种基于图像处理的书籍自动归类方法
KR102187053B1 (ko) * 2019-12-02 2020-12-04 (주)라온파트너스 도서 정보 제공 서버 및 그에 의한 도서 정보 제공 방법
CN114281982A (zh) * 2021-12-29 2022-04-05 中山大学 一种多模态融合技术的图书宣传摘要生成方法和系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
图书馆在架图书的索书号图像提取与分割;方建军 等;北京联合大学学报(自然科学版)(第01期);全文 *
方建军 等.图书馆在架图书的索书号图像提取与分割.北京联合大学学报(自然科学版).2015,(第01期),全文. *

Also Published As

Publication number Publication date
CN116403232A (zh) 2023-07-07

Similar Documents

Publication Publication Date Title
Hassanin et al. A real-time approach for automatic defect detection from PCBs based on SURF features and morphological operations
Pan et al. A robust system to detect and localize texts in natural scene images
CN104751142B (zh) 一种基于笔划特征的自然场景文本检测方法
CN111860536B (zh) 一种图像识别方法、装置及存储介质
CN101122953A (zh) 一种图片文字分割的方法
Susan et al. Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues
Akbani et al. Character recognition in natural scene images
Chidiac et al. A robust algorithm for text extraction from images
CN108921006B (zh) 手写签名图像真伪鉴别模型建立方法及真伪鉴别方法
Anjomshoae et al. Enhancement of template-based method for overlapping rubber tree leaf identification
Damayanti et al. Indonesian license plate recognition based on area feature extraction
Karanje et al. Survey on text detection, segmentation and recognition from a natural scene images
Rani et al. Detection and removal of graphical components in pre-printed documents
CN116403232B (zh) 一种基于像素值波动的图书信息提取方法
Van Phan et al. Collecting handwritten nom character patterns from historical document pages
Kavitha et al. A robust script identification system for historical Indian document images
Chatbri et al. An application-independent and segmentation-free approach for spotting queries in document images
Rajithkumar et al. Template matching method for recognition of stone inscripted Kannada characters of different time frames based on correlation analysis
CN115731550A (zh) 一种基于深度学习的药品说明书自动识别方法、系统及存储介质
Zhuge et al. Robust video text detection with morphological filtering enhanced MSER
Padma et al. Script Identification from Trilingual Documents using Profile Based Features.
Liu et al. A prototype system of courtesy amount recognition for Chinese Bank checks
Höhn Detecting arbitrarily oriented text labels in early maps
Padma et al. Entropy based texture features useful for automatic script identification
CN115995080B (zh) 基于ocr识别的档案智能管理系统

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant