CN114187600A - An Auxiliary System for Intelligent Management of Measuring Assets - Google Patents

An Auxiliary System for Intelligent Management of Measuring Assets Download PDF

Info

Publication number
CN114187600A
CN114187600A CN202111465318.3A CN202111465318A CN114187600A CN 114187600 A CN114187600 A CN 114187600A CN 202111465318 A CN202111465318 A CN 202111465318A CN 114187600 A CN114187600 A CN 114187600A
Authority
CN
China
Prior art keywords
characters
classifier
recognition
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111465318.3A
Other languages
Chinese (zh)
Inventor
郭琰
王瑾
吕甜甜
任建民
王浩
甘源
穆萍
杨媛
星玥
李永国
周洪煜
潘沙沙
乔阳波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haidong Power Supply Company State Grid Qinghai Electric Power Co ltd
State Grid Corp of China SGCC
State Grid Qinghai Electric Power Co Ltd
Original Assignee
Haidong Power Supply Company State Grid Qinghai Electric Power Co ltd
State Grid Corp of China SGCC
State Grid Qinghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haidong Power Supply Company State Grid Qinghai Electric Power Co ltd, State Grid Corp of China SGCC, State Grid Qinghai Electric Power Co Ltd filed Critical Haidong Power Supply Company State Grid Qinghai Electric Power Co ltd
Priority to CN202111465318.3A priority Critical patent/CN114187600A/en
Publication of CN114187600A publication Critical patent/CN114187600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

一种计量资产智能化管理的辅助系统,涉及OCR识别技术领域,其步骤为:运行步骤为开始→登陆界面→输入账号密码→检查信息→正确进入系统→上传照片→OCR文字识别→识别预处理→特征提取和降维→分类器设计、训练和实际识别→OCR识别后处理→输出结果。本发明的有益效果在于:OCR技术让大家减少了设备配置,降低了人力成本,提高了工作效率。

Figure 202111465318

An auxiliary system for intelligent management of metering assets relates to the technical field of OCR identification. →Feature extraction and dimensionality reduction →Classifier design, training and actual recognition →OCR recognition post-processing →Output results. The beneficial effects of the present invention are: the OCR technology allows everyone to reduce equipment configuration, reduce labor costs, and improve work efficiency.

Figure 202111465318

Description

Auxiliary system for intelligent management of metering assets
Technical Field
The invention relates to the technical field of OCR (optical character recognition), in particular to an auxiliary system for intelligent management of metering assets.
Background
The weight of the transformer, the position of the nameplate and the installation space limit the way of looking over the nameplate information, and in the prior working mode, when the transformer of a user is received and handled in the early stage, when the staff need to make a consignment checking order, transact the procedures of warehousing and ex-warehouse, record and maintain the paper, manually input the information of the electronic version of the mutual inductor, and check, accept, meter installation and power supply at the later stage, in order to match the field mutual inductor with the early-stage transformer for inspection, the mutual inductor information is required to be checked and recorded, and the system can be communicated with a worker handling the procedures of receiving and sending the mutual inductor by telephone to confirm whether the mutual inductor is matched or not and whether a certificate is verified or not, the work is complicated, the office must be supported by the information of the worker, and the electricity stealing prevention is realized during the work of checking the power mutual inductor of the electricity consumer, the paper data which are difficult to store and query are required to be checked for many years ago; the tool for intelligently managing and metering assets solves the problems of difficulty in checking mutual inductor information and data in work, long waiting time of a client in the process of checking mutual inductor information, acceptance and power supply user mutual inductor information and the like of a user.
Disclosure of Invention
The invention provides an auxiliary system for intelligent management of metering assets, which is characterized in that: the operation steps are beginning → login interface → input account password → check information → correctly enter the system → upload photos → OCR character recognition → recognition preprocessing → feature extraction and dimension reduction → classifier design, training and actual recognition → OCR recognition post-processing → output result; and if the input account password is incorrect, returning to the login interface.
The identification pretreatment is graying, and the color image is subjected to the substeps of noise reduction, binarization, character segmentation and normalization; after binarization, the image only has two colors, namely black and white, wherein one color is an image background, and the other color is a character to be identified; the noise reduction algorithm influences the feature extraction, characters in an image are divided into single characters by character segmentation, the characters are recognized word by word during recognition, if lines of the characters are inclined, inclination correction is usually carried out, normalization is carried out to regulate single character images to the same size, and a unified algorithm is applied under the same specification.
The characteristic extraction and dimension reduction: the characters are information for identifying characters, and each different character can be distinguished from other characters through the characteristics; the feature extraction of the numbers and the English letters is easy, because the numbers are only 10, the English letters are only 52, and all the letters are small character sets, the feature extraction is difficult for Chinese characters, the Chinese characters are large character sets, and 3755 first-level Chinese characters which are most commonly used in the national standard and Chinese light are used; the Chinese character has a complex structure and many characters with similar shapes, after determining which kind of characteristics are used, the characteristics are subjected to characteristic dimension reduction, the characteristics are represented by a vector, the dimension is the component number of the vector, if the dimension of the characteristics is too high, the efficiency of the classifier can be greatly influenced, the dimension reduction needs to reduce the dimension, and the characteristic vector after the dimension reduction also reserves enough information content to distinguish different characters.
The classifier design, training and actual recognition: the classifier is used for identification, such as feature extraction and dimension reduction, for a character image, features are extracted and sent to the classifier, the classifier classifies the character image, which character the features are identified into is determined, and before actual identification, the classifier is trained to supervise learning cases.
The OCR recognition post-processing: the classification result is optimized by post-processing, the classification of the classifier occasionally makes mistakes, such as the recognition of Chinese characters, because of the existence of the shape-close characters in the Chinese characters, a character is easily recognized as the shape-close character, and the problem can be solved in the post-processing, such as the correction is carried out through a language model, if the classifier recognizes the 'where' to store ', the' where 'to store' is found to be wrong through the language model, and then the correction is carried out; the OCR recognition image has a large amount of characters which have complex conditions of typesetting, font size and the like, and the recognition result is formatted in the post-processing and output according to the typesetting arrangement in the image.
The uploading photo is an uploading mutual inductor nameplate photo.
The core of the invention is to use artificial intelligence OCR (character recognition) technology, which refers to the process of analyzing and recognizing the image file of the text data to obtain characters and layout information, and also recognizes the characters in the image and returns the characters in the form of text.
Through the accurate mutual-inductor data plate information of different models of different producers of discernment of using this technique for the mutual-inductor data plate information direct conversion who shoots the scanning can edit the characters type and record in this instrument, exports the form of selected data with the Excel form simultaneously, has not only solved the problem of the information that the operational environment caused under the complicacy and check the difficulty, record check consuming time of a specified duration through the scanning of shooing.
The invention has the beneficial effects that: through the development of the invention, the papery record and the secondary manual electronic record when the former customer inspects the power transformer are changed into the electronic record which is changed along with the scanning and the checking, and the field acceptance inspection of the power transformer, the power supply operation inspection of the power transformer, the anti-electricity-stealing inspection of the power transformer and the later maintenance of the power transformer are changed into the working mode that the user information can be mastered along with the scanning and the checking in order to take out the mobile phone which is carried along. From slowly reading to grasp at any time, from original manual work to present AI intelligence, what improve is not only work efficiency, operating mass, has changed the working method from the root, has also saved the time of handling procedures, waiting to complete the acceptance, power supply for the customer simultaneously, faster, better for user service.
OCR technology has let everybody reduce equipment configuration, has reduced the human cost, has improved work efficiency.
Description of the drawings:
FIG. 1 is a block diagram of the process of the present invention.
Detailed Description
Embodiment 1, an auxiliary system for intelligent management of a metered asset, characterized in that: the operation steps are beginning → login interface → input account password → check information → correctly enter the system → upload photos → OCR character recognition → recognition preprocessing → feature extraction and dimension reduction → classifier design, training and actual recognition → OCR recognition post-processing → output result; and if the input account password is incorrect, returning to the login interface.
The identification pretreatment is graying, and the color image is subjected to the substeps of noise reduction, binarization, character segmentation and normalization; after binarization, the image only has two colors, namely black and white, wherein one color is an image background, and the other color is a character to be identified; the noise reduction algorithm influences the feature extraction, characters in an image are divided into single characters by character segmentation, the characters are recognized word by word during recognition, if lines of the characters are inclined, inclination correction is usually carried out, normalization is carried out to regulate single character images to the same size, and a unified algorithm is applied under the same specification.
The characteristic extraction and dimension reduction: the characters are information for identifying characters, and each different character can be distinguished from other characters through the characteristics; the feature extraction of the numbers and the English letters is easy, because the numbers are only 10, the English letters are only 52, and all the letters are small character sets, the feature extraction is difficult for Chinese characters, the Chinese characters are large character sets, and 3755 first-level Chinese characters which are most commonly used in the national standard and Chinese light are used; the Chinese character has a complex structure and many characters with similar shapes, after determining which kind of characteristics are used, the characteristics are subjected to characteristic dimension reduction, the characteristics are represented by a vector, the dimension is the component number of the vector, if the dimension of the characteristics is too high, the efficiency of the classifier can be greatly influenced, the dimension reduction needs to reduce the dimension, and the characteristic vector after the dimension reduction also reserves enough information content to distinguish different characters.
The classifier design, training and actual recognition: the classifier is used for identification, such as feature extraction and dimension reduction, for a character image, features are extracted and sent to the classifier, the classifier classifies the character image, which character the features are identified into is determined, and before actual identification, the classifier is trained to supervise learning cases.
The OCR recognition post-processing: the classification result is optimized by post-processing, the classification of the classifier occasionally makes mistakes, such as the recognition of Chinese characters, because of the existence of the shape-close characters in the Chinese characters, a character is easily recognized as the shape-close character, and the problem can be solved in the post-processing, such as the correction is carried out through a language model, if the classifier recognizes the 'where' to store ', the' where 'to store' is found to be wrong through the language model, and then the correction is carried out; the OCR recognition image has a large amount of characters which have complex conditions of typesetting, font size and the like, and the recognition result is formatted in the post-processing and output according to the typesetting arrangement in the image.
Example 2; the utility model provides an auxiliary system of intelligent management of measurement mutual-inductor which characterized in that: the operation steps are beginning → login interface → input account password → check information → correct entering system → upload mutual inductor information card photo → OCR character recognition → recognition preprocessing → feature extraction and dimension reduction → classifier design, training and actual recognition → OCR recognition post-processing → output result; and if the input account password is incorrect, returning to the login interface.

Claims (6)

1.一种计量资产智能化管理的辅助系统,其特征在于:运行步骤为开始→登陆界面→输入账号密码→检查信息→正确进入系统→上传照片→OCR文字识别→识别预处理→特征提取和降维→分类器设计、训练和实际识别→OCR识别后处理→输出结果;如果输入账号密码不正确返回登陆界面。1. An auxiliary system for intelligent management of metering assets, characterized in that: the operation steps are: start → login interface → input account password → check information → enter the system correctly → upload photos → OCR character recognition → recognition preprocessing → feature extraction and Dimension reduction → classifier design, training and actual recognition → OCR recognition post-processing → output results; if the input account password is incorrect, return to the login interface. 2.根据权利要求1所述的一种计量资产智能化管理的辅助系统,其特征在于:所述识别预处理为灰度化,将彩色图像通过降噪、二值化、字符切分以及归一化这些子步骤;经过二值化后,图像只剩下两种颜色,即黑和白,其中一个是图像背景,另一个颜色就是要识别的文字了;降噪算法影响特征提取,字符切分将图像中的文字分割成单个文字,识别的时候是一个字一个字识别的,如果文字行有倾斜的话往往还要进行倾斜校正,归一化将单个的文字图像规整到同样的尺寸,在同一个规格下,应用统一的算法。2 . The auxiliary system for intelligent management of metering assets according to claim 1 , wherein the identification preprocessing is grayscale, and the color image is processed by noise reduction, binarization, character segmentation and normalization. 3 . These sub-steps are normalized; after binarization, there are only two colors left in the image, namely black and white, one of which is the image background, and the other color is the text to be recognized; the noise reduction algorithm affects the feature extraction, and the character cuts Divide the text in the image into a single text, and recognize it one word at a time. If the text line is skewed, it is often necessary to perform skew correction, and normalize the single text image to the same size. Under the same specification, a unified algorithm is applied. 3.根据权利要求1所述的一种计量资产智能化管理的辅助系统,其特征在于:所述特征提取和降维:特征是用来识别文字的信息,每个不同的文字都能通过特征来和其他文字进行区分;对于数字和英文字母这个特征提取是比较容易,因为数字只有10个,英文字母只有52个,都是小字符集,对于汉字来说,特征提取困难,汉字是大字符集,国标中光是最常用的第一级汉字就有3755个;汉字结构复杂,形近字多,在确定了使用何种特征后,进行特征降维,特征用一个向量表示,维数即该向量的分量数,如果特征的维数太高,分类器的效率会受到很大的影响,降维既要降低维数,又得减少维数后的特征向量还保留了足够的信息量,以区分不同的文字。3. The auxiliary system for the intelligent management of measuring assets according to claim 1, wherein: the feature extraction and dimensionality reduction: the feature is the information used to identify characters, and each different character can pass the feature To distinguish it from other characters; it is easier to extract features for numbers and English letters, because there are only 10 numbers and only 52 English letters, all of which are small character sets. For Chinese characters, feature extraction is difficult, and Chinese characters are large characters Set, there are 3755 first-level Chinese characters in the national standard alone; Chinese characters are complex in structure, and there are many similar characters. After determining which feature to use, the feature dimension reduction is carried out. The feature is represented by a vector, and the dimension is The number of components of the vector, if the dimension of the feature is too high, the efficiency of the classifier will be greatly affected. Dimension reduction not only reduces the dimension, but also retains enough information in the feature vector after reducing the dimension. to distinguish between different texts. 4.根据权利要求1所述的一种计量资产智能化管理的辅助系统,其特征在于:所述分类器设计、训练和实际识别:分类器是用来进行识别的,如特征提取和降维,对一个文字图像,提取出特征给分类器,分类器就对其进行分类,确定特征该识别成哪个文字,在进行实际识别前,要对分类器进行训练,监督学习的案例。4. The auxiliary system for intelligent management of metering assets according to claim 1, characterized in that: the classifier is designed, trained and actually identified: the classifier is used for identification, such as feature extraction and dimensionality reduction , for a text image, extract the features to the classifier, the classifier will classify it, and determine which text the feature should be recognized as. Before the actual recognition, the classifier should be trained to supervise the learning case. 5.根据权利要求1所述的一种计量资产智能化管理的辅助系统,其特征在于:所述OCR识别后处理:后处理对分类结果进行优化,分类器的分类偶尔会出错,如对汉字的识别,由于汉字中形近字的存在,很容易将一个字识别成其形近字,后处理中可以去解决这个问题,比如通过语言模型来进行校正——如果分类器将“在哪里”识别成“存哪里”,通过语言模型会发现“存哪里”是错误的,然后进行校正;OCR识别图像会有大量文字,这些文字存在排版、字体大小等复杂情况,后处理中去对识别结果进行格式化,按照图像中的排版排列输出结果。5. The auxiliary system for intelligent management of metering assets according to claim 1, characterized in that: the post-processing of the OCR recognition: the post-processing optimizes the classification results, and the classification of the classifier occasionally makes mistakes, such as for Chinese characters. Because of the existence of close characters in Chinese characters, it is easy to recognize a word as its close characters, and this problem can be solved in post-processing, such as correction through language model - if the classifier will "where" If it is recognized as "where to store", the language model will find that "where to store" is wrong, and then correct it; the OCR recognition image will have a lot of text, and these texts have complex situations such as typesetting and font size, and the recognition results will be checked in post-processing. Format the output according to the layout in the image. 6.根据权利要求1所述的一种计量资产智能化管理的辅助系统,其特征在于:所述上传照片是上传互感器铭牌照片。6 . The auxiliary system for intelligent management of metering assets according to claim 1 , wherein the uploaded photo is an uploaded photo of the nameplate of the transformer. 7 .
CN202111465318.3A 2021-12-03 2021-12-03 An Auxiliary System for Intelligent Management of Measuring Assets Pending CN114187600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111465318.3A CN114187600A (en) 2021-12-03 2021-12-03 An Auxiliary System for Intelligent Management of Measuring Assets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111465318.3A CN114187600A (en) 2021-12-03 2021-12-03 An Auxiliary System for Intelligent Management of Measuring Assets

Publications (1)

Publication Number Publication Date
CN114187600A true CN114187600A (en) 2022-03-15

Family

ID=80542101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111465318.3A Pending CN114187600A (en) 2021-12-03 2021-12-03 An Auxiliary System for Intelligent Management of Measuring Assets

Country Status (1)

Country Link
CN (1) CN114187600A (en)

Similar Documents

Publication Publication Date Title
EP3432197B1 (en) Method and device for identifying characters of claim settlement bill, server and storage medium
CN110751143A (en) Electronic invoice information extraction method and electronic equipment
US11232300B2 (en) System and method for automatic detection and verification of optical character recognition data
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
CN112800848A (en) Structured extraction method, device and equipment of information after bill identification
US20140064618A1 (en) Document information extraction using geometric models
US20070237427A1 (en) Method and system for simplified recordkeeping including transcription and voting based verification
US8326041B2 (en) Machine character recognition verification
CN109886270B (en) A Case Element Recognition Method Oriented to Electronic File Transcripts
CN110705952A (en) A contract review method and device
EP3588376A1 (en) System and method for enrichment of ocr-extracted data
CN113569863B (en) Document checking method, system, electronic equipment and storage medium
US20130318110A1 (en) System for data extraction and processing
CN110516664A (en) Bill identification method and device, electronic equipment and storage medium
US20220292861A1 (en) Docket Analysis Methods and Systems
CN109271951A (en) A kind of method and system promoting book keeping operation review efficiency
CN112215225B (en) KYC certificate verification method based on computer vision technology
CN117831056A (en) Bill information extraction method, device and bill information extraction system
CN113841156B (en) Control method and device based on image recognition
CN114187600A (en) An Auxiliary System for Intelligent Management of Measuring Assets
CN117807967A (en) Financial account reporting method and device based on OCR intelligent form filling and electronic equipment
US20230350932A1 (en) Automated splitting of document packages and identification of relevant documents
CN112348022B (en) Free-form document identification method based on deep learning
CN115482075A (en) Financial data anomaly analysis method and device, electronic equipment and storage medium
CN113935296A (en) Method for extracting paper bank flow information by using sliding template technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination