CN117115289A - Method and system for converting OFG format file into picture based on MapReduce license - Google Patents

Method and system for converting OFG format file into picture based on MapReduce license Download PDF

Info

Publication number
CN117115289A
CN117115289A CN202311090890.5A CN202311090890A CN117115289A CN 117115289 A CN117115289 A CN 117115289A CN 202311090890 A CN202311090890 A CN 202311090890A CN 117115289 A CN117115289 A CN 117115289A
Authority
CN
China
Prior art keywords
data
license
image
file
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311090890.5A
Other languages
Chinese (zh)
Inventor
徐伟进
宁方刚
迟钰沛
陈兆亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN202311090890.5A priority Critical patent/CN117115289A/en
Publication of CN117115289A publication Critical patent/CN117115289A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a method and a system for converting an OFG format file into a picture based on a MapReduce license, which belong to the technical field of picture processing, and the method and the system solve the technical problem of how to quickly extract photo information of an OFD file, effectively increase the use strength of an electronic license and promote the application service level of the electronic license system, and adopt the following technical scheme: acquiring data; data conversion: different DPI settings are completed according to different license types, so that definition of conversion results is ensured, and the size of disk space occupied by data is ensured to be reduced as much as possible; fusing data conversion into MapReduce multitasking for processing, and finishing data processing; wherein, mapReduce comprises Map and Reduce two stages to finish; extracting the illumination item data in the license image based on a texture filter image algorithm; warehousing of data conversion results and deletion of temporary files: storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting the temporary file before and after conversion.

Description

Method and system for converting OFG format file into picture based on MapReduce license
Technical Field
The invention relates to the technical field of picture processing, in particular to a method and a system for converting an OFG format file based on a MapReduce license into a picture.
Background
At present, license images are generally stored in a license library through a BASE64, but because the BASE64 occupies a relatively large space, license items of many license systems are not stored or data are not standard, so that license data are lost. On the other hand, the license image extraction function based on the license system is limited by the influence of other functions of the license system, and the extraction of mass license face data cannot be performed. Meanwhile, license images (such as portraits on identity cards) have wide use requirements in checking identity information, professional qualification certification, shopping website certification and the like.
The establishment of the sound license image library is beneficial to the establishment of an information platform, and the data resources of each department can be fully integrated. By collecting, integrating, analyzing and other processes on population data and integrating information and resources of all departments, the system not only can provide government departments with government service application, but also can provide civil service for vast citizens. The information island can be fully broken, and the sharing and complementation of population information can be fully realized.
The part of electronic license system obtains license face information and adopts a third-party conversion interface, the requirement of large-batch real-time conversion cannot be met, the conversion definition and the conversion size cannot be controlled, the task of cooperation of two systems is involved, the problem of inquiry is time-consuming and labor-consuming, and the defect of large size is overcome.
The OFD format file is read at the computer end and must be provided with related plug-ins, and can not be simultaneously matched with all browsers, so that the compatibility at the mobile phone APP end is poorer, and related information such as license image coverage and the like can not be displayed in real time, so that the image information of the OFD format file is timely extracted and an image file library of the license format file is formed according to types, and the problem is urgent to be solved.
Therefore, how to extract the photo information of the OFD file rapidly, effectively increase the use strength of the electronic license, and promote the application service level of the electronic license system is a technical problem to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a method and a system for converting an OFG format file based on a MapReduce license into a picture, which are used for solving the problems of rapidly extracting photo information of the OFD file, effectively increasing the use strength of the electronic license and promoting the application service level of the electronic license system.
The technical task of the invention is realized in the following way, and the method for converting the MapReduce license OFG format file into the picture is specifically as follows:
acquiring data: according to the electronic license accessory index data table, reading an electronic license OFD format file from a MongDB library and storing the electronic license OFD format file on a server, and simultaneously reading an electronic license type code corresponding to the electronic license OFD format file to provide basic data support for MapReduce to construct key value pairs;
data conversion: different DPI settings are completed according to different license types, so that definition of conversion results is ensured, and the size of disk space occupied by data is ensured to be reduced as much as possible;
fusing data conversion into MapReduce multitasking for processing, and finishing data processing; wherein, mapReduce comprises Map and Reduce two stages to finish;
extracting the illumination item data in the license image based on a texture filter image algorithm;
warehousing of data conversion results and deletion of temporary files: storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting the temporary file before and after conversion.
Preferably, the data conversion is fused into MapReduce multitasking for processing, and the data processing is completed specifically as follows:
Map tasks are carried out on the temporarily stored files and the corresponding license type codes, all acquired license OFD format file data are disassembled, and Map task number establishment is completed according to the number of license types of the actual license format files; simultaneously taking a plurality of electronic license OFD format files with different license types as a data block, wherein each Map task is responsible for processing a data sheet containing the electronic license format files with different license types; after receiving the fragmented data containing the electronic format file, the Map task defaults to take each electronic license format data file of the fragments as a record, completes the sequential processing of each record, and finally outputs a plurality of key value pairs containing the input electronic license format file and the output lightweight picture data;
the Reduce stage is to recombine the converted license format file results, each Reduce task process can process and bind the electronic license format file and output the key value pair of the lightweight picture data, and only one calculation result is finally output for the same type of key value Reduce stage.
More preferably, the method for extracting the illumination item data in the license image based on the texture filter image algorithm is specifically as follows:
Reading license image data of a Reduce stage result;
creating a texture image of the result file by using an enthropoyfilt function;
converting the texture image into a gray level binary image, and displaying textures of different parts of the license image data by using a Bwareaopen function; the textures of different parts of the license image data comprise bottom textures of the image and a flooded film image of the textures;
image separation is carried out by using a texture filter algorithm, and picture illumination items are extracted; the image segmentation of the texture filter is to divide different areas of the image according to different texture features presented in the image; the textures can embody the color or internal composition mechanism of any object on the image; the texture can effectively detect the boundary of the object, so that the extraction of target elements is completed;
and finishing the extraction of the picture illumination item data.
More preferably, the data conversion is specifically as follows:
reading the license OFD format file stream by using OFDReader class;
initializing a data conversion class by using a Ofd Img class;
traversing OFD format files of each license type, completing conversion from the license OFD files to image files by using Ofd Img types, and setting different DPIs according to different license base graphs, so that definition is ensured, disk space is reduced, and in addition, data conversion speed is improved;
And constructing a picture output class ImageIO, finishing the storage of the output image file, and closing the related file stream.
A system for converting an OFG format file into a picture based on a MapReduce license comprises a data input module, a data processing module and a data storage and evaluation module;
the data input module is used for analyzing the OFD format file of the electronic license in the MongDB, completing conversion and image recognition, and further completing the warehousing of the data of different license types;
the data processing module is used for converting the stored license plate into a lightweight picture, carrying out parallel processing on the lightweight picture by utilizing Hadoop MapReduce, and then identifying a license image by using image processing Su research and development to finish the extraction of a license picture illumination item;
the data storage and evaluation module is used for storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting temporary files before and after conversion; and then, detecting whether the data size and the content conversion of the data are normal or not and whether the pictures of the converted format file are clear or not according to the sampling detection method by each license type of the generated result file.
Preferably, the working process of the data input module is as follows:
(1) Reading a license type code of the license type of the image data to be extracted;
(2) Acquiring all license identifiers of the type of license according to the license type code;
(3) Acquiring index information stored in a MongDB by a license attachment according to the license identifier;
(4) Reading the license format file data file stream through an index field in the table structure;
(5) And converting the file stream into an license format OFD file and storing the license format OFD file into a server.
Preferably, the data processing module comprises a data conversion sub-module, a Hadoop MapReduce parallel processing sub-module and a data extraction sub-module;
the data conversion sub-module is used for converting the stored license plate type file into a lightweight picture without distortion, so that the reading is convenient, and the space occupied by storage is reduced; the working process of the data conversion sub-module is specifically as follows:
(1) Reading the license OFD format file stream by using OFDReader class;
(2) Initializing a data conversion class by using a Ofd Img class;
(3) Traversing OFD format files of each license type, completing conversion from the license OFD files to image files by using Ofd Img types, and setting different DPIs according to different license base graphs, so that definition is ensured, disk space is reduced, and in addition, data conversion speed is improved;
(4) Constructing a picture output class ImageIO, completing the storage of an output image file, and closing a related file stream;
the Hadoop MapReduce parallel processing sub-module is used for completing the rapid extraction of image data in a multi-task and multi-process mode;
the data extraction sub-module is used for completing identification of the generated license picture by using an image processing algorithm, and further completing extraction of the license image illumination item.
Preferably, the MapReduce in the Hadoop MapReduce parallel processing submodule is divided into two phases in total, namely Map and Reduce are completed;
the Map stage is the disassembly of the data task and comprises a plurality of Map tasks, a plurality of data blocks (fragments) are input in the Map stage, and a plurality of electronic license format files can be simultaneously used as one data block, and each Map task is responsible for processing a data sheet containing the electronic license format files of a plurality of license types; after receiving the fragmented data containing the electronic license format file, the Map task defaults to take each electronic license format data file of the fragments as a record, completes the sequential processing of each record, and finally outputs a plurality of key value pairs containing the input electronic license format file and the output lightweight picture data;
The Reduce stage is to re-combine the results, finish the data re-output according to the requirement of the setter, one or more Reduce tasks exist, and in some special scenes, the stage can be omitted, the stage is completely formulated by project requirements, each Reduce task can process and bind the electronic license format file and output the key value pair of the lightweight picture data, and only one result is finally output for the same type of key value Reduce stage; wherein, the data generated by Map task is processed by the buffer to realize the partitioning, ordering and merging operation, and the result is output to the distributed file storage system (Hadoop Distributed File System, HDFS) to be stored;
the Map task divides the OFD format file containing a plurality of license types into a plurality of records, and because the Map task divides the OFD electronic license format file data of a plurality of different license types into a plurality of records, each Map task can output the OFD electronic license format file data of a plurality of different license types and key value pairs of the light-weight electronic license pictures corresponding to the OFD electronic license format file data of the different license types, and the key value pairs are divided into the different electronic license types according to whether keys are the same or not: the keys are the same, belonging to the same electronic license format file type; the keys are different, and belong to another type of electronic license format file type; because the Map task is finished and a plurality of key value pairs of different electronic license types can be generated, the different key value pairs are processed through the partition and matched with the Reduce task, one type of electronic license format file corresponds to one type of partition, one type of partition corresponds to one Reduce task, the electronic license types among different partitions are different, each Reduce task corresponds to different license types, the license type codes are different, different license type data enter different Tereduce tasks, the same type of key value pair is guaranteed to be sent to the same Reduce task, and the data processing is completed;
The number of Map tasks is determined by the number of the input OFD electronic license format file data fragments, and the number of Reduce tasks is specified according to the number of the input license types; the slicing refers to an OFD electronic license format data file composed of a plurality of different license types.
Preferably, the image recognition extraction sub-module performs image recognition by classifying and extracting key features to exclude redundant information; the method comprises the following steps:
information acquisition: converting light or sound information into electric information through a sensor, namely acquiring basic information of a study object and converting the basic information into information which can be recognized by a machine;
pretreatment: denoising and smoothing the image to transform so as to strengthen key features of the image;
feature extraction and selection: in pattern recognition, extracting and selecting features; a simple understanding is that the images that we are looking at are of a wide variety, if they are to be distinguished by some method, they are identified by their own features, and the process of obtaining these features is feature extraction; the features obtained in feature extraction may not be useful for this identification, and at this time, useful features are extracted, which is the choice of features; feature extraction and selection are one of the very critical technologies in the image recognition process, so understanding this step is the focus of image recognition; the method comprises the steps of extracting a picture item on a license by using image textures, wherein the textures are formed by a plurality of elements which are close to each other and have relatively close attributes, the spatial distribution attribute reflecting the gray level of pixels in an area is called as the texture in image analysis, the color of the image surface is reflected, the change of gray values is reflected, in addition, certain texture characteristics can have directionality, and the gray distribution property or the direction information of the image can be called as the texture; texture can be theoretically classified into artificial and natural textures, symbols on natural background can be called artificial textures such as points, lines, faces, letters, numbers, etc., while natural textures are repeatedly arranged scenes with certain regular properties such as tiles on the roof, grasslands, forests, etc.; the image texture feature extraction is a processing process for extracting texture feature parameters through an image processing technology so as to acquire quantitative or qualitative description of textures, and the image texture extraction is specifically as follows:
(1) Reading license image data of a Reduce stage result;
(2) Creating a texture image of the result file by using an enthropoyfilt function;
(3) Converting the texture image into a gray level binary image, and displaying textures of different parts of the license image data by using a Bwareaopen function; the textures of different parts of the license image data comprise bottom textures of the image and a flooded film image of the textures;
(4) Image separation is carried out by using a texture filter algorithm, and picture illumination items are extracted; the image segmentation of the texture filter is to divide different areas of the image according to different texture features presented in the image; the textures can embody the color or internal composition mechanism of any object on the image; the texture can effectively detect the boundary of the object, so that the extraction of target elements is completed;
(5) And finishing the extraction of the picture illumination item data.
More preferably, the data storage and evaluation module comprises a data storage sub-module and a data evaluation sub-module;
the working process of the data storage sub-module is specifically as follows:
(1) Obtaining result data extracted from the picture illumination item image;
(2) Acquiring MongDB configuration information of a picture photo frame;
(3) Sequentially storing the result data into a MongDB, and recording indexes of the data in a MongDB picture library;
(4) The index is stored in the database, so that the query operation of the data is facilitated;
(5) Deleting the generated temporary file to release space; the temporary file comprises a source license format file and a result file generated by the reduce;
the data evaluation sub-module evaluates data conversion results by using qualification rate, conversion efficiency and response speed; the qualification rate P (Percent of Pass) is an evaluation index of accuracy of the extracted picture coverage item result for the OFD format file data, and represents a ratio of a qualified sample (i.e., proper data size, normal data content conversion, clear picture of the converted format file, etc.) to the total number of sampling samples, and the formula is as follows:
P=TP/(TP+FP);
TP is the number of qualified sample data; FP is the number of failed samples; if the P value is lower, the problems of proper data size, normal data content conversion and extracted picture definition of the photo are detected, and corresponding logic is adjusted to improve corresponding qualification rate according to different problems;
the conversion Efficiency E (Efficiency) is an evaluation index of the conversion speed of the photo frame picture result extracted from the OFD format file data, namely, the ratio of the time for completing data conversion to the time for completing conversion of the electronic license format file data by the common method of the electronic license system, and the formula is as follows:
E=T1/T2;
Wherein T1 is the time for completing data conversion; t2 is the time for converting the common method of the electronic license system; the smaller the ratio of T1 to T2, the faster the switching speed is indicated;
response speed S (speed of response) indicates that for the converted data, the response speed of the interface is tested by using the POSTAN, if the speed is higher, the response speed of the accessed system or APP is higher, and the formula is as follows:
S=T3/T4;
wherein, T3 is the time of the interface obtaining the picture-taking item data after finishing the data; t4 is the time of the real-time conversion interface access of the electronic license system; the smaller the ratio of T3 to T4, the more pronounced the conversion effect, representing more significant conversion.
The method and the system for converting the MapReduce license OFG format file into the picture have the following advantages:
the invention supports the definition of the output of the customized electronic license format file, and the user can dynamically adjust the dpi value according to the output effect of each license type, thereby reducing the occupied memory space as much as possible and the cost under the condition of ensuring the definition of the converted picture;
secondly, the invention improves the response speed of other applications or APP, expands the application range of the electronic license, and the license picture photo items stored in the form of pictures are dominant in size, and the response speed after interface adaptation is also much faster due to the conversion time saving, thus improving the use experience of users;
The invention can rapidly complete the data extraction of license picture coverage items of tens of millions or even hundreds of millions of data, and can rapidly complete data conversion tasks by utilizing the parallel processing capability of MapReduce for a large number of electronic licenses;
the algorithm based on the texture filter image has the advantages of simplicity in implementation, high speed and good extraction effect;
the OFD file photo information is quickly extracted in a short time, so that the use strength of the electronic license can be effectively increased, and the application service level of the electronic license system is further promoted;
the method for quickly extracting the image elements in the evidence format OFD file is researched on the basis of MapReduce of Hadoop, data processing is realized on a plurality of servers, and identification and extraction of target images are quickly realized; the method has the advantages that the input end is a series of license data with different license types of the images to be extracted, the target images at the output end are stored in a classified mode according to the different license types, and the method is based on a MapReduce and has a powerful and rapid data processing mechanism, so that the data extraction of the images on the license accessories can be completed under the condition that the server resource allocation is reasonable, and the rapid extraction of the images is realized; wherein Hadoop is a distributed system infrastructure developed by Apache; a user may develop a distributed program under the architecture. And can utilize the cluster to carry on high-speed operation and storage; hadoop implements a distributed file system (Distributed File System), represented by HDFS (Hadoop Distributed File System). HDFS has the characteristics of high fault tolerance and can be deployed on inexpensive hardware; and can provide high throughput access to the data of the application program, the most core design of the framework of Hadoop is as follows: HDFS and MapReduce; HDFS provides storage for massive data, while MapReduce provides computation for massive data;
The MapReduce framework is adopted to realize multitasking conversion of the license OFD format file, support custom definition according to different license types, quickly complete light-weight picture file data conversion of the OFD format file, and complete extraction of picture photo item data in the license image based on an algorithm of a texture filter image;
the invention establishes a mechanism for completing multi-task parallel processing conversion based on the MapReduce of Hadoop, and realizes the rapid extraction of the picture coverage item data of the evidence;
and (nine) the invention realizes the function of self-defining conversion definition parameters according to different license types, and the final picture-taking data result can be used for license data sharing and application, and the extraction of the target image has the advantages of simple realization, high speed and good extraction effect.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart diagram of a method for converting an OFG layout file into a picture based on a MapReduce license;
FIG. 2 is a block flow diagram of a process for performing picture-taking items using texture filters;
FIG. 3 is a schematic diagram of the data flow of a MapReduce job;
FIG. 4 is a schematic flow chart of data preservation and data evaluation.
Detailed Description
The method and the system for converting the MapReduce license OFG format file into the picture are described in detail below with reference to the accompanying drawings and specific embodiments of the present invention.
Example 1:
as shown in fig. 1, the embodiment provides a method for converting an OFG format file into a picture based on a MapReduce license, which specifically includes the following steps:
s1, acquiring data: according to the electronic license accessory index data table, reading an electronic license OFD format file from a MongDB library and storing the electronic license OFD format file on a server, and simultaneously reading an electronic license type code corresponding to the electronic license OFD format file to provide basic data support for MapReduce to construct key value pairs;
s2, data conversion: different DPI settings are completed according to different license types, so that definition of conversion results is ensured, and the size of disk space occupied by data is ensured to be reduced as much as possible;
s3, fusing data conversion into MapReduce multitasking for processing, and finishing data processing; wherein, mapReduce comprises Map and Reduce two stages to finish;
s4, extracting the illumination item data in the license image based on a texture filter image algorithm;
s5, warehousing of data conversion results and deleting of temporary files: storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting the temporary file before and after conversion.
As shown in fig. 3, the data conversion is fused into MapReduce multitasking for processing, and the data processing is completed specifically as follows:
map tasks are carried out on the temporarily stored files and the corresponding license type codes, all acquired license OFD format file data are disassembled, and Map task number establishment is completed according to the number of license types of the actual license format files; simultaneously taking a plurality of electronic license OFD format files with different license types as a data block, wherein each Map task is responsible for processing a data sheet containing the electronic license format files with different license types; after receiving the fragmented data containing the electronic format file, the Map task defaults to take each electronic license format data file of the fragments as a record, completes the sequential processing of each record, and finally outputs a plurality of key value pairs containing the input electronic license format file and the output lightweight picture data;
the Reduce stage is to recombine the converted license format file results, each Reduce task process can process and bind the electronic license format file and output the key value pair of the lightweight picture data, and only one calculation result is finally output for the same type of key value Reduce stage.
As shown in fig. 2, in this embodiment, the extraction of the coverage item data in the license image based on the texture filter image algorithm is specifically as follows:
(1) Reading license image data of a Reduce stage result;
(2) Creating a texture image of the result file by using an enthropoyfilt function;
(3) Converting the texture image into a gray level binary image, and displaying textures of different parts of the license image data by using a Bwareaopen function; the textures of different parts of the license image data comprise bottom textures of the image and a flooded film image of the textures;
(4) Image separation is carried out by using a texture filter algorithm, and picture illumination items are extracted; the image segmentation of the texture filter is to divide different areas of the image according to different texture features presented in the image; the textures can embody the color or internal composition mechanism of any object on the image; the texture can effectively detect the boundary of the object, so that the extraction of target elements is completed;
(5) And finishing the extraction of the picture illumination item data.
The data conversion in this embodiment is specifically as follows:
(1) Reading the license OFD format file stream by using OFDReader class;
(2) Initializing a data conversion class by using a Ofd Img class;
(3) Traversing OFD format files of each license type, completing conversion from the license OFD files to image files by using Ofd Img types, and setting different DPIs according to different license base graphs, so that definition is ensured, disk space is reduced, and in addition, data conversion speed is improved;
(4) And constructing a picture output class ImageIO, finishing the storage of the output image file, and closing the related file stream.
Example 2:
the embodiment provides a system for converting an OFG format file into a picture based on a MapReduce license, which comprises a data input module, a data processing module and a data storage and evaluation module;
the data input module is used for analyzing the OFD format file of the electronic license in the MongDB, completing conversion and image recognition, and further completing the warehousing of the data of different license types;
the data processing module is used for converting the stored license plate into a lightweight picture, carrying out parallel processing on the lightweight picture by utilizing Hadoop MapReduce, and then identifying a license image by using image processing Su research and development to finish the extraction of a license picture illumination item;
the data storage and evaluation module is used for storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting temporary files before and after conversion; and then, detecting whether the data size and the content conversion of the data are normal or not and whether the pictures of the converted format file are clear or not according to the sampling detection method by each license type of the generated result file.
The data entry module in this embodiment: the method comprises the steps that an electronic license OFD format file input module is built according to the types of electronic licenses, the number of provincial electronic licenses can reach hundreds of millions, the number of the electronic license OFDs of a municipal electronic license system is millions, the electronic license OFD files have size differences due to different license types, and due to the limitation of resources, an electronic license issuing department cannot apply for extra-large resources to finish storage of the license format files, a plurality of electronic license systems can select MongoDB or a network disk to store data, and the MongoDB uses a slicing technology to achieve expansibility, high-performance real-time insertability, storage dynamics and the like.
The working process of the data input module in this embodiment is specifically as follows:
(1) Reading a license type code of the license type of the image data to be extracted;
(2) Acquiring all license identifiers of the type of license according to the license type code;
(3) Acquiring index information stored in a MongDB by a license attachment according to the license identifier;
(4) Reading the license format file data file stream through an index field in the table structure;
(5) And converting the file stream into an license format OFD file and storing the license format OFD file into a server.
The data processing module in the embodiment comprises a data conversion sub-module, a Hadoop MapReduce parallel processing sub-module and a data extraction sub-module;
the data conversion sub-module is used for converting the stored license plate type file into a lightweight picture without distortion, so that the reading is convenient, and the space occupied by storage is reduced; the working process of the data conversion sub-module is specifically as follows:
(1) Reading the license OFD format file stream by using OFDReader class;
(2) Initializing a data conversion class by using a Ofd Img class;
(3) Traversing OFD format files of each license type, completing conversion from the license OFD files to image files by using Ofd Img types, and setting different DPIs according to different license base graphs, so that definition is ensured, disk space is reduced, and in addition, data conversion speed is improved;
(4) Constructing a picture output class ImageIO, completing the storage of an output image file, and closing a related file stream;
the Hadoop MapReduce parallel processing submodule can achieve hundreds of millions for the current stored quantity of the provincial electronic license data, even if the market-level project has more than 500 tens of thousands of data, if the extraction of the image data is to be completed quickly, a powerful data processing tool must be selected for completing the quick extraction of the image data in a multitask and multiprocessing mode;
The data extraction sub-module is used for completing identification of the generated license picture by using an image processing algorithm, and further completing extraction of the license image illumination item.
In order to realize the rapid conversion from OFD to license picture files, the Hadoop MapReduce parallel processing submodule adopts a Hadoop MapReduce parallel processing submodule, and the MapReduce (MR for short) is a distributed computing software framework which is mainly used for solving the parallel operation of a large-scale data set (more than 1 TB). The main ideas of the framework are the concepts Map and Reduce, both of which originate in a functional programming language, and in part from a vector programming language. The method solves the problem that programmers run own programs on a distributed system under the condition that distributed parallel programming is not performed to the greatest extent. The main idea implementation is to specify a Map function to Map a set of key-value pairs to a new set of key-value pairs, and a concurrent Reduce function to ensure that each of all mapped key-value pairs share the same key-set. In addition, YARN is a general resource management system for scheduling resource management, and is used for realizing basic support for upper-layer applications. The three components of the resource manager (ResourceManager, RM), node Manager (NM) and application server (ApplicationMaster, AM) constitute a resource management system. The YARN has the main functions of cluster resource management and job scheduling/monitoring, RM realizes the resource management and scheduling of the whole cluster, NM manages the resources of single nodes, AM plays an important role in monitoring tasks and scheduling jobs, and the three are closely related to each other and cannot be separated.
In summary, MR is an application embodiment of yacn, the starting of MapReduce also invokes the starting of MR and AM processes, and under the coordination of resource allocation by the resource manager, each subtask completes starting, monitoring and real-time returning to the execution state of the task in the resource manager, the MR executes in the container only the map or reduce, and preferably executes the map task, and then executes the reduce task, so that the order cannot be reversed.
The MapReduce in the Hadoop MapReduce parallel processing submodule is divided into two phases in total, namely Map and Reduce are completed;
the Map stage is the disassembly of the data task and comprises a plurality of Map tasks, a plurality of data blocks (fragments) are input in the Map stage, and a plurality of electronic license format files can be simultaneously used as one data block, and each Map task is responsible for processing a data sheet containing the electronic license format files of a plurality of license types; after receiving the fragmented data containing the electronic license format file, the Map task defaults to take each electronic license format data file of the fragments as a record, completes the sequential processing of each record, and finally outputs a plurality of key value pairs containing the input electronic license format file and the output lightweight picture data;
The Reduce stage is to re-combine the results, finish the data re-output according to the requirement of the setter, one or more Reduce tasks exist, and in some special scenes, the stage can be omitted, the stage is completely formulated by project requirements, each Reduce task can process and bind the electronic license format file and output the key value pair of the lightweight picture data, and only one result is finally output for the same type of key value Reduce stage; wherein, the data generated by Map task is processed by the buffer to realize the partitioning, ordering and merging operation, and the result is output to the distributed file storage system (Hadoop Distributed File System, HDFS) to be stored; as shown in FIG. 3, the MR job flow is shown in FIG. 3, the Map stage uses three pieces of data, each piece of data is formed by inputting a plurality of OFD electronic license format files, each Map subtask converts the OFD electronic license format files into lightweight picture data in a respective partition, the three parts are respectively responsible for the data of each partition after the partitioning, sorting and merging of the buffer stage, and part1, part2 and part3 are the output of the final result.
The Map task divides the OFD format file containing a plurality of license types into a plurality of records, and because the Map task divides the OFD electronic license format file data of a plurality of different license types into a plurality of records, each Map task can output the OFD electronic license format file data of a plurality of different license types and key value pairs of the light-weight electronic license pictures corresponding to the OFD electronic license format file data of the different license types, and the key value pairs are divided into the different electronic license types according to whether keys are the same or not: the keys are the same, belonging to the same electronic license format file type; the keys are different, and belong to another type of electronic license format file type; because the Map task is finished and a plurality of key value pairs of different electronic license types can be generated, the different key value pairs are processed through the partition and matched with the Reduce task, one type of electronic license format file corresponds to one type of partition, one type of partition corresponds to one Reduce task, the electronic license types among different partitions are different, each Reduce task corresponds to different license types, the license type codes are different, different license type data enter different Tereduce tasks, the same type of key value pair is guaranteed to be sent to the same Reduce task, and the data processing is completed;
The number of Map tasks is determined by the number of the input OFD electronic license format file data fragments, and the number of Reduce tasks is specified according to the number of the input license types; the slicing refers to an OFD electronic license format data file composed of a plurality of different license types.
The image recognition extraction submodule in the embodiment performs image recognition by classifying and extracting key features to exclude redundant information; the method comprises the following steps:
information acquisition: converting light or sound information into electric information through a sensor, namely acquiring basic information of a study object and converting the basic information into information which can be recognized by a machine;
pretreatment: denoising and smoothing the image to transform so as to strengthen key features of the image;
feature extraction and selection: in pattern recognition, extracting and selecting features; a simple understanding is that the images that we are looking at are of a wide variety, if they are to be distinguished by some method, they are identified by their own features, and the process of obtaining these features is feature extraction; the features obtained in feature extraction may not be useful for this identification, and at this time, useful features are extracted, which is the choice of features; feature extraction and selection are one of the very critical technologies in the image recognition process, so understanding this step is the focus of image recognition; the method comprises the steps of extracting a picture item on a license by using image textures, wherein the textures are formed by a plurality of elements which are close to each other and have relatively close attributes, the spatial distribution attribute reflecting the gray level of pixels in an area is called as the texture in image analysis, the color of the image surface is reflected, the change of gray values is reflected, in addition, certain texture characteristics can have directionality, and the gray distribution property or the direction information of the image can be called as the texture; texture can be theoretically classified into artificial and natural textures, symbols on natural background can be called artificial textures such as points, lines, faces, letters, numbers, etc., while natural textures are repeatedly arranged scenes with certain regular properties such as tiles on the roof, grasslands, forests, etc.; the image texture feature extraction is a processing procedure for extracting texture feature parameters through an image processing technology so as to obtain quantitative or qualitative description of textures, as shown in fig. 2, the image texture extraction is specifically as follows:
(1) Reading license image data of a Reduce stage result;
(2) Creating a texture image of the result file by using an enthropoyfilt function;
(3) Converting the texture image into a gray level binary image, and displaying textures of different parts of the license image data by using a Bwareaopen function; the textures of different parts of the license image data comprise bottom textures of the image and a flooded film image of the textures;
(4) Image separation is carried out by using a texture filter algorithm, and picture illumination items are extracted; the image segmentation of the texture filter is to divide different areas of the image according to different texture features presented in the image; the textures can embody the color or internal composition mechanism of any object on the image; the texture can effectively detect the boundary of the object, so that the extraction of target elements is completed;
(5) And finishing the extraction of the picture illumination item data.
The data storage and evaluation module in the embodiment comprises a data storage sub-module and a data evaluation sub-module;
the working process of the data storage sub-module is specifically as follows:
(1) Obtaining result data extracted from the picture illumination item image;
(2) Acquiring MongDB configuration information of a picture photo frame;
(3) Sequentially storing the result data into a MongDB, and recording indexes of the data in a MongDB picture library;
(4) The index is stored in the database, so that the query operation of the data is facilitated;
(5) Deleting the generated temporary file to release space; the temporary file comprises a source license format file and a result file generated by the reduce;
as shown in fig. 4, the data evaluation sub-module uses the qualification rate, the conversion efficiency and the response speed to evaluate the data conversion result; the qualification rate P (Percent of Pass) is an evaluation index of accuracy of the extracted picture coverage item result for the OFD format file data, and represents a ratio of a qualified sample (i.e., proper data size, normal data content conversion, clear picture of the converted format file, etc.) to the total number of sampling samples, and the formula is as follows:
P=TP/(TP+FP);
TP is the number of qualified sample data; FP is the number of failed samples; if the P value is lower, the problems of proper data size, normal data content conversion and extracted picture definition of the photo are detected, and corresponding logic is adjusted to improve corresponding qualification rate according to different problems;
the conversion Efficiency E (Efficiency) is an evaluation index of the conversion speed of the photo frame picture result extracted from the OFD format file data, namely, the ratio of the time for completing data conversion to the time for completing conversion of the electronic license format file data by the common method of the electronic license system, and the formula is as follows:
E=T1/T2;
Wherein T1 is the time for completing data conversion; t2 is the time for converting the common method of the electronic license system; the smaller the ratio of T1 to T2, the faster the switching speed is indicated;
response speed S (speed of response) indicates that for the converted data, the response speed of the interface is tested by using the POSTAN, if the speed is higher, the response speed of the accessed system or APP is higher, and the formula is as follows:
S=T3/T4;
wherein, T3 is the time of the interface obtaining the picture-taking item data after finishing the data; t4 is the time of the real-time conversion interface access of the electronic license system; the smaller the ratio of T3 to T4, the more pronounced the conversion effect, representing more significant conversion.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for converting an OFG format file based on a MapReduce license into a picture is characterized by comprising the following steps:
acquiring data: according to the electronic license accessory index data table, reading an electronic license OFD format file from a MongDB library and storing the electronic license OFD format file on a server, and simultaneously reading an electronic license type code corresponding to the electronic license OFD format file to provide basic data support for MapReduce to construct key value pairs;
data conversion: different DPI settings are completed according to different license types, so that definition of conversion results is ensured, and the size of disk space occupied by data is ensured to be reduced as much as possible;
fusing data conversion into MapReduce multitasking for processing, and finishing data processing; wherein, mapReduce comprises Map and Reduce two stages to finish;
extracting the illumination item data in the license image based on a texture filter image algorithm;
warehousing of data conversion results and deletion of temporary files: storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting the temporary file before and after conversion.
2. The method for converting the MapReduce license OFG format file into the picture according to claim 1, wherein the data conversion is fused into MapReduce multitasking for processing, and the data processing is completed specifically as follows:
Map tasks are carried out on the temporarily stored files and the corresponding license type codes, all acquired license OFD format file data are disassembled, and Map task number establishment is completed according to the number of license types of the actual license format files; simultaneously taking a plurality of electronic license OFD format files with different license types as a data block, wherein each Map task is responsible for processing a data sheet containing the electronic license format files with different license types; after receiving the fragmented data containing the electronic format file, the Map task defaults to take each electronic license format data file of the fragments as a record, completes the sequential processing of each record, and finally outputs a plurality of key value pairs containing the input electronic license format file and the output lightweight picture data;
the Reduce stage is to recombine the converted license format file results, each Reduce task process can process and bind the electronic license format file and output the key value pair of the lightweight picture data, and only one calculation result is finally output for the same type of key value Reduce stage.
3. The method for converting a map layout file based on a MapReduce license into a picture according to claim 1 or 2, wherein the extraction of the photo face item data in the license image based on the texture filter image algorithm is specifically as follows:
Reading license image data of a Reduce stage result;
creating a texture image of the result file by using an enthropoyfilt function;
converting the texture image into a gray level binary image, and displaying textures of different parts of the license image data by using a Bwareaopen function; the textures of different parts of the license image data comprise bottom textures of the image and a flooded film image of the textures;
image separation is carried out by using a texture filter algorithm, and picture illumination items are extracted; the image segmentation of the texture filter is to divide different areas of the image according to different texture features presented in the image; the textures can embody the color or internal composition mechanism of any object on the image; the texture can effectively detect the boundary of the object, so that the extraction of target elements is completed;
and finishing the extraction of the picture illumination item data.
4. A method for converting a MapReduce license OFG format file into a picture according to claim 3, wherein the data conversion is specifically as follows:
reading the license OFD format file stream by using OFDReader class;
initializing a data conversion class by using a Ofd Img class;
traversing OFD format files of each license type, completing conversion from the license OFD files to image files by using Ofd Img types, and setting different DPIs according to different license base graphs;
And constructing a picture output class ImageIO, finishing the storage of the output image file, and closing the related file stream.
5. The system for converting the MapReduce license OFG format file into the picture is characterized by comprising a data input module, a data processing module and a data storage and evaluation module;
the data input module is used for analyzing the OFD format file of the electronic license in the MongDB, completing conversion and image recognition, and further completing the warehousing of the data of different license types;
the data processing module is used for converting the stored license plate into a lightweight picture, carrying out parallel processing on the lightweight picture by utilizing Hadoop MapReduce, and then identifying a license image by using image processing Su research and development to finish the extraction of a license picture illumination item;
the data storage and evaluation module is used for storing the converted OFD format file into a MongDB library, storing index data returned by the MongDB library, and deleting temporary files before and after conversion; and then, detecting whether the data size and the content conversion of the data are normal or not and whether the pictures of the converted format file are clear or not according to the sampling detection method by each license type of the generated result file.
6. The system for converting a MapReduce license OFG format file into a picture according to claim 5, wherein the working procedure of the data entry module is specifically as follows:
(1) Reading a license type code of the license type of the image data to be extracted;
(2) Acquiring all license identifiers of the type of license according to the license type code;
(3) Acquiring index information stored in a MongDB by a license attachment according to the license identifier;
(4) Reading the license format file data file stream through an index field in the table structure;
(5) And converting the file stream into an license format OFD file and storing the license format OFD file into a server.
7. The system for converting the MapReduce license OFG format file into the picture according to claim 5, wherein the data processing module comprises a data conversion sub-module, a Hadoop MapReduce parallel processing sub-module and a data extraction sub-module;
the data conversion sub-module is used for converting the stored license plate type file into a lightweight picture without distortion; the working process of the data conversion sub-module is specifically as follows:
(1) Reading the license OFD format file stream by using OFDReader class;
(2) Initializing a data conversion class by using a Ofd Img class;
(3) Traversing OFD format files of each license type, completing conversion from the license OFD files to image files by using Ofd Img types, and setting different DPIs according to different license base graphs;
(4) Constructing a picture output class ImageIO, completing the storage of an output image file, and closing a related file stream;
the Hadoop MapReduce parallel processing sub-module is used for completing the rapid extraction of image data in a multi-task and multi-process mode;
the data extraction sub-module is used for completing identification of the generated license picture by using an image processing algorithm, and further completing extraction of the license image illumination item.
8. The system for converting the MapReduce-license-based OFG format file into the picture according to claim 5, wherein the MapReduce in the Hadoop MapReduce parallel processing submodule is divided into two stages in total, namely Map and Reduce;
the Map stage is the disassembly of the data task and comprises a plurality of Map tasks, the Map stage inputs a plurality of data blocks, and a plurality of electronic license format files can be simultaneously used as one data block, and each Map task is responsible for processing a data sheet containing a plurality of license type electronic license format files; after receiving the fragmented data containing the electronic license format file, the Map task defaults to take each electronic license format data file of the fragments as a record, completes the sequential processing of each record, and finally outputs a plurality of key value pairs containing the input electronic license format file and the output lightweight picture data;
The Reduce stage is to re-combine the results, finish the data re-output according to the requirements of the setter, one or more Reduce tasks exist, each Reduce task can process the key value pair of the bound electronic license format file and output the lightweight picture data, and only one result is finally output for the same type of key value Reduce stage; the Map task processing method comprises the steps of performing a Map task processing on data generated by the Map task in a Reduce stage, and outputting the result to a distributed file storage system to be stored after the partitioning, sequencing and merging operation Reduce stage is completed;
the Map task divides an OFD format file containing a plurality of license types into a plurality of records, each Map task outputs OFD electronic license format file data of a plurality of different license types and key value pairs of a lightweight electronic license picture corresponding to the OFD electronic license format file data, and the key value pairs are divided into different electronic license types according to whether keys are the same or not: the keys are the same, belonging to the same electronic license format file type; the keys are different, and belong to another type of electronic license format file type; different key value pairs are processed through the partition and matched with the Reduce task, one type of electronic license format file corresponds to one type of partition, one type of partition corresponds to one Reduce task, the electronic license types among different partitions are different, each Reduce task corresponds to different license types, the license type codes are different, different license type data enter different Teduce tasks, the fact that the same type of key value pair is sent to the same Reduce task is guaranteed, and data processing is completed;
The number of Map tasks is determined by the number of the input OFD electronic license format file data fragments, and the number of Reduce tasks is specified according to the number of the input license types; the slicing refers to an OFD electronic license format data file composed of a plurality of different license types.
9. The system for converting the MapReduce license OFG format file into the picture according to claim 5, wherein the image recognition extraction submodule performs image recognition by classifying and extracting key features to exclude redundant information; the method comprises the following steps:
information acquisition: converting light or sound information into electric information through a sensor, namely acquiring basic information of a study object and converting the basic information into information which can be recognized by a machine;
pretreatment: denoising and smoothing the image to transform so as to strengthen key features of the image;
feature extraction and selection: the method comprises the steps of carrying out the illumination item on the feature extraction license by using the image texture, wherein the image texture feature extraction is to extract texture feature parameters through an image processing technology, so as to obtain the processing procedure of quantitative or qualitative description of the texture, and the image texture extraction is specifically as follows:
(1) Reading license image data of a Reduce stage result;
(2) Creating a texture image of the result file by using an enthropoyfilt function;
(3) Converting the texture image into a gray level binary image, and displaying textures of different parts of the license image data by using a Bwareaopen function; the textures of different parts of the license image data comprise bottom textures of the image and a flooded film image of the textures;
(4) Image separation is carried out by using a texture filter algorithm, and picture illumination items are extracted; the image segmentation of the texture filter is to divide different areas of the image according to different texture features presented in the image; the textures can embody the color or internal composition mechanism of any object on the image; the texture can effectively detect the boundary of the object, so that the extraction of target elements is completed;
(5) And finishing the extraction of the picture illumination item data.
10. The system for converting a MapReduce license OFG format file into a picture according to any one of claims 5-9, wherein the data storage and evaluation module comprises a data storage sub-module and a data evaluation sub-module;
the working process of the data storage sub-module is specifically as follows:
(1) Obtaining result data extracted from the picture illumination item image;
(2) Acquiring MongDB configuration information of a picture photo frame;
(3) Sequentially storing the result data into a MongDB, and recording indexes of the data in a MongDB picture library;
(4) The index is stored in the database, so that the query operation of the data is facilitated;
(5) Deleting the generated temporary file to release space; the temporary file comprises a source license format file and a result file generated by the reduce;
the data evaluation sub-module evaluates data conversion results by using qualification rate, conversion efficiency and response speed; the qualification rate P is an evaluation index of accuracy of the extracted picture coverage item result of the OFD format file data, and represents the ratio of qualified samples to the total number of sampling samples, and the formula is as follows:
P=TP/(TP+FP);
TP is the number of qualified sample data; FP is the number of failed samples; if the P value is lower, the problems of proper data size, normal data content conversion and extracted picture definition of the photo are detected, and corresponding logic is adjusted to improve corresponding qualification rate according to different problems;
the conversion efficiency E is an evaluation index of the conversion speed of the picture result of the coverage item extracted from the OFD format file data, namely the ratio of the time for completing data conversion to the time for completing the conversion of the common method of the electronic license system aiming at 100 ten thousand electronic license format file data, and the formula is as follows:
E=T1/T2;
Wherein T1 is the time for completing data conversion; t2 is the time for converting the common method of the electronic license system; the smaller the ratio of T1 to T2, the faster the switching speed is indicated;
the response speed S represents the response speed of using the post man test interface for the converted data, if the speed is faster, the response speed of the accessed system or APP is faster, and the formula is as follows:
S=T3/T4;
wherein, T3 is the time of the interface obtaining the picture-taking item data after finishing the data; t4 is the time of the real-time conversion interface access of the electronic license system; the smaller the ratio of T3 to T4, the more pronounced the conversion effect, representing more significant conversion.
CN202311090890.5A 2023-08-28 2023-08-28 Method and system for converting OFG format file into picture based on MapReduce license Pending CN117115289A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311090890.5A CN117115289A (en) 2023-08-28 2023-08-28 Method and system for converting OFG format file into picture based on MapReduce license

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311090890.5A CN117115289A (en) 2023-08-28 2023-08-28 Method and system for converting OFG format file into picture based on MapReduce license

Publications (1)

Publication Number Publication Date
CN117115289A true CN117115289A (en) 2023-11-24

Family

ID=88805240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311090890.5A Pending CN117115289A (en) 2023-08-28 2023-08-28 Method and system for converting OFG format file into picture based on MapReduce license

Country Status (1)

Country Link
CN (1) CN117115289A (en)

Similar Documents

Publication Publication Date Title
CN103518183B (en) Graphical object classification
CN112115113B (en) Data storage system, method, device, equipment and storage medium
CN105824855B (en) Method and device for screening and classifying data objects and electronic equipment
US10073938B2 (en) Integrated circuit design verification
CN111639068A (en) Multi-system-based public data pool generation method, device, equipment and readable storage medium
CN115774552A (en) Configurated algorithm design method and device, electronic equipment and readable storage medium
Jiao et al. An augmented MapReduce framework for building information modeling applications
CN112765014B (en) Automatic test system for multi-user simultaneous operation and working method
CN103530369A (en) De-weight method and system
CN113778961A (en) Production management method, device and system for CIM model data
US20220284043A1 (en) Data analytical processing apparatus, data analytical processing method, and data analytical processing program
CN117115289A (en) Method and system for converting OFG format file into picture based on MapReduce license
JP5206268B2 (en) Rule creation program, rule creation method and rule creation device
US11995587B2 (en) Method and device for managing project by using data merging
CN116431668A (en) Metadata acquisition-based data blood-edge analysis method and device and electronic equipment
CN116303768A (en) Data synchronization method based on artificial intelligence and related equipment
CN107493205B (en) Method and device for predicting capacity expansion performance of equipment cluster
García et al. Data-intensive analysis for scientific experiments at the large scale data facility
US20220261227A1 (en) Code Generation Tool for Cloud-Native High-Performance Computing
Happ et al. Towards distributed region growing image segmentation based on MapReduce
US11269625B1 (en) Method and system to identify and prioritize re-factoring to improve micro-service identification
CN115422126B (en) Method, system and device for rapidly transferring certificate OFD format file to picture
CN113641654A (en) Marketing handling rule engine method based on real-time event
CN110309177B (en) Data processing method and related device
CN113641705A (en) Marketing disposal rule engine method based on calculation engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination