CN117058699A - Resume layout dividing method, system and storage medium based on LayoutLMv3 model - Google Patents

Resume layout dividing method, system and storage medium based on LayoutLMv3 model Download PDF

Info

Publication number
CN117058699A
CN117058699A CN202311087110.1A CN202311087110A CN117058699A CN 117058699 A CN117058699 A CN 117058699A CN 202311087110 A CN202311087110 A CN 202311087110A CN 117058699 A CN117058699 A CN 117058699A
Authority
CN
China
Prior art keywords
resume
title
layoutlmv3
layout
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311087110.1A
Other languages
Chinese (zh)
Other versions
CN117058699B (en
Inventor
李敬泉
徐雯
胡伟
徐伟招
郑德乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Kuakua Jingling Technology Co ltd
Original Assignee
Shenzhen Kuakua Jingling Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Kuakua Jingling Technology Co ltd filed Critical Shenzhen Kuakua Jingling Technology Co ltd
Priority to CN202311087110.1A priority Critical patent/CN117058699B/en
Publication of CN117058699A publication Critical patent/CN117058699A/en
Application granted granted Critical
Publication of CN117058699B publication Critical patent/CN117058699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Multimedia (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a resume layout dividing method based on a LayoutLMv3 model, which comprises the following steps: s1: fine tuning the LayoutLMv3 target detection model based on the self-labeling resume; s2: reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume; s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section. The invention can improve the accuracy of dividing the resume layout and can more accurately embody the information organization form in the resume.

Description

Resume layout dividing method, system and storage medium based on LayoutLMv3 model
Technical Field
The invention relates to the technical field of resume analysis, in particular to a resume layout dividing method, a resume layout dividing system and a storage medium based on a LayoutLMv3 model.
Background
In recruitment, the recruiter needs to read the resume of the job seeker to screen whether the recruiter has the capability and experience of matching the job position, and the resume content is extracted in a structured mode according to the layout, so that the recruiter can quickly know personal information of the job seeker, and the recruitment resume screening efficiency is improved.
At present, the method for carrying out structural extraction on resume information is mainly carried out according to text keywords, for example, a resume data information analysis processing method proposed in CN 108874928A patent is a method for directly adopting keyword matching for the whole resume text content, but the method does not consider the influence of keywords in a text on a title, and has the possibility of causing layout division errors.
Disclosure of Invention
The invention aims to provide a resume layout dividing method, a system and a storage medium based on a LayoutLMv3 model, which are characterized in that the LayoutLMv3 model is applied to resume analysis, firstly, layout division is carried out on a fine granularity level through resume titles, the accuracy of resume layout division is improved, the information organization form in the resume can be more accurately reflected, the data is structured on the basis, the resume is convenient to use and store in a downstream task, the problem of difficult layout positioning and analysis in diversified resume analysis can be reduced, and meanwhile, layout areas of different resume can be accurately identified in a mode of combining image information and text semantic information acquired by image vision auxiliary titles, so that the accuracy and recall rate of the whole resume analysis are improved.
In order to achieve the above purpose, the following technical scheme is adopted:
a resume layout dividing method based on a LayoutLMv3 model comprises the following steps:
s1: fine tuning the LayoutLMv3 target detection model based on the self-labeling resume;
s2: reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume;
s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
Further, the step S1 specifically includes the following steps:
s11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
s12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting the marked information and the resume title into a LayoutLMv3 model together, so as to finely adjust the LayoutLMv3 model, and obtaining the finely adjusted model parameters.
Further, the step S2 specifically includes the following steps:
s21: converting the non-marked resume into a picture format, obtaining resume name, length and width information of the non-marked resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a fine-tuned LayoutLMv3 target detection model;
s22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
Further, the step S3 specifically includes the following steps:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
Further, the dividing of the sections in S32 using the titles as keywords specifically includes the following steps:
s321: based on the layout and the content of the resume, the resume is divided into the following 7 sections in advance: BASIC information, working EXPERIENCE, educational background, project EXPERIENCE, self-evaluation, rewarding certificate and skill, wherein the plate labels corresponding to the 7 plates are BASIC_ INFORMATION, WORK _EXPERIENCE, EDUCATION BACKGROUND, PROJECT EXPERIENCE, SELF ASSESSMENT and REWARD_ CERTIFICATES, SKILL respectively;
s322: and for each edition, listing a keyword list, and for each detected resume title text content, matching keywords in the keyword list with the detected resume title text content, and dividing the title and the content thereof into edition corresponding to the keywords when any one keyword can be matched.
Further, in S322, for the basic information layout, if the real resume does not include the text corresponding to the layout, the content before the first title in the first page of the resume is used as the layout content.
Further, the resume layout dividing method based on the LayoutLMv3 model further comprises the following steps:
s4: and visually displaying the title detection result on the corresponding resume.
Further, the step S4 specifically includes: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language.
The system for dividing the resume print block based on the LayoutLMv3 model comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the resume print block dividing method when executing the computer program.
There is also provided a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described method.
By adopting the scheme, the invention has the beneficial effects that:
1) The marked resume is used for fine tuning the LayoutLMv3 target detection model, the fine-tuned target detection model is used for reasoning the new resume, the dividing area of each edition block is found out, instead of converting the resume into a plain text for analysis just like a conventional resume analysis mode, the accuracy of edition block division of the diversified resume is higher, in the next analysis process, a named entity identification technology is used for analyzing the detailed information of each edition block, for example, the 'work experience' can be automatically presented in a segmented mode, and the information is ensured not to be lost basically;
2) The resume with various formats is converted into jpg picture format, and the text task is converted into visual task, so that the method can be suitable for resume data with various formats and sizes;
3) The method has wide application prospect, particularly in the human resource industry, can avoid the work of manually inputting system information, and can reduce the error rate of manual input.
Drawings
FIG. 1 is a flow chart diagram of the present invention;
FIG. 2 is a schematic diagram of resume labels according to an embodiment of the present invention;
FIG. 3 is a diagram showing the result of dividing the content of a resume layout according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the specific embodiments.
Referring to fig. 1 to 3, the invention provides a resume layout dividing method based on a LayoutLMv3 model, which comprises the following steps:
s1: and fine tuning the LayoutLMv3 target detection model based on the self-labeling resume.
The LayoutLMv3 target detection model is based on a transducer structure, and is trained on 1100 ten thousand scanned document images, and in order to better identify and segment resume data, the LayoutLMv3 target detection model adapting to resume data is obtained by performing fine adjustment on the LayoutLMv3 by using self-labeled resume data, and in one embodiment, the step of performing fine adjustment on the LayoutLMv3 target detection model is as follows:
s11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
s12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting the marked information and the resume title into a LayoutLMv3 model together, so as to finely adjust the LayoutLMv3 model, and obtaining the finely adjusted model parameters.
In this step, mainly the LayoutLMv3 model is trimmed, in this embodiment, the resume is first converted into JPG picture format (which converts text task into visual task, can adapt to resume data of various formats and sizes, and stores images to save memory resources of a computer), then the positions of the titles of each layout in the resume in the picture are represented by a four-tuple (x, y, box_width, box_height), where x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame; and then, the position information of the resume titles in the training data is in one-to-one correspondence with the resume according to the four-element organization mode, the marking information is written into a JSON file, the JSON file and the resume titles are input into a LayoutLMv3 model together, the model is finely adjusted and calculated by using a GPU, and the model is a data marking result of a single resume as shown in figure 2.
S2: and reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume.
In one embodiment, the method specifically includes:
s21: converting the non-marked resume into a picture format, obtaining resume name, length and width information of the non-marked resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a fine-tuned LayoutLMv3 target detection model;
s22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
In this embodiment, title position information of the non-labeling resume is obtained by using a Layoutlmv3 target detection model obtained by fine tuning, after S1 is completed, a Layoutlmv3 model and related model parameters after fine tuning of a resume image are obtained, and for a new non-labeling resume, information such as names, lengths, widths and the like of resume pictures is required to be obtained, then the picture information is organized into a JSON format, taken as a model input together with the picture data, the model parameters are loaded into a neural network model, and a resume title coordinate position of model reasoning is obtained, wherein the format is JSON, so that a coordinate quadruple of a new resume title can be obtained.
S3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
In one embodiment, the method specifically includes:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
This step is mainly aimed at achieving text information extraction, in this embodiment based on title location and OCR recognition algorithm, resulting in modular content of the resume. Typically, a resume contains multiple pictures, and the content is related across pages. Therefore, the title coordinates are ordered according to the resume page and the title frame ordinate, after the ordering is finished, the character content in the resume is obtained by using the OCR character recognition technology, the coordinates of the recognized characters are also given at the same time of OCR detection, and the content between two titles is judged to be the content of the last title layout block compared with the title ordinate. Then, the first line of characters of each plate are used as a title, the plate is divided into BASIC information, working experience, educational background, project experience, self-evaluation, rewarding certificate and 7 plates of SKILL, the plate labels corresponding to the 7 plates are BASIC_ INFORMATION, WORK _ EXPERIENCE, EDUCATION _ BACKGROUND, PROJECT _ EXPERIENCE, SELF _ ASSESSMENT, REWARD _ CERTIFICATES and SKILL respectively, for each plate, a keyword list is listed, for each detected resume title text content, keywords in the keyword list are matched with the detected resume title text content, any keyword can be matched, the title and the content thereof are divided into plates corresponding to the keywords, as shown in fig. 3, the resume layout text extraction is schematic, and the content of the same frame is one plate.
In addition, for the block of basic_information, since the real resume usually does not contain keywords, the present invention proposes that, for the resume which does not contain basic_information block keywords, the content before the first title in the first page of the resume is taken as the content of the block, and meanwhile, the present invention further includes step S4: the title detection result is visually displayed on the corresponding resume, and in an embodiment, the title detection result is specifically: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language so as to read and view.
In addition, a resume layout dividing system based on a LayoutLMv3 model is provided, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the resume layout dividing method, and meanwhile, a computer readable storage medium is provided, and the computer readable storage medium stores the computer program, and the computer program is suitable for being loaded and executed by the processor to enable a computer device with the processor to execute the method.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the memory may be a hard disk, a computer self-contained memory, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc.
The computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The foregoing description of the preferred embodiment of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (10)

1. A resume layout dividing method based on a LayoutLMv3 model is characterized by comprising the following steps:
s1: fine tuning the LayoutLMv3 target detection model based on the self-labeling resume;
s2: reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume;
s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
2. The resume layout dividing method based on the LayoutLMv3 model according to claim 1, wherein the S1 specifically comprises the following steps:
s11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
s12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting the marked information and the resume title into a LayoutLMv3 model together, so as to finely adjust the LayoutLMv3 model, and obtaining the finely adjusted model parameters.
3. The resume layout dividing method based on the LayoutLMv3 model according to claim 2, wherein the step S2 specifically comprises the following steps:
s21: converting the non-marked resume into a picture format, obtaining resume name, length and width information of the non-marked resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a fine-tuned LayoutLMv3 target detection model;
s22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
4. The resume layout dividing method based on the LayoutLMv3 model according to claim 1, wherein the step S3 specifically comprises the following steps:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
5. The resume layout dividing method based on the LayoutLMv3 model according to claim 4, wherein the step of dividing the layout by using the title as the keyword in S32 specifically comprises the following steps:
s321: based on the layout and the content of the resume, the resume is divided into the following 7 sections in advance: BASIC information, working experience, educational background, project experience, self-evaluation, rewarding certificate and skill, wherein the plate labels corresponding to the 7 plates are BASIC_ INFORMATION, WORK _ EXPERIENCE, EDUCATION _ BACKGROUND, PROJECT _ EXPERIENCE, SELF _ ASSESSMENT, REWARD _ CERTIFICATES, SKILL;
s322: and for each edition, listing a keyword list, and for each detected resume title text content, matching keywords in the keyword list with the detected resume title text content, and dividing the title and the content thereof into edition corresponding to the keywords when any one keyword can be matched.
6. The method for dividing a layout of a resume based on a LayoutLMv3 model according to claim 5, wherein in S322, for a basic information layout, if the real resume does not include text corresponding to the layout, the content before the first title in the first page of the resume is taken as the layout content.
7. The resume layout dividing method based on the LayoutLMv3 model according to claim 3, wherein the resume layout dividing method based on the LayoutLMv3 model further comprises the following steps:
s4: and visually displaying the title detection result on the corresponding resume.
8. The resume layout dividing method based on the LayoutLMv3 model according to claim 7, wherein the S4 specifically is: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language.
9. A resume layout dividing system based on a LayoutLMv3 model, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the resume layout dividing method according to any one of claims 1 to 8 when executing the computer program.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any one of claims 1 to 8.
CN202311087110.1A 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model Active CN117058699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311087110.1A CN117058699B (en) 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311087110.1A CN117058699B (en) 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model

Publications (2)

Publication Number Publication Date
CN117058699A true CN117058699A (en) 2023-11-14
CN117058699B CN117058699B (en) 2024-04-19

Family

ID=88653361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311087110.1A Active CN117058699B (en) 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model

Country Status (1)

Country Link
CN (1) CN117058699B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145584A (en) * 2017-05-10 2017-09-08 西南科技大学 A kind of resume analytic method based on n gram models
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN110020327A (en) * 2019-04-16 2019-07-16 上海大易云计算股份有限公司 A kind of resume resolution system based on vertical search engine
CN114708595A (en) * 2022-03-15 2022-07-05 灵犀量子(北京)医疗科技有限公司 Image document structured analysis method, system, electronic device, and storage medium
CN115661836A (en) * 2022-11-08 2023-01-31 太保科技有限公司 Automatic correction method, device and system and readable storage medium
US20230092559A1 (en) * 2021-09-17 2023-03-23 American Family Mutual Insurance Company, S.I. Systems and methods for unstructured data processing
CN115984886A (en) * 2022-12-19 2023-04-18 中国平安人寿保险股份有限公司 Table information extraction method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145584A (en) * 2017-05-10 2017-09-08 西南科技大学 A kind of resume analytic method based on n gram models
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN110020327A (en) * 2019-04-16 2019-07-16 上海大易云计算股份有限公司 A kind of resume resolution system based on vertical search engine
US20230092559A1 (en) * 2021-09-17 2023-03-23 American Family Mutual Insurance Company, S.I. Systems and methods for unstructured data processing
CN114708595A (en) * 2022-03-15 2022-07-05 灵犀量子(北京)医疗科技有限公司 Image document structured analysis method, system, electronic device, and storage medium
CN115661836A (en) * 2022-11-08 2023-01-31 太保科技有限公司 Automatic correction method, device and system and readable storage medium
CN115984886A (en) * 2022-12-19 2023-04-18 中国平安人寿保险股份有限公司 Table information extraction method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUANG, YUPAN, LV, TENGCHAO, CUI, LEI, LU, YUTONG: "LayoutLMv3 : pre - training for document AI with unified text and image masking", PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIME‐ DIA. LISBOA , PORTUGAL : ASSOCIATION FOR COMPUTING MACHINERY : 4083 - 4091, 22 July 2022 (2022-07-22) *
祖石诚;王修来;曹阳;张玉韬;梁珊;: "基于新型文本块分割法的简历解析", 计算机科学, no. 1, 15 June 2020 (2020-06-15) *

Also Published As

Publication number Publication date
CN117058699B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN111476227B (en) Target field identification method and device based on OCR and storage medium
RU2699687C1 (en) Detecting text fields using neural networks
US20200175095A1 (en) Object recognition and tagging based on fusion deep learning models
US8494257B2 (en) Music score deconstruction
US9099007B1 (en) Computerized processing of pictorial responses in evaluations
CN111752557A (en) Display method and device
CN112036295B (en) Bill image processing method and device, storage medium and electronic equipment
CN110796145B (en) Multi-certificate segmentation association method and related equipment based on intelligent decision
US11017266B2 (en) Aggregated image annotation
WO2013039063A1 (en) Answer processing device, answer processing method, recording medium, and seal
CN112381087A (en) Image recognition method, apparatus, computer device and medium combining RPA and AI
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
CN107168635A (en) Information demonstrating method and device
CN113780116A (en) Invoice classification method and device, computer equipment and storage medium
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment
CN108369647B (en) Image-based quality control
CN117058699B (en) Resume layout dividing method, system and storage medium based on LayoutLMv model
CN110363245B (en) Online classroom highlight screening method, device and system
CN110941947A (en) Document editing method and device, computer storage medium and terminal
US20180018893A1 (en) Method and system for identifying marked response data on a manually filled paper form
US20230214451A1 (en) System and method for finding data enrichments for datasets
CN103870793B (en) The monitoring method and device of paper media's advertisement
JP5846378B2 (en) Information management method and information management system
CN117437649B (en) File signing method, device, computer equipment and storage medium
CN114138214B (en) Method and device for automatically generating print file and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant