CN117058699B - Resume layout dividing method, system and storage medium based on LayoutLMv model - Google Patents

Resume layout dividing method, system and storage medium based on LayoutLMv model Download PDF

Info

Publication number
CN117058699B
CN117058699B CN202311087110.1A CN202311087110A CN117058699B CN 117058699 B CN117058699 B CN 117058699B CN 202311087110 A CN202311087110 A CN 202311087110A CN 117058699 B CN117058699 B CN 117058699B
Authority
CN
China
Prior art keywords
resume
title
layoutlmv
dividing
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311087110.1A
Other languages
Chinese (zh)
Other versions
CN117058699A (en
Inventor
李敬泉
徐雯
胡伟
徐伟招
郑德乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Kuakua Jingling Technology Co ltd
Original Assignee
Shenzhen Kuakua Jingling Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Kuakua Jingling Technology Co ltd filed Critical Shenzhen Kuakua Jingling Technology Co ltd
Priority to CN202311087110.1A priority Critical patent/CN117058699B/en
Publication of CN117058699A publication Critical patent/CN117058699A/en
Application granted granted Critical
Publication of CN117058699B publication Critical patent/CN117058699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Multimedia (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a resume layout dividing method based on LayoutLMv model, which comprises the following steps: s1: performing fine adjustment on the LayoutLMv target detection model based on the self-labeling resume; s2: reasoning the non-labeling resume based on the fine-tuned LayoutLMv target detection model to acquire the title position information of the non-labeling resume; s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section. The invention can improve the accuracy of dividing the resume layout and can more accurately embody the information organization form in the resume.

Description

Resume layout dividing method, system and storage medium based on LayoutLMv model
Technical Field
The invention relates to the technical field of resume analysis, in particular to a resume layout dividing method, a resume layout dividing system and a storage medium based on LayoutLMv model.
Background
In recruitment, the recruiter needs to read the resume of the job seeker to screen whether the recruiter has the capability and experience of matching the job position, and the resume content is extracted in a structured mode according to the layout, so that the recruiter can quickly know personal information of the job seeker, and the recruitment resume screening efficiency is improved.
At present, the method for carrying out structural extraction on resume information is mainly carried out according to text keywords, for example, a resume data information analysis processing method proposed in CN 108874928A patent is a method for directly adopting keyword matching for the whole resume text content, but the method does not consider the influence of keywords in a text on a title, and has the possibility of causing layout division errors.
Disclosure of Invention
The invention aims to provide a resume layout dividing method, a system and a storage medium based on LayoutLMv model, which are characterized in that LayoutLMv model is applied to resume analysis, the layout dividing is firstly carried out on the fine granularity level through resume titles, the accuracy of resume layout dividing is improved, the information organization form in resume can be more accurately embodied, the data are structured on the basis, the resume is convenient to use and store in downstream tasks, the problem that layout positioning and analysis are difficult in diversified resume analysis can be reduced, and meanwhile, layout areas of resume of different types can be accurately identified in a mode of combining image vision auxiliary titles with text semantic information, so that the accuracy and recall rate of integral resume analysis are improved.
In order to achieve the above purpose, the following technical scheme is adopted:
a resume layout dividing method based on LayoutLMv models comprises the following steps:
s1: performing fine adjustment on the LayoutLMv target detection model based on the self-labeling resume;
S2: reasoning the non-labeling resume based on the fine-tuned LayoutLMv target detection model to acquire the title position information of the non-labeling resume;
S3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
Further, the step S1 specifically includes the following steps:
S11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
S12: and marking the position information of the resume title in a four-element mode in the step S11, writing the marking information into a JSON file, inputting the JSON file and the resume title into a LayoutLMv model together to perform fine tuning on the LayoutLMv model, and obtaining fine-tuned model parameters.
Further, the step S2 specifically includes the following steps:
S21: converting the non-labeling resume into a picture format, obtaining resume name, length and width information of the non-labeling resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a LayoutLMv target detection model after fine adjustment;
S22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
Further, the step S3 specifically includes the following steps:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
Further, the dividing of the sections in S32 using the titles as keywords specifically includes the following steps:
S321: based on the layout and the content of the resume, the resume is divided into the following 7 sections in advance: basic information, working experience, educational background, project experience, self-evaluation, rewarding certificate and skill, wherein the plate labels corresponding to the 7 plates are respectively BASIC_INFORMATION、WORK_EXPERIENCE、EDUCATION BACKGROUND、PROJECT EXPERIENCE、SELF ASSESSMENT、REWARD_CERTIFICATES、SKILL;
S322: and for each edition, listing a keyword list, and for each detected resume title text content, matching keywords in the keyword list with the detected resume title text content, and dividing the title and the content thereof into edition corresponding to the keywords when any one keyword can be matched.
Further, in S322, for the basic information layout, if the real resume does not include the text corresponding to the layout, the content before the first title in the first page of the resume is used as the layout content.
Further, the resume layout dividing method based on LayoutLMv model further comprises the following steps:
S4: and visually displaying the title detection result on the corresponding resume.
Further, the step S4 specifically includes: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language.
The system for dividing the resume print blocks based on the LayoutLMv model comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the resume print block dividing method when executing the computer program.
There is also provided a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described method.
By adopting the scheme, the invention has the beneficial effects that:
1) The labeled resume is used for fine tuning the LayoutLMv target detection model, the fine-tuned target detection model is used for reasoning the new resume, the dividing area of each edition is found out, instead of the conventional resume analysis mode, the resume is only converted into a plain text for analysis, the accuracy of dividing the edition of the diversified resume is higher, in the next analysis process, a named entity recognition technology is used for analyzing the detailed information of each edition, for example, the 'work experience' can be automatically presented in a segmented mode, and the information is ensured not to be lost basically;
2) The resume with various formats is converted into jpg picture format, and the text task is converted into visual task, so that the method can be suitable for resume data with various formats and sizes;
3) The method has wide application prospect, particularly in the human resource industry, can avoid the work of manually inputting system information, and can reduce the error rate of manual input.
Drawings
FIG. 1 is a flow chart diagram of the present invention;
FIG. 2 is a schematic diagram of resume labels according to an embodiment of the present invention;
FIG. 3 is a diagram showing the result of dividing the content of a resume layout according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the specific embodiments.
Referring to fig. 1 to 3, the invention provides a resume layout dividing method based on LayoutLMv3 models, which comprises the following steps:
s1: and fine tuning LayoutLMv the target detection model based on the self-labeling resume.
The LayoutLMv target detection model is based on a transducer structure, and the model itself is trained on 1100 ten thousand scanned document images, for better recognition and segmentation of resume data, the self-labeled resume data is first used to perform fine tuning on LayoutLMv, so as to obtain a target detection model adapting to resume data, and in one embodiment, the steps of fine tuning the LayoutLMv target detection model are as follows:
S11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
S12: and marking the position information of the resume title in a four-element mode in the step S11, writing the marking information into a JSON file, inputting the JSON file and the resume title into a LayoutLMv model together to perform fine tuning on the LayoutLMv model, and obtaining fine-tuned model parameters.
In this step, the LayoutLMv model is mainly trimmed, in this embodiment, the resume is firstly converted into JPG picture format (which converts text task into visual task, can adapt to resume data of various formats and sizes, and stores images to save memory resources of a computer), then the positions of the titles of all the sections in the resume in the picture are represented by a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame; and then, the position information of the resume titles in the training data is in one-to-one correspondence with the resume according to the four-element organization mode, marking information of the resume titles is written into a JSON file, the JSON file and the resume titles are input into a LayoutLMv model together, the model is finely adjusted and calculated by using a GPU, and the model is a data marking result of a single resume as shown in fig. 2.
S2: and reasoning the non-labeling resume based on the fine-tuned LayoutLMv target detection model to acquire the title position information of the non-labeling resume.
In one embodiment, the method specifically includes:
S21: converting the non-labeling resume into a picture format, obtaining resume name, length and width information of the non-labeling resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a LayoutLMv target detection model after fine adjustment;
S22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
In this embodiment, the object detection model LayoutLMv obtained by fine tuning is used to obtain the heading position information of the non-labeling resume, after the execution of S1 is completed, the LayoutLMv model and related model parameters after fine tuning of the resume image are obtained, and for the new non-labeling resume, the name, length, width and other information of the resume picture need to be obtained, then the picture information is organized into JSON format, and is taken as model input together with the picture data, the model parameters are loaded into the neural network model, and the resume heading coordinate position of the model reasoning is obtained, wherein the format is JSON, so that the coordinate quadruple of the new resume heading can be obtained.
S3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
In one embodiment, the method specifically includes:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
This step is mainly aimed at achieving text information extraction, in this embodiment based on title location and OCR recognition algorithm, resulting in modular content of the resume. Typically, a resume contains multiple pictures, and the content is related across pages. Therefore, the title coordinates are ordered according to the resume page and the title frame ordinate, after the ordering is finished, the character content in the resume is obtained by using the OCR character recognition technology, the coordinates of the recognized characters are also given at the same time of OCR detection, and the content between two titles is judged to be the content of the last title layout block compared with the title ordinate. Then, the first line of characters of each plate are used as titles, the plates are divided into basic information, working experience, educational background, project experience, self-evaluation, rewarding certificate and 7 plates of SKILL, the 7 plates are respectively and correspondingly set with plate labels BASIC_INFORMATION,WORK_EXPERIENCE,EDUCATION_BACKGROUND,PROJECT_EXPERIENCE,SELF_ASSESSMENT,REWARD_CERTIFICATES and SKILL, for each plate, a keyword list is listed, for each detected resume title character content, keywords in the keyword list are matched with the keyword list, any keyword can be matched, the title and the content thereof are divided into plates corresponding to the keywords, as shown in fig. 3, the extraction and indication of resume layout characters are carried out, and the content of the same frame is one plate.
In addition, for the block of basic_information, since the real resume usually does not contain keywords, the present invention proposes that, for the resume which does not contain basic_information block keywords, the content before the first title in the first page of the resume is taken as the content of the block, and meanwhile, the present invention further includes step S4: the title detection result is visually displayed on the corresponding resume, and in an embodiment, the title detection result is specifically: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language so as to read and view.
In addition, a system for dividing the resume print based on LayoutLMv model is provided, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the resume print dividing method, and meanwhile, a computer readable storage medium is provided, and the computer readable storage medium stores the computer program, and the computer program is suitable for being loaded and executed by the processor to enable a computer device with the processor to execute the method.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (DIGITAL SIGNAL processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf programmable gate array (field-programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the memory may be a hard disk, a computer self-contained memory, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc.
The computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. that are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The foregoing description of the preferred embodiment of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (8)

1. A resume layout dividing method based on LayoutLMv models is characterized by comprising the following steps:
s1: performing fine adjustment on the LayoutLMv target detection model based on the self-labeling resume;
S2: reasoning the non-labeling resume based on the fine-tuned LayoutLMv target detection model to acquire the title position information of the non-labeling resume;
S3: dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step S2, and extracting the text information in each section;
the step S1 specifically comprises the following steps:
S11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
S12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting LayoutLMv models together with the resume title to finely tune LayoutLMv models, and obtaining finely tuned model parameters;
The step S2 specifically comprises the following steps:
S21: converting the non-labeling resume into a picture format, obtaining resume name, length and width information of the non-labeling resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a LayoutLMv target detection model after fine adjustment;
S22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
2. The resume layout dividing method based on LayoutLMv model according to claim 1, wherein the step S3 specifically includes the following steps:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
3. The resume layout dividing method based on LayoutLMv model according to claim 2, wherein the step of dividing the layout by using the title as the key in S32 specifically comprises the following steps:
S321: based on the layout and the content of the resume, the resume is divided into the following 7 sections in advance: basic information, working experience, educational background, project experience, self-evaluation, rewarding certificate and skill, wherein the plate labels corresponding to the 7 plates are respectively BASIC_INFORMATION、WORK_EXPERIENCE、EDUCATION_BACKGROUND、PROJECT_EXPERIENCE、SELF_ASSESSMENT、REWARD_CERTIFICATES、SKILL;
S322: and for each edition, listing a keyword list, and for each detected resume title text content, matching keywords in the keyword list with the detected resume title text content, and dividing the title and the content thereof into edition corresponding to the keywords when any one keyword can be matched.
4. The method for dividing a block of a resume based on the LayoutLMv model according to claim 3, wherein in S322, for a basic information block, if the real resume does not include text corresponding to the block, the content before the first title in the first page of the resume is taken as the content of the block.
5. The resume layout dividing method based on LayoutLMv models according to claim 1, wherein the resume layout dividing method based on LayoutLMv models further comprises the following steps:
S4: and visually displaying the title detection result on the corresponding resume.
6. The method for dividing a resume layout based on LayoutLMv model according to claim 5, wherein S4 is specifically: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language.
7. A system for dividing a resume layout based on LayoutLMv's model, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the resume layout dividing method according to any one of claims 1 to 6 when executing the computer program.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1 to 6.
CN202311087110.1A 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model Active CN117058699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311087110.1A CN117058699B (en) 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311087110.1A CN117058699B (en) 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model

Publications (2)

Publication Number Publication Date
CN117058699A CN117058699A (en) 2023-11-14
CN117058699B true CN117058699B (en) 2024-04-19

Family

ID=88653361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311087110.1A Active CN117058699B (en) 2023-08-28 2023-08-28 Resume layout dividing method, system and storage medium based on LayoutLMv model

Country Status (1)

Country Link
CN (1) CN117058699B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145584A (en) * 2017-05-10 2017-09-08 西南科技大学 A kind of resume analytic method based on n gram models
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN110020327A (en) * 2019-04-16 2019-07-16 上海大易云计算股份有限公司 A kind of resume resolution system based on vertical search engine
CN114708595A (en) * 2022-03-15 2022-07-05 灵犀量子(北京)医疗科技有限公司 Image document structured analysis method, system, electronic device, and storage medium
CN115661836A (en) * 2022-11-08 2023-01-31 太保科技有限公司 Automatic correction method, device and system and readable storage medium
CN115984886A (en) * 2022-12-19 2023-04-18 中国平安人寿保险股份有限公司 Table information extraction method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230092559A1 (en) * 2021-09-17 2023-03-23 American Family Mutual Insurance Company, S.I. Systems and methods for unstructured data processing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145584A (en) * 2017-05-10 2017-09-08 西南科技大学 A kind of resume analytic method based on n gram models
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN110020327A (en) * 2019-04-16 2019-07-16 上海大易云计算股份有限公司 A kind of resume resolution system based on vertical search engine
CN114708595A (en) * 2022-03-15 2022-07-05 灵犀量子(北京)医疗科技有限公司 Image document structured analysis method, system, electronic device, and storage medium
CN115661836A (en) * 2022-11-08 2023-01-31 太保科技有限公司 Automatic correction method, device and system and readable storage medium
CN115984886A (en) * 2022-12-19 2023-04-18 中国平安人寿保险股份有限公司 Table information extraction method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LayoutLMv3 : pre - training for document AI with unified text and image masking;Huang, Yupan, Lv, Tengchao, Cui, Lei, Lu, Yutong;Proceedings of the 30th ACM International Conference on Multime‐ dia. Lisboa , Portugal : Association for Computing Machinery : 4083 - 4091;20220722;全文 *
基于新型文本块分割法的简历解析;祖石诚;王修来;曹阳;张玉韬;梁珊;;计算机科学;20200615(第S1期);全文 *

Also Published As

Publication number Publication date
CN117058699A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
RU2699687C1 (en) Detecting text fields using neural networks
US8295590B2 (en) Method and system for creating a form template for a form
US20190294921A1 (en) Field identification in an image using artificial intelligence
US20090202144A1 (en) Music score deconstruction
US9099007B1 (en) Computerized processing of pictorial responses in evaluations
CN111752557A (en) Display method and device
CN112036295B (en) Bill image processing method and device, storage medium and electronic equipment
US11017266B2 (en) Aggregated image annotation
CN110347855A (en) Paintings recommended method, terminal device, server, computer equipment and medium
CN114049631A (en) Data labeling method and device, computer equipment and storage medium
CN110796145B (en) Multi-certificate segmentation association method and related equipment based on intelligent decision
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
CN112381087A (en) Image recognition method, apparatus, computer device and medium combining RPA and AI
WO2013039063A1 (en) Answer processing device, answer processing method, recording medium, and seal
CN108369647B (en) Image-based quality control
CN117058699B (en) Resume layout dividing method, system and storage medium based on LayoutLMv model
CN110363245B (en) Online classroom highlight screening method, device and system
CN112418813A (en) AEO qualification intelligent rating management system and method based on intelligent analysis and identification and storage medium
US20180018893A1 (en) Method and system for identifying marked response data on a manually filled paper form
US20230214451A1 (en) System and method for finding data enrichments for datasets
CN110941947A (en) Document editing method and device, computer storage medium and terminal
CN112365402B (en) Intelligent winding method and device, storage medium and electronic equipment
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment
US20200311413A1 (en) Document form identification
CN110750501A (en) File retrieval method and device, storage medium and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant