CN117058699A - Resume layout dividing method, system and storage medium based on LayoutLMv3 model - Google Patents
Resume layout dividing method, system and storage medium based on LayoutLMv3 model Download PDFInfo
- Publication number
- CN117058699A CN117058699A CN202311087110.1A CN202311087110A CN117058699A CN 117058699 A CN117058699 A CN 117058699A CN 202311087110 A CN202311087110 A CN 202311087110A CN 117058699 A CN117058699 A CN 117058699A
- Authority
- CN
- China
- Prior art keywords
- resume
- title
- layoutlmv3
- layout
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000002372 labelling Methods 0.000 claims abstract description 25
- 238000001514 detection method Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 8
- 238000012015 optical character recognition Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000012821 model calculation Methods 0.000 claims description 3
- OKUGPJPKMAEJOE-UHFFFAOYSA-N S-propyl dipropylcarbamothioate Chemical compound CCCSC(=O)N(CCC)CCC OKUGPJPKMAEJOE-UHFFFAOYSA-N 0.000 claims description 2
- 230000008520 organization Effects 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000007115 recruitment Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
- G06Q10/1053—Employment or hiring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Multimedia (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Entrepreneurship & Innovation (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a resume layout dividing method based on a LayoutLMv3 model, which comprises the following steps: s1: fine tuning the LayoutLMv3 target detection model based on the self-labeling resume; s2: reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume; s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section. The invention can improve the accuracy of dividing the resume layout and can more accurately embody the information organization form in the resume.
Description
Technical Field
The invention relates to the technical field of resume analysis, in particular to a resume layout dividing method, a resume layout dividing system and a storage medium based on a LayoutLMv3 model.
Background
In recruitment, the recruiter needs to read the resume of the job seeker to screen whether the recruiter has the capability and experience of matching the job position, and the resume content is extracted in a structured mode according to the layout, so that the recruiter can quickly know personal information of the job seeker, and the recruitment resume screening efficiency is improved.
At present, the method for carrying out structural extraction on resume information is mainly carried out according to text keywords, for example, a resume data information analysis processing method proposed in CN 108874928A patent is a method for directly adopting keyword matching for the whole resume text content, but the method does not consider the influence of keywords in a text on a title, and has the possibility of causing layout division errors.
Disclosure of Invention
The invention aims to provide a resume layout dividing method, a system and a storage medium based on a LayoutLMv3 model, which are characterized in that the LayoutLMv3 model is applied to resume analysis, firstly, layout division is carried out on a fine granularity level through resume titles, the accuracy of resume layout division is improved, the information organization form in the resume can be more accurately reflected, the data is structured on the basis, the resume is convenient to use and store in a downstream task, the problem of difficult layout positioning and analysis in diversified resume analysis can be reduced, and meanwhile, layout areas of different resume can be accurately identified in a mode of combining image information and text semantic information acquired by image vision auxiliary titles, so that the accuracy and recall rate of the whole resume analysis are improved.
In order to achieve the above purpose, the following technical scheme is adopted:
a resume layout dividing method based on a LayoutLMv3 model comprises the following steps:
s1: fine tuning the LayoutLMv3 target detection model based on the self-labeling resume;
s2: reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume;
s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
Further, the step S1 specifically includes the following steps:
s11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
s12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting the marked information and the resume title into a LayoutLMv3 model together, so as to finely adjust the LayoutLMv3 model, and obtaining the finely adjusted model parameters.
Further, the step S2 specifically includes the following steps:
s21: converting the non-marked resume into a picture format, obtaining resume name, length and width information of the non-marked resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a fine-tuned LayoutLMv3 target detection model;
s22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
Further, the step S3 specifically includes the following steps:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
Further, the dividing of the sections in S32 using the titles as keywords specifically includes the following steps:
s321: based on the layout and the content of the resume, the resume is divided into the following 7 sections in advance: BASIC information, working EXPERIENCE, educational background, project EXPERIENCE, self-evaluation, rewarding certificate and skill, wherein the plate labels corresponding to the 7 plates are BASIC_ INFORMATION, WORK _EXPERIENCE, EDUCATION BACKGROUND, PROJECT EXPERIENCE, SELF ASSESSMENT and REWARD_ CERTIFICATES, SKILL respectively;
s322: and for each edition, listing a keyword list, and for each detected resume title text content, matching keywords in the keyword list with the detected resume title text content, and dividing the title and the content thereof into edition corresponding to the keywords when any one keyword can be matched.
Further, in S322, for the basic information layout, if the real resume does not include the text corresponding to the layout, the content before the first title in the first page of the resume is used as the layout content.
Further, the resume layout dividing method based on the LayoutLMv3 model further comprises the following steps:
s4: and visually displaying the title detection result on the corresponding resume.
Further, the step S4 specifically includes: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language.
The system for dividing the resume print block based on the LayoutLMv3 model comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the resume print block dividing method when executing the computer program.
There is also provided a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described method.
By adopting the scheme, the invention has the beneficial effects that:
1) The marked resume is used for fine tuning the LayoutLMv3 target detection model, the fine-tuned target detection model is used for reasoning the new resume, the dividing area of each edition block is found out, instead of converting the resume into a plain text for analysis just like a conventional resume analysis mode, the accuracy of edition block division of the diversified resume is higher, in the next analysis process, a named entity identification technology is used for analyzing the detailed information of each edition block, for example, the 'work experience' can be automatically presented in a segmented mode, and the information is ensured not to be lost basically;
2) The resume with various formats is converted into jpg picture format, and the text task is converted into visual task, so that the method can be suitable for resume data with various formats and sizes;
3) The method has wide application prospect, particularly in the human resource industry, can avoid the work of manually inputting system information, and can reduce the error rate of manual input.
Drawings
FIG. 1 is a flow chart diagram of the present invention;
FIG. 2 is a schematic diagram of resume labels according to an embodiment of the present invention;
FIG. 3 is a diagram showing the result of dividing the content of a resume layout according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the specific embodiments.
Referring to fig. 1 to 3, the invention provides a resume layout dividing method based on a LayoutLMv3 model, which comprises the following steps:
s1: and fine tuning the LayoutLMv3 target detection model based on the self-labeling resume.
The LayoutLMv3 target detection model is based on a transducer structure, and is trained on 1100 ten thousand scanned document images, and in order to better identify and segment resume data, the LayoutLMv3 target detection model adapting to resume data is obtained by performing fine adjustment on the LayoutLMv3 by using self-labeled resume data, and in one embodiment, the step of performing fine adjustment on the LayoutLMv3 target detection model is as follows:
s11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
s12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting the marked information and the resume title into a LayoutLMv3 model together, so as to finely adjust the LayoutLMv3 model, and obtaining the finely adjusted model parameters.
In this step, mainly the LayoutLMv3 model is trimmed, in this embodiment, the resume is first converted into JPG picture format (which converts text task into visual task, can adapt to resume data of various formats and sizes, and stores images to save memory resources of a computer), then the positions of the titles of each layout in the resume in the picture are represented by a four-tuple (x, y, box_width, box_height), where x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame; and then, the position information of the resume titles in the training data is in one-to-one correspondence with the resume according to the four-element organization mode, the marking information is written into a JSON file, the JSON file and the resume titles are input into a LayoutLMv3 model together, the model is finely adjusted and calculated by using a GPU, and the model is a data marking result of a single resume as shown in figure 2.
S2: and reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume.
In one embodiment, the method specifically includes:
s21: converting the non-marked resume into a picture format, obtaining resume name, length and width information of the non-marked resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a fine-tuned LayoutLMv3 target detection model;
s22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
In this embodiment, title position information of the non-labeling resume is obtained by using a Layoutlmv3 target detection model obtained by fine tuning, after S1 is completed, a Layoutlmv3 model and related model parameters after fine tuning of a resume image are obtained, and for a new non-labeling resume, information such as names, lengths, widths and the like of resume pictures is required to be obtained, then the picture information is organized into a JSON format, taken as a model input together with the picture data, the model parameters are loaded into a neural network model, and a resume title coordinate position of model reasoning is obtained, wherein the format is JSON, so that a coordinate quadruple of a new resume title can be obtained.
S3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
In one embodiment, the method specifically includes:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
This step is mainly aimed at achieving text information extraction, in this embodiment based on title location and OCR recognition algorithm, resulting in modular content of the resume. Typically, a resume contains multiple pictures, and the content is related across pages. Therefore, the title coordinates are ordered according to the resume page and the title frame ordinate, after the ordering is finished, the character content in the resume is obtained by using the OCR character recognition technology, the coordinates of the recognized characters are also given at the same time of OCR detection, and the content between two titles is judged to be the content of the last title layout block compared with the title ordinate. Then, the first line of characters of each plate are used as a title, the plate is divided into BASIC information, working experience, educational background, project experience, self-evaluation, rewarding certificate and 7 plates of SKILL, the plate labels corresponding to the 7 plates are BASIC_ INFORMATION, WORK _ EXPERIENCE, EDUCATION _ BACKGROUND, PROJECT _ EXPERIENCE, SELF _ ASSESSMENT, REWARD _ CERTIFICATES and SKILL respectively, for each plate, a keyword list is listed, for each detected resume title text content, keywords in the keyword list are matched with the detected resume title text content, any keyword can be matched, the title and the content thereof are divided into plates corresponding to the keywords, as shown in fig. 3, the resume layout text extraction is schematic, and the content of the same frame is one plate.
In addition, for the block of basic_information, since the real resume usually does not contain keywords, the present invention proposes that, for the resume which does not contain basic_information block keywords, the content before the first title in the first page of the resume is taken as the content of the block, and meanwhile, the present invention further includes step S4: the title detection result is visually displayed on the corresponding resume, and in an embodiment, the title detection result is specifically: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language so as to read and view.
In addition, a resume layout dividing system based on a LayoutLMv3 model is provided, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the resume layout dividing method, and meanwhile, a computer readable storage medium is provided, and the computer readable storage medium stores the computer program, and the computer program is suitable for being loaded and executed by the processor to enable a computer device with the processor to execute the method.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the memory may be a hard disk, a computer self-contained memory, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc.
The computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The foregoing description of the preferred embodiment of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (10)
1. A resume layout dividing method based on a LayoutLMv3 model is characterized by comprising the following steps:
s1: fine tuning the LayoutLMv3 target detection model based on the self-labeling resume;
s2: reasoning the non-labeling resume based on the finely-adjusted LayoutLMv3 target detection model to acquire the title position information of the non-labeling resume;
s3: and (3) dividing the sections of the non-labeling resume based on the title position information of the non-labeling resume and an OCR recognition algorithm obtained in the step (S2), and extracting the text information in each section.
2. The resume layout dividing method based on the LayoutLMv3 model according to claim 1, wherein the S1 specifically comprises the following steps:
s11: converting the resume into a picture format, dividing each title in the resume by using a rectangular frame, and representing the position of the rectangular frame where each title is located in the resume by using a four-tuple (x, y, box_width, box_height), wherein x represents the abscissa of the top left corner vertex of the rectangular frame, y represents the ordinate of the top left corner vertex of the rectangular frame, box_width represents the width of the rectangular frame, and box_height represents the height of the rectangular frame;
s12: marking the position information of the resume title in a four-element mode in S11, writing the marked information into a JSON file, inputting the marked information and the resume title into a LayoutLMv3 model together, so as to finely adjust the LayoutLMv3 model, and obtaining the finely adjusted model parameters.
3. The resume layout dividing method based on the LayoutLMv3 model according to claim 2, wherein the step S2 specifically comprises the following steps:
s21: converting the non-marked resume into a picture format, obtaining resume name, length and width information of the non-marked resume, storing the resume name, length and width information into a JSON format, and inputting the resume name, length and width information and resume picture information into a fine-tuned LayoutLMv3 target detection model;
s22: and loading the model parameters obtained in the S12, obtaining resume title position information of the non-labeling resume after model calculation reasoning, and storing the resume title position information in a JSON format.
4. The resume layout dividing method based on the LayoutLMv3 model according to claim 1, wherein the step S3 specifically comprises the following steps:
s31: the method comprises the steps of acquiring resume title position information in each resume according to a sequence from top to bottom, primarily dividing the resume into a plurality of sections according to the resume title position information, and simultaneously taking title text contents in each title section and text contents between the next resume title section adjacent to the title section as text contents of the title section;
s32: and extracting the text content in each title print based on an OCR (optical character recognition) algorithm, taking the first row of text in the extracted text content in each title print as the title of the print, and carrying out final print division by taking the title as a keyword.
5. The resume layout dividing method based on the LayoutLMv3 model according to claim 4, wherein the step of dividing the layout by using the title as the keyword in S32 specifically comprises the following steps:
s321: based on the layout and the content of the resume, the resume is divided into the following 7 sections in advance: BASIC information, working experience, educational background, project experience, self-evaluation, rewarding certificate and skill, wherein the plate labels corresponding to the 7 plates are BASIC_ INFORMATION, WORK _ EXPERIENCE, EDUCATION _ BACKGROUND, PROJECT _ EXPERIENCE, SELF _ ASSESSMENT, REWARD _ CERTIFICATES, SKILL;
s322: and for each edition, listing a keyword list, and for each detected resume title text content, matching keywords in the keyword list with the detected resume title text content, and dividing the title and the content thereof into edition corresponding to the keywords when any one keyword can be matched.
6. The method for dividing a layout of a resume based on a LayoutLMv3 model according to claim 5, wherein in S322, for a basic information layout, if the real resume does not include text corresponding to the layout, the content before the first title in the first page of the resume is taken as the layout content.
7. The resume layout dividing method based on the LayoutLMv3 model according to claim 3, wherein the resume layout dividing method based on the LayoutLMv3 model further comprises the following steps:
s4: and visually displaying the title detection result on the corresponding resume.
8. The resume layout dividing method based on the LayoutLMv3 model according to claim 7, wherein the S4 specifically is: and (3) for the resume title position information obtained in the step (S2), drawing a rectangular frame where the title is located on a corresponding position in the resume by using a python programming language.
9. A resume layout dividing system based on a LayoutLMv3 model, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the resume layout dividing method according to any one of claims 1 to 8 when executing the computer program.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311087110.1A CN117058699B (en) | 2023-08-28 | 2023-08-28 | Resume layout dividing method, system and storage medium based on LayoutLMv model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311087110.1A CN117058699B (en) | 2023-08-28 | 2023-08-28 | Resume layout dividing method, system and storage medium based on LayoutLMv model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117058699A true CN117058699A (en) | 2023-11-14 |
CN117058699B CN117058699B (en) | 2024-04-19 |
Family
ID=88653361
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311087110.1A Active CN117058699B (en) | 2023-08-28 | 2023-08-28 | Resume layout dividing method, system and storage medium based on LayoutLMv model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117058699B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145584A (en) * | 2017-05-10 | 2017-09-08 | 西南科技大学 | A kind of resume analytic method based on n gram models |
CN108874928A (en) * | 2018-05-31 | 2018-11-23 | 平安科技(深圳)有限公司 | Resume data information analyzing and processing method, device, equipment and storage medium |
CN110020327A (en) * | 2019-04-16 | 2019-07-16 | 上海大易云计算股份有限公司 | A kind of resume resolution system based on vertical search engine |
CN114708595A (en) * | 2022-03-15 | 2022-07-05 | 灵犀量子(北京)医疗科技有限公司 | Image document structured analysis method, system, electronic device, and storage medium |
CN115661836A (en) * | 2022-11-08 | 2023-01-31 | 太保科技有限公司 | Automatic correction method, device and system and readable storage medium |
US20230092559A1 (en) * | 2021-09-17 | 2023-03-23 | American Family Mutual Insurance Company, S.I. | Systems and methods for unstructured data processing |
CN115984886A (en) * | 2022-12-19 | 2023-04-18 | 中国平安人寿保险股份有限公司 | Table information extraction method, device, equipment and storage medium |
-
2023
- 2023-08-28 CN CN202311087110.1A patent/CN117058699B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145584A (en) * | 2017-05-10 | 2017-09-08 | 西南科技大学 | A kind of resume analytic method based on n gram models |
CN108874928A (en) * | 2018-05-31 | 2018-11-23 | 平安科技(深圳)有限公司 | Resume data information analyzing and processing method, device, equipment and storage medium |
CN110020327A (en) * | 2019-04-16 | 2019-07-16 | 上海大易云计算股份有限公司 | A kind of resume resolution system based on vertical search engine |
US20230092559A1 (en) * | 2021-09-17 | 2023-03-23 | American Family Mutual Insurance Company, S.I. | Systems and methods for unstructured data processing |
CN114708595A (en) * | 2022-03-15 | 2022-07-05 | 灵犀量子(北京)医疗科技有限公司 | Image document structured analysis method, system, electronic device, and storage medium |
CN115661836A (en) * | 2022-11-08 | 2023-01-31 | 太保科技有限公司 | Automatic correction method, device and system and readable storage medium |
CN115984886A (en) * | 2022-12-19 | 2023-04-18 | 中国平安人寿保险股份有限公司 | Table information extraction method, device, equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
HUANG, YUPAN, LV, TENGCHAO, CUI, LEI, LU, YUTONG: "LayoutLMv3 : pre - training for document AI with unified text and image masking", PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIME‐ DIA. LISBOA , PORTUGAL : ASSOCIATION FOR COMPUTING MACHINERY : 4083 - 4091, 22 July 2022 (2022-07-22) * |
祖石诚;王修来;曹阳;张玉韬;梁珊;: "基于新型文本块分割法的简历解析", 计算机科学, no. 1, 15 June 2020 (2020-06-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN117058699B (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111476227B (en) | Target field identification method and device based on OCR and storage medium | |
RU2699687C1 (en) | Detecting text fields using neural networks | |
US20200175095A1 (en) | Object recognition and tagging based on fusion deep learning models | |
US8494257B2 (en) | Music score deconstruction | |
US9099007B1 (en) | Computerized processing of pictorial responses in evaluations | |
CN111752557A (en) | Display method and device | |
CN112036295B (en) | Bill image processing method and device, storage medium and electronic equipment | |
CN110796145B (en) | Multi-certificate segmentation association method and related equipment based on intelligent decision | |
US11017266B2 (en) | Aggregated image annotation | |
WO2013039063A1 (en) | Answer processing device, answer processing method, recording medium, and seal | |
CN112381087A (en) | Image recognition method, apparatus, computer device and medium combining RPA and AI | |
CN115937887A (en) | Method and device for extracting document structured information, electronic equipment and storage medium | |
CN107168635A (en) | Information demonstrating method and device | |
CN113780116A (en) | Invoice classification method and device, computer equipment and storage medium | |
CN113936187A (en) | Text image synthesis method and device, storage medium and electronic equipment | |
CN108369647B (en) | Image-based quality control | |
CN117058699B (en) | Resume layout dividing method, system and storage medium based on LayoutLMv model | |
CN110363245B (en) | Online classroom highlight screening method, device and system | |
CN110941947A (en) | Document editing method and device, computer storage medium and terminal | |
US20180018893A1 (en) | Method and system for identifying marked response data on a manually filled paper form | |
US20230214451A1 (en) | System and method for finding data enrichments for datasets | |
CN103870793B (en) | The monitoring method and device of paper media's advertisement | |
JP5846378B2 (en) | Information management method and information management system | |
CN117437649B (en) | File signing method, device, computer equipment and storage medium | |
CN114138214B (en) | Method and device for automatically generating print file and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |