WO2020252919A1 - Procédé et appareil d'identification de cv, ainsi que dispositif informatique et support de stockage - Google Patents

Procédé et appareil d'identification de cv, ainsi que dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2020252919A1
WO2020252919A1 PCT/CN2019/103268 CN2019103268W WO2020252919A1 WO 2020252919 A1 WO2020252919 A1 WO 2020252919A1 CN 2019103268 W CN2019103268 W CN 2019103268W WO 2020252919 A1 WO2020252919 A1 WO 2020252919A1
Authority
WO
WIPO (PCT)
Prior art keywords
resume
lstm
dnlp
text block
text
Prior art date
Application number
PCT/CN2019/103268
Other languages
English (en)
Chinese (zh)
Inventor
石明川
姚飞
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020252919A1 publication Critical patent/WO2020252919A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the field of computers, in particular to a method and device for identifying resumes, computer equipment, and storage media.
  • Resume recognition is a kind of semi-structured text recognition. Because it does not have the natural word order concept of traditional unstructured text, it is difficult to recognize.
  • the resume recognition system in the prior art is a recognition system based on keywords. For example, "person's name”, “mobile phone number”, “work history”, etc., but if these keywords do not exist in the semi-structured text, the traditional resume recognition system cannot recognize the corresponding corpus.
  • regular expressions are usually used.
  • the period contains various resume formats that bring identification difficulties. For example: the person name keyword is followed by the person's name in the resume, but there are also a series of problems such as the number of words, Chinese and English, and spaces in the person's name.
  • the resume may include multiple names, multiple time periods, etc., often with work experience and project experience
  • the problem of confusion in the recognition of the middle because this part of the resume does not have a unified format, which leads to a very low recognition rate of the resume, and manual screening is required.
  • embodiments of the present application provide a method and device for identifying resumes, computer equipment, and storage media.
  • an embodiment of the present application provides a method for recognizing a resume, the method comprising: receiving a target resume to be recognized; and inputting the target resume into a deep neural language programming DNLP system, where the DNLP system is It is obtained by training with a bidirectional long and short-term memory loop neural network BI-LSTM-CRF model; using the DNLP system to determine the resume template used by the target resume; and extracting feature information in the target resume according to the resume template.
  • the method before inputting the target resume into the deep neural linguistic programming DNLP system, the method further includes: determining a plurality of resume samples; using the plurality of resume samples to train the initial nerve of the BI-LSTM-CRF model Network to obtain the DNLP system.
  • using the multiple resume samples to train the initial neural network of the BI-LSTM-CRF model includes: using a supervised classification method to segment the resume text of each resume sample to obtain multiple text blocks that can correspond to artificial labels , Wherein each text block corresponds to a category attribute in the resume; word segmentation is performed on the text block, and feature words of each text block are extracted; the BI-LSTM- is trained using the text block and the corresponding feature words The initial neural network of the CRF model.
  • dividing the resume text of each resume sample by means of supervised classification includes: dividing the following resume text in each resume sample: self-introduction, education experience, work experience, learning experience, project experience; use The label information marks the resume text.
  • n is a positive integer greater than 1; among them, n i, j is the number of occurrences of the current word in the text block d j , the denominator is the sum of the number of occurrences of all words in d j , and k is any value of i;
  • is the total number of files in the resume sample, and
  • training the initial neural network of the BI-LSTM-CRF model by using the text block and the corresponding feature words includes: in the BI layer of the BI-LSTM-CRF model, using pre-trained or randomly initialized
  • the embedding matrix maps each word in the sentence of the text block from a one-hot vector to a low-dimensional dense word vector.
  • the following maximum log likelihood function is used to process the sample data:
  • x) score (x, y x) -log ( ⁇ y 'exp (score (x, y'))); where, (x, y x) of training samples.
  • an embodiment of the present application provides a device for recognizing resumes.
  • the device includes: a receiving module for receiving a target resume to be recognized; and an input module for inputting the target resume into a deep neural language program Learn a DNLP system, wherein the DNLP system is obtained by training with a bidirectional long and short-term memory loop neural network BI-LSTM-CRF model; a determining module is used to determine the resume template used by the target resume using the DNLP system; extract The module is used to extract feature information in the target resume according to the resume template.
  • the device further includes: a determination module, configured to determine a plurality of resume samples before the input module inputs the target resume into the deep neural linguistic programming DNLP system; a training module, configured to use the Multiple resume samples train the initial neural network of the BI-LSTM-CRF model to obtain the DNLP system.
  • a determination module configured to determine a plurality of resume samples before the input module inputs the target resume into the deep neural linguistic programming DNLP system
  • a training module configured to use the Multiple resume samples train the initial neural network of the BI-LSTM-CRF model to obtain the DNLP system.
  • the training module includes: a segmentation unit for segmenting the resume text of each resume sample in a supervised classification manner to obtain multiple text blocks that can correspond to manual labels, wherein each text block corresponds to a resume An extraction unit, used to segment the text block, and extract the feature words of each text block; a training unit, used to train the BI-LSTM using the text block and corresponding feature words -The initial neural network of the CRF model.
  • the segmentation unit includes: a segmentation subunit for segmenting the following resume text in each resume sample: self-introduction, education experience, work experience, learning experience, project experience; labeling the information with label information Resume text.
  • is the total number of files in the resume sample, and
  • the training module includes: a first processing unit configured to use a pre-trained or randomly initialized embedding matrix to convert each sentence in the text block in the BI layer of the BI-LSTM-CRF model Words are mapped from one-hot vectors to low-dimensional dense word vectors. Before inputting the next layer, set detachment to relieve overfitting; the second processing unit is used in the LSTM layer of the BI-LSTM-CRF model In extracting sentence features, each feature word sequence of a sentence is used as the input of each time step of the bidirectional LSTM, and then the hidden state sequence output by the forward LSTM and the hidden state output by the reverse LSTM at each position are spliced by position.
  • the third processing unit further includes: a processing sub-unit for processing sample data using the following maximization log likelihood function: logP(y x
  • x) score(x,y x )- log ( ⁇ y 'exp (score (x, y'))); where, (x, y x) of training samples.
  • a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
  • FIG. 1 is a hardware structure block diagram of a mobile terminal for identifying resumes according to an embodiment of the present application
  • Figure 2 is a flowchart of a method for identifying resumes according to an embodiment of the present application
  • FIG. 3 is a flowchart of training a BI-LSTM-CRF model in an embodiment of the application
  • Fig. 4 is a structural block diagram of a device for identifying resumes according to an embodiment of the present application.
  • FIG. 1 is a hardware structural block diagram of a computer terminal for identifying resumes according to an embodiment of the present application.
  • the computer terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. ) And a memory 104 for storing data.
  • the aforementioned computer terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
  • FIG. 1 is only for illustration, and does not limit the structure of the foregoing computer terminal.
  • the computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration from that shown in FIG.
  • the memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the method for identifying resumes in the embodiment of the present application.
  • the processor 102 executes the computer programs stored in the memory 104 by running Various functional applications and data processing, namely to achieve the above methods.
  • the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the computer terminal 10 via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is used to receive or send data via a network.
  • the above-mentioned specific examples of the network may include a wireless network provided by the communication provider of the computer terminal 10.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of the method for identifying a resume according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:
  • Step S202 receiving a target resume to be identified
  • Step S204 input the target resume into a deep neural language programming DNLP system, where the DNLP system is obtained by training using a bidirectional long and short-term memory cyclic neural network BI-LSTM-CRF model;
  • Step S206 Use the DNLP system to determine a resume template used by the target resume; the resume template includes multiple physical sections;
  • the resume template of this embodiment refers to the resume style or resume layout adopted by the target resume.
  • the content of the same physical section (such as work experience) is distributed in different positions of the text.
  • the resume template of the target resume is determined by Can determine the position of each text content to be determined in the target resume;
  • Step S208 Extract feature information in the target resume according to the resume template.
  • the target resume is input into the deep neural linguistic programming DNLP system, and the DNLP system is used to determine the resume template used by the target resume, and finally the target resume is extracted according to the resume template
  • the DNLP system is used to determine the resume template used by the target resume
  • the target resume is extracted according to the resume template
  • the feature information after extracting the feature information in the target resume according to the resume template, the feature information can be re-typeset according to the specified template set by the user, so as to facilitate centralized collection, or only the feature information that the user pays attention to ( For example, the graduate school) is extracted and bound with the resume logo or other key information, and then formatted and displayed, so as to reduce the time for users to find the key information in the complicated resume.
  • the method before inputting the target resume into the deep neural linguistic programming DNLP system, the method further includes: determining a plurality of resume samples; using the plurality of resume samples to train the initial neural network of the BI-LSTM-CRF model , To obtain the DNLP system.
  • Fig. 3 is a flowchart of training the BI-LSTM-CRF model according to an embodiment of the present application.
  • the initial neural network for training the BI-LSTM-CRF model using the multiple resume samples includes:
  • S302 Use a supervised classification method to segment the resume text of each resume sample to obtain multiple text blocks that can correspond to manual labels, where each text block corresponds to a category attribute in the resume;
  • dividing the resume text of each resume sample by means of supervised classification includes: dividing the following resume text (physical section) in each resume sample: self-introduction, education experience, work experience, learning experience, project Experience; use label information to mark the resume text.
  • a resume sample a complete resume is composed of multiple resume texts, but for resumes with different templates, the same resume text may be distributed in different positions; this part is the process of learning each entity section of the resume;
  • S304 Perform word segmentation on the text block, and extract feature words of each text block; key feature words can be extracted by performing word segmentation and synonym matching on the marked text block.
  • is the total number of files in the resume sample,
  • TF-IDF can filter out common words, keep important words, and extract characteristic words.
  • the BI-LSTM-CRF model pair is trained and learned by text blocks of each category, and the recognition model of each category can be obtained including: the word-based Bi-LSTM-CRF can be used, such as B- PER, I-PER represent the first character of a person’s name, and the name is not the first character, B-SCH, I-SCH represent the first character of the school, the non-initial character of the school, etc., to train and learn the recognition model of each entity module.
  • the neural network of the BI-LSTM-CRF model includes a three-layer logical structure. Training the initial neural network of the BI-LSTM-CRF model using the text block and the corresponding feature words includes:
  • each word in the sentence of the text block is mapped to a low-dimensional by one-hot vector using a pre-trained or randomly initialized embedding matrix For dense word vectors, set detachment before entering the next layer to alleviate overfitting;
  • LSTM layer of the BI-LSTM-CRF model extract sentence features, use each feature word sequence of a sentence as the input of each time step of the bidirectional LSTM, and then combine the hidden state sequence output by the forward LSTM and the reverse LSTM
  • the hidden states output at each position are spliced by position to obtain a complete hidden state sequence, and output pi, where pi is the probability of belonging to the i tag;
  • the softmax of this embodiment only takes partial considerations, that is, the tag of the current word is not affected by other tags.
  • the following maximum log likelihood function is used to process the sample data: logP (y x
  • x) score (x, y x) -log ( ⁇ y 'exp (score (x, y'))); where, (x, y x) of training samples.
  • the scoring of the entire sequence in this embodiment is equal to the sum of the scoring of each position, and the scoring of each position is obtained from two parts, one part is determined by the pi output by the LSTM, and the other part is determined by the transition matrix A of the CRF.
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • a device for recognizing resumes is also provided.
  • the device is used to implement the above-mentioned embodiments and preferred implementations. What has been explained will not be repeated.
  • the term "module" can implement a combination of software and/or hardware with predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, hardware or a combination of software and hardware is also possible and conceived.
  • Fig. 4 is a structural block diagram of a device for identifying resumes according to an embodiment of the present application. As shown in Fig. 4, the device includes:
  • the receiving module 40 is used to receive the target resume to be identified
  • the input module 42 is configured to input the target resume into a deep neural language programming DNLP system, where the DNLP system is obtained by training using a bidirectional long and short-term memory cyclic neural network BI-LSTM-CRF model;
  • the determining module 44 is configured to use the DNLP system to determine the resume template used by the target resume;
  • the extraction module 46 is configured to extract feature information in the target resume according to the resume template.
  • the device further includes: a determination module, configured to determine a plurality of resume samples before the input module inputs the target resume into the deep neural linguistic programming DNLP system; a training module, configured to use the Multiple resume samples train the initial neural network of the BI-LSTM-CRF model to obtain the DNLP system.
  • a determination module configured to determine a plurality of resume samples before the input module inputs the target resume into the deep neural linguistic programming DNLP system
  • a training module configured to use the Multiple resume samples train the initial neural network of the BI-LSTM-CRF model to obtain the DNLP system.
  • the training module includes: a segmentation unit for segmenting the resume text of each resume sample in a supervised classification manner to obtain multiple text blocks that can correspond to manual labels, wherein each text block corresponds to a resume An extraction unit, used to segment the text block, and extract the feature words of each text block; a training unit, used to train the BI-LSTM using the text block and corresponding feature words -The initial neural network of the CRF model.
  • the segmentation unit includes: a segmentation subunit for segmenting the following resume text in each resume sample: self-introduction, education experience, work experience, learning experience, project experience; labeling the information with label information Resume text.
  • is the total number of files in the resume sample, and
  • the training module includes: a first processing unit configured to use a pre-trained or randomly initialized embedding matrix to convert each sentence in the text block in the BI layer of the BI-LSTM-CRF model Words are mapped from one-hot vectors to low-dimensional dense word vectors. Before inputting the next layer, set detachment to relieve overfitting; the second processing unit is used in the LSTM layer of the BI-LSTM-CRF model In extracting sentence features, each feature word sequence of a sentence is used as the input of each time step of the bidirectional LSTM, and then the hidden state sequence output by the forward LSTM and the hidden state output by the reverse LSTM at each position are spliced by position.
  • the third processing unit further includes: a processing sub-unit for processing sample data using the following maximization log likelihood function: logP(y x
  • x) score(x,y x )- log ( ⁇ y 'exp (score (x, y'))); where, (x, y x) of training samples.
  • each of the above modules can be implemented by software or hardware.
  • it can be implemented in the following manner, but not limited to this: the above modules are all located in the same processor; or, the above modules are combined in any combination The forms are located in different processors.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute the method described in each embodiment of the present application Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is set to execute the steps in any one of the foregoing method embodiments when running.
  • the foregoing storage medium may be configured to store a computer program for executing the following steps:
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs, such as mobile hard disks, magnetic disks, or optical disks.
  • the embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé et un appareil d'identification de CV, ainsi qu'un dispositif informatique et un support de stockage. Le procédé comprend les étapes consistant à : recevoir un CV cible à identifier (S202) ; entrer ledit CV cible dans un système de programmation en langage neuronal profond (DNLP), le système DNLP étant obtenu par apprentissage au moyen d'un modèle de réseau neuronal récurrent de mémoire à long et court terme bidirectionnel (BI-LSTM-CRF) (S204) ; déterminer un modèle de CV utilisé dans ledit CV cible au moyen du système DNLP (S206) ; et extraire des informations caractéristiques dans ledit CV cible en fonction du modèle de CV (S208). Le procédé permet de résoudre le problème technique de faible taux d'identification de CV de l'état de la technique.
PCT/CN2019/103268 2019-06-20 2019-08-29 Procédé et appareil d'identification de cv, ainsi que dispositif informatique et support de stockage WO2020252919A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910534813.1 2019-06-20
CN201910534813.1A CN110442841B (zh) 2019-06-20 2019-06-20 识别简历的方法及装置、计算机设备、存储介质

Publications (1)

Publication Number Publication Date
WO2020252919A1 true WO2020252919A1 (fr) 2020-12-24

Family

ID=68428319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103268 WO2020252919A1 (fr) 2019-06-20 2019-08-29 Procédé et appareil d'identification de cv, ainsi que dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN110442841B (fr)
WO (1) WO2020252919A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541125A (zh) * 2020-12-25 2021-03-23 北京百度网讯科技有限公司 序列标注模型训练方法、装置及电子设备
CN112733550A (zh) * 2020-12-31 2021-04-30 科大讯飞股份有限公司 基于知识蒸馏的语言模型训练方法、文本分类方法及装置
CN112767106A (zh) * 2021-01-14 2021-05-07 中国科学院上海高等研究院 自动化审计方法、系统、计算机可读存储介质及审计设备
CN113076245A (zh) * 2021-03-30 2021-07-06 山东英信计算机技术有限公司 一种开源协议的风险评估方法、装置、设备及存储介质
CN113361253A (zh) * 2021-05-28 2021-09-07 北京金山数字娱乐科技有限公司 识别模型训练方法及装置
CN113627139A (zh) * 2021-08-11 2021-11-09 平安国际智慧城市科技股份有限公司 企业申报表生成方法、装置、设备及存储介质
CN114821603A (zh) * 2022-03-03 2022-07-29 北京百度网讯科技有限公司 票据识别方法、装置、电子设备以及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143517B (zh) * 2019-12-30 2023-09-05 浙江阿尔法人力资源有限公司 人选标签预测方法、装置、设备和存储介质
CN111144373B (zh) * 2019-12-31 2020-12-04 广州市昊链信息科技股份有限公司 一种信息识别方法、装置、计算机设备和存储介质
CN111428480B (zh) * 2020-03-06 2023-11-21 广州视源电子科技股份有限公司 简历识别方法、装置、设备及存储介质
CN111460084A (zh) * 2020-04-03 2020-07-28 中国建设银行股份有限公司 一种简历结构化抽取模型训练方法及系统
CN111598462B (zh) * 2020-05-19 2022-07-12 厦门大学 一种面向校园招聘的简历筛选方法
CN111966785B (zh) * 2020-07-31 2023-06-20 中国电子科技集团公司第二十八研究所 一种基于层叠序列标注的简历信息抽取方法
CN113297845B (zh) * 2021-06-21 2022-07-26 南京航空航天大学 一种基于多层次双向循环神经网络的简历块分类方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159962A (zh) * 2015-08-21 2015-12-16 北京全聘致远科技有限公司 职位推荐方法与装置、简历推荐方法与装置、招聘平台
US20170300565A1 (en) * 2016-04-14 2017-10-19 Xerox Corporation System and method for entity extraction from semi-structured text documents
CN107943911A (zh) * 2017-11-20 2018-04-20 北京大学深圳研究院 数据抽取方法、装置、计算机设备及可读存储介质
CN108664474A (zh) * 2018-05-21 2018-10-16 众安信息技术服务有限公司 一种基于深度学习的简历解析方法
CN109710930A (zh) * 2018-12-20 2019-05-03 重庆邮电大学 一种基于深度神经网络的中文简历解析方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6874002B1 (en) * 2000-07-03 2005-03-29 Magnaware, Inc. System and method for normalizing a resume
US20070005549A1 (en) * 2005-06-10 2007-01-04 Microsoft Corporation Document information extraction with cascaded hybrid model
CN107862303B (zh) * 2017-11-30 2019-04-26 平安科技(深圳)有限公司 表格类图像的信息识别方法、电子装置及可读存储介质
CN108897726B (zh) * 2018-05-03 2021-11-16 平安科技(深圳)有限公司 一种电子简历的创建方法、存储介质和服务器
CN109214382A (zh) * 2018-07-16 2019-01-15 顺丰科技有限公司 一种基于crnn的票据信息识别算法、设备及存储介质
CN109214385B (zh) * 2018-08-15 2021-06-08 腾讯科技(深圳)有限公司 数据采集方法、数据采集装置及存储介质
CN109635288B (zh) * 2018-11-29 2023-05-23 东莞理工学院 一种基于深度神经网络的简历抽取方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159962A (zh) * 2015-08-21 2015-12-16 北京全聘致远科技有限公司 职位推荐方法与装置、简历推荐方法与装置、招聘平台
US20170300565A1 (en) * 2016-04-14 2017-10-19 Xerox Corporation System and method for entity extraction from semi-structured text documents
CN107943911A (zh) * 2017-11-20 2018-04-20 北京大学深圳研究院 数据抽取方法、装置、计算机设备及可读存储介质
CN108664474A (zh) * 2018-05-21 2018-10-16 众安信息技术服务有限公司 一种基于深度学习的简历解析方法
CN109710930A (zh) * 2018-12-20 2019-05-03 重庆邮电大学 一种基于深度神经网络的中文简历解析方法

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541125A (zh) * 2020-12-25 2021-03-23 北京百度网讯科技有限公司 序列标注模型训练方法、装置及电子设备
CN112541125B (zh) * 2020-12-25 2024-01-12 北京百度网讯科技有限公司 序列标注模型训练方法、装置及电子设备
CN112733550A (zh) * 2020-12-31 2021-04-30 科大讯飞股份有限公司 基于知识蒸馏的语言模型训练方法、文本分类方法及装置
CN112733550B (zh) * 2020-12-31 2023-07-25 科大讯飞股份有限公司 基于知识蒸馏的语言模型训练方法、文本分类方法及装置
CN112767106A (zh) * 2021-01-14 2021-05-07 中国科学院上海高等研究院 自动化审计方法、系统、计算机可读存储介质及审计设备
CN112767106B (zh) * 2021-01-14 2023-11-07 中国科学院上海高等研究院 自动化审计方法、系统、计算机可读存储介质及审计设备
CN113076245A (zh) * 2021-03-30 2021-07-06 山东英信计算机技术有限公司 一种开源协议的风险评估方法、装置、设备及存储介质
CN113361253A (zh) * 2021-05-28 2021-09-07 北京金山数字娱乐科技有限公司 识别模型训练方法及装置
CN113361253B (zh) * 2021-05-28 2024-04-09 北京金山数字娱乐科技有限公司 识别模型训练方法及装置
CN113627139A (zh) * 2021-08-11 2021-11-09 平安国际智慧城市科技股份有限公司 企业申报表生成方法、装置、设备及存储介质
CN114821603A (zh) * 2022-03-03 2022-07-29 北京百度网讯科技有限公司 票据识别方法、装置、电子设备以及存储介质
CN114821603B (zh) * 2022-03-03 2023-09-01 北京百度网讯科技有限公司 票据识别方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN110442841A (zh) 2019-11-12
CN110442841B (zh) 2024-02-02

Similar Documents

Publication Publication Date Title
WO2020252919A1 (fr) Procédé et appareil d'identification de cv, ainsi que dispositif informatique et support de stockage
CN110569366B (zh) 文本的实体关系抽取方法、装置及存储介质
CN109145153B (zh) 意图类别的识别方法和装置
CN107729309B (zh) 一种基于深度学习的中文语义分析的方法及装置
WO2021068339A1 (fr) Procédé et dispositif de classification de texte, et support de stockage lisible par ordinateur
WO2021068329A1 (fr) Procédé de reconnaissance d'entités à noms chinois, dispositif et support de stockage lisible par ordinateur
CN110502621A (zh) 问答方法、问答装置、计算机设备及存储介质
CN108304373B (zh) 语义词典的构建方法、装置、存储介质和电子装置
CN110909549B (zh) 对古汉语进行断句的方法、装置以及存储介质
WO2021135469A1 (fr) Procédé, appareil, dispositif informatique et support d'extraction d'informations basée sur l'apprentissage automatique
CN110851599B (zh) 一种中文作文自动评分方法及教辅系统
CN110276023B (zh) Poi变迁事件发现方法、装置、计算设备和介质
CN112395395B (zh) 文本关键词提取方法、装置、设备及存储介质
CN112101041B (zh) 基于语义相似度的实体关系抽取方法、装置、设备及介质
CN108804423B (zh) 医疗文本特征提取与自动匹配方法和系统
CN112287069B (zh) 基于语音语义的信息检索方法、装置及计算机设备
WO2021051574A1 (fr) Procédé et système d'étiquetage de séquence de texte en anglais et dispositif informatique
WO2022222300A1 (fr) Procédé et appareil d'extraction de relation ouverte, dispositif électronique et support de stockage
CN105760363B (zh) 文本文件的词义消歧方法及装置
CN110852106A (zh) 基于人工智能的命名实体处理方法、装置及电子设备
CN112215008A (zh) 基于语义理解的实体识别方法、装置、计算机设备和介质
CN108550065A (zh) 评论数据处理方法、装置及设备
Panda Developing an efficient text pre-processing method with sparse generative Naive Bayes for text mining
CN112131881B (zh) 信息抽取方法及装置、电子设备、存储介质
WO2021189920A1 (fr) Procédé et appareil de détermination d'objet de groupe de textes médicaux, dispositif électronique et support d'enregistrement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933488

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933488

Country of ref document: EP

Kind code of ref document: A1