CN110489739A - A kind of the name extracting method and its device of public security case and confession text based on CRF algorithm - Google Patents

A kind of the name extracting method and its device of public security case and confession text based on CRF algorithm Download PDF

Info

Publication number
CN110489739A
CN110489739A CN201910593309.9A CN201910593309A CN110489739A CN 110489739 A CN110489739 A CN 110489739A CN 201910593309 A CN201910593309 A CN 201910593309A CN 110489739 A CN110489739 A CN 110489739A
Authority
CN
China
Prior art keywords
case
text
confession
public security
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910593309.9A
Other languages
Chinese (zh)
Other versions
CN110489739B (en
Inventor
麦家健
莫毅宇
朱凌峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Shuihuida Data Co Ltd
Original Assignee
Dongguan Shuihuida Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Shuihuida Data Co Ltd filed Critical Dongguan Shuihuida Data Co Ltd
Priority to CN201910593309.9A priority Critical patent/CN110489739B/en
Publication of CN110489739A publication Critical patent/CN110489739A/en
Application granted granted Critical
Publication of CN110489739B publication Critical patent/CN110489739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to natural language processing technique fields, specifically disclose the name extracting method and its device of a kind of public security case based on CRF algorithm and confession text, method includes the data information for obtaining public security case text and case confession, case text and case confession are corresponded to and are integrally formed a text data, and is stored in a tables of data to be labeled;Case text and the corresponding text data being integrally formed of case confession are subjected to entity word mark;Part-of-speech tagging is carried out, feature is extracted to establish essential characteristic template according to mark;The corpus of essential characteristic template and public security case text and case confession input CRF algorithm model is trained, name is obtained and extracts model;Establish the information data table of avenue situation in public security monitoring range;It extracts model by name to identify newly-increased case text and confession information, and the information data table of correspondence mappings to avenue situation carries out information extraction, improves office efficiency.

Description

A kind of name extracting method of public security case and confession text based on CRF algorithm and Its device
Technical field
The present invention relates to natural language processing technique field, specifically disclose a kind of public security case based on CRF algorithm and The name extracting method and its device of confession text.
Background technique
With the fast development of natural language processing technique, which has been widely used for the relevant industries such as search engine In, and public security organization has accumulated a large amount of case text data information in long-term information process, public security department needs to put into More and more manpowers go analysis and classification to case text and confession text.
Currently, there are subjective on term since numerous cases and confession via different police's descriptions and record Difference in order to accurately find relevant information, need public security officer to spend more and without specification description term Time and efforts, aggravated the operating pressure and employment cost of public security officer significantly during access, office efficiency is significantly It reduces;And when public security officer needs to extract some case information, after access case need to be passed through and browse case entire contents It obtains, the important information of case can not be intuitively understood, to cause the inefficiency of public security officer's analysis case.
Therefore, a kind of method and apparatus that can solve the above problem are needed in industry.
Summary of the invention
In order to overcome shortcoming and defect existing in the prior art, the purpose of the present invention is to provide one kind to be calculated based on CRF The public security case of method and the name extracting method and its device of confession text can make public security officer can be fast in office process with this Speed accurately recognizes the relevant information of case.
To achieve the above object, the present invention uses following scheme.
A kind of name extracting method of public security case and confession text based on CRF algorithm, comprising:
Case text and case confession are corresponded to and are integrally formed a text by the data information for obtaining public security case text and case confession Notebook data, and store in a tables of data to be labeled;
Case text and the corresponding text data being integrally formed of case confession are subjected to entity word mark;
Part-of-speech tagging is carried out, feature is extracted to establish essential characteristic template according to mark;
The corpus of essential characteristic template and public security case text and case confession input CRF algorithm model is trained, is obtained Model is extracted in name;
Establish the information data table of avenue situation in public security monitoring range;
It extracts model by name to identify newly-increased case text and confession information, and correspondence mappings are to the letter of avenue situation It ceases tables of data and carries out information extraction.
Further, the carry out part-of-speech tagging, extracting feature according to mark to establish essential characteristic template includes:
Corpus is segmented using jieba participle method, part-of-speech tagging is carried out using jieba.posseg;
According to participle and part-of-speech tagging, each participle is labeled to obtain its corresponding label using BIEOS marking model, Wherein B indicates word position stem in label, and I indicates that word part interior, E indicate word position tail portion, and O indicates that unrelated word, S indicate single Pronouns, general term for nouns, numerals and measure words;
Feature extraction is carried out to establish essential characteristic template to corpus, wherein feature includes part of speech feature, entity word feature and mark Label.
Further, the essential characteristic template of establishing is that the user-defined feature template based on u-gram includes:
Establish feature templates:
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-2,1]
U06:%x[-1,1]
U07:%x[0,1]
U08:%x[1,1]
U09:%x[2,1]
U10:%x[-2,0]/%x[-1,0]/%x[0,0]
U11:%x[-1,0]/%x[0,0]/%x[1,0]
U12:%x[0,0]/%x[1,0]/%x[2,0]
U13:%x[-2,0]/%x[-1,1]
U14:%x[0,0]/%x[1,0]
U15:%x[-1,0]/%x[0,0]
U16:%x[1,1]/%x[2,1]
U17:%x[-1,1]/%x[0,1]
U18:%x[0,1]/%x[1,1]
Wherein, U00 to U09 respectively indicates the feature participle of respective position;U10 to U18 then indicates to segment the language formed by feature Material;
Part of speech feature, entity word feature and label are substituted into position and the corpus group of user-defined feature template assigned characteristics participle At.
Further, the entity word includes crime place place, loss article, case-involving tool, case-involving means;Institute's predicate Property includes noun, verb, adjective, pronoun, preposition.
It further, further include being trained to be pre-processed in input CRF algorithm model, specifically:
Using public security system data, crime place locality data table, loss type of goods tables of data and case-involving tool are constructed respectively Tables of data;
The corpus of public security case text and case confession is converted to the input format of CRF algorithm model, each of them corpus Format is expressed as<word, part of speech feature, loses article characteristics, case-involving tool characteristics, Site characterization, and label>;
Each of corpus word is traversed, if loss article characteristics, case-involving tool characteristics, Site characterization appear in its corresponding number According to being then labeled as 1 in table, 0 is labeled as if not occurring.
Further, the information of the avenue situation includes avenue address information and its corresponding house, list Position, place, personal information.
A kind of mobile device, comprising:
Case text and confession text data module are integrated, for obtaining the data information of public security case text and case confession, Case text and case confession correspondence are integrally formed a text data;
Database module, for recording the information of avenue situation;
Processor is adapted for carrying out program instruction;
Storage device, is suitable for storage program instruction, and described program instruction is suitable for having processor to load and executing above-mentioned to realize The name extracting method of public security case and confession text based on CRF algorithm.
A kind of computer readable storage devices, are stored with computer program, the computer program be executed by processor with Realize the name extracting method of the above-mentioned public security case based on CRF algorithm and confession text.
A kind of name extraction system of public security case and confession text based on CRF algorithm, server;
Server includes processor and storage equipment;
Processor is adapted for carrying out program instruction;
Equipment is stored, storage program instruction is suitable for, described program instruction is suitable for being loaded by processor and being executed above-mentioned to realize The name extracting method of public security case and confession text based on CRF algorithm.
Beneficial effects of the present invention: the classification extraction side of a kind of public security case based on CRF algorithm and confession text is provided Method and its device, it is by obtaining the data information of public security case text and case confession, case text and case confession is corresponding It is integrally formed a text data, and stores in a tables of data and marks several part-of-speech taggings to carry out entity word, can be led to after completing mark Essential characteristic template is established in the extraction for crossing mark progress feature, then again by essential characteristic template and public security case text and case Confession information input so that obtaining a general name extracts model, while establishing public security prison to the model training of CRF algorithm The information data table for controlling avenue situation in range, when the data information for having newly-increased public security case text and case confession When, it is passed into name and extracts the key message for identifying newly-increased public security case text and case confession in model, facilitate public security The inquiry to case information of personnel, while mapping to the information data table of avenue situation and feeding back to public security officer, make It is more fully accurate to obtain case extraction information.And this programme establishes a general name by sample training and extracts model, The difference being adapted in different police's description and record term, can accurately find relevant information, mention significantly High case handling efficiency.
Detailed description of the invention
Fig. 1 is the flow diagram of the embodiment of the present invention.
Fig. 2 is the schematic device of the embodiment of the present invention.
Fig. 3 is the schematic diagram of the corpus training format of the embodiment of the present invention.
Fig. 4 is the schematic diagram of BIEOS of embodiment of the present invention model mark.
Fig. 5 is the schematic diagram that the embodiment of the present invention extracts address information.
Specific embodiment
For the ease of the understanding of those skilled in the art, the present invention is made further below with reference to examples and drawings Bright, the content that embodiment refers to not is limitation of the invention.
The present invention provides a kind of public security case based on CRF algorithm and the name extracting methods of confession text, such as Fig. 1 institute Show, in order to establish one be suitable for public security case text and case confession information pass through model, it is necessary first to existing public affairs Public security case text and case confession information in peace system carry out a certain amount of sample training, so that the model adapts to Difference in different police's descriptions and record term, and inquire corresponding information accurately to improve office efficiency. Therefore the data information that public security case text and case confession are first obtained from public security system, by case text and case confession pair It should be integrally formed a text data, can be corresponded to unified case text and case confession with this, while in order to facilitate after Continuous mark is stored in a tables of data.
Case text and the corresponding text data being integrally formed of case confession are subjected to entity word mark, wherein entity word master It to include crime place place, loss article, case-involving tool, case-involving means;Above-mentioned several entity words are common in office process Key message, this also for extract key message, rather than case full text or a big segment information, with this public security office worker The artificial extraction again from a big segment information is no longer needed to, efficiency is increased.But above- mentioned information are intended only as one embodiment, can basis The requirement of different public security offices, can suitably increase other entity word informations.Entity word mark can be marked using artificial, or System mark or system mark and artificial nucleus couple, herein with no restrictions.
Part-of-speech tagging is carried out, primarily to difference including but not limited to noun, verb, adjective, pronoun, preposition, than Such as in order to both can be used as the case where noun can also be used as verb for the same word, avoid mentioning subsequently through the name of foundation The problem of obscuring when taking model extraction.
As shown in Figures 3 and 4, when carrying out part-of-speech tagging, first corpus is segmented using jieba participle method, that is, One long sentence is divided into multiple participles, for example, " crossing Dong Keng intersection preparation in the town Dongguan City LiaoPo is gone home by bus " participle at " Dongguan City/small house/control/town/Dong Keng/crossing/boundary/place/preparation/by bus/is gone home ", then uses Jieba.posseg carries out part-of-speech tagging;Further according to participle and part-of-speech tagging, each participle is carried out using BIEOS marking model Mark is to obtain its corresponding label, and wherein B indicates word position stem in label, and I indicates that word part interior, E indicate word position Tail portion, O indicate that unrelated word, S indicate monomer word;For example town-brand label in Dongguan are B-PLACE in Fig. 3, mark in this way is advantageous In subsequent feature extraction, the speed of Speed-up Establishment essential characteristic template.
Then, feature extraction is carried out to establish essential characteristic template to corpus, this feature template is equivalent to an empty content Template, only include record need training feature, these features include it is above-mentioned have the part of speech feature mentioned, entity word feature and Label.
In the present embodiment, establishing essential characteristic template is the user-defined feature template based on U-gram, including, it makes by oneself Adopted feature templates format is %U [row, col], and due to using U-gram template types, beginning letter is U;Row expression is worked as The row of front position, col corresponding is column.Every a line represents a template below:
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-2,1]
U06:%x[-1,1]
U07:%x[0,1]
U08:%x[1,1]
U09:%x[2,1]
U10:%x[-2,0]/%x[-1,0]/%x[0,0]
U11:%x[-1,0]/%x[0,0]/%x[1,0]
U12:%x[0,0]/%x[1,0]/%x[2,0]
U13:%x[-2,0]/%x[-1,1]
U14:%x[0,0]/%x[1,0]
U15:%x[-1,0]/%x[0,0]
U16:%x[1,1]/%x[2,1]
U17:%x[-1,1]/%x[0,1]
U18:%x[0,1]/%x[1,1]
Wherein, U00 to U09 respectively indicates the feature participle of respective position;U10 to U18 then indicates to segment the language formed by feature Material, and above-mentioned number is to refer in a generation, is not actual position coordinates, should not be a limitation of the present invention;By part of speech Feature, entity word feature and label substitute into position and the corpus composition of self-defined template assigned characteristics participle.
In order to illustrate more clearly of, in conjunction with BIEOS mark and segment illustrate but and as limitation of the invention, such as
The crossing Dong Keng intersection preparation in the town Dongguan City LiaoPo goes home to be robbed with knife by bus
B I I I I I I I E O O O O B E
It is mentioned before the meaning of B I E O S therein, does not make tired chat herein.
Assuming that current word is " Dong Keng ", corresponding " Dong Keng " word of U02:%x [0,0], then U00:%x [- 2,0] indicates " control " word, U01:%x [1,0] indicates that " crossing " word, U05:%x [- 1,0]/%x [0,0]/%x [1,0] indicate " town/Dong Keng/crossing ", such as such It pushes away.
Then the corpus of essential characteristic template and public security case text and case confession is inputted into CRF algorithm model, the mistake Journey is similar to carry out template to the content of public security case text and case confession according to the regulation of essential characteristic template for case information Filling to carry out sample training, obtain name and extract model.
In the present embodiment, in order to better adapt to CRF algorithm model, it is pre- that progress is trained in input CRF algorithm model Processing, specifically:
Using public security system data, crime place locality data table place_data, loss type of goods tables of data are constructed respectively Hings_data and case-involving tool data table tools_data;
As shown in figure 3, the corpus of public security case text and case confession is converted to the input format of CRF algorithm model, wherein Each corpus format is expressed as<word, part of speech feature, loses article characteristics, case-involving tool characteristics, Site characterization, and label>;Time Each of corpus word is gone through, if loss article characteristics, case-involving tool characteristics, Site characterization appear in its corresponding tables of data It is then labeled as 1,0 is labeled as if not occurring, information can be reflected intuitively more with this.
After completing to name and extracting model foundation, information extraction directly to newly-increased case and it can queried, but Application is more accurate for the information for ensureing acquisition, meets public security and handles official business rigorous requirement, also sets up public security monitoring range inner city The information data table of city street situation, the data information indicate data based on public security system to establish, the avenue The information of situation includes avenue address information and its corresponding house, unit, place, personal information.The information table is main It is that country advocates " two marks four are real " information.Two marks include normal address library, standard operation figure;Four is real including real population, reality There is house, has unit in fact, has facility in fact.That is, after extracting the key message that model identifies by name, it can be by it It maps in the information table of avenue situation, system can make a verification, for example extract model extraction by name and come out Send out place place, loss article, case-involving tool, case-involving means information, it is assumed that the event is the event of robbing the bank, and city street Recording in road situation information table is resident room, then system can recognize wrong, re-start extraction to the case information, significantly Improve accuracy.More specifically, as shown in figure 5, by the address information extracted " Tangxia Town, Dongguan City, Guangdong Province ring city east Road * * * " (since data are sensitive data, has made desensitization process), in mapping value public security department " two marks four are real " table, according to table Middle information can feed back the property reality rental house for locating the address.
In addition, as shown in Fig. 2, the present invention also provides a kind of mobile devices, comprising:
Case text and confession text data module are integrated, for obtaining the data information of public security case text and case confession, Case text and case confession correspondence are integrally formed a text data;
Database module, for recording the information of avenue situation;
Processor is adapted for carrying out program instruction;
Storage device, is suitable for storage program instruction, and described program instruction is suitable for having processor to load and executing above-mentioned to realize The name extracting method of public security case and confession text based on CRF algorithm.
The present invention provides a kind of computer readable storage devices again, is stored with computer program, which is characterized in that the meter Calculation machine program is executed by processor the name extracting method of the above-mentioned public security case based on CRF algorithm and confession text.
The present invention also provides a kind of public security case based on CRF algorithm and the name extraction system of confession text, features It is, server;
Server includes processor and storage equipment;
Processor is adapted for carrying out program instruction;
Equipment is stored, storage program instruction is suitable for, described program instruction is suitable for being loaded by processor and being executed above-mentioned to realize The name extracting method of public security case and confession text based on CRF algorithm.
The above is only a preferred embodiment of the present invention, for those of ordinary skill in the art, according to the present invention Thought, there will be changes in the specific implementation manner and application range, and the content of the present specification should not be construed as to the present invention Limitation.

Claims (9)

1. a kind of name extracting method of public security case and confession text based on CRF algorithm characterized by comprising
Case text and case confession are corresponded to and are integrally formed a text by the data information for obtaining public security case text and case confession Notebook data, and store in a tables of data to be labeled;
Case text and the corresponding text data being integrally formed of case confession are subjected to entity word mark;
Part-of-speech tagging is carried out, feature is extracted to establish essential characteristic template according to mark;
The corpus of essential characteristic template and public security case text and case confession input CRF algorithm model is trained, is obtained Model is extracted in name;
Establish the information data table of avenue situation in public security monitoring range;
It extracts model by name to identify newly-increased case text and confession information, and correspondence mappings are to the letter of avenue situation It ceases tables of data and carries out information extraction.
2. a kind of name extracting method of public security case and confession text based on CRF algorithm according to claim 1, It is characterized in that, the carry out part-of-speech tagging, extracting feature according to mark to establish essential characteristic template includes:
Corpus is segmented using jieba participle method, part-of-speech tagging is carried out using jieba.posseg;
According to participle and part-of-speech tagging, each participle is labeled to obtain its corresponding label using BIEOS marking model, Wherein B indicates word position stem in label, and I indicates that word part interior, E indicate word position tail portion, and O indicates that unrelated word, S indicate single Pronouns, general term for nouns, numerals and measure words;
Feature extraction is carried out to establish essential characteristic template to corpus, wherein feature includes part of speech feature, entity word feature and mark Label.
3. a kind of name extracting method of public security case and confession text based on CRF algorithm according to claim 2, It is characterized in that, the essential characteristic template of establishing is that the user-defined feature template based on U-gram includes:
Establish user-defined feature template:
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-2,1]
U06:%x[-1,1]
U07:%x[0,1]
U08:%x[1,1]
U09:%x[2,1]
U10:%x[-2,0]/%x[-1,0]/%x[0,0]
U11:%x[-1,0]/%x[0,0]/%x[1,0]
U12:%x[0,0]/%x[1,0]/%x[2,0]
U13:%x[-2,0]/%x[-1,1]
U14:%x[0,0]/%x[1,0]
U15:%x[-1,0]/%x[0,0]
U16:%x[1,1]/%x[2,1]
U17:%x[-1,1]/%x[0,1]
U18:%x[0,1]/%x[1,1]
Wherein, U00 to U09 respectively indicates the feature participle of respective position;U10 to U18 then indicates to segment the language formed by feature Material;
Part of speech feature, entity word feature and label are substituted into position and the corpus group of user-defined feature template assigned characteristics participle At.
4. a kind of name of described in any item public security cases and confession text based on CRF algorithm is extracted according to claim 1 Method, which is characterized in that the entity word includes crime place place, loss article, case-involving tool, case-involving means;Institute's predicate Property includes noun, verb, adjective, pronoun, preposition.
5. a kind of name extracting method of public security case and confession text based on CRF algorithm according to claim 4, It is characterized in that, further includes being trained to be pre-processed in input CRF algorithm model, specifically:
Using public security system data, crime place locality data table, loss type of goods tables of data and case-involving tool are constructed respectively Tables of data;
The corpus of public security case text and case confession is converted to the input format of CRF algorithm model, each of them corpus Format is expressed as<word, part of speech feature, loses article characteristics, case-involving tool characteristics, Site characterization, and label>;
Each of corpus word is traversed, if loss article characteristics, case-involving tool characteristics, Site characterization appear in its corresponding number According to being then labeled as 1 in table, 0 is labeled as if not occurring.
6. a kind of name of public security case and confession text based on CRF algorithm according to claim 1-5 mentions Take method, which is characterized in that the information of the avenue situation includes avenue address information and its corresponding house, list Position, place, personal information.
7. a kind of mobile device characterized by comprising
Case text and confession text data module are integrated, for obtaining the data information of public security case text and case confession, Case text and case confession correspondence are integrally formed a text data;
Database module, for recording the information of avenue situation;
Processor is adapted for carrying out program instruction;
Storage device, is suitable for storage program instruction, and described program instruction is suitable for having processor to load and executing to realize that right is wanted Seek the name extracting method of the public security case and confession text described in 1-6 any one based on CRF algorithm.
8. a kind of computer readable storage devices, are stored with computer program, which is characterized in that the computer program is processed Device execution is mentioned with the name for realizing the public security case as claimed in any one of claims 1 to 6 based on CRF algorithm and confession text Take method.
9. a kind of name extraction system of public security case and confession text based on CRF algorithm, which is characterized in that server;
Server includes processor and storage equipment;
Processor is adapted for carrying out program instruction;
Equipment is stored, storage program instruction is suitable for, described program instruction is suitable for being loaded by processor and being executed to realize that right is wanted Seek the name extracting method of the public security case and confession text described in 1 to 6 any one based on CRF algorithm.
CN201910593309.9A 2019-07-03 2019-07-03 Naming extraction method and device for public security cases and oral text based on CRF algorithm Active CN110489739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910593309.9A CN110489739B (en) 2019-07-03 2019-07-03 Naming extraction method and device for public security cases and oral text based on CRF algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910593309.9A CN110489739B (en) 2019-07-03 2019-07-03 Naming extraction method and device for public security cases and oral text based on CRF algorithm

Publications (2)

Publication Number Publication Date
CN110489739A true CN110489739A (en) 2019-11-22
CN110489739B CN110489739B (en) 2023-06-20

Family

ID=68546041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910593309.9A Active CN110489739B (en) 2019-07-03 2019-07-03 Naming extraction method and device for public security cases and oral text based on CRF algorithm

Country Status (1)

Country Link
CN (1) CN110489739B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925919A (en) * 2021-03-03 2021-06-08 曲阜师范大学 Knowledge graph driven personalized job layout method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070046982A1 (en) * 2005-08-23 2007-03-01 Hull Jonathan J Triggering actions with captured input in a mixed media environment
US20120330971A1 (en) * 2011-06-26 2012-12-27 Itemize Llc Itemized receipt extraction using machine learning
US20150186361A1 (en) * 2013-12-25 2015-07-02 Kabushiki Kaisha Toshiba Method and apparatus for improving a bilingual corpus, machine translation method and apparatus
CN109190110A (en) * 2018-08-02 2019-01-11 厦门快商通信息技术有限公司 A kind of training method of Named Entity Extraction Model, system and electronic equipment
CN109710925A (en) * 2018-12-12 2019-05-03 新华三大数据技术有限公司 Name entity recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070046982A1 (en) * 2005-08-23 2007-03-01 Hull Jonathan J Triggering actions with captured input in a mixed media environment
US20120330971A1 (en) * 2011-06-26 2012-12-27 Itemize Llc Itemized receipt extraction using machine learning
US20150186361A1 (en) * 2013-12-25 2015-07-02 Kabushiki Kaisha Toshiba Method and apparatus for improving a bilingual corpus, machine translation method and apparatus
CN109190110A (en) * 2018-08-02 2019-01-11 厦门快商通信息技术有限公司 A kind of training method of Named Entity Extraction Model, system and electronic equipment
CN109710925A (en) * 2018-12-12 2019-05-03 新华三大数据技术有限公司 Name entity recognition method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925919A (en) * 2021-03-03 2021-06-08 曲阜师范大学 Knowledge graph driven personalized job layout method

Also Published As

Publication number Publication date
CN110489739B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN111708773B (en) Multi-source scientific and creative resource data fusion method
CN111680490B (en) Cross-modal document processing method and device and electronic equipment
WO2021208696A1 (en) User intention analysis method, apparatus, electronic device, and computer storage medium
CN107193796B (en) Public opinion event detection method and device
CN108959566B (en) A kind of medical text based on Stacking integrated study goes privacy methods and system
CN107357765B (en) Word document flaking method and device
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
CN111241230A (en) Method and system for identifying string mark risk based on text mining
CN111222330B (en) Chinese event detection method and system
CN109299469A (en) A method of identifying complicated address in long text
CN115130613B (en) False news identification model construction method, false news identification method and device
CN110489739A (en) A kind of the name extracting method and its device of public security case and confession text based on CRF algorithm
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
CN112416992B (en) Industry type identification method, system and equipment based on big data and keywords
CN111898528B (en) Data processing method, device, computer readable medium and electronic equipment
Panenghat et al. Towards the necessity for debiasing natural language inference datasets
CN106649875B (en) Public opinion big data visualization system
CN109271479A (en) A kind of resume structuring processing method
CN112330501A (en) Document processing method and device, electronic equipment and storage medium
CN111427977B (en) Electronic eye data processing method and device
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product
CN110866394A (en) Company name identification method and device, computer equipment and readable storage medium
CN106598983A (en) Information display method and device
US20220075950A1 (en) Data labeling method and device, and storage medium
CN112989811A (en) BilSTM-CRF-based historical book reading auxiliary system and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant