CN104765858A - Construction method for public security synonym library and obtained public security synonym library - Google Patents
Construction method for public security synonym library and obtained public security synonym library Download PDFInfo
- Publication number
- CN104765858A CN104765858A CN201510190990.4A CN201510190990A CN104765858A CN 104765858 A CN104765858 A CN 104765858A CN 201510190990 A CN201510190990 A CN 201510190990A CN 104765858 A CN104765858 A CN 104765858A
- Authority
- CN
- China
- Prior art keywords
- participle
- data element
- class
- neologisms
- storehouse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a construction method for a public security synonym library and relates to the technical field of data processing. The construction method solves the technical problems that the calculation time is shortened and the calculating speed is increased. The method includes the steps that the synonym library is constructed firstly, a data element library is constructed according to known data elements, and all the data elements are divided into an object class, a feature class and an expression class; when a new word needs to be inserted, the new word is divided into three participles according to the object class, the feature class and the expression class, the object class data element, the feature class data element and the expression class data element in which the three participles appear most frequently are found from the data element library, the matching degree of the new word is calculated, whether the new word is a synonym is judged according to the matching degree of the new word, and if the new word is the synonym, the synonymy between the new word and the data elements is stored into the synonym library. The construction method is suitable for processing data for the public security.
Description
Technical field
The present invention relates to the technical field of data processing, particularly relate to a kind of construction method of public security thesaurus and the public security thesaurus of acquisition.
Background technology
The construction method of existing thesaurus mainly contains following two kinds:
1) manually thesaurus is built;
In the method, relation between the inner word of thesaurus is by language specialist Manual definition, its advantage is that simple effectively but the result that obtains of this method affects comparatively large by the subjective consciousness of people, and its dynamic modificability changed with language change development is poor.
2) study of internet large-scale corpus is constructed to the vector space of word;
The method carries out machine learning based on internet large-scale corpus, net result is more objective, and be easy to change with the change of language material, but account form is complicated, there is computing time long, the defect that computing velocity is slow, and be mainly used in internet, the public security database of relative closure cannot be applied in.
Summary of the invention
For the defect existed in above-mentioned prior art, it is short for computing time that technical matters to be solved by this invention is to provide one, and computing velocity is fast, and the construction method of the objective public security thesaurus of result of calculation.
In order to solve the problems of the technologies described above, the construction method of a kind of public security thesaurus structure provided by the present invention comprises the following steps:
First build a thesaurus, and build data element storehouse according to known data element, and set a matching degree threshold value, and each data element is divided into three types, these three kinds of data element types are respectively object class, feature class, representation class;
When needing when there being neologisms to insert, perform following steps:
1) title of neologisms, type and length is obtained, and neologisms are split according to object class, feature class, this three types of representation class, neologisms are divided into three participles, these three participles are respectively the first participle, the second participle, the 3rd participle, the first participle is wherein object class participle, second participle is feature class participle, and the 3rd participle is representation class participle;
2) from data element storehouse, the maximum object class data element of first participle occurrence number is found out, the feature class data element that the second participle occurrence number is maximum, and the representation class data element that the 3rd participle occurrence number is maximum;
3) calculate the matching degree of neologisms, specific formula for calculation is:
P=A×Q1+B×Q2+C×Q3;
A=X1/Xn,B=Y1/Yn,C=Z1/Zn;
Wherein, P is the matching degree of neologisms, A is the similarity in the first participle and data element storehouse, Q1 is object class weighted value, B is the similarity in the second participle and data element storehouse, and Q2 is character right weight values, and C is the similarity in the 3rd participle and data element storehouse, Q3 is representation class weighted value, and Q1, Q2, Q3 are the constant value preset;
Wherein, X1 is the maximal value of the single object class data element occurrence number of the first participle in data element storehouse, Xn is the total degree occurred in all object class data elements of the first participle in data element storehouse, Y1 is the maximal value of the single feature class data element occurrence number of the second participle in data element storehouse, Yn is the total degree occurred in all feature class data elements of the second participle in data element storehouse, Z1 is the maximal value of the single representation class data element occurrence number of the 3rd participle in data element storehouse, Zn is the total degree occurred in all representation class data elements of the 3rd participle in data element storehouse,
4) according to the matching degree of neologisms, neologisms are judged, if the matching degree of neologisms exceedes matching degree threshold value, then judge that neologisms are as synonym, and three data elements in neologisms and data element storehouse are set up synonymy, these three data elements are respectively: occur object class data element that first participle number of times is maximum, occur feature class data element that the second participle number of times is maximum, occur the representation class data element that the 3rd participle number of times is maximum;
5) by the synonymy of neologisms and three data elements stored in thesaurus.
Second aspect of the present invention, there is provided a kind of public security thesaurus installed said method and obtain.
The construction method of public security thesaurus provided by the invention, in available data unit and synon basis, calculate the matching degree of neologisms and available data unit, synonym is thought when matching degree exceedes setting value, and by synonymy stored in thesaurus, its account form is simple, has computing time short, computing velocity is fast, and the objective feature of result of calculation.
Embodiment
Below in conjunction with specific embodiment, technical scheme of the present invention is described in further detail; but the present embodiment is not limited to the present invention; every employing analog structure of the present invention and similar change thereof, all should list protection scope of the present invention in, the pause mark in the present invention all represent and relation.
The construction method of a kind of public security thesaurus that the embodiment of the present invention provides comprises the following steps:
First build a thesaurus, and build data element storehouse (such as at present Shanghai public security department had an appointment 800 data elements) according to known data element, and set a matching degree threshold value, and each data element is divided into three types, these three kinds of data element types are respectively object class, feature class, representation class;
When needing when there being neologisms to insert, perform following steps:
1) title of neologisms, type and length is obtained, and neologisms are split according to object class, feature class, this three types of representation class, neologisms are divided into three participles, these three participles are respectively the first participle, the second participle, the 3rd participle, the first participle is wherein object class participle, second participle is feature class participle, and the 3rd participle is representation class participle;
2) from data element storehouse, find out the maximum object class data element of first participle occurrence number, and the feature class data element that the second participle occurrence number is maximum, the representation class data element that the 3rd participle occurrence number is maximum;
3) calculate the matching degree of neologisms, specific formula for calculation is:
P=A×Q1+B×Q2+C×Q3;
A=X1/Xn,B=Y1/Yn,C=Z1/Zn;
Wherein, P is the matching degree of neologisms, A is the similarity in the first participle and data element storehouse, Q1 is object class weighted value, B is the similarity in the second participle and data element storehouse, and Q2 is character right weight values, and C is the similarity in the 3rd participle and data element storehouse, Q3 is representation class weighted value, and Q1, Q2, Q3 are the constant value preset;
Wherein, X1 is the maximal value of the single object class data element occurrence number of the first participle in data element storehouse, Xn is the total degree occurred in all object class data elements of the first participle in data element storehouse, Y1 is the maximal value of the single feature class data element occurrence number of the second participle in data element storehouse, Yn is the total degree occurred in all feature class data elements of the second participle in data element storehouse, Z1 is the maximal value of the single representation class data element occurrence number of the 3rd participle in data element storehouse, Zn is the total degree occurred in all representation class data elements of the 3rd participle in data element storehouse,
4) according to the matching degree of neologisms, neologisms are judged, if the matching degree of neologisms exceedes matching degree threshold value, then judge that neologisms are as synonym, and three data elements in neologisms and data element storehouse are set up synonymy, these three data elements are respectively: occur object class data element that first participle number of times is maximum, occur feature class data element that the second participle number of times is maximum, occur the representation class data element that the 3rd participle number of times is maximum;
5) by the synonymy of neologisms and three data elements stored in thesaurus.
The construction method of public security thesaurus provided by the invention, in available data unit and synon basis, calculate the matching degree of neologisms and available data unit, synonym is thought when matching degree exceedes setting value, and by synonymy stored in thesaurus, its account form is simple, has computing time short, computing velocity is fast, and the objective feature of result of calculation.
Claims (2)
1. a construction method for public security thesaurus, is characterized in that comprising the following steps:
First build a thesaurus, and build data element storehouse according to known data element, and set a matching degree threshold value, and each data element is divided into three types, these three kinds of data element types are respectively object class, feature class, representation class;
When needing when there being neologisms to insert, perform following steps:
1) title of neologisms, type and length is obtained, and neologisms are split according to object class, feature class, this three types of representation class, neologisms are divided into three participles, these three participles are respectively the first participle, the second participle, the 3rd participle, the first participle is wherein object class participle, second participle is feature class participle, and the 3rd participle is representation class participle;
2) from data element storehouse, find out the maximum object class data element of first participle occurrence number, and the feature class data element that the second participle occurrence number is maximum, the representation class data element that the 3rd participle occurrence number is maximum;
3) calculate the matching degree of neologisms, specific formula for calculation is:
P=A×Q1+B×Q2+C×Q3;
A=X1/Xn,B=Y1/Yn,C=Z1/Zn;
Wherein, P is the matching degree of neologisms, A is the similarity in the first participle and data element storehouse, Q1 is object class weighted value, B is the similarity in the second participle and data element storehouse, and Q2 is character right weight values, and C is the similarity in the 3rd participle and data element storehouse, Q3 is representation class weighted value, and Q1, Q2, Q3 are the constant value preset;
Wherein, X1 is the maximal value of the single object class data element occurrence number of the first participle in data element storehouse, Xn is the total degree occurred in all object class data elements of the first participle in data element storehouse, Y1 is the maximal value of the single feature class data element occurrence number of the second participle in data element storehouse, Yn is the total degree occurred in all feature class data elements of the second participle in data element storehouse, Z1 is the maximal value of the single representation class data element occurrence number of the 3rd participle in data element storehouse, Zn is the total degree occurred in all representation class data elements of the 3rd participle in data element storehouse,
4) according to the matching degree of neologisms, neologisms are judged, if the matching degree of neologisms exceedes matching degree threshold value, then judge that neologisms are as synonym, and three data elements in neologisms and data element storehouse are set up synonymy, these three data elements are respectively: occur object class data element that first participle number of times is maximum, occur feature class data element that the second participle number of times is maximum, occur the representation class data element that the 3rd participle number of times is maximum;
5) by the synonymy of neologisms and three data elements stored in thesaurus.
2. the public security thesaurus of construction method acquisition according to claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510190990.4A CN104765858A (en) | 2015-04-21 | 2015-04-21 | Construction method for public security synonym library and obtained public security synonym library |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510190990.4A CN104765858A (en) | 2015-04-21 | 2015-04-21 | Construction method for public security synonym library and obtained public security synonym library |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104765858A true CN104765858A (en) | 2015-07-08 |
Family
ID=53647686
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510190990.4A Pending CN104765858A (en) | 2015-04-21 | 2015-04-21 | Construction method for public security synonym library and obtained public security synonym library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104765858A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018095281A1 (en) * | 2016-11-25 | 2018-05-31 | 阿里巴巴集团控股有限公司 | Name matching method and apparatus |
CN110222266A (en) * | 2019-05-31 | 2019-09-10 | 江苏三六五网络股份有限公司 | A kind of house property profession phonetic searching system and method based on speech recognition |
CN113139657A (en) * | 2021-04-08 | 2021-07-20 | 北京泰豪智能工程有限公司 | Method and device for realizing machine thinking |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006323594A (en) * | 2005-05-18 | 2006-11-30 | Ntt Docomo Inc | Synonymous word extraction system and synonymous word extraction method |
CN101901325A (en) * | 2010-07-21 | 2010-12-01 | 赵步 | Copyright protection method |
CN102332137A (en) * | 2011-09-23 | 2012-01-25 | 纽海信息技术(上海)有限公司 | Goods matching method and system |
CN103455623A (en) * | 2013-09-12 | 2013-12-18 | 广东电子工业研究院有限公司 | Clustering mechanism capable of fusing multilingual literature |
CN103886093A (en) * | 2014-04-03 | 2014-06-25 | 江苏物联网研究发展中心 | Method for processing synonyms of electronic commerce search engine |
-
2015
- 2015-04-21 CN CN201510190990.4A patent/CN104765858A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006323594A (en) * | 2005-05-18 | 2006-11-30 | Ntt Docomo Inc | Synonymous word extraction system and synonymous word extraction method |
CN101901325A (en) * | 2010-07-21 | 2010-12-01 | 赵步 | Copyright protection method |
CN102332137A (en) * | 2011-09-23 | 2012-01-25 | 纽海信息技术(上海)有限公司 | Goods matching method and system |
CN103455623A (en) * | 2013-09-12 | 2013-12-18 | 广东电子工业研究院有限公司 | Clustering mechanism capable of fusing multilingual literature |
CN103886093A (en) * | 2014-04-03 | 2014-06-25 | 江苏物联网研究发展中心 | Method for processing synonyms of electronic commerce search engine |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018095281A1 (en) * | 2016-11-25 | 2018-05-31 | 阿里巴巴集团控股有限公司 | Name matching method and apparatus |
US10726028B2 (en) | 2016-11-25 | 2020-07-28 | Alibaba Group Holding Limited | Method and apparatus for matching names |
CN110222266A (en) * | 2019-05-31 | 2019-09-10 | 江苏三六五网络股份有限公司 | A kind of house property profession phonetic searching system and method based on speech recognition |
CN113139657A (en) * | 2021-04-08 | 2021-07-20 | 北京泰豪智能工程有限公司 | Method and device for realizing machine thinking |
CN113139657B (en) * | 2021-04-08 | 2024-03-29 | 北京泰豪智能工程有限公司 | Machine thinking realization method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2017260007A1 (en) | System and method for displaying search results for a trademark query in an interactive graphical representation | |
WO2016199160A3 (en) | Language processing and knowledge building system | |
MX368777B (en) | System and method for automatic product matching. | |
WO2014200724A3 (en) | Smart fill | |
MX2008014865A (en) | Method and apparatus for multilingual spelling corrections. | |
RS54253B1 (en) | Method for encoding and decoding images, encoding and decoding device, and corresponding computer programs | |
WO2014190220A3 (en) | Language model trained using predicted queries from statistical machine translation | |
WO2019140382A3 (en) | Probabilistic modeling system and method | |
CN104765858A (en) | Construction method for public security synonym library and obtained public security synonym library | |
CN104317965A (en) | Establishment method of emotion dictionary based on linguistic data | |
MX2018001255A (en) | System and method for the creation and use of visually- diverse high-quality dynamic visual data structures. | |
WO2015138466A3 (en) | Contextual real-time feedback for neuromorphic model development | |
WO2020026229A3 (en) | Proposition identification in natural language and usage thereof | |
AU2019268138A1 (en) | Determining digital value of a digital technology initiative | |
Wang | Product design prediction using integrated dynamic Kansei engineering scheme | |
Asim et al. | Analytic network process decision making algorithm | |
Kim | College Students' Attitudes toward World Englishes | |
Reyes-Salazar et al. | Seismic response of 3D steel buildings with hybrid connections: PRC and FRC | |
Yu et al. | A comparative study of word sense disambiguation of english modal verb by BP neural network and support vector machine | |
Pathiraja et al. | Model Uncertainty Quantification Methods For Data Assimilation In Partially Observed Multi-Scale Systems | |
Senevirathna | Meaning Revolution of Hindi Words. | |
로이 | Jeju 4.3: Planetary Consciousness and Psychosocial Processes for Social Healing and Reconciliation | |
WO2016109269A3 (en) | Simplified overlay ads | |
Li et al. | Actuator and sensor failure detection using direct approach | |
Chowdhury Mithun et al. | Weakly Supervised Video Moment Retrieval From Text Queries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150708 |