CN103646017A - Acronym generating system for naming and working method thereof - Google Patents

Acronym generating system for naming and working method thereof Download PDF

Info

Publication number
CN103646017A
CN103646017A CN201310673706.XA CN201310673706A CN103646017A CN 103646017 A CN103646017 A CN 103646017A CN 201310673706 A CN201310673706 A CN 201310673706A CN 103646017 A CN103646017 A CN 103646017A
Authority
CN
China
Prior art keywords
initialism
word
statement
character string
acronym
Prior art date
Application number
CN201310673706.XA
Other languages
Chinese (zh)
Other versions
CN103646017B (en
Inventor
王晓亮
张雪薇
陆桑璐
Original Assignee
南京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京大学 filed Critical 南京大学
Priority to CN201310673706.XA priority Critical patent/CN103646017B/en
Publication of CN103646017A publication Critical patent/CN103646017A/en
Application granted granted Critical
Publication of CN103646017B publication Critical patent/CN103646017B/en

Links

Abstract

The invention discloses an acronym generating system for naming and a working method thereof. The system is used for analyzing an input long character string and giving out a proper acronym name. The system comprises an input and output page and a backstage supporting service platform. The input and output page is used for inputting the long character string out of which an acronym is to be generated and outputting the acronym for naming. The backstage supporting service platform comprises a classification database, an acronym generating system and a recommendation system. Data which facilitate calculation of classification inclination of all words are stored in the classification database so that a backstage supporting program can analyze a statement input by a user and generate the acronym relevant to the lexeme and can be used for naming all fields. The acronym generating system for naming and a working method thereof are invented for changing the current situation so that the user can obtain the acronym which is similar to the lexeme of the original character string for carrying out naming.

Description

For initialism generation system and the method for work thereof of naming

Technical field:

The present invention relates to a kind of name generation system and method for work thereof, relate in particular to initialism generation system and method for work thereof for naming, it provides the initialism being consistent with the meaning of character string own for a character string being comprised of a plurality of words.

Background technology:

Initialism generation technique is an Auto creative, simulating human thinking is analyzed character string.Because every profession and trade, each field all need to make widely known good brand, so get the sound of reading aloud name suitable for reading and embodiment Professional Characteristics, just seem particularly important.So build one, for the initialism generation system of naming, in every profession and trade, each field, all there are huge application potential and bright prospects.

On internet, existing initialism generates and initialism inquiry system at present, and its naming rule adopts initial coupling substantially, and the first letter that extracts all words in this character string forms abbreviation name.But, how to generate the initialism relevant to character string meaning, not yet there is feasible solution.The mode that some initialism generation systems are voted by user sorts for the initialism generating, but because this mode of ballot needs user's cooperation, in the less character string of inquiry times, show and unsatisfactory.And the user's request of considering different field is different, the performance that the initialism that ballot mode generates is expressed aspect semantic at initialism is not fully up to expectations.

At the existing initialism generation system for naming, we utilize Computer Analysis to go out the semanteme of character string, by relevant matches rule, construct the initialism of close semanteme with it.By the such combination of inquiry background data base inspection, whether form word, if do not form word, add therein or delete some letters to reach the object that generates initialism.

Summary of the invention:

The invention provides a kind of initialism generation system and method for work thereof for naming, it enables to generate by changing existing initialism generation system the initialism matching with former character string semanteme.

The present invention adopts following technical scheme: a kind of initialism generation system for naming, and this system comprises the input and output page and background support service platform; Wherein, the input and output page is for inputting long character string and the initialism of output for naming of initialism to be generated; Background support service platform comprises:

Taxonomy database, taxonomy database is used back-office support program to set up, in taxonomy database, store the data of the classification tendency that calculates each word, under searching the statement of user's input, classify, and mate required initialism for the database under corresponding classification;

Initialism generation system, analyzes the semantic and affiliated classification of prototype statement, thereby in the subsequence of statement character string, finds out the initialism of semantic matches and provide sequence according to semantic degree of correlation by inquiry taxonomy database;

Commending system, initialism is not all under match condition, can judge the semanteme of read statement, and on the basis that does not affect statement semantics, revise the order of word in some word in statement or transposing statement, mate with word in taxonomy database again, make it to produce the initialism of coupling, and recommend user.

The present invention also adopts following technical scheme: a kind of method of work for the initialism generation system named, it comprises the steps:

1). input the long character string of initialism to be generated, and confirm to generate;

2). according to the long character string of above-mentioned input, extract each word in character string, and mate and calculate successively the type of each word with the word in taxonomy database;

3). judge the type that this character string belongs to, then preserve lower the type;

4). analyze and in long character string, have the initial of sincere word as fixing letter, and keep on this basis the former order of character and insert other letter in long character string, find out all possible initialism, and successively with step 3) in word in the corresponding database of type that finds out mate, if the match is successful, as candidate's initialism, preserve;

5). be the degree of correlation sequence by type of all candidate's initialisms, type degree of correlation is obtained in types of database;

6). in initialism output box, show sequence initialism afterwards, go to step 7), if cannot generate the initialism of correlation type, go to step 8);

7). carry out homing action, prepare initialism next time and generate;

8). enter initialism commending system, do not change statement semantics statement is modified, generate initialism, and the statement of revising is fed back to user.

The present invention has following beneficial effect:

(1) the present invention goes out the classification of input of character string by the classified calculating of word, mates each subsequence of input of character string under this classification, thereby reaches the semantic object close with input of character string of initialism;

(2) the present invention compares with existing initialism generation system, and it can improve the degree of correlation that generates name initialism and prototype statement greatly.

Accompanying drawing explanation:

Fig. 1 is the present invention for the structural drawing of the initialism generation system of naming.

Fig. 2 is initialism product process figure of the present invention.

Fig. 3 is taxonomy database product process figure of the present invention.

Embodiment:

Please refer to shown in Fig. 1 to Fig. 3, the present invention comprises the input and output page and back-office support program for the initialism generation system of naming.Wherein the input and output page comprises that input by sentence frame, name generate button, reset button, recommendation button and initialism output box.Back-office support program is divided into three parts: the generation of taxonomy database, the generation of initialism and commending system.The affiliated classification of statement that taxonomy database is inputted for searching user, and mate required initialism for the database under corresponding classification.To specifically introduce these three parts below:

(1) generation of taxonomy database

Taxonomy database of the present invention is used back-office support program to set up in advance, and the data of storage comprise the classification tendency of a large amount of words and each word.Set up taxonomy database and need a large amount of training texts.First we carry out word counting to all words that occur in training text, then calculates the importance of each word to each text-independent, finally uses the principle of cosine similarity to classify to text, thereby obtain the classification of word.By training text, produce taxonomy database, step is as follows:

A: analyze this part of text by handling procedure, obtain the number of times that each word occurs in each text, save as < word in preliminary date storehouse, the such form of file ID [occurrence number] >;

B: calculate the importance ti of this word to each file for each tuple in preliminary date storehouse, used TF-IDF(term frequency-inverse document frequency here) technology:

tf i , j = n i , j &Sigma; k n k , j

idf i = log | D | | { j : t i &Element; d j } |

tndf i,j=tf i,j×idf i

N in above formula i, jthat this word is at file d jin occurrence number, denominator is at file d jin the occurrence number sum of all words.| D| represents total number of files, | { j:t i∈ d j| the number of files that expression comprises word ti.

So calculating in preliminary date storehouse the degree of correlation between all words and file and depositing define database format in is < word, with file 1 degree of correlation, and with file 2 degrees of correlation ...., with file n degree of correlation >.

C: the degree of correlation of each word of define database described in B and file i is formed to a vector, calculate the cosine similarity between each file:

similarity = cos ( &theta; ) = A &CenterDot; B | | A | | | | B | | = &Sigma; i = 1 n A i &times; B i &Sigma; i = 1 n ( A i ) 2 &times; &Sigma; i = 1 n ( B i ) 2

D: be document classification according to the cosine similarity of calculating in C, then according to each word classification that is categorized as of file, so just obtained the degree of correlation of each word and a certain type, formed taxonomy database.

(2) generation of initialism

The generation of initialism is divided into two stages:

(A): the semanteme of analysis user input of character string:

First extract each word in user's read statement, by the taxonomy database of setting up in inquiry (1), analyze the classification of each word, thereby obtain and record the classification of statement.

(B): produce semantic relevant initialism sequence:

Acronym coupling:

The initialism of considering is often used the initial in prototype statement as far as possible, the database that first the present invention extracts under the initial composition initialism of user's read statement and classification of record in (A) mates, if the match is successful, this initialism is designated as best initialism.

On initial basis, insert other word matched in sentence:

The acronym probability that the match is successful is not very high.When it fails to match, we consider to choose the part word in prototype statement, by former order, are inserted among initial sequence, in the database under the classification of again recording in (A), mate.

Wherein initialism generating portion, is initiated by user, inputs a character string that comprises N word, clicks and generates after button, transfers to background process program, and step is as follows:

A. extract all word composition sequence a in input of character string, extract all character composition sequence b in input of character string;

B. according to a of sequence described in steps A, utilize the TF-IDF technique computes that early-stage preparations are partly mentioned to go out the affiliated classification of former character string;

C. according to classifying in step B, all words under the type database are taken out, form a Trie tree, so that the coupling of initialism;

D. for the b of sequence described in steps A, by summary of the invention part steps 4) middle requirement, travel through the subsequence that its all length is less than or equal to N+2, and search this subsequence in the tree of Trie described in step C, if find, preserve all results to sequence c;

E. according to degree relevant to type in taxonomy database, for sequence c sorts and prints to screen.If cannot generate the initialism of correlation type, or receive user's requirement, go to step G;

F. click reset button, prepare initialism next time and generate;

G. enter initialism commending system, do not change statement semantics statement is modified, generate initialism, and the statement of revising is fed back to user.

(3) commending system

If user is dissatisfied to the initialism producing in (2), by commending system, in the mode of modification language sentence, provide better initialism.This commending system can guarantee under the prerequisite that does not change the prototype statement meaning, statement to be modified.Revise after statement, again mate with the word in the taxonomy database recording in (2).Be below three kinds of feasible methods in native system, statement being revised:

A. the word in prototype statement is changed to order

This method is carried out the transposing of simple order to the word in prototype statement, can greatly guarantee the integrality of the prototype statement meaning.For example: extract the initial of prototype statement, carry out the transposing of order: in order to retain the meaning of prototype statement and to guarantee the correct of grammer, can consider to change and or the word of or connection.

B. in prototype statement, insert synonym word

When making in this way, can consider to use the conjunction of and or or and so on to guarantee the correct of grammer between synonym word.

C. in prototype statement, revise synonym or word of the same type

Utilize existing dictionary lookup synonym word to replace notional word in long character string.

Please refer to shown in Fig. 1 to Fig. 3, the present invention comprises the steps: for the method for work of the initialism generation system named

1). user inputs the long character string of initialism to be generated, and clicks generation button;

2). background process program, according to user's input, is extracted each word in character string, and mates and calculate successively the type of each word with the word in taxonomy database;

3). background process program is judged the type that this character string most possibly belongs to, and then preserves lower the type;

4). background process program is found out all subsequences of long character string, and successively with step 3) in word in the corresponding database of type that finds out mate, if the match is successful, as candidate's initialism, preserve;

5). be the degree of correlation sequence by type of all candidate's initialisms, type degree of correlation can be obtained in types of database;

6). in initialism output box, show sequence initialism afterwards, go to step 7), if cannot generate the initialism of correlation type, go to step 8);

7). click reset button, prepare initialism next time and generate;

8). enter initialism commending system, do not change statement semantics statement is modified, generate initialism, and the statement of revising is fed back to user.

The present invention passes through to change existing initialism generation system for initialism generation system and the method for work thereof named, enables to generate the initialism matching with former character string semanteme.

The above is only the preferred embodiment of the present invention, it should be pointed out that for those skilled in the art, can also make some improvement under the premise without departing from the principles of the invention, and these improvement also should be considered as protection scope of the present invention.

Claims (2)

1. the initialism generation system for naming, is characterized in that, this system comprises the input and output page and background support service platform; Wherein, the input and output page is for inputting long character string and the initialism of output for naming of initialism to be generated; Background support service platform comprises:
Taxonomy database, taxonomy database is used back-office support program to set up, in taxonomy database, store the data of the classification tendency that calculates each word, under searching the statement of user's input, classify, and mate required initialism for the database under corresponding classification;
Initialism generation system, analyzes the semantic and affiliated classification of prototype statement, thereby in the subsequence of statement character string, finds out the initialism of semantic matches and provide sequence according to semantic degree of correlation by inquiry taxonomy database;
Commending system, initialism is not all under match condition, by the semanteme of judgement read statement, and on the basis that does not affect statement semantics, revise the order of word in some word in statement or transposing statement, mate with word in taxonomy database again, make it to produce the initialism of coupling, and recommend user.
2. for a method of work for the initialism generation system named, it is characterized in that comprising the steps:
1). input the long character string of initialism to be generated, and confirm to generate;
2). according to the long character string of above-mentioned input, extract each word in character string, and mate and calculate successively the type of each word with the word in taxonomy database;
3). judge the type that this character string belongs to, then preserve lower the type;
4). analyze and in long character string, have the initial of sincere word as fixing letter, and keep on this basis the former order of character and insert other letter in long character string, find out all possible initialism, and successively with step 3) in word in the corresponding database of type that finds out mate, if the match is successful, as candidate's initialism, preserve;
5). be the degree of correlation sequence by type of all candidate's initialisms, type degree of correlation is obtained in types of database;
6). in initialism output box, show sequence initialism afterwards, go to step 7), if cannot generate the initialism of correlation type, go to step 8);
7). carry out homing action, prepare initialism next time and generate;
8). enter initialism commending system, do not change statement semantics statement is modified, generate initialism, and the statement of revising is fed back to user.
CN201310673706.XA 2013-12-11 2013-12-11 Acronym generating system for naming and working method thereof CN103646017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310673706.XA CN103646017B (en) 2013-12-11 2013-12-11 Acronym generating system for naming and working method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310673706.XA CN103646017B (en) 2013-12-11 2013-12-11 Acronym generating system for naming and working method thereof

Publications (2)

Publication Number Publication Date
CN103646017A true CN103646017A (en) 2014-03-19
CN103646017B CN103646017B (en) 2017-01-04

Family

ID=50251236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310673706.XA CN103646017B (en) 2013-12-11 2013-12-11 Acronym generating system for naming and working method thereof

Country Status (1)

Country Link
CN (1) CN103646017B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10380247B2 (en) 2016-10-28 2019-08-13 Microsoft Technology Licensing, Llc Language-based acronym generation for strings

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1983271A (en) * 2005-12-16 2007-06-20 国际商业机器公司 System and method for defining and translating chat abbreviations
CN101365012A (en) * 2008-10-06 2009-02-11 深圳华为通信技术有限公司 Abbreviation operating method and hand-hold communication terminal
CN101599075A (en) * 2009-07-02 2009-12-09 清华大学 Chinese abbreviation disposal route and device
WO2012047214A2 (en) * 2010-10-06 2012-04-12 Virtuoz, Sa Visual display of semantic information
CN102902660A (en) * 2011-07-26 2013-01-30 苗玉水 Holographic Chinese information processing method by Quanpin and Jianpin of Chinese phonetic codes
CN103020164A (en) * 2012-11-26 2013-04-03 华北电力大学 Semantic search method based on multi-semantic analysis and personalized sequencing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1983271A (en) * 2005-12-16 2007-06-20 国际商业机器公司 System and method for defining and translating chat abbreviations
CN101365012A (en) * 2008-10-06 2009-02-11 深圳华为通信技术有限公司 Abbreviation operating method and hand-hold communication terminal
CN101599075A (en) * 2009-07-02 2009-12-09 清华大学 Chinese abbreviation disposal route and device
WO2012047214A2 (en) * 2010-10-06 2012-04-12 Virtuoz, Sa Visual display of semantic information
CN102902660A (en) * 2011-07-26 2013-01-30 苗玉水 Holographic Chinese information processing method by Quanpin and Jianpin of Chinese phonetic codes
CN103020164A (en) * 2012-11-26 2013-04-03 华北电力大学 Semantic search method based on multi-semantic analysis and personalized sequencing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10380247B2 (en) 2016-10-28 2019-08-13 Microsoft Technology Licensing, Llc Language-based acronym generation for strings

Also Published As

Publication number Publication date
CN103646017B (en) 2017-01-04

Similar Documents

Publication Publication Date Title
Neculoiu et al. Learning text similarity with siamese recurrent networks
Kenter et al. Short text similarity with word embeddings
Yih et al. Semantic parsing for single-relation question answering
US9268766B2 (en) Phrase-based data classification system
Das et al. Frame-semantic parsing
Li et al. Recursive deep models for discourse parsing
Yin et al. Ranking relevance in yahoo search
Gu et al. " What Parts of Your Apps are Loved by Users?"(T)
Kolomiyets et al. A survey on question answering technology from an information retrieval perspective
US8577898B2 (en) System and method for rating a written document
Amolik et al. Twitter sentiment analysis of movie reviews using machine learning techniques
Karimi et al. Machine transliteration survey
McCallum Information extraction: Distilling structured data from unstructured text
Biber et al. The Cambridge handbook of English corpus linguistics
Şeker et al. Initial explorations on using CRFs for Turkish named entity recognition
Read et al. Weakly supervised techniques for domain-independent sentiment classification
Chen et al. Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing
CN103049435B (en) Text fine granularity sentiment analysis method and device
US9305083B2 (en) Author disambiguation
CN105393263A (en) Feature completion in computer-human interactive learning
Leetaru Data mining methods for the content analyst: An introduction to the computational analysis of content
Chang et al. Latent Dirichlet learning for document summarization
Demir et al. Improving named entity recognition for morphologically rich languages using word embeddings
Lau et al. Automatic domain ontology extraction for context-sensitive opinion mining
Stamatatos et al. Overview of the pan/clef 2015 evaluation lab

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
GR01 Patent grant
C14 Grant of patent or utility model