CN110263184A - A kind of data processing method and relevant device - Google Patents
A kind of data processing method and relevant device Download PDFInfo
- Publication number
- CN110263184A CN110263184A CN201910540408.0A CN201910540408A CN110263184A CN 110263184 A CN110263184 A CN 110263184A CN 201910540408 A CN201910540408 A CN 201910540408A CN 110263184 A CN110263184 A CN 110263184A
- Authority
- CN
- China
- Prior art keywords
- term
- concept
- result
- vocabulary
- preferred
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the present application provides a kind of data processing method and relevant device, it may be implemented to change automatically updating for concept in the vocabulary of new version source in integrated term system, it improves integrated term system version and updates efficiency, time cost is greatly saved, shortens the time difference between integrated term system concept upgrading and source vocabulary edition upgrading.This method comprises: source vocabulary to be updated is registered, to obtain target source vocabulary;Determine that the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet include the term of allocation identification and the concept of allocation identification;The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, the object matching result with the determination target source vocabulary relative to first source vocabulary;The vocabulary data in the integrated term system are updated according to the object matching result and preset rules.
Description
Technical field
This application involves data processing field, in particular to a kind of data processing method and relevant device.
Background technique
The terms such as synonym table, classification chart, coded system, thesaurus, ontology, knowledge mapping system is retouched in information resources
The power of state, organize, manage, find etc. has obtained books and information group, natural language processing, medical informatics etc.
Related fields is widely recognized as.In the past few decades, due to the establishment of all kinds of term systems in each field and the main face of development
It is needed to a certain specific tasks and application environment, thus the semantic relation between conceptualization, conceptual type, concept attribute and concept
Equal curriculum offerings, data structure, storage format etc. are also different, seriously limit the calculating using different term systems
Communication between machine application program, and then limit the interoperability between different information resource systems and shared utilization.In difference
It interoperates between term system, is easy to use between the computer applied algorithm of different term systems and carries out accessible understanding
And dialogue, it has also become break the limitation core technology.Integrated term system building is realize different term interoperability of system one
Kind, by gathering together several term system registries of a certain specific topic domain, using term as basic unit, with concept
It is to rely on original relationship of source vocabulary for core, passes through the term merger to identical concept is characterized in separate sources vocabulary
It links together, forms new synonymous phrase or quasi-synonym group, and recommend source term new out as the preferred shape of concept
Formula;Separate sources vocabulary semantic association is realized based on the concept formed after merger.Such integrated term system is also referred to as come more
Source word network system has constituted the information infrastructure of various information resource intercommunication mutual trust.
Present term system update technical research is carried out mainly around in single uniterm system, is related to term, concept, attribute
It is updated with relationship, the relevant technologies have unknown word identification, term are deleted, synonym expands etc..More about integrated term system
Newly, be concentrated mainly on: 1) new sources vocabulary expands, and a new vocabulary is passed through the sides such as format conversion, Lexical Similarity calculating
Formula is added in existing integrated term system;2) problem is corrected, and is found by the inconsistent inspection of relationship hidden in integrated term system
Hiding problem is simultaneously corrected.The update of existing source vocabulary also rely primarily on manual type progress, and be concentrated mainly on term and
Concept level carries out term to source vocabulary, concept additions and deletions change operation.But artificial regeneration mode is relied on, consider source vocabulary
Quantity and scale, time and economic cost are higher, are unable to satisfy efficiency and benefit needs.
Summary of the invention
The embodiment of the present application provides a kind of data processing method and relevant device, may be implemented new in integrated term system
Change automatically updating for concept in the vocabulary of version source, improves integrated term system version and update efficiency, when being greatly saved
Between cost, shorten the time difference between integrated term system concept upgrading and source vocabulary edition upgrading.
The embodiment of the present application first aspect provides a kind of data processing method, is applied to integrated term system, the collection
It include at least one source vocabulary at term system characterized by comprising
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet includes allocation identification
The concept of term and allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target
Source vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system
Source corresponding with target source vocabulary vocabulary in system;
The vocabulary data in the integrated term system are carried out more according to the object matching result and preset rules
Newly.
Optionally, described to match the target vocabulary data with the first vocabulary data of the first source vocabulary, with
Determine object matching result of the target source vocabulary relative to first source vocabulary, comprising:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot
Fruit, the term matching result include: newly-increased term result, do not change term result and/or delete term as a result, the target
Term is any one term in the target vocabulary data;
It is according to the term matching result that the first concept synonym collection and the concept in the first vocabulary data is same
Adopted set of words carries out string matching, to determine the matching result of concept synonym collection, of the concept synonym collection
Include: newly-increased conceptual result with result, the non-result of variations of concept synonym collection, delete conceptual result and/or concept synonym
Gather result of variations, the first concept synset is combined into any one concept synset in the target vocabulary data
It closes;
According to the matching result of the term matching result and the concept synonym collection by the preferred art of the first concept
The preferred term of concept in language and the first vocabulary data carries out string matching, to determine the matching knot of the preferred term of concept
Fruit, the matching result of the preferred term of concept include: the non-result of variations of the preferred term of concept and/or the preferred term variation of concept
As a result, the preferred term of the first concept is the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred art of the concept
The matching result of language belongs to the object matching result.
Optionally, it is described according to the object matching result and preset rules to the vocabulary in the integrated term system
Data, which are updated, includes:
When the term matching result is the deletion term result, the corresponding art of the deletion term result is obtained
Language;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion concept knot is obtained
The corresponding concept synonym collection of fruit;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
Optionally, it is described according to the object matching result and preset rules to the vocabulary in the integrated term system
Data, which are updated, includes:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is described
When concept synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system,
First term is any one term at least one described term;
If so, by first term and at least one described term with first term synonym each other
Term imports the second concept synonym collection.
Optionally, when the second concept synonym collection is not present in the integrated term system, the method is also
Include:
When the matching result of the preferred term of concept term result of variations preferred for the concept, obtain described integrated
N number of concept synonym collection of concept preferred term variation in term system, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
It, will be described when the matching result of the preferred term of the concept does not change result for the preferred term of the concept
First term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not occur
The concept synonym collection to match in the corresponding concept synonym collection of result of variations with first term.
Optionally, it is described according to the object matching result and preset rules to the vocabulary in the integrated term system
Data, which are updated, includes:
When the term matching result be the newly-increased term as a result, and the matching result of the concept synonym collection be
When the newly-increased conceptual result, it is similar to M of the preferably term of M in the integrated term system to calculate the first preferred term
Degree, wherein preferred term of the first preferred term in the concept synonym collection where first term, and it is described
Concept synonym collection belonging to first preferred term belongs to identical with concept synonym collection belonging to the M preferably term
Field, M >=2;
It will be mutual with first term in first term and at least one described term according to the M similarity
It is imported for the term of synonym.
Optionally, it is described according to the M similarity by first term and at least one described term with institute
State the first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by first term and at least one described term with first term synonym each other
Term imports the corresponding concept synonym collection of the maximum similarity;
If it is not, it is same to be then determined as new concept by first term and with the term of first term synonym each other
Adopted set of words.
Optionally, the target vocabulary data packet in the determination target source vocabulary includes:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain
State target vocabulary data.
The embodiment of the present application second aspect provides a kind of data processing equipment, is applied to integrated term system, the collection
It include at least one source vocabulary at term system, comprising:
Registering unit, for registering source vocabulary to be updated, to obtain target source vocabulary;
Determination unit, for determining the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet
Include the term of allocation identification and the concept of allocation identification;
Matching unit, for the target vocabulary data to be matched with the first vocabulary data of the first source vocabulary,
With the determination target source vocabulary relative to the object matching of first source vocabulary as a result, first source vocabulary is
Source corresponding with target source vocabulary vocabulary in the integrated term system;
Updating unit, for according to the object matching result and preset rules to the word in the integrated term system
Table data are updated.
Optionally, the matching unit is specifically used for:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot
Fruit, the term matching result include: newly-increased term result, do not change term result and/or delete term as a result, the target
Term is any one term in the target vocabulary data;
It is according to the term matching result that the first concept synonym collection and the concept in the first vocabulary data is same
Adopted set of words carries out string matching, to determine the matching result of concept synonym collection, of the concept synonym collection
Include: newly-increased conceptual result with result, the non-result of variations of concept synonym collection, delete conceptual result and/or concept synonym
Gather result of variations, the first concept synset is combined into any one concept synset in the target vocabulary data
It closes;
According to the matching result of the term matching result and the concept synonym collection by the preferred art of the first concept
The preferred term of concept in language and the first vocabulary data carries out string matching, to determine the matching knot of the preferred term of concept
Fruit, the matching result of the preferred term of concept include: the non-result of variations of the preferred term of concept and/or the preferred term variation of concept
As a result, the preferred term of the first concept is the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred art of the concept
The matching result of language belongs to the object matching result.
Optionally, the updating unit is specifically used for:
When the term matching result is the deletion term result, the corresponding art of the deletion term result is obtained
Language;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion concept knot is obtained
The corresponding concept synonym collection of fruit;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
Optionally, the updating unit also particularly useful for:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is described
When concept synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system,
First term is any one term at least one described term;
If so, by first term and at least one described term with first term synonym each other
Term imports the second concept synonym collection.
Optionally, the updating unit also particularly useful for:
When the integrated term system be not present the second concept synonym collection, and work as the preferred term of the concept
Matching result when being the preferred term result of variations of the concept, obtain the preferred term variation of concept in the integrated term system
N number of concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
It, will be described when the matching result of the preferred term of the concept does not change result for the preferred term of the concept
First term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not occur
The concept synonym collection to match in the corresponding concept synonym collection of result of variations with first term.
Optionally, the updating unit is specifically used for:
When the term matching result be the newly-increased term as a result, and the matching result of the concept synonym collection be
When the newly-increased conceptual result, it is similar to M of the preferably term of M in the integrated term system to calculate the first preferred term
Degree, wherein preferred term of the first preferred term in the concept synonym collection where first term, and it is described
Concept synonym collection belonging to first preferred term belongs to identical with concept synonym collection belonging to the M preferably term
Field, M >=2;
It will be mutual with first term in first term and at least one described term according to the M similarity
It is imported for the term of synonym.
Optionally, the updating unit is according to the M similarity by first term and at least one described art
In language with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by first term and at least one described term with first term synonym each other
Term imports the corresponding concept synonym collection of the maximum similarity;
If it is not, it is same to be then determined as new concept by first term and with the term of first term synonym each other
Adopted set of words.
Optionally, the determination unit is specifically used for:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain
State target vocabulary data.
A kind of computer readable storage medium of the embodiment of the present application third aspect, which is characterized in that including instruction, when described
When computer readable storage medium is run on computers, so that computer executes the data processing method of above-mentioned various aspects
The step of.
The embodiment of the present application fourth aspect provides a kind of computer program product comprising instruction, includes instruction when described
Computer program product when running on computers so that computer executes the step of the data processing method of above-mentioned various aspects
Suddenly.
In view of the foregoing it is apparent that can be by the term and concept in the target source vocabulary and the first source vocabulary
It is matched to obtain object matching as a result, and carrying out more the vocabulary data in integrated term system according to the object matching result
Newly, can follow integrated term system it is both regular on the basis of, realize in integrated term system in the vocabulary of new version source
Variation concept automatically updates, and improves integrated term system version and updates efficiency, time cost is greatly saved, and shortens integrated
Time difference between the upgrading of term system concept and source vocabulary edition upgrading.
Detailed description of the invention
Fig. 1 is one embodiment schematic diagram of data processing method provided by the embodiments of the present application;
Fig. 2 is the virtual architecture schematic diagram of data processing equipment provided by the embodiments of the present application;
Fig. 3 is the hardware structural diagram of server provided by the embodiments of the present application.
Specific embodiment
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way can be interchanged under appropriate result, so that the embodiments described herein can be in addition to illustrating herein
Or the sequence other than the content of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that
Cover it is non-exclusive include, for example, containing the process, method, system, product or equipment of a series of steps or units need not limit
In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce
The other step or units of product or equipment inherently.
Data processing method provided by the embodiments of the present application is illustrated from the angle of data processing equipment below, the number
It can be server according to processing unit, or the service unit in server, specifically without limitation.
The term in the embodiment of the present application is illustrated first:
Term system: it is used for organizational information, promotes the various types of concepts or terminology of information management.
Integrated term system: using term as basic unit, using concept as core, with original relationship of source vocabulary be according to
Support establishes the word network system for realizing separate sources vocabulary semantic association by concept.
Concept: it is the class of one group of thought, object composition, is the basic element of term system.Generally by a term or one
Group synonym description.
Term: word or phrase for flag concept.
Synonym: in general context, the meaning of a word is identical and two or more terms that morphology is different.
It is preferred that term: under general context, the higher term of frequency of use in one group of synonym of identical concept is characterized, and
The formal terms of concept are expressed.
Not preferred term: not preferred term of other synonyms as concept of identical concept is characterized.
Source vocabulary: can be used in constructing all source term systems of integrated term system, including ontology, thesaurus,
Classification chart, synonym table, dictionary, coding, keyword, user search word etc..
Source term: the term in the vocabulary of source is closely related with source vocabulary.
Source concept: the concept in a certain source vocabulary is closely related with source vocabulary.
Source term identifies (Identification, ID): source vocabulary assigns the unique identifier of term in this vocabulary.
With permanent and uniqueness.
Source concept ID: source vocabulary assigns the unique identifier of concept in this vocabulary.With permanent and uniqueness.
Referring to Fig. 1, Fig. 1 is the embodiment schematic diagram of data processing method provided by the embodiments of the present application, this method is answered
For integrating term system, which includes at least one source vocabulary, this method comprises:
101, source vocabulary to be updated is registered, to obtain target source vocabulary.
In the present embodiment, data processing equipment can be using multiple versions in a source vocabulary as the same source word
Watch series, and the source vocabulary of each version is stored in as an independent term system source word for integrating term system
In Table storehouse, and a unique identifier is distributed to the different editions of separate sources vocabulary by GetSourceCode function
SourceIDCode.Unique identifier is made of two parts: " source vocabulary sequence code " and " version serial number ".Same source system
" the source vocabulary sequence code " of the different editions of column is identical, and " version serial number " presses enrollment time sequence, from small to large, each not phase
Together.That is, data processing equipment can obtain source vocabulary to be updated first, pass through GetSourceCode letter later
It is several that source vocabulary to be updated is registered, come with obtaining the target of distribution source vocabulary sequential coding and version serial number
Source vocabulary.
It should be noted that distributing one to separate sources vocabulary different editions above by GetSourceCode function
A unique identifier SourceIDCode can also also have other modes for target source vocabulary point certainly by way of example only
With unique identifier, specifically without limitation.
102, the target vocabulary data in the vocabulary of target source are determined.
In the present embodiment, data processing equipment can extract the vocabulary data in the vocabulary of target source, and call identifier
Partition function is that the term and concept in vocabulary data distinguish allocation identification symbol, to obtain target vocabulary data.Specifically, data
Processing unit can extract the total data of target source vocabulary, the term in the vocabulary of target source by vocabulary import wizard
When unique identifier unallocated with concept, calling source term unique identifier partition function GetSourceTermID is target
Term allocation source term ID in the vocabulary of source, and call source concept unique identifier partition function
GetSourceConceptID is that the concept in the vocabulary of target source distributes source concept ID.
It should be noted that above-mentioned described calling source term unique identifier partition function GetSourceTermID
With the term and concept point that source concept unique identifier partition function GetSourceConceptID is in the vocabulary of target source
With source term ID and source concept ID, by way of example only, can also also there are other methods of salary distribution certainly, not limit specifically
It is fixed.
103, target vocabulary data are matched with the first vocabulary data of the first source vocabulary, to determine target source
Object matching result of the vocabulary relative to the first source vocabulary.
In the present embodiment, data processing equipment can be by the first vocabulary data of target vocabulary data and the first source vocabulary
It is matched, to determine object matching result of the target source vocabulary relative to the first source vocabulary, wherein the first source word
Table is source corresponding with target source vocabulary vocabulary in integrated term system.It is understood that the first source vocabulary with
The target source vocabulary is the source vocabulary with a series of different editions, and the version of the target source vocabulary is new version.
It should be noted that the object matching result includes the matching result of term matching result, concept synonym collection
And the matching result of the preferred term of concept, below to how carrying out string matching and obtain object matching result to be illustrated:
Step A, the term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot
Fruit, term matching result include: newly-increased term result, do not change term result and/or delete term as a result, target terms are mesh
Mark any one term in vocabulary data.
In this step, data processing equipment can carry out character to the term in target source vocabulary and the first source vocabulary
The accurate matching of string, determines term matching result, specifically, data processing equipment can pass through
CompareSourceTermString function is accurately matched based on character string to the term in target terms and the first source vocabulary
Matched, obtain three kinds of possible matching results: newly-increased term result, do not change term result and/or delete term as a result,
Three kinds of possible matching results are illustrated below:
1, it increases the corresponding term New term of term result newly, is i.e. still target source word is not present in the first source vocabulary
Term present in table.
2, the corresponding term Unchanged term of term result, i.e. the first source vocabulary and target source vocabulary are not changed
In simultaneous term;
3, the corresponding term Deleted term of term result is deleted, is i.e. there is still target source in the first source vocabulary
The term being not present in vocabulary.After the term matching result for obtaining target source vocabulary and the first source vocabulary, and will
The matching result is output in file CompareSourceTermResult table.Wherein, should
CompareSourceTermResult table is equipped with metadata: SourceIDCode (source vocabulary ID), SourceTermID (come
Source vocabulary term ID), TermString (term character string), TermEdit (term change operation).SourceIDCode value
For the unique identifier of target source vocabulary, SourceTermID value is the source term ID in the vocabulary of target source,
The character string of TermString value target terms, TermEdit value are three kinds of possible matching results of term matching result:
New, Unchanged and/or Deleted.
Step B, according to term matching result that the first concept synonym collection is synonymous with the concept in the first vocabulary data
Set of words carries out string matching, to determine the matching result of concept synonym collection, the matching result of concept synonym collection
Include: newly-increased conceptual result, the non-result of variations of concept synonym collection, delete conceptual result and/or the change of concept synonym collection
Change as a result, the first concept synset is combined into any one concept synonym collection in target word table data.
In this step, data processing equipment can pass through in conjunction with term matching result
CompareSourceConceptTermString function in the vocabulary of target source concept synonym collection and the first source
Concept synonym collection in vocabulary carries out character string and accurately matches that (matching herein, being will be any in the vocabulary of target source
Term in one concept synonym collection is carried out with the term in the concept synonym collection in the first source vocabulary pair respectively
Than obtaining situation of change of the term relative to the term in the first source vocabulary in the vocabulary of target source, the concept synonym
Again include the term of at least two synonyms each other in set, then concept synset can be determined by the situation of change of term
The situation of change of conjunction), the matching result of concept synonym collection is obtained, the matching result of the concept synonym collection includes newly-increased
Conceptual result, deletes conceptual result and/or concept synonym collection result of variations at the non-result of variations of concept synonym collection, under
Face is illustrated respectively:
1, it increases conceptual result New concept newly, is i.e. goes out in no appearance but target source vocabulary in the first source vocabulary
Existing concept, that is to say, that the term in the corresponding synonym collection of concept of target source vocabulary is newly-increased term;
2, the non-result of variations Unchanged Synonym of concept synonym collection comes in the first source vocabulary and target
Simultaneous concept in the vocabulary of source, that is to say, that the term in the corresponding synonym collection of concept of target source vocabulary is
Do not change term;
3, conceptual result Deleted concept is deleted, is i.e. have in the first source vocabulary but is not had in the vocabulary of target source
Some concepts, that is to say, that the term in the corresponding synonym collection of concept of the first source vocabulary is deletion term;
4, concept synonym collection result of variations Changed Synonym, i.e., with respect to the concept in the first source vocabulary, mesh
It marks that the term in the corresponding synonym collection of concept of source vocabulary is existing not to change term, and containing " new terminology " or " deletes
Term ".The matching result of the concept synonym collection is output in file CompareSourceConceptResult table,
In, the CompareSourceConceptResult table is equipped with metadata: SourceIDCode (source vocabulary ID),
SourceConceptID (source concept ID), SourceTermID (source vocabulary term ID), TermStrings (term character
String), TermEdit (nomenclature more operates), ConceptSynonymEdit (concept change operation), wherein SourceIDCode
Value is the unique identifier of target source vocabulary, and it is in the vocabulary of target source that SourceConceptID value, which is in step 102,
Concept distribution source concept ID in source concept ID, it is target source word that SourceTermID value, which is in step 102,
The source term ID of term allocation in table, the character string of TermStrings target terms, TermEdit value are target terms
The SourceTermID corresponding CompareSourceTermResult table of term matching result in step in value,
ConcepSynonymtEdit value is the matching result of the concept synonym in step B: New, Unchanged, Changed
And/or Deleted.
Step C, according to the matching result of term matching result and concept synonym collection by the first preferred term of concept
String matching is carried out with the preferred term of concept in the first vocabulary data, to determine the matching result of the preferred term of concept, generally
The matching result for reading preferred term includes: the non-result of variations of the preferred term of concept and/or the preferred term result of variations of concept, and first
The preferred term of concept is the preferred term of any one concept in target vocabulary data.
In this step, in conjunction with the matching result of above-mentioned term matching result and concept synonym collection, pass through
CompareSourceConcept PreferredTerm function is by the preferred term of concept and first in the vocabulary of target source
The preferred term of concept in the vocabulary of source carries out character string and accurately matches, and obtains of the preferred term of concept in the vocabulary of target source
With result: the non-result of variations of the preferred term of concept and/or the preferred term result of variations of concept, that is to say, that target can be carried out
The preferred term of the concept of source vocabulary is compared respectively at the preferred term of concept in the first source vocabulary, preferred to obtain concept
The matching result of term, is illustrated separately below:
1, the non-result of variations Unchanged PreferredTerm of the preferred term of concept, i.e., above-mentioned three genus synonym
The preferred term of set does not change, three genus include newly-increased concept, concept synonym collection do not change and
Concept synonym collection changes;
2, the preferred term of concept changes result Changed PreferredTerm, i.e., above-mentioned three genus synonym
The preferred term of set changes, which includes that newly-increased concept, concept synonym collection do not change and generally
Synonym collection is read to change.
The non-result of variations of the preferred term of concept and the preferred term of the concept result that changes are output to file
In CompareSourcePreferredTerm table, wherein the CompareSourceConceptResult table is equipped with metadata:
SourceIDCode (source vocabulary ID), SourceConceptID (source concept ID), PreferredTermEdit (preferably art
Language change operation), PreferredTermID (preferably term ID).SourceIDCode (source vocabulary ID) value target source
The unique identifier of vocabulary, SourceConceptID are the source concept ID in step B.Concept synonym collection does not change knot
Two matching results obtained in fruit and concept synonym collection result of variations PreferredTermEdit step 2:
Unchanged or Changed, the newly-increased corresponding PreferredTermEdit of concept unify value New.PreferredTermID
The corresponding SourceTermID of the preferred term of concept in the vocabulary of value target source.
104, the vocabulary data in integrated term system are updated according to object matching result and preset rules.
It, can be according to object matching result and preset rules to the vocabulary data in integrated term system in the present embodiment
It is updated.
In one embodiment, according to object matching result and preset rules to the vocabulary data in integrated term system
It is updated and includes:
When term matching result is to delete term result, obtains and delete the corresponding term of term result;
The corresponding term of term result will be deleted to delete;
When the matching result of concept synonym collection is to delete conceptual result, obtains and delete the corresponding concept of conceptual result
Synonym collection;
The corresponding concept synonym collection of conceptual result will be deleted to delete.
That is, delete CompareSourceTermResult table in TermEdit value be Deleted term and
(association attributes and relationship be the ID of the term, the ID of source vocabulary belonging to the term and should for its association attributes and relationship
The corresponding concept of term etc. relationship), while deleting in CompareSourceConceptResult table
The term and its association attributes and relationship that ConcepSynonymtEdit value is Deleted.
In one embodiment, according to object matching result and preset rules to the vocabulary data in integrated term system
It is updated and includes:
When term matching result is newly-increased term as a result, and the matching result of concept synonym collection is concept synset
When closing result of variations, newly-increased at least one corresponding term of term result is obtained;
Judge there is the second concept synonym collection to match with the first term, the first term in integrated term system
For any one term at least one term;
If so, by the first term and at least one term and described in the first term each other the term importing of synonym
Second concept synonym collection.
That is, when term matching result is that (the first term exists newly-increased term result
The value of TermEdit is NEW in CompareSourceTermResult), and the matching result of concept synonym collection is general
Synonym collection result of variations is read (where the first term in the CompareSourcePreferredTerm table of source concept
ConceptSynonymEdit value is<Changed>) when, judge that the first term whether there is in integrated term system and matches
The second concept synonym collection, if then will be synonymous each other with first term in first term and at least one term
The term of word imports the second concept synonym collection.
In one embodiment, when the second concept synonym collection, and the preferred art of concept is not present in integrated term system
When the matching result of language is concept preferred term result of variations, the N number of of the preferred term variation of concept in integrated term system is obtained
Concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in the first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to similarity;
When the matching result of the preferred term of concept does not change result for the preferred term of concept, the first term is imported
First concept synonym collection, it is same that the first concept synset is combined into the preferred term of the concept corresponding concept of result that do not change
The concept synonym collection to match in adopted set of words with the first term.
That is, judging the preferred term of the concept when the second concept synonym collection is not present in integrated term system
Matching result whether be the preferred term result of variations of concept, when the preferred term of the non-concept of matching result of the preferred term of the concept
Result of variations (the CompareSourcePreferredTerm table of the preferred term of the concept synonym collection where the first term
The value of middle PreferredTermEdit is<Changed>) when, obtain the N of the preferred term variation of concept in integrated term system
A concept synonym collection, and it is excellent by Dice Coefficient Algorithm the first term of calculating and the concept in N number of concept synonym collection
Select the similarity of term (can also also determine that similarity does not represent herein by way of example only in other way certainly
Restriction to it), and by the first N number of concept synonym collection of input importing, (importing herein, can be by first according to similarity
In concept synonym collection where the term importing maximum preferred term of similarity);When the matching knot of the preferred term of the concept
The preferred term of the non-concept of fruit do not change result (the preferred term of the concept synonym collection where the first term
The value of PreferredTermEdit is<Unchanged>in CompareSourcePreferredTerm table) when, it can incite somebody to action
First term imports in the first concept synonym collection, which is combined into and the first term synonym each other
The corresponding concept synonym collection of the preferred term of concept, and the first concept synonym collection is contained in the preferred term of concept not
The corresponding concept synonym collection of the result that changes.
In one embodiment, according to object matching result and preset rules to the vocabulary data in integrated term system
It is updated and includes:
When term matching result is newly-increased term as a result, and the matching result of concept synonym collection is newly-increased conceptual result
When, M similarity of M preferred terms in the first preferred term and integrated term system is calculated, wherein the first preferred term
For the preferred term in the concept synonym collection where the first term, and concept synonym collection belonging to the first preferred term
Belong to same area, M >=2 with concept synonym collection belonging to M preferably term;
According to M similarity by the art in the first term and at least one term with first term synonym each other
Language imports.
It should be noted that the M in the first preferred term and integrated term system can be calculated by Dice algorithm herein
M similarity of a preferred term, and will be mutual with the first term in the first term and at least one term according to M similarity
It is imported for the term of synonym.
It in one embodiment, will be mutual with the first term in the first term and at least one term according to M similarity
Include: for the term importing of synonym
Judge whether the maximum similarity in M similarity is greater than preset threshold;
If so, maximum by being imported in the first term and at least one term with the term of the first term synonym each other
The corresponding concept synonym collection of similarity;
If it is not, being then determined as new concept synset by the first term and with the term of the first term synonym each other
It closes.
That is, a threshold value of setting can be preset herein, and whether judge the maximum similarity in M similarity
Greater than the preset threshold, it is when the maximum similarity is greater than the preset threshold, then by first term and at least one term
In imported in the corresponding concept synonym collection of maximum similarity with the term of the first term synonym each other, when the maximum is similar
When degree is less than the preset threshold, then using first term and with the first term each other synonym term as one it is new should
Concept synonym collection.
It should be noted that in above-mentioned each update mode, by following strategy to target source in integrated term system
The preferred term that vocabulary is related to concept (assuming that the preferred term of the design concept is X) is updated:
If concept X is newly-increased concept, again according to the integrated preferred term proposed algorithm of the existing concept of term system
Recommend preferred term;
If concept X is not newly-increased concept, and the preferred term of concept X does not change because of updating, that
Keep original preferred term constant;
If concept X is not new concept, and preferably term is deleted because this is updated, then according to integrated term system
The existing preferred term proposed algorithm of concept recommends preferred term again.
It should be noted that in the vocabulary of target source term and concept updating complete after, can be according to collection
At term system to the source attribute of concept and the inheritance principles of relationship, according to source term ID and source concept ID by target come
Concept attribute in the vocabulary of source and relationship are inherited into integrated term system, are not herein limited specifically and how to be inherited.
In view of the foregoing it is apparent that can be by the term and concept in the target source vocabulary and the first source vocabulary
It is matched to obtain object matching as a result, and carrying out more the vocabulary data in integrated term system according to the object matching result
Newly, can follow integrated term system it is both regular on the basis of, realize in integrated term system in the vocabulary of new version source
Variation concept automatically updates, and improves integrated term system version and updates efficiency, time cost is greatly saved, and shortens integrated
Time difference between the upgrading of term system concept and source vocabulary edition upgrading.
Data processing method provided by the embodiments of the present application is illustrated above, below to provided by the embodiments of the present application
Data processing equipment is illustrated.
Referring to Fig. 2, Fig. 2 is the structural schematic diagram of data processing equipment provided by the embodiments of the present application, the data processing
Device is applied to integrated term system, and the integrated term system includes at least one source vocabulary, comprising:
Registering unit 201, for registering source vocabulary to be updated, to obtain target source vocabulary;
Determination unit 202, for determining the target vocabulary data in the vocabulary of the target source, the target vocabulary data
Term including allocation identification and the concept of allocation identification;
Matching unit 203, for carrying out the first vocabulary data of the target vocabulary data and the first source vocabulary
Match, with the determination target source vocabulary relative to first source vocabulary object matching as a result, first source word
Table is source corresponding with target source vocabulary vocabulary in the integrated term system;
Updating unit 204 is used for according to the object matching result and preset rules in the integrated term system
Vocabulary data be updated.
Optionally, the matching unit 203 is specifically used for:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot
Fruit, the term matching result include: newly-increased term result, do not change term result and/or delete term as a result, the target
Term is any one term in the target vocabulary data;
It is according to the term matching result that the first concept synonym collection and the concept in the first vocabulary data is same
Adopted set of words carries out string matching, to determine the matching result of concept synonym collection, of the concept synonym collection
Include: newly-increased conceptual result with result, the non-result of variations of concept synonym collection, delete conceptual result and/or concept synonym
Gather result of variations, the first concept synset is combined into any one concept synset in the target vocabulary data
It closes;
According to the matching result of the term matching result and the concept synonym collection by the preferred art of the first concept
The preferred term of concept in language and the first vocabulary data carries out string matching, to determine the matching knot of the preferred term of concept
Fruit, the matching result of the preferred term of concept include: the non-result of variations of the preferred term of concept and/or the preferred term variation of concept
As a result, the preferred term of the first concept is the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred art of the concept
The matching result of language belongs to the object matching result.
Optionally, the updating unit 204 is specifically used for:
When the term matching result is the deletion term result, the corresponding art of the deletion term result is obtained
Language;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion concept knot is obtained
The corresponding concept synonym collection of fruit;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
Optionally, the updating unit 204 also particularly useful for:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is described
When concept synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system,
First term is any one term at least one described term;
If so, by first term and at least one described term with first term synonym each other
Term imports the second concept synonym collection.
Optionally, the updating unit 204 also particularly useful for:
When the integrated term system be not present the second concept synonym collection, and work as the preferred term of the concept
Matching result when being the preferred term result of variations of the concept, obtain the preferred term variation of concept in the integrated term system
N number of concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
It, will be described when the matching result of the preferred term of the concept does not change result for the preferred term of the concept
First term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not occur
The concept synonym collection to match in the corresponding concept synonym collection of result of variations with first term.
Optionally, the updating unit 204 is specifically used for:
When the term matching result be the newly-increased term as a result, and the matching result of the concept synonym collection be
When the newly-increased conceptual result, it is similar to M of the preferably term of M in the integrated term system to calculate the first preferred term
Degree, wherein preferred term of the first preferred term in the concept synonym collection where first term, and it is described
Concept synonym collection belonging to first preferred term belongs to identical with concept synonym collection belonging to the M preferably term
Field, M >=2;
It will be mutual with first term in first term and at least one described term according to the M similarity
It is imported for the term of synonym.
Optionally, the updating unit 204 according to the M similarity by first term and it is described at least one
In term with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by first term and at least one described term with first term synonym each other
Term imports the corresponding concept synonym collection of the maximum similarity;
If it is not, it is same to be then determined as new concept by first term and with the term of first term synonym each other
Adopted set of words.
Optionally, the determination unit 202 is specifically used for:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain
State target vocabulary data.
Interactive mode between each unit of data processing equipment in the present embodiment is as illustrated in the foregoing fig. 1 in embodiment
Description, specific details are not described herein again.
In view of the foregoing it is apparent that can be by the term and concept in the target source vocabulary and the first source vocabulary
It is matched to obtain object matching as a result, and carrying out more the vocabulary data in integrated term system according to the object matching result
Newly, can follow integrated term system it is both regular on the basis of, realize in integrated term system in the vocabulary of new version source
Variation concept automatically updates, and improves integrated term system version and updates efficiency, time cost is greatly saved, and shortens integrated
Time difference between the upgrading of term system concept and source vocabulary edition upgrading.
Referring to Fig. 3, Fig. 3 is a kind of structural schematic diagram of server provided by the embodiments of the present application, which can
Bigger difference is generated because configuration or performance are different, may include one or more central processing units (central
Processing units, CPU) 322 (for example, one or more processors) and memory 332, one or more
Store the storage medium 330 (such as one or more mass memory units) of application program 342 or data 344.Wherein, it deposits
Reservoir 332 and storage medium 330 can be of short duration storage or persistent storage.The program for being stored in storage medium 330 may include
One or more modules (diagram does not mark), each module may include to the series of instructions operation in server.More
Further, central processing unit 322 can be set to communicate with storage medium 330, execute storage medium on server 300
Series of instructions operation in 330.
Server 300 can also include one or more power supplys 326, one or more wired or wireless networks
Interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341, such as
Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by data processing equipment can be based on the server architecture shown in Fig. 3 in above-described embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
The embodiment of the present application also provides a kind of storage mediums, are stored thereon with program, when which is executed by processor
Realize the data processing method.
The embodiment of the present application also provides a kind of processor, the processor is for running program, wherein described program fortune
The data processing method is executed when row.
The embodiment of the present application also provides a kind of equipment, equipment includes processor, memory and stores on a memory simultaneously
The program that can be run on a processor, processor perform the steps of when executing program
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet includes allocation identification
The concept of term and allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target
Source vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system
Source corresponding with target source vocabulary vocabulary in system;
The vocabulary data in the integrated term system are carried out more according to the object matching result and preset rules
Newly.
In the specific implementation process, any embodiment party in the corresponding embodiment of Fig. 1 may be implemented when processor executes program
Formula.
Equipment herein can be server, PC, PAD, mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, execute following step
It is rapid:
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet includes allocation identification
The concept of term and allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target
Source vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system
Source corresponding with target source vocabulary vocabulary in system;
The vocabulary data in the integrated term system are carried out more according to the object matching result and preset rules
Newly.
In the specific implementation process, any reality in the corresponding embodiment of Fig. 1 may be implemented when executing computer program product
Apply mode.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application be referring to the method for the embodiment of the present application, equipment (system) and computer program product flow chart and/
Or block diagram describes.It should be understood that each process that can be realized by computer program instructions in flowchart and/or the block diagram and/
Or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer program instructions
To general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices processor to generate one
A machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing
The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or
Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer
Readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.Under the result not limited more, the element that is limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (18)
1. a kind of data processing method is applied to integrated term system, the integrated term system includes at least one source word
Table characterized by comprising
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine that the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet include the term of allocation identification
And the concept of allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target source
Vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system in
Source corresponding with target source vocabulary vocabulary;
The vocabulary data in the integrated term system are updated according to the object matching result and preset rules.
2. the method according to claim 1, wherein described by the target vocabulary data and the first source vocabulary
The first vocabulary data matched, the object matching with the determination target source vocabulary relative to first source vocabulary
As a result, comprising:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching result, institute
Term matching result is stated to include: newly-increased term result, do not change term result and/or delete term as a result, the target terms
For any one term in the target vocabulary data;
According to the term matching result by the concept synonym in the first concept synonym collection and the first vocabulary data
Set carries out string matching, to determine the matching result of concept synonym collection, the matching knot of the concept synonym collection
Fruit includes: newly-increased conceptual result, the non-result of variations of concept synonym collection, deletes conceptual result and/or concept synonym collection
Result of variations, the first concept synset are combined into any one concept synonym collection in the target vocabulary data;
According to the matching result of the term matching result and the concept synonym collection by the preferred term of the first concept with
The preferred term of concept in the first vocabulary data carries out string matching, to determine the matching result of the preferred term of concept,
The matching result of the preferred term of concept includes: the non-result of variations of the preferred term of concept and/or the preferred term variation knot of concept
Fruit, the preferred term of the first concept are the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred term of the concept
Matching result belongs to the object matching result.
3. according to the method described in claim 2, it is characterized in that, described according to the object matching result and preset rules
Vocabulary data in the integrated term system are updated and include:
When the term matching result is the deletion term result, the corresponding term of the deletion term result is obtained;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion conceptual result pair is obtained
The concept synonym collection answered;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
4. according to the method described in claim 2, it is characterized in that, described according to the object matching result and preset rules
Vocabulary data in the integrated term system are updated and include:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is the concept
When synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, it is described
First term is any one term at least one described term;
If so, by the term in first term and at least one described term with first term synonym each other
Import the second concept synonym collection.
5. according to the method described in claim 4, it is characterized in that, when general there is no described second in the integrated term system
When reading synonym collection, the method also includes:
When the matching result of the preferred term of concept term result of variations preferred for the concept, the integrated term is obtained
N number of concept synonym collection of concept preferred term variation in system, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
When the matching result of the preferred term of the concept does not change result for the preferred term of the concept, by described first
Term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not change
As a result the concept synonym collection to match in corresponding concept synonym collection with first term.
6. according to the method described in claim 4, it is characterized in that, described according to the object matching result and preset rules
Vocabulary data in the integrated term system are updated and include:
When the term matching result is the newly-increased term as a result, and the matching result of the concept synonym collection is described
When newly-increased conceptual result, M similarity of M preferred terms in the first preferred term and the integrated term system is calculated,
The wherein preferred term in concept synonym collection of the described first preferred term where first term, and described first
It is preferred that concept synonym collection belonging to concept synonym collection belonging to term and the M preferred terms belongs to identical neck
Domain, M >=2;
It will be same each other with first term in first term and at least one described term according to the M similarity
The term of adopted word imports.
7. according to the method described in claim 6, it is characterized in that, it is described according to the M similarity by first term
And at least one described term with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by the term in first term and at least one described term with first term synonym each other
Import the corresponding concept synonym collection of the maximum similarity;
If it is not, being then determined as new concept synonym by first term and with the term of first term synonym each other
Set.
8. method according to any one of claim 1 to 7, which is characterized in that the determination target source vocabulary
In target vocabulary data packet include:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain the mesh
Mark vocabulary data.
9. a kind of data processing equipment is applied to integrated term system, the integrated term system includes at least one source word
Table characterized by comprising
Registering unit, for registering source vocabulary to be updated, to obtain target source vocabulary;
Determination unit, for determining that the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet include
The concept of the term of allocation identification and allocation identification;
Matching unit, for matching the target vocabulary data with the first vocabulary data of the first source vocabulary, with true
Target source vocabulary is determined relative to the object matching of first source vocabulary as a result, first source vocabulary is described
Source corresponding with target source vocabulary vocabulary in integrated term system;
Updating unit, for according to the object matching result and preset rules to the vocabulary number in the integrated term system
According to being updated.
10. device according to claim 9, which is characterized in that the matching unit is specifically used for:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching result, institute
Term matching result is stated to include: newly-increased term result, do not change term result and/or delete term as a result, the target terms
For any one term in the target vocabulary data;
According to the term matching result by the concept synonym in the first concept synonym collection and the first vocabulary data
Set carries out string matching, to determine the matching result of concept synonym collection, the matching knot of the concept synonym collection
Fruit includes: newly-increased conceptual result, the non-result of variations of concept synonym collection, deletes conceptual result and/or concept synonym collection
Result of variations, the first concept synset are combined into any one concept synonym collection in the target vocabulary data;
According to the matching result of the term matching result and the concept synonym collection by the preferred term of the first concept with
The preferred term of concept in the first vocabulary data carries out string matching, to determine the matching result of the preferred term of concept,
The matching result of the preferred term of concept includes: the non-result of variations of the preferred term of concept and/or the preferred term variation knot of concept
Fruit, the preferred term of the first concept are the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred term of the concept
Matching result belongs to the object matching result.
11. device according to claim 10, which is characterized in that the updating unit is specifically used for:
When the term matching result is the deletion term result, the corresponding term of the deletion term result is obtained;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion conceptual result pair is obtained
The concept synonym collection answered;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
12. device according to claim 10, which is characterized in that the updating unit also particularly useful for:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is the concept
When synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, it is described
First term is any one term at least one described term;
If so, by the term in first term and at least one described term with first term synonym each other
Import the second concept synonym collection.
13. device according to claim 12, which is characterized in that the updating unit also particularly useful for:
When the integrated term system be not present the second concept synonym collection, and work as the preferred term of the concept
When term result of variations preferred for the concept with result, the N of the preferred term variation of concept in the integrated term system is obtained
A concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
When the matching result of the preferred term of the concept does not change result for the preferred term of the concept, by described first
Term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not change
As a result the concept synonym collection to match in corresponding concept synonym collection with first term.
14. device according to claim 12, which is characterized in that the updating unit is specifically used for:
When the term matching result is the newly-increased term as a result, and the matching result of the concept synonym collection is described
When newly-increased conceptual result, M similarity of M preferred terms in the first preferred term and the integrated term system is calculated,
The wherein preferred term in concept synonym collection of the described first preferred term where first term, and described first
It is preferred that concept synonym collection belonging to concept synonym collection belonging to term and the M preferred terms belongs to identical neck
Domain, M >=2;
It will be same each other with first term in first term and at least one described term according to the M similarity
The term of adopted word imports.
15. device according to claim 14, which is characterized in that the updating unit is according to the M similarity by institute
State in the first term and at least one described term with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by the term in first term and at least one described term with first term synonym each other
Import the corresponding concept synonym collection of the maximum similarity;
If it is not, being then determined as new concept synonym by first term and with the term of first term synonym each other
Set.
16. device according to any one of claims 9 to 15, which is characterized in that the determination unit is specifically used for:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain the mesh
Mark vocabulary data.
17. a kind of computer readable storage medium, which is characterized in that including instruction, when the computer readable storage medium exists
When being run on computer, so that the step of data processing method described in any one of computer perform claim requirement 1 to 8.
18. a kind of computer program product comprising instruction, when the computer program product comprising instruction on computers
When operation, so that the step of computer executes data processing method described in any one of the claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910540408.0A CN110263184A (en) | 2019-06-20 | 2019-06-20 | A kind of data processing method and relevant device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910540408.0A CN110263184A (en) | 2019-06-20 | 2019-06-20 | A kind of data processing method and relevant device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110263184A true CN110263184A (en) | 2019-09-20 |
Family
ID=67920096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910540408.0A Pending CN110263184A (en) | 2019-06-20 | 2019-06-20 | A kind of data processing method and relevant device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263184A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704639A (en) * | 2019-09-30 | 2020-01-17 | 中国医学科学院医学信息研究所 | Abbreviation document generation method and device |
CN112765136A (en) * | 2021-04-07 | 2021-05-07 | 浙江太美医疗科技股份有限公司 | Storage method, upgrading method and device of medical coding dictionary |
CN113221543A (en) * | 2021-05-07 | 2021-08-06 | 中国医学科学院医学信息研究所 | Medical term integration method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0241565A (en) * | 1988-08-01 | 1990-02-09 | Nippon Telegr & Teleph Corp <Ntt> | Thesaurus update supporting device |
CN101030157A (en) * | 2007-04-20 | 2007-09-05 | 北京搜狗科技发展有限公司 | Method and system for updating user vocabulary synchronouslly |
CN103324704A (en) * | 2013-06-17 | 2013-09-25 | 深圳先进技术研究院 | Method and system for dynamically updating knowledge base |
CN103823879A (en) * | 2014-02-28 | 2014-05-28 | 中国科学院计算技术研究所 | Method and system for automatically updating knowledge base oriented to online encyclopedia |
CN105930478A (en) * | 2016-05-03 | 2016-09-07 | 福州市勘测院 | Element object spatial information fingerprint-based spatial data change capture method |
CN107220326A (en) * | 2017-05-23 | 2017-09-29 | 至本医疗科技(上海)有限公司 | The information updating method and system of a kind of biomedical knowledge base |
CN109145171A (en) * | 2018-07-23 | 2019-01-04 | 广州市城市规划勘测设计研究院 | A kind of multiple dimensioned map data updating method |
-
2019
- 2019-06-20 CN CN201910540408.0A patent/CN110263184A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0241565A (en) * | 1988-08-01 | 1990-02-09 | Nippon Telegr & Teleph Corp <Ntt> | Thesaurus update supporting device |
CN101030157A (en) * | 2007-04-20 | 2007-09-05 | 北京搜狗科技发展有限公司 | Method and system for updating user vocabulary synchronouslly |
CN103324704A (en) * | 2013-06-17 | 2013-09-25 | 深圳先进技术研究院 | Method and system for dynamically updating knowledge base |
CN103823879A (en) * | 2014-02-28 | 2014-05-28 | 中国科学院计算技术研究所 | Method and system for automatically updating knowledge base oriented to online encyclopedia |
CN105930478A (en) * | 2016-05-03 | 2016-09-07 | 福州市勘测院 | Element object spatial information fingerprint-based spatial data change capture method |
CN107220326A (en) * | 2017-05-23 | 2017-09-29 | 至本医疗科技(上海)有限公司 | The information updating method and system of a kind of biomedical knowledge base |
CN109145171A (en) * | 2018-07-23 | 2019-01-04 | 广州市城市规划勘测设计研究院 | A kind of multiple dimensioned map data updating method |
Non-Patent Citations (1)
Title |
---|
孙海霞等: "科技知识组织体系语义互操作网络协同工作平台设计与实现", 《农业图书情报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704639A (en) * | 2019-09-30 | 2020-01-17 | 中国医学科学院医学信息研究所 | Abbreviation document generation method and device |
CN112765136A (en) * | 2021-04-07 | 2021-05-07 | 浙江太美医疗科技股份有限公司 | Storage method, upgrading method and device of medical coding dictionary |
CN113221543A (en) * | 2021-05-07 | 2021-08-06 | 中国医学科学院医学信息研究所 | Medical term integration method and system |
CN113221543B (en) * | 2021-05-07 | 2023-10-10 | 中国医学科学院医学信息研究所 | Medical term integration method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109559234B (en) | Block chain state data storage method, equipment and storage medium | |
US11216509B2 (en) | Dynamic faceting for personalized search and discovery | |
US20200012733A1 (en) | Multi-dimensional knowledge index and application thereof | |
US11106719B2 (en) | Heuristic dimension reduction in metadata modeling | |
US20110029571A1 (en) | Query Optimization Over Graph Data Streams | |
US10915532B2 (en) | Supporting a join operation against multiple NoSQL databases | |
CN110263184A (en) | A kind of data processing method and relevant device | |
US10346496B2 (en) | Information category obtaining method and apparatus | |
JP2018515844A (en) | Data processing method and system | |
US11250719B2 (en) | Generating and rating user assessments | |
CN110866029B (en) | sql statement construction method, device, server and readable storage medium | |
JP2023546040A (en) | Data processing methods, devices, electronic devices, and computer programs | |
CN110019298A (en) | Data processing method and device | |
US11386090B2 (en) | Defining attribute feature vectors for matching data entities | |
US11303530B2 (en) | Ranking of asset tags | |
US11397715B2 (en) | Defining indexing fields for matching data entities | |
CN110968776A (en) | Policy knowledge recommendation method, device storage medium and processor | |
US11847121B2 (en) | Compound predicate query statement transformation | |
US20220058195A1 (en) | Index selection for database query | |
TWI844931B (en) | Boosting classification and regression tree performance with dimension reduction | |
US20220342887A1 (en) | Predictive query processing | |
US11556558B2 (en) | Insight expansion in smart data retention systems | |
CN108256694A (en) | Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm | |
Lu et al. | Genderpredictor: a method to predict gender of customers from e-commerce website | |
US12039273B2 (en) | Feature vector generation for probabalistic matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |