CN110263184A - A kind of data processing method and relevant device - Google Patents

A kind of data processing method and relevant device Download PDF

Info

Publication number
CN110263184A
CN110263184A CN201910540408.0A CN201910540408A CN110263184A CN 110263184 A CN110263184 A CN 110263184A CN 201910540408 A CN201910540408 A CN 201910540408A CN 110263184 A CN110263184 A CN 110263184A
Authority
CN
China
Prior art keywords
term
concept
result
vocabulary
preferred
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910540408.0A
Other languages
Chinese (zh)
Inventor
孙海霞
钱庆
邓盼盼
李姣
沈柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Medical Information CAMS
Original Assignee
Institute of Medical Information CAMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Medical Information CAMS filed Critical Institute of Medical Information CAMS
Priority to CN201910540408.0A priority Critical patent/CN110263184A/en
Publication of CN110263184A publication Critical patent/CN110263184A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the present application provides a kind of data processing method and relevant device, it may be implemented to change automatically updating for concept in the vocabulary of new version source in integrated term system, it improves integrated term system version and updates efficiency, time cost is greatly saved, shortens the time difference between integrated term system concept upgrading and source vocabulary edition upgrading.This method comprises: source vocabulary to be updated is registered, to obtain target source vocabulary;Determine that the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet include the term of allocation identification and the concept of allocation identification;The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, the object matching result with the determination target source vocabulary relative to first source vocabulary;The vocabulary data in the integrated term system are updated according to the object matching result and preset rules.

Description

A kind of data processing method and relevant device
Technical field
This application involves data processing field, in particular to a kind of data processing method and relevant device.
Background technique
The terms such as synonym table, classification chart, coded system, thesaurus, ontology, knowledge mapping system is retouched in information resources The power of state, organize, manage, find etc. has obtained books and information group, natural language processing, medical informatics etc. Related fields is widely recognized as.In the past few decades, due to the establishment of all kinds of term systems in each field and the main face of development It is needed to a certain specific tasks and application environment, thus the semantic relation between conceptualization, conceptual type, concept attribute and concept Equal curriculum offerings, data structure, storage format etc. are also different, seriously limit the calculating using different term systems Communication between machine application program, and then limit the interoperability between different information resource systems and shared utilization.In difference It interoperates between term system, is easy to use between the computer applied algorithm of different term systems and carries out accessible understanding And dialogue, it has also become break the limitation core technology.Integrated term system building is realize different term interoperability of system one Kind, by gathering together several term system registries of a certain specific topic domain, using term as basic unit, with concept It is to rely on original relationship of source vocabulary for core, passes through the term merger to identical concept is characterized in separate sources vocabulary It links together, forms new synonymous phrase or quasi-synonym group, and recommend source term new out as the preferred shape of concept Formula;Separate sources vocabulary semantic association is realized based on the concept formed after merger.Such integrated term system is also referred to as come more Source word network system has constituted the information infrastructure of various information resource intercommunication mutual trust.
Present term system update technical research is carried out mainly around in single uniterm system, is related to term, concept, attribute It is updated with relationship, the relevant technologies have unknown word identification, term are deleted, synonym expands etc..More about integrated term system Newly, be concentrated mainly on: 1) new sources vocabulary expands, and a new vocabulary is passed through the sides such as format conversion, Lexical Similarity calculating Formula is added in existing integrated term system;2) problem is corrected, and is found by the inconsistent inspection of relationship hidden in integrated term system Hiding problem is simultaneously corrected.The update of existing source vocabulary also rely primarily on manual type progress, and be concentrated mainly on term and Concept level carries out term to source vocabulary, concept additions and deletions change operation.But artificial regeneration mode is relied on, consider source vocabulary Quantity and scale, time and economic cost are higher, are unable to satisfy efficiency and benefit needs.
Summary of the invention
The embodiment of the present application provides a kind of data processing method and relevant device, may be implemented new in integrated term system Change automatically updating for concept in the vocabulary of version source, improves integrated term system version and update efficiency, when being greatly saved Between cost, shorten the time difference between integrated term system concept upgrading and source vocabulary edition upgrading.
The embodiment of the present application first aspect provides a kind of data processing method, is applied to integrated term system, the collection It include at least one source vocabulary at term system characterized by comprising
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet includes allocation identification The concept of term and allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target Source vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system Source corresponding with target source vocabulary vocabulary in system;
The vocabulary data in the integrated term system are carried out more according to the object matching result and preset rules Newly.
Optionally, described to match the target vocabulary data with the first vocabulary data of the first source vocabulary, with Determine object matching result of the target source vocabulary relative to first source vocabulary, comprising:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot Fruit, the term matching result include: newly-increased term result, do not change term result and/or delete term as a result, the target Term is any one term in the target vocabulary data;
It is according to the term matching result that the first concept synonym collection and the concept in the first vocabulary data is same Adopted set of words carries out string matching, to determine the matching result of concept synonym collection, of the concept synonym collection Include: newly-increased conceptual result with result, the non-result of variations of concept synonym collection, delete conceptual result and/or concept synonym Gather result of variations, the first concept synset is combined into any one concept synset in the target vocabulary data It closes;
According to the matching result of the term matching result and the concept synonym collection by the preferred art of the first concept The preferred term of concept in language and the first vocabulary data carries out string matching, to determine the matching knot of the preferred term of concept Fruit, the matching result of the preferred term of concept include: the non-result of variations of the preferred term of concept and/or the preferred term variation of concept As a result, the preferred term of the first concept is the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred art of the concept The matching result of language belongs to the object matching result.
Optionally, it is described according to the object matching result and preset rules to the vocabulary in the integrated term system Data, which are updated, includes:
When the term matching result is the deletion term result, the corresponding art of the deletion term result is obtained Language;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion concept knot is obtained The corresponding concept synonym collection of fruit;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
Optionally, it is described according to the object matching result and preset rules to the vocabulary in the integrated term system Data, which are updated, includes:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is described When concept synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, First term is any one term at least one described term;
If so, by first term and at least one described term with first term synonym each other Term imports the second concept synonym collection.
Optionally, when the second concept synonym collection is not present in the integrated term system, the method is also Include:
When the matching result of the preferred term of concept term result of variations preferred for the concept, obtain described integrated N number of concept synonym collection of concept preferred term variation in term system, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
It, will be described when the matching result of the preferred term of the concept does not change result for the preferred term of the concept First term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not occur The concept synonym collection to match in the corresponding concept synonym collection of result of variations with first term.
Optionally, it is described according to the object matching result and preset rules to the vocabulary in the integrated term system Data, which are updated, includes:
When the term matching result be the newly-increased term as a result, and the matching result of the concept synonym collection be When the newly-increased conceptual result, it is similar to M of the preferably term of M in the integrated term system to calculate the first preferred term Degree, wherein preferred term of the first preferred term in the concept synonym collection where first term, and it is described Concept synonym collection belonging to first preferred term belongs to identical with concept synonym collection belonging to the M preferably term Field, M >=2;
It will be mutual with first term in first term and at least one described term according to the M similarity It is imported for the term of synonym.
Optionally, it is described according to the M similarity by first term and at least one described term with institute State the first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by first term and at least one described term with first term synonym each other Term imports the corresponding concept synonym collection of the maximum similarity;
If it is not, it is same to be then determined as new concept by first term and with the term of first term synonym each other Adopted set of words.
Optionally, the target vocabulary data packet in the determination target source vocabulary includes:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain State target vocabulary data.
The embodiment of the present application second aspect provides a kind of data processing equipment, is applied to integrated term system, the collection It include at least one source vocabulary at term system, comprising:
Registering unit, for registering source vocabulary to be updated, to obtain target source vocabulary;
Determination unit, for determining the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet Include the term of allocation identification and the concept of allocation identification;
Matching unit, for the target vocabulary data to be matched with the first vocabulary data of the first source vocabulary, With the determination target source vocabulary relative to the object matching of first source vocabulary as a result, first source vocabulary is Source corresponding with target source vocabulary vocabulary in the integrated term system;
Updating unit, for according to the object matching result and preset rules to the word in the integrated term system Table data are updated.
Optionally, the matching unit is specifically used for:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot Fruit, the term matching result include: newly-increased term result, do not change term result and/or delete term as a result, the target Term is any one term in the target vocabulary data;
It is according to the term matching result that the first concept synonym collection and the concept in the first vocabulary data is same Adopted set of words carries out string matching, to determine the matching result of concept synonym collection, of the concept synonym collection Include: newly-increased conceptual result with result, the non-result of variations of concept synonym collection, delete conceptual result and/or concept synonym Gather result of variations, the first concept synset is combined into any one concept synset in the target vocabulary data It closes;
According to the matching result of the term matching result and the concept synonym collection by the preferred art of the first concept The preferred term of concept in language and the first vocabulary data carries out string matching, to determine the matching knot of the preferred term of concept Fruit, the matching result of the preferred term of concept include: the non-result of variations of the preferred term of concept and/or the preferred term variation of concept As a result, the preferred term of the first concept is the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred art of the concept The matching result of language belongs to the object matching result.
Optionally, the updating unit is specifically used for:
When the term matching result is the deletion term result, the corresponding art of the deletion term result is obtained Language;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion concept knot is obtained The corresponding concept synonym collection of fruit;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
Optionally, the updating unit also particularly useful for:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is described When concept synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, First term is any one term at least one described term;
If so, by first term and at least one described term with first term synonym each other Term imports the second concept synonym collection.
Optionally, the updating unit also particularly useful for:
When the integrated term system be not present the second concept synonym collection, and work as the preferred term of the concept Matching result when being the preferred term result of variations of the concept, obtain the preferred term variation of concept in the integrated term system N number of concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
It, will be described when the matching result of the preferred term of the concept does not change result for the preferred term of the concept First term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not occur The concept synonym collection to match in the corresponding concept synonym collection of result of variations with first term.
Optionally, the updating unit is specifically used for:
When the term matching result be the newly-increased term as a result, and the matching result of the concept synonym collection be When the newly-increased conceptual result, it is similar to M of the preferably term of M in the integrated term system to calculate the first preferred term Degree, wherein preferred term of the first preferred term in the concept synonym collection where first term, and it is described Concept synonym collection belonging to first preferred term belongs to identical with concept synonym collection belonging to the M preferably term Field, M >=2;
It will be mutual with first term in first term and at least one described term according to the M similarity It is imported for the term of synonym.
Optionally, the updating unit is according to the M similarity by first term and at least one described art In language with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by first term and at least one described term with first term synonym each other Term imports the corresponding concept synonym collection of the maximum similarity;
If it is not, it is same to be then determined as new concept by first term and with the term of first term synonym each other Adopted set of words.
Optionally, the determination unit is specifically used for:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain State target vocabulary data.
A kind of computer readable storage medium of the embodiment of the present application third aspect, which is characterized in that including instruction, when described When computer readable storage medium is run on computers, so that computer executes the data processing method of above-mentioned various aspects The step of.
The embodiment of the present application fourth aspect provides a kind of computer program product comprising instruction, includes instruction when described Computer program product when running on computers so that computer executes the step of the data processing method of above-mentioned various aspects Suddenly.
In view of the foregoing it is apparent that can be by the term and concept in the target source vocabulary and the first source vocabulary It is matched to obtain object matching as a result, and carrying out more the vocabulary data in integrated term system according to the object matching result Newly, can follow integrated term system it is both regular on the basis of, realize in integrated term system in the vocabulary of new version source Variation concept automatically updates, and improves integrated term system version and updates efficiency, time cost is greatly saved, and shortens integrated Time difference between the upgrading of term system concept and source vocabulary edition upgrading.
Detailed description of the invention
Fig. 1 is one embodiment schematic diagram of data processing method provided by the embodiments of the present application;
Fig. 2 is the virtual architecture schematic diagram of data processing equipment provided by the embodiments of the present application;
Fig. 3 is the hardware structural diagram of server provided by the embodiments of the present application.
Specific embodiment
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way can be interchanged under appropriate result, so that the embodiments described herein can be in addition to illustrating herein Or the sequence other than the content of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that Cover it is non-exclusive include, for example, containing the process, method, system, product or equipment of a series of steps or units need not limit In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce The other step or units of product or equipment inherently.
Data processing method provided by the embodiments of the present application is illustrated from the angle of data processing equipment below, the number It can be server according to processing unit, or the service unit in server, specifically without limitation.
The term in the embodiment of the present application is illustrated first:
Term system: it is used for organizational information, promotes the various types of concepts or terminology of information management.
Integrated term system: using term as basic unit, using concept as core, with original relationship of source vocabulary be according to Support establishes the word network system for realizing separate sources vocabulary semantic association by concept.
Concept: it is the class of one group of thought, object composition, is the basic element of term system.Generally by a term or one Group synonym description.
Term: word or phrase for flag concept.
Synonym: in general context, the meaning of a word is identical and two or more terms that morphology is different.
It is preferred that term: under general context, the higher term of frequency of use in one group of synonym of identical concept is characterized, and The formal terms of concept are expressed.
Not preferred term: not preferred term of other synonyms as concept of identical concept is characterized.
Source vocabulary: can be used in constructing all source term systems of integrated term system, including ontology, thesaurus, Classification chart, synonym table, dictionary, coding, keyword, user search word etc..
Source term: the term in the vocabulary of source is closely related with source vocabulary.
Source concept: the concept in a certain source vocabulary is closely related with source vocabulary.
Source term identifies (Identification, ID): source vocabulary assigns the unique identifier of term in this vocabulary. With permanent and uniqueness.
Source concept ID: source vocabulary assigns the unique identifier of concept in this vocabulary.With permanent and uniqueness.
Referring to Fig. 1, Fig. 1 is the embodiment schematic diagram of data processing method provided by the embodiments of the present application, this method is answered For integrating term system, which includes at least one source vocabulary, this method comprises:
101, source vocabulary to be updated is registered, to obtain target source vocabulary.
In the present embodiment, data processing equipment can be using multiple versions in a source vocabulary as the same source word Watch series, and the source vocabulary of each version is stored in as an independent term system source word for integrating term system In Table storehouse, and a unique identifier is distributed to the different editions of separate sources vocabulary by GetSourceCode function SourceIDCode.Unique identifier is made of two parts: " source vocabulary sequence code " and " version serial number ".Same source system " the source vocabulary sequence code " of the different editions of column is identical, and " version serial number " presses enrollment time sequence, from small to large, each not phase Together.That is, data processing equipment can obtain source vocabulary to be updated first, pass through GetSourceCode letter later It is several that source vocabulary to be updated is registered, come with obtaining the target of distribution source vocabulary sequential coding and version serial number Source vocabulary.
It should be noted that distributing one to separate sources vocabulary different editions above by GetSourceCode function A unique identifier SourceIDCode can also also have other modes for target source vocabulary point certainly by way of example only With unique identifier, specifically without limitation.
102, the target vocabulary data in the vocabulary of target source are determined.
In the present embodiment, data processing equipment can extract the vocabulary data in the vocabulary of target source, and call identifier Partition function is that the term and concept in vocabulary data distinguish allocation identification symbol, to obtain target vocabulary data.Specifically, data Processing unit can extract the total data of target source vocabulary, the term in the vocabulary of target source by vocabulary import wizard When unique identifier unallocated with concept, calling source term unique identifier partition function GetSourceTermID is target Term allocation source term ID in the vocabulary of source, and call source concept unique identifier partition function GetSourceConceptID is that the concept in the vocabulary of target source distributes source concept ID.
It should be noted that above-mentioned described calling source term unique identifier partition function GetSourceTermID With the term and concept point that source concept unique identifier partition function GetSourceConceptID is in the vocabulary of target source With source term ID and source concept ID, by way of example only, can also also there are other methods of salary distribution certainly, not limit specifically It is fixed.
103, target vocabulary data are matched with the first vocabulary data of the first source vocabulary, to determine target source Object matching result of the vocabulary relative to the first source vocabulary.
In the present embodiment, data processing equipment can be by the first vocabulary data of target vocabulary data and the first source vocabulary It is matched, to determine object matching result of the target source vocabulary relative to the first source vocabulary, wherein the first source word Table is source corresponding with target source vocabulary vocabulary in integrated term system.It is understood that the first source vocabulary with The target source vocabulary is the source vocabulary with a series of different editions, and the version of the target source vocabulary is new version.
It should be noted that the object matching result includes the matching result of term matching result, concept synonym collection And the matching result of the preferred term of concept, below to how carrying out string matching and obtain object matching result to be illustrated:
Step A, the term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot Fruit, term matching result include: newly-increased term result, do not change term result and/or delete term as a result, target terms are mesh Mark any one term in vocabulary data.
In this step, data processing equipment can carry out character to the term in target source vocabulary and the first source vocabulary The accurate matching of string, determines term matching result, specifically, data processing equipment can pass through CompareSourceTermString function is accurately matched based on character string to the term in target terms and the first source vocabulary Matched, obtain three kinds of possible matching results: newly-increased term result, do not change term result and/or delete term as a result, Three kinds of possible matching results are illustrated below:
1, it increases the corresponding term New term of term result newly, is i.e. still target source word is not present in the first source vocabulary Term present in table.
2, the corresponding term Unchanged term of term result, i.e. the first source vocabulary and target source vocabulary are not changed In simultaneous term;
3, the corresponding term Deleted term of term result is deleted, is i.e. there is still target source in the first source vocabulary The term being not present in vocabulary.After the term matching result for obtaining target source vocabulary and the first source vocabulary, and will The matching result is output in file CompareSourceTermResult table.Wherein, should CompareSourceTermResult table is equipped with metadata: SourceIDCode (source vocabulary ID), SourceTermID (come Source vocabulary term ID), TermString (term character string), TermEdit (term change operation).SourceIDCode value For the unique identifier of target source vocabulary, SourceTermID value is the source term ID in the vocabulary of target source, The character string of TermString value target terms, TermEdit value are three kinds of possible matching results of term matching result: New, Unchanged and/or Deleted.
Step B, according to term matching result that the first concept synonym collection is synonymous with the concept in the first vocabulary data Set of words carries out string matching, to determine the matching result of concept synonym collection, the matching result of concept synonym collection Include: newly-increased conceptual result, the non-result of variations of concept synonym collection, delete conceptual result and/or the change of concept synonym collection Change as a result, the first concept synset is combined into any one concept synonym collection in target word table data.
In this step, data processing equipment can pass through in conjunction with term matching result CompareSourceConceptTermString function in the vocabulary of target source concept synonym collection and the first source Concept synonym collection in vocabulary carries out character string and accurately matches that (matching herein, being will be any in the vocabulary of target source Term in one concept synonym collection is carried out with the term in the concept synonym collection in the first source vocabulary pair respectively Than obtaining situation of change of the term relative to the term in the first source vocabulary in the vocabulary of target source, the concept synonym Again include the term of at least two synonyms each other in set, then concept synset can be determined by the situation of change of term The situation of change of conjunction), the matching result of concept synonym collection is obtained, the matching result of the concept synonym collection includes newly-increased Conceptual result, deletes conceptual result and/or concept synonym collection result of variations at the non-result of variations of concept synonym collection, under Face is illustrated respectively:
1, it increases conceptual result New concept newly, is i.e. goes out in no appearance but target source vocabulary in the first source vocabulary Existing concept, that is to say, that the term in the corresponding synonym collection of concept of target source vocabulary is newly-increased term;
2, the non-result of variations Unchanged Synonym of concept synonym collection comes in the first source vocabulary and target Simultaneous concept in the vocabulary of source, that is to say, that the term in the corresponding synonym collection of concept of target source vocabulary is Do not change term;
3, conceptual result Deleted concept is deleted, is i.e. have in the first source vocabulary but is not had in the vocabulary of target source Some concepts, that is to say, that the term in the corresponding synonym collection of concept of the first source vocabulary is deletion term;
4, concept synonym collection result of variations Changed Synonym, i.e., with respect to the concept in the first source vocabulary, mesh It marks that the term in the corresponding synonym collection of concept of source vocabulary is existing not to change term, and containing " new terminology " or " deletes Term ".The matching result of the concept synonym collection is output in file CompareSourceConceptResult table, In, the CompareSourceConceptResult table is equipped with metadata: SourceIDCode (source vocabulary ID), SourceConceptID (source concept ID), SourceTermID (source vocabulary term ID), TermStrings (term character String), TermEdit (nomenclature more operates), ConceptSynonymEdit (concept change operation), wherein SourceIDCode Value is the unique identifier of target source vocabulary, and it is in the vocabulary of target source that SourceConceptID value, which is in step 102, Concept distribution source concept ID in source concept ID, it is target source word that SourceTermID value, which is in step 102, The source term ID of term allocation in table, the character string of TermStrings target terms, TermEdit value are target terms The SourceTermID corresponding CompareSourceTermResult table of term matching result in step in value, ConcepSynonymtEdit value is the matching result of the concept synonym in step B: New, Unchanged, Changed And/or Deleted.
Step C, according to the matching result of term matching result and concept synonym collection by the first preferred term of concept String matching is carried out with the preferred term of concept in the first vocabulary data, to determine the matching result of the preferred term of concept, generally The matching result for reading preferred term includes: the non-result of variations of the preferred term of concept and/or the preferred term result of variations of concept, and first The preferred term of concept is the preferred term of any one concept in target vocabulary data.
In this step, in conjunction with the matching result of above-mentioned term matching result and concept synonym collection, pass through CompareSourceConcept PreferredTerm function is by the preferred term of concept and first in the vocabulary of target source The preferred term of concept in the vocabulary of source carries out character string and accurately matches, and obtains of the preferred term of concept in the vocabulary of target source With result: the non-result of variations of the preferred term of concept and/or the preferred term result of variations of concept, that is to say, that target can be carried out The preferred term of the concept of source vocabulary is compared respectively at the preferred term of concept in the first source vocabulary, preferred to obtain concept The matching result of term, is illustrated separately below:
1, the non-result of variations Unchanged PreferredTerm of the preferred term of concept, i.e., above-mentioned three genus synonym The preferred term of set does not change, three genus include newly-increased concept, concept synonym collection do not change and Concept synonym collection changes;
2, the preferred term of concept changes result Changed PreferredTerm, i.e., above-mentioned three genus synonym The preferred term of set changes, which includes that newly-increased concept, concept synonym collection do not change and generally Synonym collection is read to change.
The non-result of variations of the preferred term of concept and the preferred term of the concept result that changes are output to file In CompareSourcePreferredTerm table, wherein the CompareSourceConceptResult table is equipped with metadata: SourceIDCode (source vocabulary ID), SourceConceptID (source concept ID), PreferredTermEdit (preferably art Language change operation), PreferredTermID (preferably term ID).SourceIDCode (source vocabulary ID) value target source The unique identifier of vocabulary, SourceConceptID are the source concept ID in step B.Concept synonym collection does not change knot Two matching results obtained in fruit and concept synonym collection result of variations PreferredTermEdit step 2: Unchanged or Changed, the newly-increased corresponding PreferredTermEdit of concept unify value New.PreferredTermID The corresponding SourceTermID of the preferred term of concept in the vocabulary of value target source.
104, the vocabulary data in integrated term system are updated according to object matching result and preset rules.
It, can be according to object matching result and preset rules to the vocabulary data in integrated term system in the present embodiment It is updated.
In one embodiment, according to object matching result and preset rules to the vocabulary data in integrated term system It is updated and includes:
When term matching result is to delete term result, obtains and delete the corresponding term of term result;
The corresponding term of term result will be deleted to delete;
When the matching result of concept synonym collection is to delete conceptual result, obtains and delete the corresponding concept of conceptual result Synonym collection;
The corresponding concept synonym collection of conceptual result will be deleted to delete.
That is, delete CompareSourceTermResult table in TermEdit value be Deleted term and (association attributes and relationship be the ID of the term, the ID of source vocabulary belonging to the term and should for its association attributes and relationship The corresponding concept of term etc. relationship), while deleting in CompareSourceConceptResult table The term and its association attributes and relationship that ConcepSynonymtEdit value is Deleted.
In one embodiment, according to object matching result and preset rules to the vocabulary data in integrated term system It is updated and includes:
When term matching result is newly-increased term as a result, and the matching result of concept synonym collection is concept synset When closing result of variations, newly-increased at least one corresponding term of term result is obtained;
Judge there is the second concept synonym collection to match with the first term, the first term in integrated term system For any one term at least one term;
If so, by the first term and at least one term and described in the first term each other the term importing of synonym Second concept synonym collection.
That is, when term matching result is that (the first term exists newly-increased term result The value of TermEdit is NEW in CompareSourceTermResult), and the matching result of concept synonym collection is general Synonym collection result of variations is read (where the first term in the CompareSourcePreferredTerm table of source concept ConceptSynonymEdit value is<Changed>) when, judge that the first term whether there is in integrated term system and matches The second concept synonym collection, if then will be synonymous each other with first term in first term and at least one term The term of word imports the second concept synonym collection.
In one embodiment, when the second concept synonym collection, and the preferred art of concept is not present in integrated term system When the matching result of language is concept preferred term result of variations, the N number of of the preferred term variation of concept in integrated term system is obtained Concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in the first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to similarity;
When the matching result of the preferred term of concept does not change result for the preferred term of concept, the first term is imported First concept synonym collection, it is same that the first concept synset is combined into the preferred term of the concept corresponding concept of result that do not change The concept synonym collection to match in adopted set of words with the first term.
That is, judging the preferred term of the concept when the second concept synonym collection is not present in integrated term system Matching result whether be the preferred term result of variations of concept, when the preferred term of the non-concept of matching result of the preferred term of the concept Result of variations (the CompareSourcePreferredTerm table of the preferred term of the concept synonym collection where the first term The value of middle PreferredTermEdit is<Changed>) when, obtain the N of the preferred term variation of concept in integrated term system A concept synonym collection, and it is excellent by Dice Coefficient Algorithm the first term of calculating and the concept in N number of concept synonym collection Select the similarity of term (can also also determine that similarity does not represent herein by way of example only in other way certainly Restriction to it), and by the first N number of concept synonym collection of input importing, (importing herein, can be by first according to similarity In concept synonym collection where the term importing maximum preferred term of similarity);When the matching knot of the preferred term of the concept The preferred term of the non-concept of fruit do not change result (the preferred term of the concept synonym collection where the first term The value of PreferredTermEdit is<Unchanged>in CompareSourcePreferredTerm table) when, it can incite somebody to action First term imports in the first concept synonym collection, which is combined into and the first term synonym each other The corresponding concept synonym collection of the preferred term of concept, and the first concept synonym collection is contained in the preferred term of concept not The corresponding concept synonym collection of the result that changes.
In one embodiment, according to object matching result and preset rules to the vocabulary data in integrated term system It is updated and includes:
When term matching result is newly-increased term as a result, and the matching result of concept synonym collection is newly-increased conceptual result When, M similarity of M preferred terms in the first preferred term and integrated term system is calculated, wherein the first preferred term For the preferred term in the concept synonym collection where the first term, and concept synonym collection belonging to the first preferred term Belong to same area, M >=2 with concept synonym collection belonging to M preferably term;
According to M similarity by the art in the first term and at least one term with first term synonym each other Language imports.
It should be noted that the M in the first preferred term and integrated term system can be calculated by Dice algorithm herein M similarity of a preferred term, and will be mutual with the first term in the first term and at least one term according to M similarity It is imported for the term of synonym.
It in one embodiment, will be mutual with the first term in the first term and at least one term according to M similarity Include: for the term importing of synonym
Judge whether the maximum similarity in M similarity is greater than preset threshold;
If so, maximum by being imported in the first term and at least one term with the term of the first term synonym each other The corresponding concept synonym collection of similarity;
If it is not, being then determined as new concept synset by the first term and with the term of the first term synonym each other It closes.
That is, a threshold value of setting can be preset herein, and whether judge the maximum similarity in M similarity Greater than the preset threshold, it is when the maximum similarity is greater than the preset threshold, then by first term and at least one term In imported in the corresponding concept synonym collection of maximum similarity with the term of the first term synonym each other, when the maximum is similar When degree is less than the preset threshold, then using first term and with the first term each other synonym term as one it is new should Concept synonym collection.
It should be noted that in above-mentioned each update mode, by following strategy to target source in integrated term system The preferred term that vocabulary is related to concept (assuming that the preferred term of the design concept is X) is updated:
If concept X is newly-increased concept, again according to the integrated preferred term proposed algorithm of the existing concept of term system Recommend preferred term;
If concept X is not newly-increased concept, and the preferred term of concept X does not change because of updating, that Keep original preferred term constant;
If concept X is not new concept, and preferably term is deleted because this is updated, then according to integrated term system The existing preferred term proposed algorithm of concept recommends preferred term again.
It should be noted that in the vocabulary of target source term and concept updating complete after, can be according to collection At term system to the source attribute of concept and the inheritance principles of relationship, according to source term ID and source concept ID by target come Concept attribute in the vocabulary of source and relationship are inherited into integrated term system, are not herein limited specifically and how to be inherited.
In view of the foregoing it is apparent that can be by the term and concept in the target source vocabulary and the first source vocabulary It is matched to obtain object matching as a result, and carrying out more the vocabulary data in integrated term system according to the object matching result Newly, can follow integrated term system it is both regular on the basis of, realize in integrated term system in the vocabulary of new version source Variation concept automatically updates, and improves integrated term system version and updates efficiency, time cost is greatly saved, and shortens integrated Time difference between the upgrading of term system concept and source vocabulary edition upgrading.
Data processing method provided by the embodiments of the present application is illustrated above, below to provided by the embodiments of the present application Data processing equipment is illustrated.
Referring to Fig. 2, Fig. 2 is the structural schematic diagram of data processing equipment provided by the embodiments of the present application, the data processing Device is applied to integrated term system, and the integrated term system includes at least one source vocabulary, comprising:
Registering unit 201, for registering source vocabulary to be updated, to obtain target source vocabulary;
Determination unit 202, for determining the target vocabulary data in the vocabulary of the target source, the target vocabulary data Term including allocation identification and the concept of allocation identification;
Matching unit 203, for carrying out the first vocabulary data of the target vocabulary data and the first source vocabulary Match, with the determination target source vocabulary relative to first source vocabulary object matching as a result, first source word Table is source corresponding with target source vocabulary vocabulary in the integrated term system;
Updating unit 204 is used for according to the object matching result and preset rules in the integrated term system Vocabulary data be updated.
Optionally, the matching unit 203 is specifically used for:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching knot Fruit, the term matching result include: newly-increased term result, do not change term result and/or delete term as a result, the target Term is any one term in the target vocabulary data;
It is according to the term matching result that the first concept synonym collection and the concept in the first vocabulary data is same Adopted set of words carries out string matching, to determine the matching result of concept synonym collection, of the concept synonym collection Include: newly-increased conceptual result with result, the non-result of variations of concept synonym collection, delete conceptual result and/or concept synonym Gather result of variations, the first concept synset is combined into any one concept synset in the target vocabulary data It closes;
According to the matching result of the term matching result and the concept synonym collection by the preferred art of the first concept The preferred term of concept in language and the first vocabulary data carries out string matching, to determine the matching knot of the preferred term of concept Fruit, the matching result of the preferred term of concept include: the non-result of variations of the preferred term of concept and/or the preferred term variation of concept As a result, the preferred term of the first concept is the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred art of the concept The matching result of language belongs to the object matching result.
Optionally, the updating unit 204 is specifically used for:
When the term matching result is the deletion term result, the corresponding art of the deletion term result is obtained Language;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion concept knot is obtained The corresponding concept synonym collection of fruit;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
Optionally, the updating unit 204 also particularly useful for:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is described When concept synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, First term is any one term at least one described term;
If so, by first term and at least one described term with first term synonym each other Term imports the second concept synonym collection.
Optionally, the updating unit 204 also particularly useful for:
When the integrated term system be not present the second concept synonym collection, and work as the preferred term of the concept Matching result when being the preferred term result of variations of the concept, obtain the preferred term variation of concept in the integrated term system N number of concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
It, will be described when the matching result of the preferred term of the concept does not change result for the preferred term of the concept First term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not occur The concept synonym collection to match in the corresponding concept synonym collection of result of variations with first term.
Optionally, the updating unit 204 is specifically used for:
When the term matching result be the newly-increased term as a result, and the matching result of the concept synonym collection be When the newly-increased conceptual result, it is similar to M of the preferably term of M in the integrated term system to calculate the first preferred term Degree, wherein preferred term of the first preferred term in the concept synonym collection where first term, and it is described Concept synonym collection belonging to first preferred term belongs to identical with concept synonym collection belonging to the M preferably term Field, M >=2;
It will be mutual with first term in first term and at least one described term according to the M similarity It is imported for the term of synonym.
Optionally, the updating unit 204 according to the M similarity by first term and it is described at least one In term with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by first term and at least one described term with first term synonym each other Term imports the corresponding concept synonym collection of the maximum similarity;
If it is not, it is same to be then determined as new concept by first term and with the term of first term synonym each other Adopted set of words.
Optionally, the determination unit 202 is specifically used for:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain State target vocabulary data.
Interactive mode between each unit of data processing equipment in the present embodiment is as illustrated in the foregoing fig. 1 in embodiment Description, specific details are not described herein again.
In view of the foregoing it is apparent that can be by the term and concept in the target source vocabulary and the first source vocabulary It is matched to obtain object matching as a result, and carrying out more the vocabulary data in integrated term system according to the object matching result Newly, can follow integrated term system it is both regular on the basis of, realize in integrated term system in the vocabulary of new version source Variation concept automatically updates, and improves integrated term system version and updates efficiency, time cost is greatly saved, and shortens integrated Time difference between the upgrading of term system concept and source vocabulary edition upgrading.
Referring to Fig. 3, Fig. 3 is a kind of structural schematic diagram of server provided by the embodiments of the present application, which can Bigger difference is generated because configuration or performance are different, may include one or more central processing units (central Processing units, CPU) 322 (for example, one or more processors) and memory 332, one or more Store the storage medium 330 (such as one or more mass memory units) of application program 342 or data 344.Wherein, it deposits Reservoir 332 and storage medium 330 can be of short duration storage or persistent storage.The program for being stored in storage medium 330 may include One or more modules (diagram does not mark), each module may include to the series of instructions operation in server.More Further, central processing unit 322 can be set to communicate with storage medium 330, execute storage medium on server 300 Series of instructions operation in 330.
Server 300 can also include one or more power supplys 326, one or more wired or wireless networks Interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by data processing equipment can be based on the server architecture shown in Fig. 3 in above-described embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
The embodiment of the present application also provides a kind of storage mediums, are stored thereon with program, when which is executed by processor Realize the data processing method.
The embodiment of the present application also provides a kind of processor, the processor is for running program, wherein described program fortune The data processing method is executed when row.
The embodiment of the present application also provides a kind of equipment, equipment includes processor, memory and stores on a memory simultaneously The program that can be run on a processor, processor perform the steps of when executing program
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet includes allocation identification The concept of term and allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target Source vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system Source corresponding with target source vocabulary vocabulary in system;
The vocabulary data in the integrated term system are carried out more according to the object matching result and preset rules Newly.
In the specific implementation process, any embodiment party in the corresponding embodiment of Fig. 1 may be implemented when processor executes program Formula.
Equipment herein can be server, PC, PAD, mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, execute following step It is rapid:
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet includes allocation identification The concept of term and allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target Source vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system Source corresponding with target source vocabulary vocabulary in system;
The vocabulary data in the integrated term system are carried out more according to the object matching result and preset rules Newly.
In the specific implementation process, any reality in the corresponding embodiment of Fig. 1 may be implemented when executing computer program product Apply mode.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application be referring to the method for the embodiment of the present application, equipment (system) and computer program product flow chart and/ Or block diagram describes.It should be understood that each process that can be realized by computer program instructions in flowchart and/or the block diagram and/ Or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer program instructions To general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices processor to generate one A machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer Readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.Under the result not limited more, the element that is limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (18)

1. a kind of data processing method is applied to integrated term system, the integrated term system includes at least one source word Table characterized by comprising
Source vocabulary to be updated is registered, to obtain target source vocabulary;
Determine that the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet include the term of allocation identification And the concept of allocation identification;
The target vocabulary data are matched with the first vocabulary data of the first source vocabulary, with the determination target source Vocabulary relative to first source vocabulary object matching as a result, first source vocabulary be the integrated term system in Source corresponding with target source vocabulary vocabulary;
The vocabulary data in the integrated term system are updated according to the object matching result and preset rules.
2. the method according to claim 1, wherein described by the target vocabulary data and the first source vocabulary The first vocabulary data matched, the object matching with the determination target source vocabulary relative to first source vocabulary As a result, comprising:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching result, institute Term matching result is stated to include: newly-increased term result, do not change term result and/or delete term as a result, the target terms For any one term in the target vocabulary data;
According to the term matching result by the concept synonym in the first concept synonym collection and the first vocabulary data Set carries out string matching, to determine the matching result of concept synonym collection, the matching knot of the concept synonym collection Fruit includes: newly-increased conceptual result, the non-result of variations of concept synonym collection, deletes conceptual result and/or concept synonym collection Result of variations, the first concept synset are combined into any one concept synonym collection in the target vocabulary data;
According to the matching result of the term matching result and the concept synonym collection by the preferred term of the first concept with The preferred term of concept in the first vocabulary data carries out string matching, to determine the matching result of the preferred term of concept, The matching result of the preferred term of concept includes: the non-result of variations of the preferred term of concept and/or the preferred term variation knot of concept Fruit, the preferred term of the first concept are the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred term of the concept Matching result belongs to the object matching result.
3. according to the method described in claim 2, it is characterized in that, described according to the object matching result and preset rules Vocabulary data in the integrated term system are updated and include:
When the term matching result is the deletion term result, the corresponding term of the deletion term result is obtained;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion conceptual result pair is obtained The concept synonym collection answered;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
4. according to the method described in claim 2, it is characterized in that, described according to the object matching result and preset rules Vocabulary data in the integrated term system are updated and include:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is the concept When synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, it is described First term is any one term at least one described term;
If so, by the term in first term and at least one described term with first term synonym each other Import the second concept synonym collection.
5. according to the method described in claim 4, it is characterized in that, when general there is no described second in the integrated term system When reading synonym collection, the method also includes:
When the matching result of the preferred term of concept term result of variations preferred for the concept, the integrated term is obtained N number of concept synonym collection of concept preferred term variation in system, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
When the matching result of the preferred term of the concept does not change result for the preferred term of the concept, by described first Term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not change As a result the concept synonym collection to match in corresponding concept synonym collection with first term.
6. according to the method described in claim 4, it is characterized in that, described according to the object matching result and preset rules Vocabulary data in the integrated term system are updated and include:
When the term matching result is the newly-increased term as a result, and the matching result of the concept synonym collection is described When newly-increased conceptual result, M similarity of M preferred terms in the first preferred term and the integrated term system is calculated, The wherein preferred term in concept synonym collection of the described first preferred term where first term, and described first It is preferred that concept synonym collection belonging to concept synonym collection belonging to term and the M preferred terms belongs to identical neck Domain, M >=2;
It will be same each other with first term in first term and at least one described term according to the M similarity The term of adopted word imports.
7. according to the method described in claim 6, it is characterized in that, it is described according to the M similarity by first term And at least one described term with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by the term in first term and at least one described term with first term synonym each other Import the corresponding concept synonym collection of the maximum similarity;
If it is not, being then determined as new concept synonym by first term and with the term of first term synonym each other Set.
8. method according to any one of claim 1 to 7, which is characterized in that the determination target source vocabulary In target vocabulary data packet include:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain the mesh Mark vocabulary data.
9. a kind of data processing equipment is applied to integrated term system, the integrated term system includes at least one source word Table characterized by comprising
Registering unit, for registering source vocabulary to be updated, to obtain target source vocabulary;
Determination unit, for determining that the target vocabulary data in the vocabulary of the target source, the target vocabulary data packet include The concept of the term of allocation identification and allocation identification;
Matching unit, for matching the target vocabulary data with the first vocabulary data of the first source vocabulary, with true Target source vocabulary is determined relative to the object matching of first source vocabulary as a result, first source vocabulary is described Source corresponding with target source vocabulary vocabulary in integrated term system;
Updating unit, for according to the object matching result and preset rules to the vocabulary number in the integrated term system According to being updated.
10. device according to claim 9, which is characterized in that the matching unit is specifically used for:
Term in target terms and the first vocabulary data is subjected to string matching, to obtain term matching result, institute Term matching result is stated to include: newly-increased term result, do not change term result and/or delete term as a result, the target terms For any one term in the target vocabulary data;
According to the term matching result by the concept synonym in the first concept synonym collection and the first vocabulary data Set carries out string matching, to determine the matching result of concept synonym collection, the matching knot of the concept synonym collection Fruit includes: newly-increased conceptual result, the non-result of variations of concept synonym collection, deletes conceptual result and/or concept synonym collection Result of variations, the first concept synset are combined into any one concept synonym collection in the target vocabulary data;
According to the matching result of the term matching result and the concept synonym collection by the preferred term of the first concept with The preferred term of concept in the first vocabulary data carries out string matching, to determine the matching result of the preferred term of concept, The matching result of the preferred term of concept includes: the non-result of variations of the preferred term of concept and/or the preferred term variation knot of concept Fruit, the preferred term of the first concept are the preferred term of any one concept in the target vocabulary data;
Wherein, the term matching result, the matching result of the concept synonym collection and the preferred term of the concept Matching result belongs to the object matching result.
11. device according to claim 10, which is characterized in that the updating unit is specifically used for:
When the term matching result is the deletion term result, the corresponding term of the deletion term result is obtained;
The corresponding term of the deletion term result is deleted;
When the matching result of the concept synonym collection is the deletion conceptual result, the deletion conceptual result pair is obtained The concept synonym collection answered;
The corresponding concept synonym collection of the deletion conceptual result is deleted.
12. device according to claim 10, which is characterized in that the updating unit also particularly useful for:
When the term matching result is newly-increased term as a result, and the matching result of the concept synonym collection is the concept When synonym collection result of variations, at least one corresponding term of the newly-increased term result is obtained;
Judge there is the second concept synonym collection to match with first term in the integrated term system, it is described First term is any one term at least one described term;
If so, by the term in first term and at least one described term with first term synonym each other Import the second concept synonym collection.
13. device according to claim 12, which is characterized in that the updating unit also particularly useful for:
When the integrated term system be not present the second concept synonym collection, and work as the preferred term of the concept When term result of variations preferred for the concept with result, the N of the preferred term variation of concept in the integrated term system is obtained A concept synonym collection, wherein N >=2;
Calculate the similarity of the preferred term of concept in first term and N number of concept synonym collection;
First term is imported into N number of concept synonym collection according to the similarity;
When the matching result of the preferred term of the concept does not change result for the preferred term of the concept, by described first Term imports the first concept synonym collection, and the first concept synset is combined into the preferred term of the concept and does not change As a result the concept synonym collection to match in corresponding concept synonym collection with first term.
14. device according to claim 12, which is characterized in that the updating unit is specifically used for:
When the term matching result is the newly-increased term as a result, and the matching result of the concept synonym collection is described When newly-increased conceptual result, M similarity of M preferred terms in the first preferred term and the integrated term system is calculated, The wherein preferred term in concept synonym collection of the described first preferred term where first term, and described first It is preferred that concept synonym collection belonging to concept synonym collection belonging to term and the M preferred terms belongs to identical neck Domain, M >=2;
It will be same each other with first term in first term and at least one described term according to the M similarity The term of adopted word imports.
15. device according to claim 14, which is characterized in that the updating unit is according to the M similarity by institute State in the first term and at least one described term with first term each other synonym term importing include:
Judge whether the maximum similarity in the M similarity is greater than preset threshold;
If so, by the term in first term and at least one described term with first term synonym each other Import the corresponding concept synonym collection of the maximum similarity;
If it is not, being then determined as new concept synonym by first term and with the term of first term synonym each other Set.
16. device according to any one of claims 9 to 15, which is characterized in that the determination unit is specifically used for:
Extract the vocabulary data in the vocabulary of the target source;
Call identifier partition function is that the term and concept in the vocabulary data distinguish allocation identification symbol, to obtain the mesh Mark vocabulary data.
17. a kind of computer readable storage medium, which is characterized in that including instruction, when the computer readable storage medium exists When being run on computer, so that the step of data processing method described in any one of computer perform claim requirement 1 to 8.
18. a kind of computer program product comprising instruction, when the computer program product comprising instruction on computers When operation, so that the step of computer executes data processing method described in any one of the claims 1 to 8.
CN201910540408.0A 2019-06-20 2019-06-20 A kind of data processing method and relevant device Pending CN110263184A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910540408.0A CN110263184A (en) 2019-06-20 2019-06-20 A kind of data processing method and relevant device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910540408.0A CN110263184A (en) 2019-06-20 2019-06-20 A kind of data processing method and relevant device

Publications (1)

Publication Number Publication Date
CN110263184A true CN110263184A (en) 2019-09-20

Family

ID=67920096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910540408.0A Pending CN110263184A (en) 2019-06-20 2019-06-20 A kind of data processing method and relevant device

Country Status (1)

Country Link
CN (1) CN110263184A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704639A (en) * 2019-09-30 2020-01-17 中国医学科学院医学信息研究所 Abbreviation document generation method and device
CN112765136A (en) * 2021-04-07 2021-05-07 浙江太美医疗科技股份有限公司 Storage method, upgrading method and device of medical coding dictionary
CN113221543A (en) * 2021-05-07 2021-08-06 中国医学科学院医学信息研究所 Medical term integration method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0241565A (en) * 1988-08-01 1990-02-09 Nippon Telegr & Teleph Corp <Ntt> Thesaurus update supporting device
CN101030157A (en) * 2007-04-20 2007-09-05 北京搜狗科技发展有限公司 Method and system for updating user vocabulary synchronouslly
CN103324704A (en) * 2013-06-17 2013-09-25 深圳先进技术研究院 Method and system for dynamically updating knowledge base
CN103823879A (en) * 2014-02-28 2014-05-28 中国科学院计算技术研究所 Method and system for automatically updating knowledge base oriented to online encyclopedia
CN105930478A (en) * 2016-05-03 2016-09-07 福州市勘测院 Element object spatial information fingerprint-based spatial data change capture method
CN107220326A (en) * 2017-05-23 2017-09-29 至本医疗科技(上海)有限公司 The information updating method and system of a kind of biomedical knowledge base
CN109145171A (en) * 2018-07-23 2019-01-04 广州市城市规划勘测设计研究院 A kind of multiple dimensioned map data updating method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0241565A (en) * 1988-08-01 1990-02-09 Nippon Telegr & Teleph Corp <Ntt> Thesaurus update supporting device
CN101030157A (en) * 2007-04-20 2007-09-05 北京搜狗科技发展有限公司 Method and system for updating user vocabulary synchronouslly
CN103324704A (en) * 2013-06-17 2013-09-25 深圳先进技术研究院 Method and system for dynamically updating knowledge base
CN103823879A (en) * 2014-02-28 2014-05-28 中国科学院计算技术研究所 Method and system for automatically updating knowledge base oriented to online encyclopedia
CN105930478A (en) * 2016-05-03 2016-09-07 福州市勘测院 Element object spatial information fingerprint-based spatial data change capture method
CN107220326A (en) * 2017-05-23 2017-09-29 至本医疗科技(上海)有限公司 The information updating method and system of a kind of biomedical knowledge base
CN109145171A (en) * 2018-07-23 2019-01-04 广州市城市规划勘测设计研究院 A kind of multiple dimensioned map data updating method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙海霞等: "科技知识组织体系语义互操作网络协同工作平台设计与实现", 《农业图书情报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704639A (en) * 2019-09-30 2020-01-17 中国医学科学院医学信息研究所 Abbreviation document generation method and device
CN112765136A (en) * 2021-04-07 2021-05-07 浙江太美医疗科技股份有限公司 Storage method, upgrading method and device of medical coding dictionary
CN113221543A (en) * 2021-05-07 2021-08-06 中国医学科学院医学信息研究所 Medical term integration method and system
CN113221543B (en) * 2021-05-07 2023-10-10 中国医学科学院医学信息研究所 Medical term integration method and system

Similar Documents

Publication Publication Date Title
CN109559234B (en) Block chain state data storage method, equipment and storage medium
US11216509B2 (en) Dynamic faceting for personalized search and discovery
US20200012733A1 (en) Multi-dimensional knowledge index and application thereof
US11106719B2 (en) Heuristic dimension reduction in metadata modeling
US20110029571A1 (en) Query Optimization Over Graph Data Streams
US10915532B2 (en) Supporting a join operation against multiple NoSQL databases
CN110263184A (en) A kind of data processing method and relevant device
US10346496B2 (en) Information category obtaining method and apparatus
JP2018515844A (en) Data processing method and system
US11250719B2 (en) Generating and rating user assessments
CN110866029B (en) sql statement construction method, device, server and readable storage medium
JP2023546040A (en) Data processing methods, devices, electronic devices, and computer programs
CN110019298A (en) Data processing method and device
US11386090B2 (en) Defining attribute feature vectors for matching data entities
US11303530B2 (en) Ranking of asset tags
US11397715B2 (en) Defining indexing fields for matching data entities
CN110968776A (en) Policy knowledge recommendation method, device storage medium and processor
US11847121B2 (en) Compound predicate query statement transformation
US20220058195A1 (en) Index selection for database query
TWI844931B (en) Boosting classification and regression tree performance with dimension reduction
US20220342887A1 (en) Predictive query processing
US11556558B2 (en) Insight expansion in smart data retention systems
CN108256694A (en) Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm
Lu et al. Genderpredictor: a method to predict gender of customers from e-commerce website
US12039273B2 (en) Feature vector generation for probabalistic matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination