CN105335378A - Multi-data source information processing device and method, and server - Google Patents

Multi-data source information processing device and method, and server Download PDF

Info

Publication number
CN105335378A
CN105335378A CN201410291263.2A CN201410291263A CN105335378A CN 105335378 A CN105335378 A CN 105335378A CN 201410291263 A CN201410291263 A CN 201410291263A CN 105335378 A CN105335378 A CN 105335378A
Authority
CN
China
Prior art keywords
information
unit
entity
attribute
signal conditioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410291263.2A
Other languages
Chinese (zh)
Inventor
张姝
孟遥
杨铭
缪庆亮
李贤华
房璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201410291263.2A priority Critical patent/CN105335378A/en
Publication of CN105335378A publication Critical patent/CN105335378A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a multi-data source information processing device and method, and a server. The device comprises a first judgment unit, a link unit, a first expansion unit and a second judgment unit, wherein the first judgment unit is used for carrying out same entity judgment on information in at least two data sources; the link unit is used for linking the entities in at least two data sources into preset external resources; the first expansion unit is used for expanding the attributes of the entities according to information in the external resources; and the second judgment unit is used for judging whether the information meets a preset condition or not after the attribute expansion, the information after theattribute expansion is used for carrying out the same entity judgment when the information does not meet the preset condition, and the information afterthe attribute expansion is output when the information afterattribute expansion meets the preset condition. Multi-data source information can be integrated through a way that the same entity is judged and the external resources are linked for attribute expansion in an iteration way, and the accuracy and the comprehensiveness of the information are improved.

Description

The signal conditioning package of multi-data source, server and method
Technical field
The present invention relates to communication technical field, particularly relate to a kind of signal conditioning package of multi-data source, server and method.
Background technology
Along with the development of infotech, the quantity of information in each technical field is also increasing.In many situations, need to use the information in multiple data source.Such as, inquire about the data in multiple technological platform, government department carries out Information Statistics, enterprises carries out information integration analysis etc.Existing inquiry and statistical analysis technique generally need to inquire about one by one and statistical study multiple data source.
Above it should be noted that, just conveniently to technical scheme of the present invention, clear, complete explanation is carried out to the introduction of technical background, and facilitate the understanding of those skilled in the art to set forth.Only can not think that technique scheme is conventionally known to one of skill in the art because these schemes have carried out setting forth in background technology part of the present invention.
Summary of the invention
Above-mentioned existing inquiry and statistical analysis technique, because needs are inquired about and statistical study one by one to multiple data source, the efficiency of inquiry and statistical study is lower, and available information amount is less and accuracy is poor.
The embodiment of the present invention provides a kind of signal conditioning package of multi-data source, server and method, extended attribute is carried out by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
According to the first aspect of the embodiment of the present invention, provide a kind of signal conditioning package of multi-data source, described device comprises: the first identifying unit, and described first identifying unit is used for the judgement information at least two data sources being carried out to same entity; Link unit, described link unit is used for the chain of entities in described at least two data sources to receive in the external resource preset; First expanding element, described first expanding element is used for expanding described entity attributes according to the information in external resource; Second identifying unit, described second identifying unit is for judging whether the information after attribute extension meets the condition preset, when the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
According to the second aspect of the embodiment of the present invention, provide a kind of server, described server comprises the signal conditioning package of the multi-data source according to the first aspect of the embodiment of the present invention.
According to the third aspect of the embodiment of the present invention, provide a kind of information processing method of multi-data source, described method comprises: the judgement information at least two data sources being carried out to same entity; Chain of entities in described at least two data sources is received in the external resource preset; According to the information in external resource, described entity attributes is expanded; Judge whether the information after attribute extension meets the condition preset, when the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
Beneficial effect of the present invention is: carry out extended attribute by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
With reference to explanation hereinafter and accompanying drawing, disclose in detail particular implementation of the present invention, specifying principle of the present invention can adopted mode.Should be appreciated that, thus embodiments of the present invention are not restricted in scope.In the spirit of claims and the scope of clause, embodiments of the present invention comprise many changes, amendment and are equal to.
The feature described for a kind of embodiment and/or illustrate can use in one or more other embodiment in same or similar mode, combined with the feature in other embodiment, or substitutes the feature in other embodiment.
Should emphasize, term " comprises/comprises " existence referring to feature, one integral piece, step or assembly when using herein, but does not get rid of the existence or additional of one or more further feature, one integral piece, step or assembly.
Accompanying drawing explanation
Included accompanying drawing is used to provide the further understanding to the embodiment of the present invention, which constituting a part for instructions, for illustrating embodiments of the present invention, and coming together to explain principle of the present invention with text description.Apparently, the accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.In the accompanying drawings:
Fig. 1 is the structural representation of the signal conditioning package of the multi-data source of the embodiment of the present invention 1;
Fig. 2 is the structural representation of the first identifying unit of the embodiment of the present invention 1;
Fig. 3 is the method flow diagram carrying out the judgement of same entity of the embodiment of the present invention 1;
Fig. 4 is the structural representation of the link unit of the embodiment of the present invention 1;
Fig. 5 is the method flow diagram chain of entities at least two data sources received in the external resource preset of the embodiment of the present invention 1;
Fig. 6 is the structural representation of the first expanding element of the embodiment of the present invention 1;
Fig. 7 is the method flow diagram expanded this entity attributes according to the information in external resource of the embodiment of the present invention 1;
Fig. 8 is the structural representation of the translation unit of the embodiment of the present invention 1;
Fig. 9 is the method flow diagram translated multilingual entity of the embodiment of the present invention 1;
Figure 10 is the structural representation of the integrated unit of the embodiment of the present invention 1;
Figure 11 is the method flow diagram information at least two data sources being carried out to the fusion of same alike result of the embodiment of the present invention 1;
Figure 12 is a schematic block diagram of the System's composition of the server of the embodiment of the present invention 2;
Figure 13 is the process flow diagram of the information processing method of the multi-data source of the embodiment of the present invention 3;
Figure 14 is the process flow diagram of the information processing method of the multi-data source of the embodiment of the present invention 4.
Embodiment
With reference to accompanying drawing, by instructions below, aforementioned and further feature of the present invention will become obvious.In the specification and illustrated in the drawings, specifically disclose particular implementation of the present invention, which show the some embodiments that wherein can adopt principle of the present invention, will be appreciated that, the invention is not restricted to described embodiment, on the contrary, the present invention includes the whole amendments fallen in the scope of claims, modification and equivalent.
Embodiment 1
Fig. 1 is the structural representation of the signal conditioning package of the multi-data source of the embodiment of the present invention 1.As shown in Figure 1, this device 100 comprises: the first identifying unit 101, link unit 102, first expanding element 103 and the second identifying unit 104, wherein,
First identifying unit 101 is for carrying out the judgement of same entity to the information at least two data sources;
Link unit 102 is for receiving in the external resource that presets by the chain of entities at least two data sources;
First expanding element 103 is for expanding this entity attributes according to the information in external resource;
Second identifying unit 104 is for judging whether the information after attribute extension meets the condition preset, when the information after this attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out this same entity, when the information after this attribute extension meets the condition preset, the information after this attribute extension is exported.
From above-described embodiment, extended attribute is carried out by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
In the present embodiment, these at least two data sources can comprise any one or more data source of this area, such as, and database, Microsoft Excel, csv file, CRC file etc.Wherein, these at least two data sources can be stored in the outside of the signal conditioning package of this multi-data source, and also can be stored in the signal conditioning package of this multi-data source, the embodiment of the present invention does not limit the memory location of data source.
In the present embodiment, this entity can comprise any one or more entity of this area, such as, and name, place name or mechanism's name etc.The embodiment of the present invention does not limit the particular type of entity and quantity.
In the present embodiment, judgement is carried out to same entity and can comprise two kinds of situations: different variants whether points to same entity and whether same entity has ambiguity thus in fact represent different entities.But the embodiment of the present invention is not limited to this two kinds of situations.
Wherein, whether point to same entity for different variants, such as, same name, mechanism's name or place name may have different describing methods, therefore there is multiple variant, thus need to be pointed to same entity; Whether have ambiguity for same entity thus in fact represent different entities, such as, in fact same name may point to different people, or in fact same place name may point to different places, needs it separately to represent different entities.
In the present embodiment, any one method existing can be used information at least two data sources to be carried out to the judgement of same entity.Below exemplary explanation is carried out to the method for carrying out the judgement of same entity of the embodiment of the present invention.
Fig. 2 is the structural representation of first identifying unit of the present embodiment.As shown in Figure 2, this first identifying unit 101 comprises: set up unit 201, grouped element 202 and separative element 203, wherein,
Set up unit 201 for comparing the similarity degree between each entity, entity similarity being greater than the threshold value preset is got together, thus sets up candidate pool;
Grouped element 202, for according to other information of arranging relevant to this entity, carries out merging to the information in candidate pool and distinguishes, utilizing clustering method to mark off each different candidate's group of entities;
The entity containing mutex propertiy in the information of this candidate's group of entities, for utilizing rule-based method, is separated by separative element 203.
Fig. 3 is the method flow diagram carrying out the judgement of same entity of the present embodiment.As shown in Figure 3, the method comprises:
Step 301: compare the similarity degree between each entity, entity similarity being greater than the threshold value preset is got together, thus sets up candidate pool;
Step 302: according to other information of arranging relevant to this entity, carries out merging to the information in candidate pool and distinguishes, utilizing clustering method to mark off each different candidate's group of entities;
Step 303: utilize rule-based method, is separated the entity containing mutex propertiy in the information of this candidate's group of entities.
In the present embodiment, compare the similarity degree between each entity, entity similarity being greater than the threshold value preset is got together, and such as, when carrying out name and judging, can compare the similarity degree between each name.
In the present embodiment, according to other information of arranging relevant to this entity, carry out merging to the information in candidate pool and distinguish, utilizing clustering method to mark off each different candidate's group of entities.Such as, when carrying out name and judging, can address be utilized, the information such as unit, find out statistical knowledge, strong differentiation attribute etc.Then, any one clustering method existing can be used to mark off different candidate's group of entities, such as, use Agglomerative Hierarchical Clustering method (HierarchicalAgglomerativeCluster), k average (k-means) clustering procedure etc.The embodiment of the present invention does not limit the concrete grammar of cluster.
In the present embodiment, after at the first identifying unit 101 information at least two data sources being carried out to the judgement of same entity, based on the result judged, the chain of entities at least two data sources is received in the external resource preset by link unit 102.Wherein, this external resource preset can be existing any one or multiple external resource, such as, and wikipedia (Wikipedia), Freebase etc.
In the present embodiment, any one method existing can be used to be linked in the external resource preset.Below exemplary explanation is carried out to the method received in the external resource preset by the chain of entities at least two data sources of the embodiment of the present invention.
Fig. 4 is the structural representation of the link unit of the present embodiment.As shown in Figure 4, this link unit 102 comprises: the 3rd identifying unit 401, first is searched unit 402 and second and searched unit 403, wherein,
3rd identifying unit 401 is for judging whether this entity exists ambiguity;
First searches unit 402 for when this entity does not exist ambiguity, utilizes the method for coupling and/or extended attribute completely to search information in this external resource;
Second searches unit 403 for when this entity exists ambiguity, utilizes the data separation in other attributes relevant to this entity and this external resource, in this external resource, searches information.
Fig. 5 is the method flow diagram chain of entities at least two data sources received in the external resource preset of the present embodiment.As shown in Figure 5, the method comprises:
Step 501: judge whether this entity exists ambiguity; Wherein, when judged result is "No", enter step 502, when result of determination is "Yes", enter step 503;
Step 502: utilize the method for coupling and/or extended attribute completely to search information in this external resource;
Step 503: utilize the data separation in other attributes relevant to this entity and this external resource, search information in this external resource.
In the present embodiment, after receiving in the external resource preset by the chain of entities in described at least two data sources, the first expanding element 103 is expanded this entity attributes according to the information in external resource.Wherein, any one method existing can be used to expand this entity attributes according to the information in external resource, according to the information in external resource, exemplary explanation be carried out to the method that this entity attributes is expanded below to the embodiment of the present invention.
Fig. 6 is the structural representation of first expanding element of the present embodiment.As shown in Figure 6, this first expanding element 103 comprises: the first expansion module 601 and the second expansion module 602, wherein,
First expansion module 601 is for expanding this entity attributes according to the structured message in external resource;
Second expansion module 602 for extracting structured message from the unstructured information in external resource, thus is expanded this entity attributes.
In the present embodiment, the first expanding element 103 can comprise the first expansion module 601 and the second expansion module 602 simultaneously, also can comprise any one module wherein.
Fig. 7 is the method flow diagram expanded this entity attributes according to the information in external resource of the present embodiment.As shown in Figure 7, the method comprises:
Step 701: this entity attributes is expanded according to the structured message in external resource;
Step 702: extract structured message from the unstructured information external resource, thus this entity attributes is expanded.
In the present embodiment, the method can comprise step 701 and step 702 simultaneously, also can have any one step wherein.
In the present embodiment, such as, the formatted Infobox information in wikipedia (Wikipedia) can be utilized, or other information in the page, this entity attributes is expanded.
In the present embodiment, after according to the information in external resource described entity attributes being expanded, whether the information after the second identifying unit 104 judges attribute extension meets the condition preset, when the information after attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out above-mentioned same entity, when the information after described attribute extension meets the condition preset, the information after attribute extension is exported.
In the present embodiment, this condition preset can be determined according to actual needs.Such as, this condition preset is: iterations reaches the number of times of setting, or the knots modification of information after attribute extension is less than the threshold value of setting.
In the present embodiment, when the information after attribute extension does not meet the condition preset, the information after attribute extension is used for the judgement carrying out above-mentioned same entity, based on the information namely after attribute extension, repeat above-mentioned steps, until meet this condition preset.By this iterative process, the accuracy of information and comprehensive can be improved constantly.
In the present embodiment, this device can also comprise: the first completion unit 105, wherein, first completion unit 105 is for according to the information after the judgement of above-mentioned same entity, carry out same entity attributes completion, and the information after attribute completion is used for carrying out above-mentioned link, wherein, when carrying out this attribute completion, indicate source and/or the confidence level of this attribute.
In the present embodiment, any one method existing can be used to carry out same entity attributes completion.Such as, other information arranged can be utilized to carry out the completion of missing information, wherein, the mode increasing candidate can be adopted to carry out completion.
Such as, for the two row information pointing to same person, contain address information if any in a line information, and do not have in another row, then address information can be joined in the row of not this information, and indicate source and/or the confidence level of this attribute when adding this address information.Wherein, this confidence level can utilize any one method existing to obtain, and such as, this confidence level can utilize rule or statistical information and obtain.
In the present embodiment, the first completion unit 105 is selectable unit (SU)s, is indicated by the dashed box in FIG.
By carrying out same entity attributes completion according to the information after the judgement of above-mentioned same entity, the accuracy of information and comprehensive can be improved further.
In the present embodiment, this device can also comprise: translation unit 106, and wherein, the result after translation for translating multilingual entity, and is used for carrying out described link by translation unit 106.Wherein, any one method existing can be used to translate multilingual entity, below to the embodiment of the present invention, exemplary explanation be carried out to the method that multilingual entity is translated.
Fig. 8 is the structural representation of the translation unit of the present embodiment.As shown in Figure 8, this translation unit 106 comprises: candidate's acquiring unit 801, retrieval unit 802 and determining unit 803, wherein,
The translation candidate item of candidate's acquiring unit 801 for utilizing mechanical translation to obtain this entity;
The co-occurrence statistics information of retrieval unit 802 for utilizing search engine retrieving to obtain this entity and translation candidate item, thus obtain possible candidate couple;
Determining unit 803, for utilizing the similarity degree of the respective attributes of other information in result for retrieval and this entity, determines that each candidate of this possible candidate's centering is to the confidence level for correct translation result.
Fig. 9 is the method flow diagram translated multilingual entity of the present embodiment.As shown in Figure 9, the method comprises:
Step 901: utilize mechanical translation to obtain the translation candidate item of this entity;
Step 902: utilize search engine retrieving to obtain the co-occurrence statistics information of this entity and translation candidate item, thus obtain possible candidate couple;
Step 903: the similarity degree utilizing the respective attributes of other information in result for retrieval and this entity, determines that each candidate of this possible candidate's centering is to the confidence level for correct translation result.
The mechanical translation that utilizes of the present embodiment obtains candidate item, utilizes search engine to carry out retrieving and determine that confidence level can use any one method existing.Such as, for bibliographic data base, information about paper may be described with Chinese and English bilingual, same author also can deliver Chinese literature and corresponding english literature, the existing translation system towards name can be utilized, search the translation candidate item of this name corresponding, and the co-occurrence of the Chinese and English that utilize search to quote this name of retrieval amounts to information, utilize the similarity degree of the respective attributes of other information in result for retrieval and this name, each candidate of this possible candidate's centering is to the confidence level for correct translation result to adopt existing rule-based method to determine.
In the present embodiment, translation unit 106 is selectable unit (SU)s, is indicated by the dashed box in FIG.
By translating multilingual entity, and the result after translation being used for carrying out described link, the accuracy of information and comprehensive can being improved further, be beneficial to the analytic statistics of information.
In the present embodiment, this device can also comprise: the second completion unit 107, and it, for according to the result after translating multilingual entity, carries out same entity attributes completion, and the information after attribute completion be used for carrying out above-mentioned link.
In the present embodiment, this second completion unit 107 carries out same entity attributes completion and can use any one method existing.Such as, method identical when to carry out same entity attributes completion with the first completion unit 105 can be used, repeat no more herein.
In the present embodiment, the second completion unit 107 is selectable unit (SU)s, is indicated by the dashed box in FIG.
By according to the result after translating multilingual entity, carry out same entity attributes completion, and the information after attribute completion is used for carrying out above-mentioned link, the accuracy of information and comprehensive can be improved further.
In the present embodiment, this device can also comprise: integrated unit 108, and it is for carrying out the fusion of same alike result to the information at least two data sources, and the information after being merged by attribute is used for the judgement carrying out above-mentioned same entity.Wherein, any one method existing can be used to carry out the fusion of same alike result.Below to the embodiment of the present invention, exemplary explanation is carried out to the method that the information at least two data sources carries out the fusion of same alike result.
Figure 10 is the structural representation of the integrated unit of the present embodiment.As shown in Figure 10, this integrated unit 108 comprises: the 4th identifying unit 1001 and the 5th identifying unit 1002, wherein,
According to the distribution similarity degree of the different field at least two data sources, 4th identifying unit 1001 is for tentatively judging whether this different field may point to same attribute;
5th identifying unit 1002, for when the identical repetition example in the field that these are different is greater than the ratio preset, judges that this different field points to same attribute.
Figure 11 is the method flow diagram information at least two data sources being carried out to the fusion of same alike result of the present embodiment.As shown in figure 11, the method comprises:
Step 1101: tentatively judge whether this different field may point to same attribute according to the distribution similarity degree of the different field at least two data sources;
Step 1102: when the identical repetition example in the field that these are different is greater than the ratio preset, judge that this different field points to same attribute.
In the present embodiment, any one method existing can be used tentatively to judge whether this different field may point to same attribute according to the distribution similarity degree of the different field at least two data sources.Such as, " name ", " name ", different fields such as " Name " states same implication, the information such as the length of example in field and conventional N tuple (N-gram) can be used, tentatively judge whether two fields may point to same attribute information according to distribution similarity degree.
In the present embodiment, this ratio preset can set according to actual needs, and the embodiment of the present invention does not limit the numerical value of this ratio.
In the present embodiment, integrated unit 108 is selectable unit (SU)s, is indicated by the dashed box in FIG.
By carrying out the fusion of same alike result to the information at least two data sources, and the information after being merged by attribute is used for the judgement carrying out above-mentioned same entity, can improve the accuracy of information and comprehensive further.
In the present embodiment, this device can also comprise: cleaning unit 109, and the information after data cleansing for carrying out data cleansing to the information at least two data sources, and is used for the fusion carrying out above-mentioned same alike result by it.Wherein, any one method existing can be used to carry out data cleansing to the information at least two data sources.
Such as, rule-based method can be utilized to carry out data cleansing to the information at least two data sources, wherein, for other data of character level, the process such as character code, the unification of full-shape half-angle and special symbol judgement can be carried out, for the data of character string rank, the process such as network string processing and the judgement of name surname can be carried out, such as, carry out English surname abbreviation, the analysis of name front and back position and the English alphabet based on probability statistics and spell the process such as rectification.
From above-described embodiment, extended attribute is carried out by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
Embodiment 2
The embodiment of the present invention provides a kind of server, and this server comprises the signal conditioning package of multi-data source as described in Example 1.
Figure 12 is a schematic block diagram of the System's composition of the server 1200 of the embodiment of the present invention 2.As shown in figure 12, server 1200 can comprise central processing unit 1201 and storer 1202; Storer 1202 is coupled to central processing unit 1201.This figure is exemplary; The structure of other types can also be used, supplement or replace this structure, to realize telecommunications functions or other functions.
As shown in figure 12, this server 1200 can also comprise: communication module 1203, input block 1204, display 1205, power supply 1206.
In one embodiment, the function of the signal conditioning package of multi-data source can be integrated in central processing unit 1201.Wherein, central processing unit 1201 can be configured to: the judgement information at least two data sources being carried out to same entity; Chain of entities in described at least two data sources is received in the external resource preset; According to the information in external resource, described entity attributes is expanded; Judge whether the information after attribute extension meets the condition preset, when the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
Central processing unit 1201 can also be configured to: according to the information after the judgement of described same entity, carry out same entity attributes completion, and the information after attribute completion is used for carrying out described link, wherein, when carrying out described attribute completion, indicate source and/or the confidence level of described attribute.
Central processing unit 1201 can also be configured to: translate multilingual entity, and the result after translation is used for carrying out described link; Wherein, describedly translation is carried out to multilingual entity comprise: utilize mechanical translation to obtain the translation candidate item of described entity; Utilize search engine retrieving to obtain the co-occurrence statistics information of described entity and translation candidate item, thus obtain possible candidate couple; Utilize the similarity degree of the respective attributes of other information in result for retrieval and described entity, determine that each candidate of described possible candidate's centering is to the confidence level for correct translation result.
Central processing unit 1201 can also be configured to: according to the result after translating multilingual entity, carry out same entity attributes completion.
Central processing unit 1201 can also be configured to: the fusion information at least two data sources being carried out to same alike result, and the information after being merged by attribute is used for the judgement carrying out described same entity; Wherein, described the fusion that information at least two data sources carries out same alike result to be comprised: tentatively judge whether described different field may point to same attribute according to the distribution similarity degree of the different field at least two data sources; When the identical repetition example in described different field is greater than the ratio preset, judge that described different field points to same attribute.
Central processing unit 1201 can also be configured to: carry out data cleansing to the information at least two data sources, and the information after data cleansing is used for the fusion carrying out described same alike result.
Wherein, described the judgement that information at least two data sources carries out same entity to be comprised: judge whether different variants points to same entity and whether same entity has ambiguity thus in fact represent different entities.
Wherein, described the judgement that information at least two data sources carries out same entity to be comprised: compare the similarity degree between each entity, entity similarity being greater than the threshold value preset is got together, thus sets up candidate pool; According to other information of arranging relevant to described entity, carry out merging to the information in candidate pool and distinguish, utilizing clustering method to mark off each different candidate's group of entities; Utilize rule-based method, the entity containing mutex propertiy in the information of described candidate's group of entities is separated.
Wherein, describedly chain of entities in described at least two data sources is received the external resource preset comprise: judge whether described entity exists ambiguity; When described entity does not exist ambiguity, the method for coupling and/or extended attribute is completely utilized to search information in described external resource; When described entity exists ambiguity, utilize the data separation in other attributes relevant to described entity and described external resource, in described external resource, search information.
Wherein, describedly according to the information in external resource, expansion is carried out to described entity attributes and comprise: according to the structured message in external resource, described entity attributes is expanded; And/or extract structured message from the unstructured information external resource, thus described entity attributes is expanded.
In another embodiment, the signal conditioning package of multi-data source can with central processing unit 1201 separate configuration, such as the information processing apparatus of multi-data source can be set to the chip be connected with central processing unit 1201, be realized the function of the signal conditioning package of multi-data source by the control of central processing unit.
Server 1200 is also not necessary to all parts of comprising shown in Figure 12 in the present embodiment
As shown in figure 12, central processing unit 1201, sometimes also referred to as controller or operational controls, can comprise microprocessor or other processor devices and/or logical unit, and central processing unit 1201 receives input and the operation of all parts of Control Server 1200.
Storer 1202 can be such as one or more of in buffer, flash memory, hard disk driver, removable medium, volatile memory, nonvolatile memory or other appropriate device.The above-mentioned information relevant with failure can be stored, execution program for information about can be stored in addition.And central processing unit 1201 can perform this program that this storer 1202 stores, to realize information storage or process etc.The function of miscellaneous part and existing similar, repeats no more herein.Each parts of server 1200 can be realized by specialized hardware, firmware, software or its combination, and do not depart from scope of the present invention.
From above-described embodiment, extended attribute is carried out by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
Embodiment 3
Figure 13 is the process flow diagram of the information processing method of the multi-data source of the embodiment of the present invention 3, corresponding to the signal conditioning package of the multi-data source of embodiment 1.As shown in figure 13, the method comprises:
Step 1301: the judgement information at least two data sources being carried out to same entity;
Step 1302: the chain of entities at least two data sources is received in the external resource preset;
Step 1303: this entity attributes is expanded according to the information in external resource;
Step 1304: judge whether the information after attribute extension meets the condition preset, when the information after attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out this same entity, when the information after attribute extension meets the condition preset, the information after attribute extension is exported.
In the present embodiment, carry out the method for the judgement of same entity, the chain of entities at least two data sources is received the external resource that presets in method, the method this entity attributes expanded according to the information in external resource and to judge whether the information after attribute extension meets the method for the condition preset identical with the record of embodiment 1, repeat no more herein.
From above-described embodiment, extended attribute is carried out by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
Embodiment 4
Figure 14 is the process flow diagram of the information processing method of the multi-data source of the embodiment of the present invention 4, corresponding to the signal conditioning package of the multi-data source of embodiment 1.As shown in figure 14, the method comprises:
Step 1401: data cleansing is carried out to the information at least two data sources;
Step 1402: the fusion information after data cleansing being carried out to same alike result;
Step 1403: the judgement carrying out same entity;
Step 1404: according to the information after the judgement of described same entity, carry out same entity attributes completion, wherein, when carrying out this attribute completion, indicates source and/or the confidence level of described attribute;
Step 1405: the multilingual entity in the information after attribute completion is translated;
Step 1406: according to the result after translation, carry out same entity attributes completion;
Step 1407: the chain of entities in the information after same entity attributes completion is received in the external resource preset;
Step 1408: this entity attributes is expanded according to the information in external resource;
Step 1409: judge whether the information after attribute extension meets the condition preset, and wherein, when judged result is "No", enters step 1403, when judged result is "Yes", enters step 1410;
Step 1410: the information after this attribute extension is exported.
In the present embodiment, carry out the method for data cleansing, carry out the method for the fusion of same alike result, carry out the method for the judgement of same entity, carry out the method for same entity attributes completion, to the method that the multilingual entity in the information after attribute completion is translated, carry out the method for same entity attributes completion, chain of entities at least two data sources is received the method in the external resource preset, the method this entity attributes expanded according to the information in external resource and to judge whether the information after attribute extension meets the method for the condition preset identical with the record of embodiment 1, repeat no more herein.
From above-described embodiment, extended attribute is carried out by judging same entity and linking external resource, and carry out above-mentioned judgement and link in an iterative manner, effectively can carry out the information integration of multi-data source, improve the accuracy of information and comprehensive.
The embodiment of the present invention also provides a kind of computer-readable program, wherein when performing described program in the signal conditioning package or server of multi-data source, described program makes computing machine in described information acquisition device or server, perform the information processing method of embodiment 3 or the multi-data source described in embodiment 4.
The embodiment of the present invention also provides a kind of storage medium storing computer-readable program, and wherein said computer-readable program makes computing machine in the signal conditioning package or server of multi-data source, perform the information processing method of the multi-data source described in embodiment 3 or embodiment 4.
Apparatus and method more than the present invention can by hardware implementing, also can by combination of hardware software simulating.The present invention relates to such computer-readable program, when this program is performed by logical block, this logical block can be made to realize device mentioned above or component parts, or make this logical block realize various method mentioned above or step.The invention still further relates to the storage medium for storing above program, as hard disk, disk, CD, DVD, flash storer etc.
More than in conjunction with concrete embodiment, invention has been described, but it will be apparent to those skilled in the art that these descriptions are all exemplary, is not limiting the scope of the invention.Those skilled in the art can make various variants and modifications according to spirit of the present invention and principle to the present invention, and these variants and modifications also within the scope of the invention.
About the embodiment comprising above embodiment, following remarks is also disclosed:
The signal conditioning package of remarks 1, a kind of multi-data source, described signal conditioning package comprises:
First identifying unit, described first identifying unit is used for the judgement information at least two data sources being carried out to same entity;
Link unit, described link unit is used for the chain of entities in described at least two data sources to receive in the external resource preset;
First expanding element, described first expanding element is used for expanding described entity attributes according to the information in external resource;
Second identifying unit, described second identifying unit is for judging whether the information after attribute extension meets the condition preset; When the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
Remarks 2, signal conditioning package according to remarks 1, wherein, described signal conditioning package also comprises:
First completion unit, described first completion unit is used for according to the information after the judgement of described same entity, carries out same entity attributes completion, and the information after attribute completion is used for carrying out described link; Wherein, when carrying out described attribute completion, indicate source and/or the confidence level of described attribute.
Remarks 3, signal conditioning package according to remarks 1, wherein, described signal conditioning package also comprises:
Translation unit, described translation unit is used for translating multilingual entity, and the result after translation is used for carrying out described link;
Wherein, described translation unit comprises:
Candidate's acquiring unit, the translation candidate item of described candidate's acquiring unit for utilizing mechanical translation to obtain described entity;
Retrieval unit, the co-occurrence statistics information of described retrieval unit for utilizing search engine retrieving to obtain described entity and translation candidate item, thus obtain possible candidate couple;
Determining unit, described determining unit, for utilizing the similarity degree of the respective attributes of other information in result for retrieval and described entity, determines that each candidate of described possible candidate's centering is to the confidence level for correct translation result.
Remarks 4, signal conditioning package according to remarks 3, wherein, described signal conditioning package also comprises:
Second completion unit, described second completion unit is used for, according to the result after translating multilingual entity, carrying out same entity attributes completion.
Remarks 5, signal conditioning package according to remarks 1, wherein, described first identifying unit is for judging different variants and whether point to same entity and whether same entity having ambiguity thus in fact represent different entities.
Remarks 6, signal conditioning package according to remarks 1, wherein, described first identifying unit comprises:
Set up unit, described unit of setting up is for comparing the similarity degree between each entity, and entity similarity being greater than the threshold value preset is got together, thus sets up candidate pool;
Grouped element, described grouped element is used for according to other information of arranging relevant to described entity, carries out merging and distinguishes, utilize clustering method to mark off each different candidate's group of entities to the information in candidate pool;
Separative element, described separative element is used for utilizing rule-based method, is separated by the entity containing mutex propertiy in the information of described candidate's group of entities.
Remarks 7, signal conditioning package according to remarks 1, wherein, described link unit comprises:
3rd identifying unit, described 3rd identifying unit is for judging whether described entity exists ambiguity;
First searches unit, and described first searches unit for when described entity does not exist ambiguity, utilizes the method for coupling and/or extended attribute completely to search information in described external resource;
Second searches unit, and described second searches unit for when described entity exists ambiguity, utilizes the data separation in other attributes relevant to described entity and described external resource, in described external resource, searches information.
Remarks 8, signal conditioning package according to remarks 1, wherein, described first expanding element comprises:
First expansion module, described first expansion module is used for expanding described entity attributes according to the structured message in external resource; And/or
Second expansion module, described second expansion module is used for extracting structured message from the unstructured information external resource, thus expands described entity attributes.
Remarks 9, signal conditioning package according to remarks 1, wherein, described signal conditioning package also comprises:
Integrated unit, described integrated unit is used for the fusion information at least two data sources being carried out to same alike result, and the information after being merged by attribute is used for the judgement carrying out described same entity;
Wherein, described integrated unit comprises:
4th identifying unit, described 4th identifying unit is used for tentatively judging whether described different field may point to same attribute according to the distribution similarity degree of the different field at least two data sources;
5th identifying unit, described 5th identifying unit is used for when the identical repetition example in described different field is greater than the ratio preset, and judges that described different field points to same attribute.
Remarks 10, signal conditioning package according to remarks 9, wherein, described signal conditioning package also comprises:
Cleaning unit, described cleaning unit is used for carrying out data cleansing to the information at least two data sources, and the information after data cleansing is used for the fusion carrying out described same alike result.
Remarks 11, a kind of server, described server comprises the signal conditioning package of the multi-data source according to any one of remarks 1-10.
The information processing method of remarks 12, a kind of multi-data source, described information processing method comprises:
Information at least two data sources is carried out to the judgement of same entity;
Chain of entities in described at least two data sources is received in the external resource preset;
According to the information in external resource, described entity attributes is expanded;
Judge whether the information after attribute extension meets the condition preset; When the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
Remarks 13, information processing method according to remarks 12, wherein, described information processing method also comprises:
According to the information after the judgement of described same entity, carry out same entity attributes completion, and the information after attribute completion is used for carrying out described link; Wherein, when carrying out described attribute completion, indicate source and/or the confidence level of described attribute.
Remarks 14, information processing method according to remarks 12, wherein, described information processing method also comprises:
Multilingual entity is translated, and the result after translation is used for carrying out described link;
Wherein, describedly translation carried out to multilingual entity comprise:
Mechanical translation is utilized to obtain the translation candidate item of described entity;
Utilize search engine retrieving to obtain the co-occurrence statistics information of described entity and translation candidate item, thus obtain possible candidate couple;
Utilize the similarity degree of the respective attributes of other information in result for retrieval and described entity, determine that each candidate of described possible candidate's centering is to the confidence level for correct translation result.
Remarks 15, information processing method according to remarks 14, wherein, described information processing method also comprises:
According to the result after translating multilingual entity, carry out same entity attributes completion.
Remarks 16, information processing method according to remarks 12, wherein, describedly to comprise the judgement that the information at least two data sources carries out same entity:
Judge whether different variants points to same entity and whether same entity has ambiguity thus in fact represent different entities.
Remarks 17, information processing method according to remarks 12, wherein, describedly to comprise the judgement that the information at least two data sources carries out same entity:
Similarity degree relatively between each entity, entity similarity being greater than the threshold value preset is got together, thus sets up candidate pool;
According to other information of arranging relevant to described entity, carry out merging to the information in candidate pool and distinguish, utilizing clustering method to mark off each different candidate's group of entities;
Utilize rule-based method, the entity containing mutex propertiy in the information of described candidate's group of entities is separated.
Remarks 18, information processing method according to remarks 12, wherein, describedly chain of entities in described at least two data sources is received the external resource preset comprise:
Judge whether described entity exists ambiguity;
When described entity does not exist ambiguity, the method for coupling and/or extended attribute is completely utilized to search information in described external resource;
When described entity exists ambiguity, utilize the data separation in other attributes relevant to described entity and described external resource, in described external resource, search information.
Remarks 19, information processing method according to remarks 12, wherein, describedly carry out expansion according to the information in external resource to described entity attributes and comprise:
According to the structured message in external resource, described entity attributes is expanded; And/or
From the unstructured information external resource, extract structured message, thus described entity attributes is expanded.
Remarks 20, information processing method according to remarks 12, wherein, described information processing method also comprises:
Information at least two data sources is carried out to the fusion of same alike result, and the information after being merged by attribute is used for the judgement carrying out described same entity;
Wherein, described the fusion that information at least two data sources carries out same alike result to be comprised:
Tentatively judge whether described different field may point to same attribute according to the distribution similarity degree of the different field at least two data sources;
When the identical repetition example in described different field is greater than the ratio preset, judge that described different field points to same attribute.

Claims (10)

1. a signal conditioning package for multi-data source, described signal conditioning package comprises:
First identifying unit, described first identifying unit is used for the judgement information at least two data sources being carried out to same entity;
Link unit, described link unit is used for the chain of entities in described at least two data sources to receive in the external resource preset;
First expanding element, described first expanding element is used for expanding described entity attributes according to the information in external resource;
Second identifying unit, described second identifying unit is for judging whether the information after attribute extension meets the condition preset; When the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
2. signal conditioning package according to claim 1, wherein, described signal conditioning package also comprises:
First completion unit, described first completion unit is used for according to the information after the judgement of described same entity, carries out same entity attributes completion, and the information after attribute completion is used for carrying out described link; Wherein, when carrying out described attribute completion, indicate source and/or the confidence level of described attribute.
3. signal conditioning package according to claim 1, wherein, described signal conditioning package also comprises:
Translation unit, described translation unit is used for translating multilingual entity, and the result after translation is used for carrying out described link;
Wherein, described translation unit comprises:
Candidate's acquiring unit, the translation candidate item of described candidate's acquiring unit for utilizing mechanical translation to obtain described entity;
Retrieval unit, the co-occurrence statistics information of described retrieval unit for utilizing search engine retrieving to obtain described entity and translation candidate item, thus obtain possible candidate couple;
Determining unit, described determining unit, for utilizing the similarity degree of the respective attributes of other information in result for retrieval and described entity, determines that each candidate of described possible candidate's centering is to the confidence level for correct translation result.
4. signal conditioning package according to claim 3, wherein, described signal conditioning package also comprises:
Second completion unit, described second completion unit is used for, according to the result after translating multilingual entity, carrying out same entity attributes completion.
5. signal conditioning package according to claim 1, wherein, described first identifying unit is for judging different variants and whether point to same entity and whether same entity having ambiguity thus in fact represent different entities.
6. signal conditioning package according to claim 1, wherein, described first identifying unit comprises:
Set up unit, described unit of setting up is for comparing the similarity degree between each entity, and entity similarity being greater than the threshold value preset is got together, thus sets up candidate pool;
Grouped element, described grouped element is used for according to other information of arranging relevant to described entity, carries out merging and distinguishes, utilize clustering method to mark off each different candidate's group of entities to the information in candidate pool;
Separative element, described separative element is used for utilizing rule-based method, is separated by the entity containing mutex propertiy in the information of described candidate's group of entities.
7. signal conditioning package according to claim 1, wherein, described link unit comprises:
3rd identifying unit, described 3rd identifying unit is for judging whether described entity exists ambiguity;
First searches unit, and described first searches unit for when described entity does not exist ambiguity, utilizes the method for coupling and/or extended attribute completely to search information in described external resource;
Second searches unit, and described second searches unit for when described entity exists ambiguity, utilizes the data separation in other attributes relevant to described entity and described external resource, in described external resource, searches information.
8. signal conditioning package according to claim 1, wherein, described first expanding element comprises:
First expansion module, described first expansion module is used for expanding described entity attributes according to the structured message in external resource; And/or
Second expansion module, described second expansion module is used for extracting structured message from the unstructured information external resource, thus expands described entity attributes.
9. signal conditioning package according to claim 1, wherein, described signal conditioning package also comprises:
Integrated unit, described integrated unit is used for the fusion information at least two data sources being carried out to same alike result, and the information after being merged by attribute is used for the judgement carrying out described same entity;
Wherein, described integrated unit comprises:
4th identifying unit, described 4th identifying unit is used for tentatively judging whether described different field may point to same attribute according to the distribution similarity degree of the different field at least two data sources;
5th identifying unit, described 5th identifying unit is used for when the identical repetition example in described different field is greater than the ratio preset, and judges that described different field points to same attribute.
10. an information processing method for multi-data source, described information processing method comprises:
Information at least two data sources is carried out to the judgement of same entity;
Chain of entities in described at least two data sources is received in the external resource preset;
According to the information in external resource, described entity attributes is expanded;
Judge whether the information after attribute extension meets the condition preset; When the information after described attribute extension does not meet the condition preset, information after attribute extension is used for the judgement carrying out described same entity, when the information after described attribute extension meets the condition preset, the information after described attribute extension is exported.
CN201410291263.2A 2014-06-25 2014-06-25 Multi-data source information processing device and method, and server Pending CN105335378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410291263.2A CN105335378A (en) 2014-06-25 2014-06-25 Multi-data source information processing device and method, and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410291263.2A CN105335378A (en) 2014-06-25 2014-06-25 Multi-data source information processing device and method, and server

Publications (1)

Publication Number Publication Date
CN105335378A true CN105335378A (en) 2016-02-17

Family

ID=55285919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410291263.2A Pending CN105335378A (en) 2014-06-25 2014-06-25 Multi-data source information processing device and method, and server

Country Status (1)

Country Link
CN (1) CN105335378A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202041A (en) * 2016-07-01 2016-12-07 北京奇虎科技有限公司 A kind of method and apparatus of the entity alignment problem solved in knowledge mapping
CN107122358A (en) * 2016-02-24 2017-09-01 阿里巴巴集团控股有限公司 Mix querying method and equipment
CN107480130A (en) * 2017-07-25 2017-12-15 西北工业大学 The property value homogeneity decision method of relation data based on WEB information
CN107784058A (en) * 2017-04-11 2018-03-09 平安医疗健康管理股份有限公司 Drug data processing method and processing device
CN108182295A (en) * 2018-02-09 2018-06-19 重庆誉存大数据科技有限公司 A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN109034199A (en) * 2018-06-25 2018-12-18 泰康保险集团股份有限公司 Data processing method and device, storage medium and electronic equipment
CN109960722A (en) * 2019-03-31 2019-07-02 联想(北京)有限公司 A kind of information processing method and device
CN111415749A (en) * 2020-03-12 2020-07-14 深圳中兴网信科技有限公司 Information processing method, information processing apparatus, and computer-readable storage medium
CN113160956A (en) * 2021-04-21 2021-07-23 复旦大学附属中山医院 Patient management method and system based on multi-identity data fusion
CN113157996B (en) * 2020-01-23 2022-09-16 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975772A (en) * 2006-12-22 2007-06-06 中国建设银行股份有限公司 Method and device for integrating information in multi-system
CN101482876A (en) * 2008-12-11 2009-07-15 南京大学 Weight-based link multi-attribute entity recognition method
US20120066363A1 (en) * 2010-09-15 2012-03-15 Oracle International Corporation System and method for using a gridlink data source to connect an application server with a clustered database
CN102495892A (en) * 2011-12-09 2012-06-13 北京大学 Webpage information extraction method
CN103246685A (en) * 2012-02-14 2013-08-14 株式会社理光 Method and equipment for normalizing attributes of object instance into features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975772A (en) * 2006-12-22 2007-06-06 中国建设银行股份有限公司 Method and device for integrating information in multi-system
CN101482876A (en) * 2008-12-11 2009-07-15 南京大学 Weight-based link multi-attribute entity recognition method
US20120066363A1 (en) * 2010-09-15 2012-03-15 Oracle International Corporation System and method for using a gridlink data source to connect an application server with a clustered database
CN102495892A (en) * 2011-12-09 2012-06-13 北京大学 Webpage information extraction method
CN103246685A (en) * 2012-02-14 2013-08-14 株式会社理光 Method and equipment for normalizing attributes of object instance into features

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122358A (en) * 2016-02-24 2017-09-01 阿里巴巴集团控股有限公司 Mix querying method and equipment
CN106202041B (en) * 2016-07-01 2019-07-09 北京奇虎科技有限公司 A kind of method and apparatus of entity alignment problem in solution knowledge mapping
CN106202041A (en) * 2016-07-01 2016-12-07 北京奇虎科技有限公司 A kind of method and apparatus of the entity alignment problem solved in knowledge mapping
CN107784058A (en) * 2017-04-11 2018-03-09 平安医疗健康管理股份有限公司 Drug data processing method and processing device
CN107784058B (en) * 2017-04-11 2020-11-13 平安医疗健康管理股份有限公司 Medicine data processing method and device
CN107480130A (en) * 2017-07-25 2017-12-15 西北工业大学 The property value homogeneity decision method of relation data based on WEB information
CN107480130B (en) * 2017-07-25 2020-09-08 西北工业大学 Method for judging attribute value identity of relational data based on WEB information
CN108182295B (en) * 2018-02-09 2021-09-10 重庆电信系统集成有限公司 Enterprise knowledge graph attribute extraction method and system
CN108182295A (en) * 2018-02-09 2018-06-19 重庆誉存大数据科技有限公司 A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN109034199A (en) * 2018-06-25 2018-12-18 泰康保险集团股份有限公司 Data processing method and device, storage medium and electronic equipment
CN109034199B (en) * 2018-06-25 2022-02-01 泰康保险集团股份有限公司 Data processing method and device, storage medium and electronic equipment
CN109960722A (en) * 2019-03-31 2019-07-02 联想(北京)有限公司 A kind of information processing method and device
CN109960722B (en) * 2019-03-31 2021-10-22 联想(北京)有限公司 Information processing method and device
CN113157996B (en) * 2020-01-23 2022-09-16 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium
CN111415749A (en) * 2020-03-12 2020-07-14 深圳中兴网信科技有限公司 Information processing method, information processing apparatus, and computer-readable storage medium
CN113160956A (en) * 2021-04-21 2021-07-23 复旦大学附属中山医院 Patient management method and system based on multi-identity data fusion

Similar Documents

Publication Publication Date Title
CN105335378A (en) Multi-data source information processing device and method, and server
CN111723215B (en) Device and method for establishing biotechnological information knowledge graph based on text mining
AU2016277558B2 (en) Generating a semantic network based on semantic connections between subject-verb-object units
CN101918945B (en) Automatic expanded language search
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
CN111460083B (en) Method and device for constructing document title tree, electronic equipment and storage medium
US10558754B2 (en) Method and system for automating training of named entity recognition in natural language processing
US10360294B2 (en) Methods and systems for efficient and accurate text extraction from unstructured documents
CN101432685B (en) For the method and system of extending database search inquiry
EP3851975A1 (en) Method and apparatus for generating text topics, and electronic device
CN104199831A (en) Information processing method and device
US9183223B2 (en) System for non-deterministic disambiguation and qualitative entity matching of geographical locale data for business entities
KR101509727B1 (en) Apparatus for creating alignment corpus based on unsupervised alignment and method thereof, and apparatus for performing morphological analysis of non-canonical text using the alignment corpus and method thereof
CN111159330A (en) Database query statement generation method and device
CN103942212A (en) User interface character detecting method and device
CN108170752B (en) Template-based metadata management method and system
Döhmen et al. Multi-hypothesis CSV parsing
CN104620241A (en) Multi-language document clustering
CN111753029A (en) Entity relationship extraction method and device
US9208134B2 (en) Methods and systems for tokenizing multilingual textual documents
CN113032371A (en) Database grammar analysis method and device and computer equipment
CN110309258B (en) Input checking method, server and computer readable storage medium
Asadi et al. Pattern-based extraction of addresses from web page content
CN109344254B (en) Address information classification method and device
CN113569974B (en) Programming statement error correction method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160217

WD01 Invention patent application deemed withdrawn after publication