CN105630788B - Method and apparatus for determining the approximate judgement for having distinctiveness true - Google Patents

Method and apparatus for determining the approximate judgement for having distinctiveness true Download PDF

Info

Publication number
CN105630788B
CN105630788B CN201410587566.9A CN201410587566A CN105630788B CN 105630788 B CN105630788 B CN 105630788B CN 201410587566 A CN201410587566 A CN 201410587566A CN 105630788 B CN105630788 B CN 105630788B
Authority
CN
China
Prior art keywords
item
true
distinctiveness
document
judgement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410587566.9A
Other languages
Chinese (zh)
Other versions
CN105630788A (en
Inventor
张碧川
黄耀海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201410587566.9A priority Critical patent/CN105630788B/en
Publication of CN105630788A publication Critical patent/CN105630788A/en
Application granted granted Critical
Publication of CN105630788B publication Critical patent/CN105630788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to the method and apparatus for determining the approximate judgement for having distinctiveness true.The described method includes: obtaining document, wherein document obtained judges item comprising first, and first judges that item is the keyword of predefined type;First is extracted from document obtained and judge item and the first true item, and wherein each first fact item is to judge the associated information of item with first;Judge that item and the first true item obtain first group similar to document using first, and is different from first similar to extraction in document from first group and judges that the second of item judges item and the second true item;And item and the second true item are judged similar to document and second by using first group, detect at least one approximate judgement true with distinctiveness.

Description

Method and apparatus for determining the approximate judgement for having distinctiveness true
Technical field
The present invention relates to the search of similar document, particularly, are related to the past creation for being similar to current input document The search of document.
Background technique
User always needs to judge or determine using certain documents at hand, for example, doctor can be by reference to Certain existing diagnosis reports provide diagnostic result, and traveller can be used shell folder selection and go which or customer that can lead to It crosses reference product introduction and determines which product bought.User can be by using document as current document searching class, to help It helps and judges, and have a look in the case of similar, be made that judgement in the past or determine.
For example, for an input document, can be determined most similar with the input document in similar document searching processing Document as output result.
In US2013/0044925, similar case retrieval device and similar case retrieval method are proposed.In the patent In the method for application, judge that item is the keyword of a predefined type, is that user wants the kernel keyword determined.True item It is the information of certain specified types associated with the judgement item.For the application about diagnosis report, selection diagnoses item, such as Disease outcome or illness result, which are used as, judges item, and selects discovery item as true item.In the method, according to diagnosis item It is used to scan for the diagnostic tree of discovery item creation.
Figure 1A shows the method for similar case retrieval in the patent application US2013/0044925 of the prior art Flow chart.An input document is received in step 110 with reference to Figure 1A.In step 120, extract input document judgement item and True item.In step 130, the fact that judge item and extract one group of similar document of item retrieval of the extraction of input document is used.
The fact that Figure 1B shows the judgement item of the extraction of the input document of the use in US2013/0044925 and extracts Retrieve the flow chart of the processing of one group of similar document.With reference to Figure 1B, in step 131, the relationship for judging item and true item is extracted. Then, in step 132, selection judges item and true item to establish diagnostic tree based on extracted relationship.Finally, in step 133, some similar documents are retrieved in document database using diagnostic tree.
Fig. 1 C shows the schematic diagram of the diagnostic tree in the patent application US2013/0044925 of the prior art.Using The method of US2013/0044925 can be used diagnostic tree as shown in Figure 1 C and be retrieved from document database similar to input text The document of shelves.
In patent US 8,352,416, another similar method for searching for similar document is proposed.The U.S. Patent relates generally to diagnosis report search, and is scanned for using the structure being made of diagnostic result and discovery item.For example, frequency The symptom occurred together numerously and a disease may be constructed a structure.If a pervious document in document database With input document structure having the same, then the document is likely to be retrieved.
Fig. 2A shows the flow chart of the method for similar document searching in patent US 8,352,416.With reference to figure 2A receives input document in step 210.In step 220, the judgement item and true item of input document are extracted.In step 230, make With the fact that judge item and extract one group of similar document of item retrieval of the extraction of input document.
The fact that Fig. 2 B shows the judgement item of the extraction of the input document of the use in patent US 8,352,416 and extracts The flow chart of the processing of item one group of similar document of retrieval.With reference to Fig. 2 B, in step 231, the pass for judging item and true item is extracted System.Then, in step 232, select the judgement item with predetermined relationship type with true item as a structure.Finally, in step Rapid 233, use some similar documents in the structure retrieval document database.
Fig. 2 C shows the schematic diagram of structure used in prior art US 8,352,416.In the structure of Fig. 2 C, show The counting of semantic primitive and semantic primitive is gone out, semantic primitive includes the title of the disease of description and the diagnosis of symptom.According to this It counts, can extract including desired crucial contamination, and can also be extracted from the combination of extraction except desired Entry other than keyword is as relevant keyword.It can retrieve including in desired keyword and relevant keyword One or both diagnosis report.Using the method for US 8,352,416, can be retrieved from document database be similar to it is defeated Enter the document of document.
US2013/0044925 in US 8,352,416 similar document search method and the prior art it is other In method, keyword is extracted from input document, and the then relationship between analysis of key word, to find comprising having class As relationship similar keyword similar document.In the prior art method, a knot of the document is simply shown Fruit, but do not account for the true purpose that user scans for.
The search of similar document is different from the search using inquiry.If user utilizes query search document, inquiry can be with Reflect the purpose of user and the aspect of user's concern.However, he/her is still when user is with a similar document of document searching It is so solely focused on some aspect, and this aspect is the judgement item of the document.
Using the method for the prior art, only a series of document can be returned to user.As a result main to include and input text The identical judgement item of shelves, cannot provide the user with the different certain similar documents for judging item.If user wants ratio Compared with item is judged, he/her needs to read many documents, this is time-consuming.
Substantially with the judgement item judged in item and input document in the search result of method in the prior art retrieval It is identical.Return have it is identical judgement item document be necessary, but return have it is different judgement items similar documents more added with With.For example, doctor provides diagnostic result in report.Returning has very similar discovery item but has different diagnosis As a result report is useful.For example, patient's Index for examination having the same and identical patient symptom, but have different The report of disease is useful.This, which will provide him/her in this case Xiang doctor, should carefully make the signal of interest of diagnosis.
Therefore, it is intended that proposing the new technology of at least one of solution problem of the prior art.
Summary of the invention
It is an object of the present invention to provide the valuable information of the actual search purpose of matching user.
Another object of the present invention is that the time of user's reading documents is saved by tissue search result.
According to an aspect of the invention, there is provided a kind of for determining the side for the approximate judgement for having distinctiveness true Method, comprising: document obtains step, and for obtaining document, wherein the document obtained judges item comprising first, and first judges item It is the keyword of predefined type;Document analysis step extracts first from the document obtained and judges item and the first true item, wherein Each first true item is to judge the associated information of item with first;Similar document analysis step, for using the first judgement Item obtains first group of similar document with the first true item, and is used for from first group similar to extraction in document different from the first judgement The second of item judges item and the second true item;Detecting step is judged with the true approximation of distinctiveness, for by using first The similar document of group and second judges item and the second true item, detects at least one approximate judgement true with distinctiveness, in which: Distinctiveness fact instruction first judges that item and second judges difference between item;And approximation judgement is one of second judgement item, And the approximate judgement and described first judges that the change distance between item is less than scheduled first threshold, wherein the change Distance instruction distinguishes first and judges that item and second judges the difficulty level of item.
According to another aspect of the present invention, a kind of method for similar document searching is provided, comprising: receive input Document;Based on the above-mentioned method for determining the approximate judgement for having distinctiveness true, at least the one of the input document is determined A approximate judgement true with distinctiveness;And it using at least one described approximate judgement true with distinctiveness, obtains One group of the input document is similar to document.
According to a further aspect of the invention, it provides a kind of for determining the dress for the approximate judgement for having distinctiveness true It sets, comprising: document obtaining unit, for obtaining document, wherein the document obtained judges item comprising first, and first judges item It is the keyword of predefined type;Judge item and true item extraction unit, judges item and true item for extracting;Document analysis list Member, for using judging that item and true item extraction unit extract first from the document obtained and judge item and the first fact item, wherein Each first true item is to judge the associated information of item with first;Similar document analysis unit, for using the first judgement Item obtains first group of similar document with the first true item, and judges item and true item extraction unit from first group of class for using Judge that the second of item judges item and the second true item different from first like extracting in document;With the approximate judgement that distinctiveness is true Detection unit, for judging item and the second true item similar to document and second by using first group, detecting at least one has The true approximate judgement of distinctiveness, in which: distinctiveness fact instruction first judges item and second judges difference between item;And Approximation judgement is one in the second judgement item, and approximate judgement and first judges that the change between item is scheduled apart from being less than First threshold, wherein change distance instruction distinguishes first and judges that item and second judges the difficulty level of item.
According to a further aspect of the invention, a kind of device for similar document searching is provided, comprising: for receiving Input the input Document Creator unit of document;The above-mentioned device for the determining approximate judgement for having distinctiveness true, is used for true Surely at least one approximate judgement true with distinctiveness of document is inputted;And similar document obtaining unit, for using institute At least one approximate judgement true with distinctiveness is stated, obtains one group for inputting document similar to document.
One of the advantages of the present invention is that the valuable information of the actual search purpose of matching user can be provided.
A further advantage is that search result can be organized, so as to save user's reading documents when Between.
By referring to the drawings to the detailed description of exemplary embodiments of the present invention, other feature of the invention and Its advantage will become apparent.
Detailed description of the invention
The attached drawing being included in the description and forms part of the description describes embodiments of the present invention, and even With specification together principle for explaining the present invention.
Figure 1A shows the flow chart of the method for similar case retrieval in prior art US2013/0044925.
The fact that Figure 1B shows the judgement item of the extraction of the input document of the use in US2013/0044925 and extracts Retrieve the flow chart of one group of processing similar to document.
Fig. 1 C shows the schematic diagram of the diagnostic tree in prior art US2013/0044925.
Fig. 2A shows the flow chart of the method for the similar document searching in patent US 8,352,416.
The fact that Fig. 2 B shows the judgement item of the extraction of the input document of the use in patent US 8,352,416 and extracts The flow chart of processing of one group of the item retrieval similar to document.
Fig. 2 C shows the schematic diagram of structure used in patent US 8,352,416.
Fig. 3 is the schematic frame for showing the hardware configuration for the computer system 1000 that can implement embodiments of the present invention Figure.
Fig. 4 show embodiment according to the present invention for determine have the distinctiveness true the approximate processing judged Flow chart.
Fig. 5 shows the example of radiography report.
Fig. 6, which shows embodiment according to the present invention, by traversing true item and extracting there is the approximation of the distinctiveness fact to sentence The flow chart of disconnected processing.
Fig. 7 shows embodiment according to the present invention for second judging the original judgement distinctiveness of item extraction for each The flow chart of true processing.
Fig. 8 show embodiment according to the present invention for for each second judge item extract newly judge distinguish sexual behavior The flow chart of real processing.
Fig. 9, which shows embodiment according to the present invention, has the distinctiveness fact for extracting based on minimum change distance The flow chart of the processing of approximation judgement.
Figure 10 is shown embodiment according to the present invention and extracted using important path, and there is the approximation of the distinctiveness fact to sentence The flow chart of disconnected processing.
Figure 11 shows the schematic diagram for generating the example of the candidate approximate judgement true with distinctiveness.
Figure 12 shows the schematic diagram of the example of important path excavation.
Figure 13 shows embodiment according to the present invention and extracts the approximation with the distinctiveness fact by changing true item The flow chart of the processing of judgement.
Figure 14 shows embodiment according to the present invention and extracts the approximate judgement for having distinctiveness true using tree is changed Processing flow chart.
Figure 15 shows the schematic diagram for changing the example of tree.
Figure 16 shows the flow chart of the method for similar document searching of embodiment according to the present invention.
Figure 17 shows embodiment according to the present invention for determining the device for the approximate judgement for having distinctiveness true 4000 functional block diagram.
Figure 18 shows functional block diagram of the embodiment according to the present invention for the device 5000 of similar document searching.
Specific embodiment
Carry out the various illustrative embodiments of detailed description of the present invention now with reference to attached drawing.It should also be noted that unless in addition It illustrates, the component and the positioned opposite of step, numerical expression and numerical value otherwise illustrated in these embodiments is unlimited The scope of the present invention processed.
Be to the description only actually of at least one illustrative embodiments below it is illustrative, never as to this hair Bright and its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as authorizing part of specification.
It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without It is as limitation.Therefore, the other examples of illustrative embodiments can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
Fig. 3 is the schematic frame for showing the hardware configuration for the computer system 1000 that can implement embodiments of the present invention Figure.Method of the invention can be implemented on the hardware of computer system 1000.
As shown in Figure 3, computer system includes computer 1110.Computer 1110 includes connecting via system bus 1121 Processing unit 1120, system storage 1130, the fixed non-volatile memory interface 1140, removable non-volatile memories connect Device interface 1150, user input interface 1160, network interface 1170, video interface 1190 and peripheral interface 1195.
System storage 1130 includes ROM (read-only memory) 1131 and RAM (random access memory) 1132.BIOS (basic input output system) 1133 resides in ROM 1131.Operating system 1134, application program 1135, other program modules 1136 and certain program datas 1137 reside in RAM 1132.
The fixed non-volatile memory 1141 of such as hard disk etc is connected to fixed non-volatile memory interface 1140. Fixed non-volatile memory 1141 for example can store an operating system 1144, application program 1145, other program modules 1146 With certain program datas 1147.
The removable non-volatile memory of such as floppy disk drive 1151 and CD-ROM drive 1155 etc is connected to Removable non-volatile memory interface 1150.For example, diskette 1 152 can be inserted into floppy disk drive 1151 and CD (CD) 1156 can be inserted into CD-ROM drive 1155.
The input equipment of such as mouse 1161 and keyboard 1162 etc is connected to user input interface 1160.
Computer 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170 Remote computer 1180 can be connected to via local area network 1171.Alternatively, network interface 1170 may be coupled to modem (modulator-demodulator) 1172 and modem 1172 are connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 may include the memory 1181 of such as hard disk etc, store remote application 1185。
Video interface 1190 is connected to monitor 1191.
Peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.
Computer system shown in Fig. 3 be merely illustrative and be never intended to invention, its application, or uses into Row any restrictions.
Computer system shown in Fig. 3 can be incorporated in any embodiment, can be used as stand-alone computer, or can also As the processing system in equipment, one or more unnecessary components can be removed, one or more can also be added to it Additional component.
Fig. 4 show embodiment according to the present invention for determine have the distinctiveness true the approximate processing judged Flow chart.
As shown in figure 4, obtaining input document in step 2100.In this application, the type for inputting document may include, But it is not limited to, radiography report, shell folder or product introduction.
Herein, we select a radiography report as an example.Fig. 5 shows the example of radiography report.
The document may include the certain keywords that can be classified as judge item and true item.Input the keyword in document The first judgement item and the first true item can be referred to as.In one embodiment, judge that item can be the key of predefined type Word.
Next, extracting first from the document obtained in step 2200 and judging item and the first true item.
There is some methods for extracting keyword from document using existing NLP technology, such as so-called entity is known Not, subject distillation and keyword extraction.After extracting keyword from input document, it is important that identify which keyword is to sentence Disconnected item.
Radiography is reported, document segment information can be used to select to judge item.For example, being reported in radiography In, it can choose the keyword conduct in " diagnosis " part and judge item, and the document paragraph information is to judge that item entry domain is believed Breath.For example, judging that item entry domain can be diagnostic result, product type, destination etc..
Alternatively and/or additionally, item can be judged according to scheduled configuration or rule selection.In a kind of embodiment In, select keyword as judging that item can also include: the keyword selected in sentence according to predetermined configurations, wherein sentence expression Subjective judgement and/or objective result.For example, if the subjective meaning that judges item of the context interpretation of a keyword Think, then can choose keyword conduct and judge item.
Alternatively and/or additionally, user, which oneself can define keyword and be used as, judges item.For example, being searched in doctor Before rope, he can choose the certain keywords that will be highlighted, and select the disease in these keywords as judgement , and select other information or symptom (finding) as true item.Each true item is letter associated with item is judged Breath.
In one embodiment, from extracting true item in document and judge that item can also include: to extract pass from document Keyword;And identification judges item from the keyword of extraction, and selects remaining keyword as true item.
In one embodiment, keyword can be extracted from document by least one in following operation: used Dictionary storage including judging item and true item;Use document layout information;And it uses and is instructed by ready training data Experienced extraction model.
Table 1 shows the example for judging item and true item of radiography report.
Table 1: the example for judging item and true item of radiography report
Next, judge that item and the first true item obtain first group similar to document using first in step 2300, and from First group is different from first similar to extraction in document and judges that the second of item judges item and the second true item.
It exists in the prior art many for judging that item and the first true item obtain first group similar to document using first Known method.In one embodiment, first group can be directly inputted by user similar to document.Alternatively, inspection can be passed through Rope obtains first group similar to document.In addition, it is, for example, possible to use the methods of United States Patent (USP) No.US 8,352,416.For table 1 Example, first group is 143 similar documents shown in table 2 similar to document, and wherein document is by according to item label judge, to show The distribution in these documents is judged out.In addition, being extracted from first group of similar document using method identical with step 2200 Judge item and true item.First group can be referred to as the second judgement item and the second true item similar to the keyword in document.
2 first groups of the table examples similar to document
Next, judging item and the second true item, inspection similar to document and second by using first group in step 2400 Survey at least one approximate judgement true with distinctiveness, in which: distinctiveness fact instruction first judges that item and second judges item Between difference;Approximation judgement is one in the second judgement item, and approximate judgement and first judge change between item away from From scheduled first threshold is less than, wherein change distance instruction distinguishes first and judges that item and second judges the difficulty grade of item Not.Note that scheduled first threshold can rule of thumb be defined by user.
Key point of the invention is to be found to have the true approximate judgement of distinctiveness.But true approximate judgement and area The result that other sexual behavior may not obtain in fact with the present invention is coincide.The present invention is not intended to find really approximate judgement, this is because It obtains really approximate judgement and needs very deep domain knowledge, and is extremely difficult for people.For example, doctor is difficult really It is fixed for distinguishing the core difference symptom of two similar diseases, core difference symptom depending on the age of patient, gender, position and Medical history.
In the present invention, document analysis technology is only used only from first group similar to the close of the current input document of detection in document It is true like judgement and distinctiveness.In this case, it is assumed that implying that age of patient, property in the form of keyword in a document Not, position and medical history.
Next, will describe to judge item and the second true item similar to document and second by using first group, detection is extremely The few one detailed processing with the true approximate judgement of distinctiveness.
According to an aspect of the present invention, the approximate of the distinctiveness fact can judges by traversing true item detection. In this process, each true item that will check input document, to identify which true item is the distinctiveness fact.
Fig. 6, which shows embodiment according to the present invention, by traversing true item and extracting there is the approximation of the distinctiveness fact to sentence The flow chart of disconnected processing.
With reference to Fig. 6, in step 2410, second judge that item extracts the original judgement distinctiveness fact for each.
Fig. 7 shows embodiment according to the present invention for second judging the original judgement distinctiveness of item extraction for each The flow chart of true processing.
As shown in fig. 7, one can choose in the first true item is used as target fact item in step 2411.Next, In step 2412, the susceptibility of target fact item is calculated.
In one embodiment, the susceptibility for calculating target fact item may include: using the first true item by deleting Except target fact item, second group is obtained similar to document;It is different from the first third for judging item from second group similar to extraction in document Judge item and third fact item;And judge that item judges item similar to the distribution and second in document at second group by using third At first group similar to the distribution in document, susceptibility is calculated.
For example, for the example of table 1 true item " tubercle: irregular " can be deleted from the first true item.Then make It is scanned for the fact that remaining, 178 documents can be obtained.Compared with first group comprising 143 documents is similar to document Compared with additional there are 35 as a result, they are defined as second group similar to document.It is similar literary that table 3 shows the second (additional) group The example of shelves.Document in table 3 is judged that item is marked according to third, to show the distribution judged in these documents.
3 second groups of the table examples similar to document
If the fact that delete is that the distinctiveness of " lung cancer " (the first judgement) is true, the additional result will include it Its diagnostic result;Otherwise, which will still include judgement " lung cancer ".Whether the fact that in order to check deletion is difference sexual behavior It is real, additional 35 results will be used (second group similar to document).
Susceptibility of the true item " tubercle: irregular " relative to " lung cancer " can be calculated as follows.
Susceptibility=(third judges item at second group similar to the distribution in document)/(second judges that item is similar at first group Distribution in document)
For example, there is the diagnostic result for being different from the three types that first judges item in 143 results (that is, second sentences Disconnected item, comprising: bronchiectasis, lung running sore and pulmonary emphysema), and exist in 35 results and judge item different from first Two kinds of diagnostic result (that is, third judges item, comprising: bronchiectasis and lung running sore).
Susceptibility=(60%+35%)/(15%+5%+10%)
Referring back to Fig. 7, in step 2413, if the susceptibility calculated is equal to or more than scheduled second threshold, It is true as original judgement distinctiveness to can choose target fact item.Note that this can be defined according to their experience by user Threshold value.
Referring back to Fig. 6, if detecting that an original judgement distinctiveness is true, we will further check second group of class Like document, to determine whether there is other approximate judgements.This is carried out in step 2420, wherein second judging item for each It extracts and newly judges the distinctiveness fact.
Fig. 8 show embodiment according to the present invention for for each second judge item extract newly judge distinguish sexual behavior The flow chart of real processing.
As shown in figure 8, judging that item is similar at second group with corresponding third by using third fact item in step 2421 Appearance ratio in document calculates the correlation of each third fact item.
For example, judging item " lung running sore " for third, third fact item can be extracted, and is not included in input document A third fact item be " pleural effusion: existing ".To check the fact item whether with judge that item " lung running sore " is highly relevant. In one embodiment, correlation will be calculated.
For example, to be related to the document of " lung running sore " similar to there are 12 in document at second group, and 11 tools in them There is true item " pleural effusion: existing ", so correlation=11/12.
Next, in step 2422, if the correlation of third fact item is equal to or more than scheduled third threshold value, Third fact item is then selected to judge the distinctiveness fact as new.
In the above example, because the correlation of " pleural effusion: existing " be greater than predetermined threshold (for example, 80%, can Rule of thumb defined by user), therefore select facts item " pleural effusion: existing " judges the distinctiveness fact as new.
Next, referring back to Fig. 6, in step 2430, the original judgement distinctiveness fact of extraction can be used and newly sentence Disconnected distinctiveness is true, calculates each and second judges that item and first judges change distance between item.
In the above example, first judges that item is " lung cancer ", and second judges that one in item is " lung running sore ", can incite somebody to action Change distance to be calculated as the number of the original judgement distinctiveness fact and newly judge the sum of the number of the distinctiveness fact.
Next, its second of change distance less than scheduled first threshold can be used and judge that item produces in step 2440 The raw approximate judgement for having distinctiveness true.
Alternatively, the approximation with the distinctiveness fact with minimum change distance that can also generate predetermined number is sentenced It is disconnected.
In addition, if can then use second group of " multiplicity being removed similar to document there is no the distinctiveness fact is newly judged Property " result detects approximate judgement." diversity being removed " refers to that the judgement of similar document is multiplicity and has multiple.By The distinctiveness fact is judged in joined some, reducing the diversity of judgement, then the judgement being removed, is exactly approximate judgement. The diversity being removed can be simply set as diagnosis most in different diagnosis.For example, if second group similar in document 60% be related to " bronchiectasis ", be higher than user-defined threshold value, then judge that " bronchiectasis " can be confirmed as approximation Judgement, and true item " tubercle: irregular " can be confirmed as the distinctiveness fact.
In the above example, it can be found that two approximate judgements true with distinctiveness:
" lung running sore ": " tubercle: irregular ", " pleural effusion: existing ".
" bronchiectasis ": " tubercle: irregular "
According to another aspect of the present invention, being extracted based on minimum change distance, there is the approximation of the distinctiveness fact to sentence It is disconnected.In this process, it will check that each of similar document judges item, to identify which judges that item is approximate judgement.
Fig. 9, which shows embodiment according to the present invention, has the distinctiveness fact for extracting based on minimum change distance The flow chart of the processing of approximation judgement.
As shown in figure 9, in step 2510, it can be by calculating first group similar to each of document document and input text Shelves the distance between, calculate first group similar to each of document document the fact distance, wherein by using two documents Between the true items of difference counting, calculate first group similar to the distance between each of document document and input document.
For example, with first group of different diagnostic results (second judge item) similar to there are 100 in document similar to documents, Wherein there are 20 bronchiectasic documents, the document of 35 lung running sores, the document of 15 pulmonary emphysema, 20 phthisical texts The document of shelves and 10 pneumonia.In one embodiment, there can be how many differences compared with the first true item by counting The fact, the distance of the fact that calculate each document.
For example, there are 4 true items different from the first true item for the first document of pulmonary emphysema;For pulmonary emphysema The second document, there are 2 true items different from the first true item;For the third document of pulmonary emphysema, exist different from 3 true items etc. of one true item.
Next, in step 2520, by using the fact that first group of calculating similar to each of document document away from From calculating each and second judge that item and first judges change distance between item, second judge sentencing for item to calculate each Disconnected item distance, wherein by first group similar to each of document document the fact distance be averaged, calculate each Second judges the change distance between item and the first judgement.
Can all documents by judging one item the fact distance carry out average computation and judge item distance.For example, There are phthisical 20 similar to documents, and the distance the fact step 2510 calculates each document, then according to lung knot The fact that core distance sum divided by 20, calculate phthisical judgement item distance.
In the same way, the judgement item distance in the upper surface of step 2520 calculating example is as follows:
Lung running sore: 1.87
Pulmonary emphysema: 2.48
Pulmonary tuberculosis: 2.68
In this example, it can be seen that lung running sore is the most similar judgement item for inputting document.
Next, in step 2530, if second judges that the judgement item distance of item is equal to or less than scheduled 4th threshold Value can choose this and second judge that item judges as approximate.Note that the threshold value can be defined according to the experience by user.
In the above example, the 4th threshold value can be defined as 2.Because the judgement item distance of lung running sore is less than the threshold value, Therefore will select lung running sore as approximate judgement.
Next, in step 2540, it can be by identifying between the fact that the first true item and the approximation judge not With fact item, the distinctiveness for extracting approximation judgement is true.
For example, existing not including and " tubercle: not advised as the fact that first true in 35 documents of lung running sore 30 documents then ", exist include the fact that be not first true " pleural effusion: existing " 29 documents.Therefore, The distinctiveness that true item " tubercle: irregular " and " pleural effusion: existing " are identified as lung running sore is true.
It will consequently, it can be seen that deleting true item " tubercle: irregular " and adding true item " pleural effusion: existing " Cause to judge that item changes into lung running sore from lung cancer, this can be written as:
(<tubercle: irregular>→<pleural effusion: existing>) → (lung cancer → lung running sore);Distance=2
Because of the fact that item " tubercle: irregular " disappears, change distance in this respect can be counted as 1.In addition, depositing In new true item " pleural effusion: existing ", thus in this respect change distance it is also countable be 1.Therefore, always change distance It can be counted as 2.
According to another aspect of the present invention, important path can be used and excavate to extract, and there is the approximation of the distinctiveness fact to sentence It is disconnected.
Figure 10 is shown embodiment according to the present invention and extracted using important path, and there is the approximation of the distinctiveness fact to sentence The flow chart of disconnected processing.
It as shown in Figure 10, can be by the true item of identification first and first group similar to each in document in step 2610 The true item of difference between the fact that a document, generates the candidate approximation with the distinctiveness fact for each described document Judgement.
For each document with different judgement items compared with inputting document, assume initially that the different judgement item is made Judge for candidate approximation, and assumes all different true items as the distinctiveness fact.Then, generation candidate had into area The real approximate judgement of other sexual behavior.
Figure 11 shows the schematic diagram for generating the example of the candidate approximate judgement true with distinctiveness.In this example In, for the input document, true item (it was found that) include: " age: 50 ", " tubercle: irregular ", " lymph node: enlargement ", " Gender: women ", and " shade: existing ", and judge that item (diagnostic result) is " lung cancer ".
For the input document, 100 similar documents can be obtained (note that this 100 similar documents are using another kind What method obtained, so these documents are uncorrelated to 143 documents above), and the judgement item of 70 similar documents is different In " lung cancer ".20% judgement item in this 70 similar documents is bronchiectasis, and 35% is lung running sore, and 15% is lung qi Swollen, 20% is pulmonary tuberculosis, and 10% is pneumonia.For the pass between a similar document and input document with different judgement items System, can be written as " (it was found that<shade: existing>→ 0) → (lung cancer → bronchiectasis);Distance=1 ".It means that deleting True item " shade: existing " judges that item will change into " bronchiectasis " from " lung cancer ", and input document and similar document it Between the fact distance be 1.
Next, the method that important path will be used to excavate, is mentioned using the candidate approximate judgement true with distinctiveness Take the approximate judgement true with distinctiveness.Being extracted using the candidate approximate judgement true with distinctiveness has difference sexual behavior The detailed step of real approximate judgement is as follows.
In step 2620, a transfer figure can produce, wherein each of transfer figure endpoint node is to judge item, And each of transfer figure non-end node is true item.
Next, all candidate approximate judgements true with distinctiveness can be arranged in this turn in step 2630 It moves in figure, wherein each paths of two endpoint nodes in connection transfer figure indicate that one candidate has distinctiveness true Approximate judgement.It in other words, can if two nodes are included in a candidate approximate judgement true with distinctiveness To draw the side between these nodes, therefore it will generate and connect two paths for judging item node in transfer figure.
Next, each side company of any two node can be connected in transfer figure by being recorded in step 2640 Frequency is connect, the importance on each side in transfer figure is calculated.
In step 2650, identify that its importance is equal to or more than the important side of scheduled 5th threshold value.In other words, such as The importance of fruit a line reaches scheduled threshold value, this is when will be identified that important.Note that can be by user rule of thumb Determine scheduled 5th threshold value.
Next, can produce at least one distinctiveness path in step 2660, wherein the distinctiveness path is by important Side composition, and the distinctiveness path judges that item is connected to the first judgement item for second.
Figure 12 shows the schematic diagram of the example of important path excavation.As shown in figure 12, the endpoint node in transfer figure is " Lung cancer " and " pulmonary tuberculosis ", they are to judge item.Non-end node in transfer figure includes: " shade exists ", " pleural effusion: is deposited ", " lymph node: enlargement " and " tubercle: irregular ", they are true items.If two nodes are included in a candidate The approximate judgement true with distinctiveness in, then draw the side between these nodes.Important side is also marked with thick line.Area Other property path is from " lung cancer " to " lymph node: enlargement " to " shade exists " to " pulmonary tuberculosis ".
Finally, in step 2670, each distinctiveness path is translated to the approximate judgement true with distinctiveness.
In the above example, important path can be translated for:
(it was found that<lymph node: enlargement>→<shade: existing>) → (lung cancer → pulmonary tuberculosis);Distance=2
It will lead to judgement item it means that deleting true item " lymph node: enlargement " and adding true item " shade: existing " Pulmonary tuberculosis is changed into from lung cancer.In addition, as described above, changing distance is 2.
Therefore, by processing as shown in Figure 10, the approximate judgement true with distinctiveness can be extracted.
According to another aspect of the present invention, can be by changing true item, extracting, there is the approximation of the distinctiveness fact to sentence It is disconnected.In this processing, each different true item of input document with similar document will be checked, so which fact identified Item is the distinctiveness fact.
Figure 13 shows embodiment according to the present invention and extracts the approximation with the distinctiveness fact by changing true item The flow chart of the processing of judgement.
As shown in figure 13, in step 2710, it is real to can produce candidate difference sexual behavior, in fact may be used wherein generating candidate difference sexual behavior To include: true using the be different from the second true item first true specified candidate original judgement distinctiveness of item;Using being different from The true specified candidate of item of the second of first true item newly judges the distinctiveness fact, wherein the number of the candidate original judgement distinctiveness fact Mesh and candidate newly judge that the sum of number of the distinctiveness fact is equal to a predetermined number (that is, scheduled change distance).
For example, traveller may wish to search for certain similar travelling introductions using the current travel directory of Tokyo Tower Handbook.Each travelling directory includes certain features of destination, is referred to alternatively as the interested project of user, and mesh Ground be user want compare place.Therefore, destination can be taken as judgement item, and the interested project of user can be worked as Make true item.
In the step, the information described about destination, time needed for such as price, travelling, travelling mould can be retrieved Many similar travelling directories of formula, architectural style etc..
For each destination, it is real to can produce candidate difference sexual behavior.
For example, current destination is Tokyo Tower, and will the shallow careless temple of concern.The true item of difference between the two destinations May include:
Tokyo Tower:<price: 200><building: modern>
Shallow grass temple:<price: 100><building: religion>
Therefore, it is real to can produce candidate difference sexual behavior.
Next, in step 2720, it is real similar to the candidate difference sexual behavior in document that first group can be verified, wherein verifying the One group may include: to identify that first group newly judges area comprising candidate similar in document similar to the candidate difference sexual behavior in document in fact Other sexual behavior it is real but not including that the candidate original judgement distinctiveness fact document, and the judgement item of the document identified is different from First judges item;And if one judged in item of the document identified is to concentrate to judge item, which is distinguished into sexual behavior Real label is, concentrates ratio of the document for judging item in all documents identified to be equal to or greatly wherein corresponding to In scheduled 6th threshold value.
For example, distinguishing sexual behavior real (<building: modern>→<building: religion>) for candidate, (it is meant that comprising thing Real (building: religion), but do not include true (building: modern)), it is found that 10 travelling directories include the fact <build It builds: religion>and do not include<building: modern>;And 9 travelling directories are related to shallow careless temple, and number is greater than predetermined Threshold value (for example, 60%), therefore shallow careless temple is to concentrate to judge item.Then the candidate difference sexual behavior of verifying is real (<building: modern> →<building: religion>), and shallow careless temple is to concentrate to judge item.Note that the threshold value can also rule of thumb be defined by user.
Next, can produce the approximate judgement true with distinctiveness, wherein selecting the time having verified that in step 2730 The other sexual behavior in constituency is implemented as the distinctiveness fact;And judge item as approximate judgement in choice set.
The approximate example that judges true with distinctiveness in the example of travelling directory search is as follows.
(1) (<building: modern>→<building: religion>) → (Tokyo Tower → shallow careless temple);Distance=2
(2) (<building: modern>→<building: imperial>) → (Tokyo Tower → imperial palace square);Distance=2
(3) (<travel mode: land>→<travel mode: waterborne>) → (Tokyo Tower → ink field river cruise);Distance=2
(4) (<time: in 2 hours>→ 0) → (Tokyo Tower → eight treasures (choice ingredients of certain special dishes) garden country garden);Distance=1
For project (1), the true item being meant that in deletion input travelling directory " it builds: modern ", and It adds true item " building: religion " and will lead to and judge that item (destination) changes into shallow careless temple from Tokyo Tower, and change distance (number of the distinctiveness fact) is 2.
For project (2), the true item being meant that in deletion input travelling directory " it builds: modern ", and Add true item " building: imperial " and will lead to and judge that item (destination) changes into imperial palace square from Tokyo Tower, and change away from It is 2 from (number of the distinctiveness fact).
For project (3), it is meant that the true item " travel mode: land " deleted in input travelling directory, and And add true item " travel mode: waterborne " and will lead to and judge that item (destination) changes into Mo Tianhe cruise from Tokyo Tower, and Changing distance (number of the distinctiveness fact) is 2.
For project (4), it is meant that the true item " time: in 2 hours " deleted in input travelling directory will be led It causes to judge that item (destination) changes into eight treasures (choice ingredients of certain special dishes) garden country garden from Tokyo Tower, and change distance (number of the distinctiveness fact) to be 1。
Therefore, true item can be changed by using method shown in Figure 13, extracting, there is the approximation of the distinctiveness fact to sentence It is disconnected.
According to another aspect of the present invention, it can be used and change the approximate judgement that tree extraction has distinctiveness true.? In this method, domain knowledge can be used to improve similar document searching.
Figure 14 shows embodiment according to the present invention and extracts the approximate judgement for having distinctiveness true using tree is changed Processing flow chart.
As shown in figure 14, in step 2810, the change tree about input document can be obtained, wherein the change tree is specific In the structural data of one group of knowledge information related with input document, wherein each non-end node is a true item, And each endpoint node is one and judges item.
For example, customer may want to determine to buy any camera.Customer may think that a type of card camera Current introduction it is not good enough, and he can search for certain similar camera introductions.
In this case, product type is the content that user wants comparison, so product type can be taken as judgement item, and And product parameters project can be taken as true item.
In this area, construction by hand may be had existed or known by the structuring of knowledge excavation technology mining Know.Structural knowledge is known as changing tree by we.The structural knowledge can be used for tissue search result.
Figure 15 shows the schematic diagram for changing the example of tree.In this example, " card photograph that an endpoint node is Machine ".Other endpoint nodes are " compact camera (compact camera) ", " SLR camera ", " professional camera " and " Focal length camera ".Feature about various types of cameras, that is, true item constitutes non-end node.
Next, in step 2,820 one of two endpoint nodes in tree can be changed by the way that selection link is obtained Paths generate the approximate judgement for having distinctiveness true.
For example, for the change tree in Figure 15, we can choose the branch of rightmost.For the branch, we can be with Translated to the following approximate judgement true with distinctiveness:
(parameter<optical zoom: 5 times>→ parameter<optical zoom: 50 times>) →
(card camera → telephoto camera);Distance=2
This is meant that true item " the optics contracting deleted in input product introduction with the true approximate judgement of distinctiveness Put: 5 times " and add true item " optical zoom: 50 times " and will lead to and judge that item (product type) is changed into from card camera Telephoto camera, and changing distance (number of the distinctiveness fact) is 2.
Therefore, the approximate judgement for having distinctiveness true can be extracted based on processing shown in Figure 14.
Alternatively and/or additionally, the approximate judgement that extracting has distinctiveness true can also include: the similar area of detection Other sexual behavior is real;It is true to merge similar distinctiveness;Using combined distinctiveness fact adjustment there is the approximation of the distinctiveness fact to sentence It is disconnected.
For example, two true items " tumor size: 3.7cm " and " tumor size: 3.9cm " can be merged into a true item " tumor size: 3.5~4.0cm ".It is then possible to using the fact that this merging adjustment there is the approximation of the distinctiveness fact to sentence It is disconnected.
In one embodiment, tool can be presented by exporting the list of all approximate judgements true with distinctiveness The real approximate judgement of sexual behavior of having any different.
In one embodiment, the approximation for having distinctiveness true can be presented by following operation to judge: exports it Change with distinctiveness true approximate judgement of the distance less than scheduled 7th threshold value, or exports having most for predetermined number The small approximate judgement true with distinctiveness for changing distance.Note that can be by empirically determined scheduled 7th threshold of user Value.
In one embodiment, the approximation for having distinctiveness true can be presented by following operation to judge: calculates every One coverage rate with the true approximate judgement of distinctiveness, wherein the coverage rate is and approximate the sentencing with the distinctiveness fact Break matched document at first group similar to the ratio in document;And its coverage rate is exported equal to or more than scheduled 8th threshold value The approximate judgement true with distinctiveness, or output predetermined number with maximal cover rate with the close of the distinctiveness fact Like judgement.Note that can be by empirically determined scheduled 8th threshold value of user.
In one embodiment, can be in by with the true approximate judgement of distinctiveness together output change tree Approximate judgement now true with distinctiveness.
In one embodiment, the fact that between the fact that the first judgement item can be presented and approximate the fact that judge, is poor It is different, wherein the fact difference causes to judge the variation of item to the approximate judgement from first.Through this process, user can be clear Know to Chu which true difference will cause judge that item judges to another variation of item from first, and if document includes this The true difference of kind, he can focus more on the document.For example, because " pleural effusion: existing " is " lung cancer " and " lung running sore " Essential distinction, if true item " pleural effusion: existing " exists, doctor should focus more on it.Doctor can reexamine thing Real item " pleural effusion: existing " is to provide accurate diagnosis.This is the actual search purpose that doctor carries out document searching.
In one embodiment, for each approximate judgement true with distinctiveness, input document can be indicated In correspond to it is original judgement distinctiveness fact sentence, and also can indicate that input document in new judgement distinguish sexual behavior It is real.Through this process, the pith that can be highlighted in document, this is convenient for the reading of user.
In addition, in certain documents, it is understood that there may be multiple judgement items, such as patient may have simultaneously there are two types of disease.? In this case, the relationship of the fact that judge item about each should be detected, and can be divided item and true item is judged A series of class, to obtain judgement items with its true item.In addition, input document can be taken as the combination of two documents, and And for each different judgement item with its true item, can be extracted according to above method has the distinctiveness fact Approximation judgement.
Using above method, can provide and the matched valuable information of the actual search purpose of user.
Furthermore it is possible to search result be organized, so as to save the time of user's reading documents.
Figure 16 shows the flow chart of the method for similar document searching of embodiment according to the present invention.
As shown in figure 16, in step 3100, input document can be obtained.Next, this hair can be based in step 3200 The bright above method determines at least one approximate judgement true with distinctiveness of the input document.Next, in step 3300, at least one described approximate one group similar text judging the acquisition input document true with distinctiveness can be used Shelves.
In one embodiment, which is the radiography report for including discovery item and diagnosis item, the discovery Item is selected as the first true item, and the diagnosis item is selected as the first judgement item.
In one embodiment, which is the trip for including user's interested project and Reiseziel project Row handbook, the interested project of the user is selected as the first true item, and the Reiseziel project is selected as First judges item.
In one embodiment, which is product Jie for including product parameters project and product type project It continues, which is selected as the first true item, and the product type project is selected as the first judgement item.
Figure 17 shows embodiment according to the present invention for determining the device for the approximate judgement for having distinctiveness true 4000 functional-block diagram.Device 4000 as shown in figure 17 may be implemented shown in Fig. 4 true with distinctiveness for determining Approximate judgement method.It can be functional by the institute of the combination realization device 4000 of hardware, software or hardware and software Block (the various units for including in device 4000, no matter being shown or having been not shown in the figure), it is of the invention to realize Principle.The sub-block it will be appreciated by those skilled in the art that functional block described in Figure 17 can be combined or be divided into, to realize this Invent principle as described above.Therefore, description herein can support functional block described herein it is any it is possible combination or It decomposes or further limits.
As shown in figure 17, according to an aspect of the present invention, for determining the dress for the approximate judgement for having distinctiveness true Set 4000 may include: document obtaining unit 4100, judge item and true item extraction unit 4200, document analysis unit 4300, Detection unit 4500 is judged similar to document analysis unit 4400 and with the true approximation of distinctiveness.Document obtaining unit 4100 is matched It is set to acquisition document, wherein document obtained judges item comprising first, and first judges that item is the keyword of predefined type. Judge that item and true item extraction unit 4200 are configured to extract and judge item and true item.Document analysis unit 4300 is configured so that The judgement item and true item extraction unit extract first from the document obtained and judge item and the first true item, wherein each A first true item is to judge the associated information of item with first.Similar document analysis unit 4400 is configured so that the first judgement Item obtains first group of similar document with the first true item, and uses the judgement item and fact item extraction unit 4200 from first Extracted in the similar document of group with first judge item it is different second judge item and the second fact item.Approximation with the distinctiveness fact Judge detection unit 4500 be configured to by using first group similar to document and second judge item and the second true item detect to The few one approximate judgement true with distinctiveness.The distinctiveness fact instruction first judges that item and second judges between item Difference.The approximate judgement is one in the second judgement item, and the approximate judgement and described first judges between item Change distance and be less than scheduled first threshold, wherein change distance instruction distinguishes first and judges that item and second judges the difficulty of item Spend rank.
In one embodiment, judge that item and true item extraction unit 4200 can also include: for extracting from document The keyword extracting unit of keyword;For identifying the judgement item recognition unit of the judgement item from the keyword of extraction, with And for selecting the fact that remaining keyword is as true item selecting unit.
In one embodiment, the judgement item recognition unit at least one of may further include the following units: use In from judging to select keyword as the unit for judging item in item entry domain;For according to it is scheduled configuration select keyword as Judge the unit of item;And for selecting keyword as the unit for judging item by user.
In one embodiment, select keyword as judging that the unit of item can also include: according to scheduled configuration For selecting the unit of the keyword in sentence, wherein the judgement and/or objective result of sentence expression subjectivity.
In one embodiment, judge that detection unit 4500 can also include: original with the true approximation of distinctiveness Distinctiveness fact extraction unit is judged, for second judge that item extracts the original judgement distinctiveness fact for each;Newly judge area Other sexual behavior reality extraction unit, for second judge that item extracts for each and newly judging that distinctiveness is true;Change metrics calculation unit, For using the original judgement distinctiveness fact of extraction and newly judging that the distinctiveness fact calculates each and second judges item and first Judge the change distance between item;And first approximate judgement generate unit, be less than scheduled for changing distance using it The second of one threshold value judges that item generates the approximate judgement for having distinctiveness true.
In one embodiment, the original judgement distinctiveness fact extraction unit can also include: target fact item Selecting unit, for select one in the first true item as target fact item;Susceptibility computing unit, for calculating target The susceptibility of true item comprising: second group is deleted institute similar to document obtaining unit, for passing through using the described first true item Target fact item is stated, obtains second group similar to document;Third judges item and third fact item extraction unit, is used for from second group of class It is extracted like document and judges that the different third of item judges item and third fact item from first;And susceptibility computation subunit, it is used for Judge that item judges point of the item in first group of similar document with second similar to the distribution in document at second group by using third Cloth calculates the susceptibility;And original judgement distinctiveness fact selecting unit, if for calculating susceptibility be equal to or Greater than scheduled second threshold, select the target fact item true as original judgement distinctiveness.
In one embodiment, newly judge that distinctiveness fact extraction unit can also include: correlation calculations unit, use In judging appearance ratio of the item in second group of similar document with corresponding third by using third fact item, each is calculated The correlation of third fact item;And newly judge distinctiveness fact selecting unit, if the correlation etc. for third fact item In or greater than scheduled third threshold value, third fact item is selected to judge the distinctiveness fact as new.
In one embodiment, judge that detection unit 4500 can also include: the fact with the true approximation of distinctiveness Metrics calculation unit, for by calculating first group similar to the distance between each of document document and the document of acquisition, The fact that calculate first group similar to each of document document distance, wherein by using not working together between two documents The counting of real item calculates first group similar to the distance between each of document document and the document of acquisition;Judge item distance Computing unit, for calculating each by using the fact that first group of calculating similar to each of document document distance Second judges item and first judges change distance between item, to calculate each second judgement item distance for judging item, wherein By to first group similar to each of document document the fact distance be averaged, calculate each and second judge item and Change distance between one judgement;Second judges item selecting unit, if for second judge item judgement item distance be equal to or Person is less than scheduled 4th threshold value, selects second to judge that item judges as approximation;And distinctiveness fact extraction unit, for leading to The true items of difference identified between the fact that the first true item and the approximation judge are crossed, extraction is for the approximate judgement Distinctiveness is true.
In one embodiment, judge that detection unit 4500 can also include: candidate with the true approximation of distinctiveness The fact that approximation judgement generates unit, is used for through the true item of identification first and first group similar to each of document document Between the true items of difference, be first group and generate the candidate approximation with the distinctiveness fact similar to each of document document Judgement;Approximation judges extraction unit, has difference sexual behavior for using the candidate approximate judgement true with distinctiveness to extract Real approximate judgement comprising: the transfer figure for generating transfer figure generates unit, wherein each of transfer figure end segment Point is to judge item, and each of transfer figure non-end node is true item;Candidate approximation judges arrangement unit, for that will own The candidate approximate judgement true with distinctiveness is arranged in transfer figure, wherein two endpoint nodes in connection transfer figure Each paths indicate a candidate approximate judgement true with distinctiveness;Importance computing unit, for passing through record The rate of connections that each side of any two node is connected in transfer figure calculates the important of each side in transfer figure Property;Important side recognition unit, its importance is equal to or more than the important side of scheduled 5th threshold value for identification;Distinctiveness Path generates unit, and for generating at least one distinctiveness path, wherein distinctiveness path is made of important side, and is distinguished Property path judges that item is connected to the first judgement item for second;And translation unit, for each distinctiveness path to be translated to With the approximate judgement that distinctiveness is true.
In one embodiment, judge that detection unit 4500 can also include: to be used for the true approximation of distinctiveness It generates the real candidate difference sexual behavior of candidate difference sexual behavior and generates unit in fact comprising: the candidate original judgement distinctiveness fact is specified Unit, for using the specified candidate original judgement distinctiveness of the true item of first different from the second true item true;Candidate newly sentences Disconnected distinctiveness fact designating unit, for using the true specified candidate of item of second different from the first true item newly to judge distinctiveness The fact, wherein the number of the candidate original judgement distinctiveness fact and candidate newly judge the sum of number of the distinctiveness fact equal to predetermined Number;Candidate's difference sexual behavior reality authentication unit, it is real similar to the candidate difference sexual behavior in document for verifying first group, comprising: text Shelves recognition unit newly judges that distinctiveness is true but it is candidate former not include comprising candidate similar in document for first group for identification Beginning judges the document of the distinctiveness fact, and the judgement item of the document identified is different from first and judges item;And candidate difference Sexual behavior reality marking unit, it is if one of the document for identifying judges that item is to concentrate to judge item, candidate's difference sexual behavior is real Labeled as having verified that, ratio of the document for judging item in all documents identified is concentrated to be equal to or more than wherein corresponding to Scheduled 6th threshold value;And second approximate judgement generate unit, for generating the approximate judgement true with distinctiveness, wrap It includes: the candidate difference sexual behavior reality selecting unit having verified that, for selecting the candidate difference sexual behavior having verified that be implemented as difference sexual behavior It is real;Item selecting unit is judged with concentrating, for judging item as approximate judgement in choice set.
In one embodiment, judge that detection unit 4500 can also include: to change with the true approximation of distinctiveness Obtaining unit is set, for obtaining the change tree about document obtained, is specific for and document obtained wherein changing tree The structural data of relevant one group of knowledge information, wherein each non-end node is true item, and each end segment Point is to judge item;And the judgement of third approximation generates unit, for passing through selection link two ends obtained changed in tree One paths of end node generate the approximate judgement for having distinctiveness true.
In one embodiment, judge that detection unit 4500 can also include: to be used for the true approximation of distinctiveness Detect the similar distinctiveness fact detection unit of the similar distinctiveness fact;For merging the similar distinctiveness of the similar distinctiveness fact True combining unit;And approximation judges adjustment unit, for using the distinctiveness of merging true, adjustment has distinctiveness true Approximate judgement.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include First approximation judges display unit, and for the lists by exporting all approximation judgements true with distinctiveness, presentation has The true approximate judgement of distinctiveness.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include Second approximation judges display unit, and the approximation for having distinctiveness true for passing through following operation presentation judges: exporting its change Distance has the approximate judgement of the distinctiveness fact less than scheduled 7th threshold value, or exports changing with minimum for predetermined number Displacement from the approximate judgement true with distinctiveness.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include Third approximation for rendering with the true approximate judgement of distinctiveness judges display unit, further include: coverage rate calculates single Member, for calculating each coverage rate with the true approximate judgement of distinctiveness, wherein coverage rate is and has difference sexual behavior Real approximation judges matched document at first group similar to the ratio in document;And approximation judges output unit, for exporting Its coverage rate judges equal to or more than the approximation true with distinctiveness of scheduled 8th threshold value, or exports predetermined number The approximate judgement true with distinctiveness with maximal cover rate.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include 4th approximation judges display unit, for tool to be presented by setting with the true approximate judgement of distinctiveness together output change The real approximate judgement of sexual behavior of having any different.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include True difference display unit, for rendering between first the fact that judge item and approximate the fact that judge the fact difference, wherein The fact difference causes to judge the variation of item to approximation judgement from first.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include Indicating unit, for indicating to correspond to original in document obtained for each approximate judgement true with distinctiveness Beginning judges the sentence of the distinctiveness fact, and indicates the fact of the new judgement distinctiveness in document obtained.
Figure 18 shows the function box of the device 5000 for similar document searching of embodiment according to the present invention Figure.The method shown in Figure 16 for similar document searching may be implemented in device 5000 shown in Figure 18.It can be by hardware, soft The combination realization device 5000 of part or hardware and software all functional blocks (the various units for including in device 5000, no matter It is shown or has been not shown in the figure), to realize the principle of the present invention.It will be appreciated by those skilled in the art that Figure 18 Described in functional block can be combined or be divided into sub-block, to realize present invention principle as described above.Therefore, retouching herein State any possible combination that can support functional block described herein or decomposition or further restriction.
As shown in figure 18, according to an aspect of the present invention, the device 5000 for similar document searching may include: defeated Enter Document Creator unit 5100, for determining that the device 4000 for the approximate judgement for having distinctiveness true, and similar document obtain Unit 5200.Input Document Creator unit 5100 is configured to receive input document.For determining the approximation with the distinctiveness fact The device 4000 of judgement is configured to determine at least one approximate judgement true with distinctiveness of input document.Similar document obtains It obtains unit 5200 and is configured so that at least one approximate judgement true with distinctiveness, obtain and be directed to the one of the input document The similar document of group.
In one embodiment, which is the radiography report for including discovery item and diagnosis item, the hair Existing item is selected as the first true item, and the diagnosis item is selected as the first judgement item.
In one embodiment, which is the trip for including user's interested project and Reiseziel project Row handbook, the interested project of user is selected as the first true item, and the Reiseziel project is selected Item is judged as first.
In one embodiment, which is product Jie for including product parameters project and product type project It continues, the product parameters project is selected as the first true item, and the product type project is selected as first and sentences Disconnected item.
In addition, according to another aspect of the present invention, can provide for determining the approximate judgement for having distinctiveness true Device.The device is realized in computer system 1000 that can be shown in Fig. 3.The apparatus may include processor and thereon It is stored with the memory of instruction, when described instruction is executed by processor, so that processor performs the following operations: document is obtained, Wherein document obtained judges item comprising first, and first judges that item is the keyword of predefined type;From text obtained First is extracted in shelves and judge item and the first true item, and wherein each first fact item is to judge the associated letter of item with first Breath;Judge that item and the first true item obtain first group similar to document using first, and from first group similar to extracted in document with First judge item it is different second judge item and the second true item;By using first group similar to document and second judge item and Second true item detects at least one approximate judgement true with distinctiveness, and wherein distinctiveness fact instruction first judges item And second judge difference between item;The approximate judgement is one in the second judgement item, and approximate judgement and first is sentenced Change distance between disconnected item is less than scheduled first threshold, wherein change distance instruction distinguishes first and judges item and second Judge the difficulty level of item.
In one embodiment, from extracting true item in document and judge that item can also include: to extract pass from document Keyword;Item is judged with identifying from the keyword of extraction, and selects remaining keyword as true item.
In one embodiment, identification judges that item can also include at least one of the following: from judging item word Select keyword as judging item in domain;Select keyword as judging item according to scheduled configuration;And it is selected by user Keyword, which is used as, judges item.
In one embodiment, select keyword as judging that item can also include: selection sentence according to predetermined configurations In keyword, wherein the judgement and/or objective result of sentence expression subjectivity.
In one embodiment, it detects at least one and judges that it is each for may include: with the true approximation of distinctiveness A second judges that item extracts the original judgement distinctiveness fact;Second judge that item extracts for each and newly judge that distinctiveness is true;Make With the original judgement distinctiveness fact of extraction and newly judge that distinctiveness is true, calculates each and second judge that item and first judges item Between change distance;And judge that item generation has difference sexual behavior using second of distance less than scheduled first threshold is changed Real approximate judgement.
In one embodiment, extracting the original judgement distinctiveness fact includes: the work selected in the first true item For target fact item;Calculate the susceptibility of target fact item, comprising: true by deleting the target using the first true item , second group is obtained similar to document;Judge that the different third of item judges item and the from first from second group similar to extracting in document Three true items;And judge that item judges item in first group of class at second group similar to the distribution and second in document by using third Like the distribution in document, the susceptibility is calculated;And if the susceptibility calculated is equal to or more than scheduled second threshold, Select the target fact item true as original judgement distinctiveness.
In one embodiment, it extracts and newly judges that the distinctiveness fact includes: by using third fact item and corresponding Third judges that item, similar to the appearance ratio in document, calculates the correlation of each third fact item at second group;And if The correlation of third fact item is equal to or more than scheduled third threshold value, selects third fact item as new judgement difference sexual behavior It is real.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to pass through meter First group is calculated similar to the distance between each of document document and document obtained, calculates first group similar in document The fact that each document distance, wherein calculating first group of class by using the counting of the true items of difference between two documents Like the distance between each of document document and document obtained;By using first group similar to each of document The fact that the calculating of document distance, second judge that item and first judges change distance between item by calculating each, calculate Each second judgement item distance for judging item, wherein by first group similar to each of document document the fact distance Be averaged, calculate each second judge item and first judgement between change distance;If second judges the judgement item of item Distance is equal to or less than scheduled 4th threshold value, then selects second to judge that item judges as approximation;And pass through identification first The true item of difference between the fact that true item and the approximation judge, extracts the difference sexual behavior for the approximate judgement It is real.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to pass through knowledge The true items of difference between the fact that other first true item and first group are similar to each of document document, are first group of class The candidate approximate judgement true with distinctiveness is generated like each of document document;There is difference sexual behavior using candidate The approximate judgement for having distinctiveness true is extracted in real approximate judgement, comprising: transfer figure is generated, wherein each in transfer figure A endpoint node is to judge item, and each of transfer figure non-end node is true item;All candidates had into area The real approximate judgement of other sexual behavior is arranged in transfer figure, wherein each paths of two endpoint nodes in connection transfer figure refer to Show a candidate approximate judgement true with distinctiveness;The each of any two node is connected by being recorded in transfer figure The rate of connections on side calculates the importance on each side in transfer figure;It is scheduled to identify that its importance is equal to or more than The important side of 5th threshold value;At least one distinctiveness path is generated, wherein the distinctiveness path is made of important side, and And the distinctiveness path judges that item is connected to the first judgement item for second;And each distinctiveness path is translated to has The true approximate judgement of distinctiveness.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to generate time The other sexual behavior in constituency is real, comprising: using the first true item different from the second true item, candidate original judgement is specified to distinguish sexual behavior It is real;Using the second true item different from the first true item, specified candidate newly judges the distinctiveness fact, wherein candidate original judgement The number of the distinctiveness fact and candidate newly judge that the sum of number of the distinctiveness fact is equal to scheduled number;Verify first group it is similar Candidate difference sexual behavior reality in document, comprising: first group of identification newly judges that distinctiveness is true comprising candidate similar in document, but It is the document not comprising the candidate original judgement distinctiveness fact, and the judgement item of the document identified and first judges item not Together;And if one of the document identified judges that item is to concentrate to judge item, by candidate's difference sexual behavior, label is in fact, Wherein correspond to the concentration and judges that ratio of the document of item in all documents identified is equal to or more than the scheduled 6th Threshold value;And generate the approximate judgement for having distinctiveness true, comprising: the candidate difference sexual behavior having verified that described in selection is implemented as Distinctiveness is true;And the concentration is selected to judge that item judges as approximation.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to be closed In the change tree of document obtained, wherein changing the knot that tree is specific for one group of knowledge information relevant to document obtained Structure data, wherein each non-end node is true item, and each endpoint node is to judge item;And pass through selection The paths obtained for changing two endpoint nodes in tree are linked, the approximate judgement for having distinctiveness true is generated.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness can also include: detection Similar distinctiveness is true;It is true to merge similar distinctiveness;There is the close of the distinctiveness fact using combined distinctiveness fact adjustment Like judgement.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor When row, so that processor performs the following operations: tool is presented in the list by exporting all approximate judgements true with distinctiveness The real approximate judgement of sexual behavior of having any different.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor When row, so that processor performs the following operations: changing distance less than scheduled 7th threshold value with distinctiveness by exporting it True approximate judgement, or the approximate judgement true with distinctiveness with minimum change distance of output predetermined number, The approximate judgement for having distinctiveness true is presented.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor When row, so that processor performs the following operations: by calculating each coverage rate with the true approximate judgement of distinctiveness, Middle coverage rate be first group similar in document with the approximate ratio for judging matched document with the distinctiveness fact;With And judged by exporting the true approximation of the distinctiveness that there is its coverage rate to be equal to or more than scheduled 8th threshold value, or pass through The approximate judgement true with distinctiveness with maximal cover rate of output predetermined number, to present with the distinctiveness fact Approximation judgement.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor When row, so that processor performs the following operations: by presenting with the true approximate judgement of distinctiveness together output change tree With the approximate judgement that distinctiveness is true.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor When row, so that processor performs the following operations: the thing between the fact that the fact that presentation first judges item and the approximation judge Real difference, wherein the fact difference causes to judge the variation of item to the approximate judgement from first.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor When row, so that processor performs the following operations: for each approximate judgement true with distinctiveness, indicating text obtained The sentence for corresponding to the original judgement distinctiveness fact in shelves, and the new judgement difference sexual behavior in instruction document obtained It is real.
In addition, according to another aspect of the present invention, a kind of device for similar document searching can be provided.The dress Setting may include processor and the memory for being stored thereon with instruction, when described instruction is executed by processor, so that processor It performs the following operations: receiving input document;Determine that at least one described for inputting document has difference sexual behavior based on the above method Real approximate judgement;And using at least one described approximate judgement true with distinctiveness, obtain the input document One group similar to document.
In one embodiment, which is the radiography report for including discovery item and diagnosis item, the hair Existing item is selected as the first true item, and the diagnosis item is selected as the first judgement item.
In one embodiment, which is the trip for including user's interested project and Reiseziel project Row handbook, the interested project of user is selected as the first true item, and the Reiseziel project is selected Item is judged as first.
In one embodiment, which is product Jie for including product parameters project and product type project It continues, the product parameters project is selected as the first true item, and the product type project is selected as first and sentences Disconnected item.
Note that those skilled in the art are it will be clearly understood that the embodiment in the application can be combined arbitrarily.
Method and system of the invention may be achieved in many ways.For example, can by software, hardware, firmware or Software, hardware, firmware any combination realize method and system of the invention.The said sequence of the step of for the method Merely to be illustrated, the step of method of the invention, is not limited to sequence described in detail above, special unless otherwise It does not mentionlet alone bright.In addition, in some embodiments, also the present invention can be embodied as to record program in the recording medium, these programs Including for realizing machine readable instructions according to the method for the present invention.Thus, the present invention also covers storage for executing basis The recording medium of the program of method of the invention.
Although some specific embodiments of the invention are described in detail by example, the skill of this field Art personnel it should be understood that above example merely to being illustrated, the range being not intended to be limiting of the invention.The skill of this field Art personnel are it should be understood that can without departing from the scope and spirit of the present invention modify to above embodiments.This hair Bright range is defined by the following claims.

Claims (44)

1. a kind of method for determining the approximate judgement for having distinctiveness true, comprising:
A) document obtains step, and for obtaining document, wherein document obtained judges item comprising first, and described first sentences Disconnected item is the keyword of predefined type;
B) document analysis step, from document obtained extract first judge item and first the fact item, wherein each first True item is to judge the associated information of item with first;
C) similar document analysis step, for judging that item and the first true item obtain first group similar to document using first, and Judge that the second of item judges item and the second true item for being different from first similar to extraction in document from first group;
D) detecting step is judged with the true approximation of distinctiveness, for judging by using first group similar to document and second Item and the second true item detect at least one and judge with the true approximation of distinctiveness, in which:
At least one described approximation with the distinctiveness fact judges to be made of the distinctiveness fact and approximate judgement;
Distinctiveness fact instruction first judges item and second judges difference between item;And
The approximate judgement is one in the second judgement item, and the approximate judgement and described first judges changing between item Displacement changes distance and judges that item and second judges difference between item according to first from being less than scheduled first threshold wherein described Sexual behavior determines in fact.
2. the method as described in claim 1, wherein extracting true item from document and judging item further include:
Keyword is extracted from the document;And
The judgement item is identified from extracted keyword, and selects remaining keyword as the true item.
3. method according to claim 2, wherein identifying that the judgement item further includes at least one of the following:
From judge to select in item entry domain keyword as judging item;
Select keyword as judging item according to scheduled configuration;And
Select keyword as judging item by user.
4. method as claimed in claim 3, wherein selecting keyword as judging item according to scheduled configuration further include:
The keyword in sentence is selected, wherein the judgement and/or objective result of sentence expression subjectivity.
5. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) second judge that item extracts the original judgement distinctiveness fact for each;
2) second judge that item extracts for each and newly judge that distinctiveness is true;
3) judge that distinctiveness is true using the extracted original judgement distinctiveness fact and newly, calculate each second judge item with First judges the change distance between item;And
4) changed using it and judge item apart from be less than scheduled first threshold second, generated described with the close of the distinctiveness fact Like judgement.
6. method as claimed in claim 5, wherein extracting the original judgement distinctiveness fact and including:
Select one in the first true item as target fact item;
Calculate the susceptibility of the target fact item, comprising:
Second group is obtained similar to document by deleting the target fact item using the first true item;
Judge that the different third of item judges item and third fact item from first from second group similar to extracting in document;And
Judge that item judges item in first group of similar document similar to the distribution in document at second group with second by using third Distribution, calculate the susceptibility;And
If susceptibility calculated is equal to or more than scheduled second threshold, select the target fact item as the original Beginning judges the distinctiveness fact.
7. method as claimed in claim 6, newly judging that the distinctiveness fact includes: wherein extracting
Appearance ratio of the item in second group of similar document is judged with corresponding third by using third fact item, is calculated each The correlation of a third fact item;And
If the correlation of third fact item is equal to or more than scheduled third threshold value, select third fact item as described new Judge the distinctiveness fact.
8. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) by calculating first group similar to the distance between each of document document and document obtained, first group is calculated The fact that each of similar document document distance, wherein by using the counting of the true items of difference between two documents, First group is calculated similar to the distance between each of document document and obtained document;
2) the fact that calculating by using first group similar to each of document document distance, by calculate each second Judge that item and first judges change distance between item, calculate each second judgement item distance for judging item, wherein by pair First group similar to each of document document the fact distance be averaged, calculate each second judge item and first judgement Between change distance;
If 3) second judge that the judgement item distance of item is equal to or less than scheduled 4th threshold value, select second judge item as The approximate judgement;And
4) the true item of difference between the fact that judged by the true item of identification first and the approximation, extracts the approximation and sentences Disconnected distinctiveness is true.
9. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) difference between the fact that by the true item of identification first and first group similar to each of document document is true , generate the candidate approximate judgement true with distinctiveness of each document;
2) the approximate judgement true with distinctiveness is extracted using the candidate approximate judgement true with distinctiveness, Include:
Generate transfer figure, wherein each of described transfer figure endpoint node be judge item, and it is described shift figure in it is every One non-end node is true item;
All candidate approximate judgements true with distinctiveness are arranged in the transfer figure, wherein connecting the transfer figure In each paths of two endpoint nodes indicate a candidate approximate judgement true with distinctiveness;
The rate of connections for connecting each side of any two node in the transfer figure by being recorded in, calculates in the transfer The importance on each side in figure;
Identify that its importance is equal to or more than the important side of scheduled 5th threshold value;
At least one distinctiveness path is generated, wherein the distinctiveness path is made of important side, and the distinctiveness road Diameter judges that item is connected to the first judgement item for second;And
Each distinctiveness path is translated into the approximate judgement true with distinctiveness.
10. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) it is real to generate candidate difference sexual behavior, comprising:
Using the first true item different from the second true item, specify candidate original judgement distinctiveness true;
Using the second true item different from the first true item, it is specified it is candidate newly judge that distinctiveness is true, original sentence wherein candidate The number of disconnected distinctiveness fact and the candidate number for newly judging the distinctiveness fact and equal to scheduled number;
2) first group is verified similar to the candidate difference sexual behavior reality in document, comprising:
Identify that first group newly judges that distinctiveness is true original but do not include the candidate comprising the candidate similar in document The judgement item for the document for judging the document of the distinctiveness fact, and identifying is different from first and judges item;And
If one of the document identified judges that item is to concentrate to judge item, the candidate difference sexual behavior is marked in fact to test Card, wherein corresponding to the ratio for concentrating the document for judging item in all documents identified equal to or more than scheduled 6th threshold value;And
3) the approximate judgement true with distinctiveness is generated, comprising:
The candidate difference sexual behavior having verified that is selected to be implemented as the distinctiveness true;And
The concentration is selected to judge item as the approximate judgement.
11. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) obtain change tree about document obtained, wherein the change set be specific for it is relevant to document obtained The structural data of one group of knowledge information, wherein each non-end node is true item, and each endpoint node is to sentence Disconnected item;And
2) paths obtained for changing two endpoint nodes in tree are linked by selection, generated described with distinctiveness True approximate judgement.
12. the method as described in any one of claim 5 to 11, wherein the approximate judgement inspection true with distinctiveness Survey step further include:
It is true to detect similar distinctiveness;
It is true to merge similar distinctiveness;
It is true using combined distinctiveness, adjust the approximate judgement for having distinctiveness true.
13. the method as described in claim 1, further includes: by the column for exporting all approximate judgements true with distinctiveness The approximate judgement true with distinctiveness is presented in table.
14. the method as described in claim 1 further includes that the approximation with the distinctiveness fact is presented by following operation Judgement:
It exports it and changes the approximate judgement true with distinctiveness that distance is less than scheduled 7th threshold value, or
Export the approximate judgement true with distinctiveness with minimum change distance of predetermined number.
15. the method as described in claim 1 further includes that the approximation with the distinctiveness fact is presented by following operation Judgement:
Each coverage rate with the true approximate judgement of distinctiveness is calculated, wherein the coverage rate is in first group of similar text Shelves in the approximate ratio for judging matched document with the distinctiveness fact;And
The approximation true with distinctiveness that its coverage rate is exported equal to or more than scheduled 8th threshold value judges, or output is predetermined The approximate judgement true with distinctiveness with maximal cover rate of number.
16. method as claimed in claim 11, further includes: by defeated together with the approximate judgement true with distinctiveness The approximate judgement true with distinctiveness is presented in the change tree out.
17. the method as described in claim 1, further include be presented for first the fact that judge item and the fact that the approximation judges it Between the fact difference, wherein the fact difference causes judge item to the approximate variation judged from first.
18. the method as described in any one of claim 5 to 7, further includes: for each with the close of the distinctiveness fact Like judgement, indicate to correspond to the sentence of the original judgement distinctiveness fact in document obtained, and indicate to be obtained Document in the new judgement distinctiveness it is true.
19. a kind of method for similar document searching, comprising:
A) input document is received;
B) based on method described in any one of claims 1 to 18, determine the input document at least one with area The real approximate judgement of other sexual behavior;And
C) using at least one described approximate judgement true with distinctiveness, one group of similar text of the input document is obtained Shelves.
20. method as claimed in claim 19, wherein
The input document is the radiography report for including discovery item and diagnosis item, and the discovery item is selected as the first thing Real item, and the diagnosis item is selected as the first judgement item.
21. method as claimed in claim 19, wherein
The input document is the shell folder for including user's interested project and Reiseziel project, and user's sense is emerging The project of interest is selected as the first true item, and the Reiseziel project is selected as the first judgement item.
22. method as claimed in claim 19, wherein
The input document is the product introduction for including product parameters project and product type project, the product parameters project quilt It is selected as the first true item, and the product type project is selected as the first judgement item.
23. a kind of for determining the device for the approximate judgement for having distinctiveness true, comprising:
A) document obtaining unit, for obtaining document, wherein document obtained judges item comprising first, and first judges item It is the keyword of predefined type;
B) judge item and true item extraction unit, judge item and true item for extracting;
C) document analysis unit, for extracting first from the document obtained using the judgement item and true item extraction unit Judge item and the first true item, wherein each first true item is to judge the associated information of item with first;
D) similar document analysis unit, for judging that item and the first true item obtain first group similar to document using first, and The of item is judged different from first for using the judgement item and true item extraction unit to extract from first group of similar document Two judge item and the second true item;
E) detection unit is judged with the true approximation of distinctiveness, for judging by using first group similar to document and second Item and the second true item detect at least one and judge with the true approximation of distinctiveness, in which:
At least one described approximation with the distinctiveness fact judges to be made of the distinctiveness fact and approximate judgement;
Distinctiveness fact instruction first judges item and second judges difference between item;And
The approximate judgement is one in the second judgement item, and the approximate judgement and described first judges changing between item Displacement changes distance and judges that item and second judges difference between item according to first from being less than scheduled first threshold wherein described Sexual behavior determines in fact.
24. device as claimed in claim 23, wherein the judgement item and true item extraction unit further include:
Keyword extracting unit, for extracting keyword from the document;
Item recognition unit is judged, for identifying the judgement item from extracted keyword;And
True selecting unit, for selecting remaining keyword as the true item.
25. device as claimed in claim 24, the judgement item recognition unit further includes at least one of lower unit:
For from judging to select keyword as the unit for judging item in item entry domain;
For selecting keyword as the unit for judging item according to scheduled configuration;And
For selecting keyword as the unit for judging item by user.
26. device as claimed in claim 25, wherein for selecting keyword as the list for judging item according to scheduled configuration Member further include:
For selecting the unit of the keyword in sentence, wherein the judgement and/or objective result of sentence expression subjectivity.
27. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) original judgement distinctiveness fact extraction unit, for second judge that item extracts original judgement difference sexual behavior for each It is real;
2) newly distinctiveness fact extraction unit is judged, for second judge that item extracts for each and newly judging that distinctiveness is true;
3) change metrics calculation unit, for judging the distinctiveness fact with new using the extracted original judgement distinctiveness fact, It calculates each and second judges that item and first judges change distance between item;And
4) the first approximate judgement generates unit, judges item apart from be less than scheduled first threshold second for using it to change, Generate the approximate judgement true with distinctiveness.
28. device as claimed in claim 27, wherein the original judgement distinctiveness fact extraction unit further include:
Target fact item selecting unit, for select one in the first true item as target fact item;
Susceptibility computing unit, for calculating the susceptibility of the target fact item, comprising:
Second group is deleted the target fact item similar to document obtaining unit, for passing through using the first true item, obtains second The similar document of group;
Third judges item and third fact item extraction unit, for judging that item is different from first from second group similar to extraction in document Third judge item and third fact item;And
Susceptibility computation subunit, for judging item at second group similar to the distribution and the second judgement in document by using third Item, similar to the distribution in document, calculates the susceptibility at first group;And
Original judgement distinctiveness fact selecting unit, if being equal to or more than scheduled second threshold for susceptibility calculated Value selects the target fact item true as the original judgement distinctiveness.
29. device as claimed in claim 28, wherein the new judgement distinctiveness fact extraction unit further include:
Correlation calculations unit, for judging item in second group of similar document with corresponding third by using third fact item Appearance ratio, calculate the correlation of each third fact item;And
Newly distinctiveness fact selecting unit is judged, if the correlation for third fact item is equal to or more than scheduled third Threshold value selects third fact item true as the new judgement distinctiveness.
30. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) true metrics calculation unit, for by calculating first group similar to each of document document and document obtained The distance between, calculate first group similar to each of document document the fact distance, wherein by using two documents it Between the true items of difference counting, calculate first group similar to the distance between each of document document and obtained document;
2) judge item metrics calculation unit, the fact that for by using first group of calculating similar to each of document document Distance second judges that item and first judges change distance between item by calculating each, calculates each and second judges item Judgement item distance, wherein by first group similar to each of document document the fact distance be averaged, calculate every The one second change distance judged between item and the first judgement;
3) second judges item selecting unit, if judging that the judgement item distance of item is equal to or less than the scheduled 4th for second Threshold value selects second to judge item as the approximate judgement;And
4) distinctiveness fact extraction unit, for by identifying between the fact that the first true item and the approximation judge not With fact item, the distinctiveness for extracting the approximate judgement is true.
31. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) candidate approximate judgement generates unit, for literary similar to each of document by the true item of identification first and first group The true item of difference between the fact that shelves, the candidate approximation with the distinctiveness fact for generating each document are sentenced It is disconnected;
2) approximate to judge extraction unit, for using the approximate judgement true with distinctiveness of the candidate, extract the tool The real approximate judgement of sexual behavior of having any different, comprising:
Transfer figure generate unit, for generate transfer figure, wherein each of described transfer figure endpoint node be judge item, and And each of described transfer figure non-end node is true item;
Candidate approximation judges arrangement unit, for all candidate approximate judgements true with distinctiveness to be arranged in described turn It moves in figure, wherein each paths for connecting two endpoint nodes in the transfer figure indicate that one candidate has distinctiveness True approximate judgement;
Importance computing unit, for connecting the connection on each side of any two node in the transfer figure by being recorded in Frequency calculates the importance on each side in the transfer figure;
Important side recognition unit, its importance is equal to or more than the important side of scheduled 5th threshold value for identification;
Distinctiveness path generates unit, for generating at least one distinctiveness path, wherein the distinctiveness path is by important Side composition, and the distinctiveness path judges that item is connected to the first judgement item for second;And
Unit is translated, for each distinctiveness path to be translated to the approximate judgement true with distinctiveness.
32. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) candidate difference sexual behavior generates unit in fact, real for generating candidate difference sexual behavior, comprising:
Candidate original judgement distinctiveness fact designating unit, for using the first true item for being different from the second true item is specified to wait Select original judgement distinctiveness true;
Candidate newly judges distinctiveness fact designating unit, for using the second true item different from the first true item, specifies and waits Choosing newly judges the distinctiveness fact, wherein the number of the candidate original judgement distinctiveness fact and the candidate number for newly judging the distinctiveness fact Purpose and be equal to scheduled number;
2) candidate difference sexual behavior reality authentication unit, it is real similar to the candidate difference sexual behavior in document for verifying first group, comprising:
Document identification unit newly judges the distinctiveness fact still not for first group similar in document comprising the candidate for identification Document comprising the original judgement distinctiveness fact of the candidate, and the judgement item of the document identified is different from the first judgement ?;And
Candidate's difference sexual behavior reality marking unit, if one of the document for identifying judges that item is to concentrate to judge item, by institute Stating candidate difference sexual behavior, label is in fact, wherein concentrating the document for judging item in all documents identified corresponding to described In ratio be equal to or more than scheduled 6th threshold value;And
3) the second approximate judgement generates unit, for generating the approximate judgement true with distinctiveness, comprising:
The candidate difference sexual behavior reality selecting unit having verified that, for selecting the candidate difference sexual behavior having verified that be implemented as the difference Sexual behavior is real;And
Concentration judges item selecting unit, for selecting the concentration to judge item as the approximate judgement.
33. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) change tree obtaining unit, for obtaining the change tree about document obtained, wherein change tree is specific for The structural data of one group of knowledge information relevant to document obtained, wherein each non-end node is true item, and And each endpoint node is to judge item;And
2) third approximation judgement generates unit, for changing one of two endpoint nodes in tree by the way that selection link is obtained Paths generate the approximate judgement true with distinctiveness.
34. the device as described in any one of claim 27 to 33, wherein the approximate judgement true with distinctiveness Detection unit further include:
Similar distinctiveness fact detection unit, it is true for detecting similar distinctiveness;
Similar distinctiveness fact combining unit, it is true for merging similar distinctiveness;
Approximation judges adjustment unit, for using the distinctiveness of merging true, adjusts the approximate judgement for having distinctiveness true.
35. device as claimed in claim 23 further includes that the first approximation judges display unit, for having by the way that output is all The approximate judgement true with distinctiveness is presented in the list of the true approximate judgement of distinctiveness.
36. device as claimed in claim 23 further includes that the second approximation judges display unit, for being in by following operation The existing approximate judgement true with distinctiveness:
It exports it and changes the approximate judgement true with distinctiveness that distance is less than scheduled 7th threshold value, or
Export the approximate judgement true with distinctiveness with minimum change distance of predetermined number.
It further include the of the approximate judgement true with distinctiveness for rendering 37. device as claimed in claim 23 Three approximations judge display unit, further include:
Coverage rate computing unit, for calculating each coverage rate with the true approximate judgement of distinctiveness, wherein described cover Lid rate is to judge matched document at first group similar to the ratio in document with the approximate of the distinctiveness fact;And
Approximation judges output unit, has distinctiveness true equal to or more than scheduled 8th threshold value for exporting its coverage rate Approximate judgement, or output predetermined number judges with true approximate of distinctiveness with maximal cover rate.
38. device as claimed in claim 33, further include the 4th it is approximate judge display unit, be used for by with distinctiveness True approximate judgement exports the change tree together, and the approximate judgement true with distinctiveness is presented.
39. device as claimed in claim 23 further includes true difference display unit, first the fact that judge item for rendering The fact that between approximation the fact that judge difference, wherein the fact difference causes to judge item to the approximation from first The variation of judgement.
40. the device as described in any one of claim 27 to 29 further includes indicating unit, for having for each The true approximate judgement of distinctiveness indicates the sentence for corresponding to the original judgement distinctiveness fact in document obtained, And indicate that the new judgement distinctiveness in document obtained is true.
41. a kind of device for similar document searching, comprising:
A) Document Creator unit is inputted, for receiving input document;
B) for determining the approximate judgement for having distinctiveness true according to any one in claim 23 to 40 Device, for determining at least one approximate judgement true with distinctiveness of the input document;And
C) similar document obtaining unit, for using at least one described approximation with the distinctiveness fact to judge, described in acquisition One group of input document is similar to document.
42. device as claimed in claim 41, wherein
The input document is the radiography report for including discovery item and diagnosis item, and the discovery item is selected as the first thing Real item, and the diagnosis item is selected as the first judgement item.
43. device as claimed in claim 41, wherein
The input document is the shell folder for including user's interested project and Reiseziel project, and user's sense is emerging The project of interest is selected as the first true item, and the Reiseziel project is selected as the first judgement item.
44. device as claimed in claim 41, wherein
The input document is the product introduction for including product parameters project and product type project, the product parameters project quilt It is selected as the first true item, and the product type project is selected as the first judgement item.
CN201410587566.9A 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true Active CN105630788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410587566.9A CN105630788B (en) 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410587566.9A CN105630788B (en) 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true

Publications (2)

Publication Number Publication Date
CN105630788A CN105630788A (en) 2016-06-01
CN105630788B true CN105630788B (en) 2019-05-03

Family

ID=56045742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410587566.9A Active CN105630788B (en) 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true

Country Status (1)

Country Link
CN (1) CN105630788B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362735B (en) * 2019-07-15 2022-05-13 北京百度网讯科技有限公司 Method and device for judging the authenticity of a statement, electronic device, readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567011A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Document processing device and document processing method
CN103294671A (en) * 2012-02-22 2013-09-11 腾讯科技(深圳)有限公司 Document detection method and system
CN103903164A (en) * 2014-03-25 2014-07-02 华南理工大学 Semi-supervised automatic aspect extraction method and system based on domain information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007219880A (en) * 2006-02-17 2007-08-30 Fujitsu Ltd Reputation information processing program, method, and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567011A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Document processing device and document processing method
CN103294671A (en) * 2012-02-22 2013-09-11 腾讯科技(深圳)有限公司 Document detection method and system
CN103903164A (en) * 2014-03-25 2014-07-02 华南理工大学 Semi-supervised automatic aspect extraction method and system based on domain information

Also Published As

Publication number Publication date
CN105630788A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
Liu et al. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations
CN105760495B (en) A kind of knowledge based map carries out exploratory searching method for bug problem
Chuang et al. Topic model diagnostics: Assessing domain relevance via topical alignment
CN104573130B (en) The entity resolution method and device calculated based on colony
Chou et al. PaperVis: Literature review made easy
CN101566997A (en) Determining words related to given set of words
CN101223525A (en) Relationship networks
Laenen et al. Web search of fashion items with multimodal querying
JP2015532495A (en) System and method for presenting and navigating network data sets
Strötgen et al. TimeTrails: a system for exploring spatio-temporal information in documents
CN106095738A (en) Recommendation tables single slice
CN112966091A (en) Knowledge graph recommendation system fusing entity information and heat
Zigkolis et al. Collaborative event annotation in tagged photo collections
Li et al. Attribute-aware explainable complementary clothing recommendation
Yang et al. Managing discoveries in the visual analytics process
JPWO2010013472A1 (en) Data classification system, data classification method, and data classification program
CN105630788B (en) Method and apparatus for determining the approximate judgement for having distinctiveness true
Villaespesa et al. A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection
KR20190023503A (en) Image based patent search apparatus
JP5117589B2 (en) Document analysis apparatus and program
Nguyen et al. Social tagging analytics for processing unlabeled resources: A case study on non-geotagged photos
JP2014102625A (en) Information retrieval system, program, and method
Jayashree et al. Multimodal web page segmentation using self-organized multi-objective clustering
Yoon et al. A conference paper exploring system based on citing motivation and topic
Pocco et al. DRIFT: A visual analytic tool for scientific literature exploration based on textual and image content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant