CN105630788A - Method and device for determining approximate judgment with distinctive truth - Google Patents

Method and device for determining approximate judgment with distinctive truth Download PDF

Info

Publication number
CN105630788A
CN105630788A CN201410587566.9A CN201410587566A CN105630788A CN 105630788 A CN105630788 A CN 105630788A CN 201410587566 A CN201410587566 A CN 201410587566A CN 105630788 A CN105630788 A CN 105630788A
Authority
CN
China
Prior art keywords
item
true
distinctiveness
judgement
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410587566.9A
Other languages
Chinese (zh)
Other versions
CN105630788B (en
Inventor
张碧川
黄耀海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201410587566.9A priority Critical patent/CN105630788B/en
Publication of CN105630788A publication Critical patent/CN105630788A/en
Application granted granted Critical
Publication of CN105630788B publication Critical patent/CN105630788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a method and a device for determining approximate judgment with distinctive truth. The method comprises the steps of acquiring a document, wherein the acquired document comprises a first judgment item, and the first judgment item is keywords of a predetermined type; extracting the first judgment item and first truth items from the acquired document, wherein each first truth item is the related information of the first judgment item; acquiring a first group of similar documents by using the first judgment item and the first truth items, and extracting a second judgment item different from the first judgment item and second truth items from the first group of similar documents; and detecting at least one approximate judgment with the distinctive truth by using the first group of similar documents, the second judgment item and the second truth items.

Description

For determining the method and apparatus with the true approximate judgement of distinctiveness
Technical field
The present invention relates to the search of similar document, especially, relate to the search being similar to the document that the past being currently entered document creates.
Background technology
User always needs to utilize some document at hand judge or determine, such as, doctor can by providing diagnostic result with reference to some existing diagnosis report, and traveller can use shell folder selects which goes, or client can pass through reference product introduction determines to buy which product. User can pass through to use document like current document searching class, in order to helps to judge, and has a look at for, in the case of similar, being made that what judges or determines in the past.
Such as, in similar document searching processes, input document for one, it may be determined that with this most similar document of input document as output result.
In US2013/0044925, it is proposed that similar case retrieval device and similar case retrieval method. In the method for this patent application, it is judged that item is the key word of a predefined type, it is the kernel keyword that user wants to determine. True item is the information of some specified type being associated with this judgement item. For the application about diagnosis report, selecting diagnosis item, such as disease outcome or disease result are as judging item, and select to find that item is as true item. In the method, the diagnostic tree created according to diagnosis item and discovery item is used to scan for.
Figure 1A illustrates the flow chart of the method for similar case retrieval in the patent application US2013/0044925 of prior art. With reference to Figure 1A, in step 110, receive an input document. In step 120, extract the judgement item of input document and true item. In step 130, the fact that judge item with the extraction item of the extraction of input document is used to retrieve document one group similar.
Figure 1B illustrates that the fact that judge item with the extraction item of the extraction using input document in US2013/0044925 retrieves the flow chart of the process of document one group similar. With reference to Figure 1B, in step 131, extract the relation judging item and true item. Then, in step 132, select to judge that item and true item are to set up diagnostic tree based on the relation extracted. Finally, in step 133, diagnostic tree is used to retrieve some similar documents in document database.
Fig. 1 C illustrates the schematic diagram of the diagnostic tree in the patent application US2013/0044925 of prior art. The method adopting US2013/0044925, it is possible to use diagnostic tree as shown in Figure 1 C retrieves the document being similar to input document from document database.
At patent US8, in 352,416, it is proposed that for searching for another similar method of similar document. This United States Patent (USP) relates generally to diagnosis report search, and uses by diagnostic result and find that the structure that item forms scans for. Such as, the symptom occurred together continually and a disease may be constructed a structure. If the document before in document database has identical structure with input document, then the document is likely to be retrieved.
Fig. 2 A illustrates the flow chart of the method for similar document searching in patent US8,352,416. With reference to Fig. 2 A, in step 210, receive input document. In step 220, extract the judgement item of input document and true item. In step 230, the fact that judge item with the extraction item of the extraction of input document is used to retrieve document one group similar.
Fig. 2 B illustrates that the fact that judge item with the extraction item of the extraction using input document in patent US8,352,416 retrieves the flow chart of the process of document one group similar. With reference to Fig. 2 B, in step 231, extract the relation judging item and true item. Then, in step 232, select the judgement item with predetermined relationship type and true item as a structure. Finally, in step 233, some the similar documents in this structure retrieval document database are used.
Fig. 2 C illustrates the schematic diagram of the structure used in prior art US8,352,416. In the structure of Fig. 2 C, it is shown that the counting of semantic primitive and semantic primitive, semantic primitive includes the description of symptom and the title of the disease of diagnosis. According to this counting, it is possible to extract and include desired crucial contamination, and also the entry except desired key word can be extracted from the combination extracted as relevant key word. One or both diagnosis reports including in desired key word and relevant key word can be retrieved. The method adopting US8,352,416, it is possible to retrieval is similar to the document of input document from document database.
At US2013/0044925 and US8, similar document search method in 352,416 and in other method of prior art, extracts key word from input document, and the then relation between analysis of key word, in order to find the similar document comprising the similar key word with similar relation. In the prior art method, simply show the document a result, but do not account for the true purpose that user scans for.
The search of similar document is different from the search utilizing inquiry. If user utilizes query search document, inquiry can reflect the purpose of user and the aspect of user's concern. But, when user is with document that document searching is similar, he/her is still solely focused on certain aspect, and this aspect is the judgement item of the document.
The method adopting prior art, only can return a series of document to user. Result mainly comprises the judgement item identical with input document, and it can not provide the user with has different some similar documents judging item. If user wants multilevel iudge item, he/her needs to read many documents, and this is time-consuming.
With in the Search Results that method of the prior art is retrieved judge item with input document in judgement item substantially the same. It is necessary for returning the document with identical judgement item, but return has difference and judges that the similar document of item is more useful. Such as, doctor provides diagnostic result when report. Return to have and very similar find item but there is the report of different diagnostic results to be useful. Such as, there is identical patient's Index for examination and identical patients symptomatic, but the report with different diseases is useful. This should make the signal of interest of diagnosis in this case modestly by providing him/her Xiang doctor.
Therefore, it is intended that propose to solve the new technique of at least one in problem of the prior art.
Summary of the invention
It is an object of the present invention to provide the valuable information of the actual search purpose of coupling user.
Another object of the present invention is by organizing Search Results to save the time of user's reading documents.
According to an aspect of the present invention, provide a kind of for determining the method with the true approximate judgement of distinctiveness, including: document obtains step, is used for obtaining document, the document package wherein obtained judges item containing first, and first judges that item is the key word of predefined type; Document analysis step, extracts first from the document obtained and judges item and the first true item, and the true item of each of which first is to judge, with first, the information that item is associated; Similar document analysis step, is used for using the first judgement item and the first true item to obtain first group of similar document, and is different from the second of the first judgement item judges item and the second fact item for extracting from first group of similar document; There is the approximate judgement detecting step that distinctiveness is true, for passing through to use first group of similar document and second to judge item and the second true item, detect at least one and there is the approximate judgement that distinctiveness is true, wherein: distinctiveness fact instruction first judges item and the second difference judging between item; And approximate judgement is one of second judgement item, and described approximate judgement and described first judges that the change distance between item is less than predetermined first threshold, and wherein said change distance instruction is distinguished first and judged that item and second judges the difficulty level of item.
According to another aspect of the present invention, it is provided that a kind of method for similar document searching, including: receive input document; Based on above-mentioned for determining the method with the true approximate judgement of distinctiveness, it is determined that at least one of described input document has the approximate judgement that distinctiveness is true; And at least one has the approximate judgement that distinctiveness is true described in using, it is thus achieved that one group of similar document of described input document.
According to a further aspect of the invention, provide a kind of for determining the device with the true approximate judgement of distinctiveness, including: document obtains unit, is used for obtaining document, the document package wherein obtained judges item containing first, and first judges that item is the key word of predefined type; Judge item and true item extraction unit, be used for extracting and judge item and fact item; For using, document analysis unit, judges that item and true item extraction unit extract first from the document obtained and judge item and the first true item, the true item of each of which first is to judge, with first, the information that item is associated; Similar document analysis unit, for using the first judgement item and the first true item to obtain first group of similar document, and judge that item and fact item extraction unit extract from first group of similar document and be different from the second of the first judgement item for using and judge item and the second fact item; There is the approximate judgement detection unit that distinctiveness is true, for passing through to use first group of similar document and second to judge item and the second true item, detect at least one and there is the approximate judgement that distinctiveness is true, wherein: distinctiveness fact instruction first judges item and the second difference judging between item; And approximate judge it is in the second judgement item, and approximate judge and the first changes distance judging between item is less than predetermined first threshold, wherein said change distance indicates differentiation first to judge that item and second judges the difficulty level of item.
According to a further aspect of the invention, it is provided that a kind of device for similar document searching, including: for receive input document input Document Creator unit; Above-mentioned for determining the device with the true approximate judgement of distinctiveness, for determining that at least one inputting document has the approximate judgement that distinctiveness is true; And similar document obtains unit, for use described at least one there is the approximate judgement that distinctiveness is true, it is thus achieved that one group of similar document of described input document.
One of advantages of the present invention is in that to provide the valuable information of the actual search purpose of coupling user.
A further advantage is that and can organize Search Results, such that it is able to save the time of user's reading documents.
By referring to the accompanying drawing detailed description to the illustrative embodiments of the present invention, the further feature of the present invention and advantage thereof will be made apparent from.
Accompanying drawing explanation
The accompanying drawing of the part comprising in the description and constituting description describes embodiments of the present invention, and is used for together with the description explaining principles of the invention.
Figure 1A illustrates the flow chart of the method for similar case retrieval in prior art US2013/0044925.
Figure 1B illustrates that the fact that judge item with the extraction item of the extraction using input document in US2013/0044925 retrieves the flow chart of the process of one group of similar document.
Fig. 1 C illustrates the schematic diagram of the diagnostic tree in prior art US2013/0044925.
Fig. 2 A illustrates the flow chart of the method for the similar document searching in patent US8,352,416.
Fig. 2 B illustrates that the fact that judge item with the extraction item of the extraction using input document in patent US8,352,416 retrieves the flow chart of the process of one group of similar document.
Fig. 2 C illustrates the schematic diagram of the structure used in patent US8,352,416.
Fig. 3 is the schematic block diagram illustrating the ability to implement the hardware configuration of the computer system 1000 of embodiments of the present invention.
Fig. 4 illustrates according to the embodiment of the present invention for determining the flow chart of the process with the true approximate judgement of distinctiveness.
Fig. 5 illustrates the example that shadowgraph is reported.
Fig. 6 illustrates the flow chart extracting the process with the true approximate judgement of distinctiveness according to the embodiment of the present invention by traveling through true item.
Fig. 7 illustrates according to the embodiment of the present invention for second judging that item extracts the flow chart of the true process of original judgement distinctiveness for each.
Fig. 8 illustrates according to the embodiment of the present invention for second judging that item extracts the flow chart of the process newly judging the distinctiveness fact for each.
Fig. 9 illustrates according to the embodiment of the present invention for extracting the flow chart of the process with the true approximate judgement of distinctiveness based on minimum change distance.
Figure 10 illustrates the flow chart using important path to extract the process with the true approximate judgement of distinctiveness according to the embodiment of the present invention.
Figure 11 illustrates the schematic diagram of the example with the true approximate judgement of distinctiveness producing candidate.
Figure 12 illustrates the schematic diagram of the example that important path excavates.
Figure 13 illustrates the flow chart extracting the process with the true approximate judgement of distinctiveness according to the embodiment of the present invention by changing true item.
Figure 14 illustrates and uses the flow chart changing the process that tree extracts the approximate judgement with the distinctiveness fact according to the embodiment of the present invention.
Figure 15 illustrates the schematic diagram of the example changing tree.
Figure 16 illustrates the flow chart of the method for similar document searching according to the embodiment of the present invention.
Figure 17 illustrates according to the embodiment of the present invention for determining the functional block diagram of the device 4000 with the true approximate judgement of distinctiveness.
Figure 18 illustrates the functional block diagram of device 5000 for similar document searching according to the embodiment of the present invention.
Detailed description of the invention
The various illustrative embodiments of the present invention are described in detail now with reference to accompanying drawing. It should also be noted that unless specifically stated otherwise, the parts otherwise set forth in these embodiments and positioned opposite, the numerical expression of step and numerical value do not limit the scope of the invention.
Description only actually at least one illustrative embodiments is illustrative below, never as any restriction to the present invention and application or use.
The known technology of person of ordinary skill in the relevant, method and apparatus are likely to be not discussed in detail, but in the appropriate case, described technology, method and apparatus should be considered to authorize a part for description.
Shown here with in all examples discussed, any occurrence should be construed as merely exemplary, not as restriction. Therefore, other example of illustrative embodiments can have different values.
It should also be noted that similar label and letter below figure represent similar terms, therefore, once a certain Xiang Yi accompanying drawing is defined, then it need not be further discussed in accompanying drawing subsequently.
Fig. 3 is the schematic block diagram illustrating the ability to implement the hardware configuration of the computer system 1000 of embodiments of the present invention. The method of the present invention can be implemented on the hardware of computer system 1000.
As shown in Figure 3, computer system includes computer 1110. Computer 1110 includes the processing unit 1120, system storage 1130, fixed non-volatile memory interface 1140, removable non-volatile memory interface 1150, user's input interface 1160, network interface 1170, video interface 1190 and the output peripheral interface 1195 that connect via system bus 1121.
System storage 1130 includes ROM (read only memory) 1131 and RAM (random access memory) 1132. BIOS (basic input output system) 1133 resides in ROM1131. Operating system 1134, application program 1135, other program module 1136 and some routine data 1137 reside in RAM1132.
The fixed non-volatile memory 1141 of such as hard disk etc is connected to fixed non-volatile memory interface 1140. Fixed non-volatile memory 1141 such as can store operating system 1144, application program 1145, other program module 1146 and some routine data 1147.
Such as the removable non-volatile memory of floppy disk 1151 and CD-ROM drive 1155 etc is connected to removable non-volatile memory interface 1150. Such as, diskette 1 152 can be inserted in floppy disk 1151, and CD (CD) 1156 can be inserted in CD-ROM drive 1155.
Such as the input equipment of mouse 1161 and keyboard 1162 etc is connected to user's input interface 1160.
Computer 1110 can pass through network interface 1170 and be connected to remote computer 1180. Such as, network interface 1170 can be connected to remote computer 1180 via LAN 1171. Or, network interface 1170 may be coupled to modem (modulator-demodulator) 1172, and modem 1172 is connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 can include the memorizer 1181 of such as hard disk etc, and it stores remote application 1185.
Video interface 1190 is connected to monitor 1191.
Output peripheral interface 1195 is connected to printer 1196 and speaker 1197.
Computer system shown in Fig. 3 is merely illustrative and is never intended to invention, its application, or uses are carried out any restriction.
Computer system shown in Fig. 3 can be incorporated in any embodiment, can as stand-alone computer, or also can as the process system in equipment, it is possible to remove one or more unnecessary assembly, it is also possible to be added to one or more additional assembly.
Fig. 4 illustrates according to the embodiment of the present invention for determining the flow chart of the process with the true approximate judgement of distinctiveness.
As shown in Figure 4, in step 2100, it is thus achieved that input document. In this application, the type of input document can include, but are not limited to, shadowgraph report, shell folder or product introduction.
Herein, we select a shadowgraph to report as an example. Fig. 5 illustrates the example that shadowgraph is reported.
The document can comprise some key word that can be classified as judge item and true item. Key word in input document can be referred to as the first judgement item and the first true item. In one embodiment, it is judged that item can be the key word of predefined type.
It follows that in step 2200, extract first from the document obtained and judge item and the first true item.
There is some methods using existing NLP technology to extract key word from document, such as so-called Entity recognition, subject distillation and keyword extraction. After extracting key word from input document, it is important that identify which key word is to judge item.
Shadowgraph is reported, it is possible to use document segment information selects to judge item. Such as, in shadowgraph is reported, it is possible to select the key word in " diagnosis " part as judging item, and the document paragraph information is to judge item entry domain information. Such as, it is judged that item entry territory can be diagnostic result, product type, destination etc.
Alternatively and/or additionally, it is possible to judge item according to predetermined configuration or rules selection. In one embodiment, select key word as judging that item can also include according to predetermined configurations: to select the key word in sentence, judgement that wherein sentence expression is subjective and/or objective result. Such as, if the meaning judging item that the context interpretation of a key word is subjective, then this key word can be selected as judging item.
Alternatively and/or additionally, user can oneself definition key word as judging item. Such as, before doctor scans for, he can select some key word that will be highlighted, and selects the disease in these key words as judging item, and selects out of Memory or symptom (finding) as true item. Each true item is and the information judging that item is associated.
In one embodiment, from document, extract true item and judge that item can also include: from document, extracting key word; And identification judges item from the key word extracted, and select all the other key words as true item.
In one embodiment, it is possible to from document, extract key word by least one in following operation: use the dictionary storage including judging item and true item; Use document layout information; And use the extraction model trained by ready training data.
Table 1 illustrates the example judging item and true item that shadowgraph is reported.
Table 1: the example judging item and true item of shadowgraph report
It follows that in step 2300, item and the first true item obtain first group of similar document to use first to judge, and extract from first group of similar document and be different from the second of the first judgement item and judge item and the second fact item.
Prior art exists many known methods for using the first judgement item and the first true item to obtain first group of similar document. In one embodiment, it is possible to directly inputted first group of similar document by user. Alternatively, it is possible to obtain first group of similar document by retrieving. Additionally, it is, for example possible to use the method for U.S. Patent No. US8,352,416. For the example of table 1, first group of similar document is the similar documents in 143 shown in table 2, and wherein document is by according to judging item labelling, in order to illustrate the distribution judged in these documents. It addition, use the method identical with step 2200 to extract from first group of similar document judge item and true item. Key word in first group of similar document can be referred to as the second judgement item and the second true item.
The example of 2 first groups of similar documents of table
It follows that in step 2400, by using first group of similar document and second to judge item and the second true item, detect at least one and there is the approximate judgement that distinctiveness is true, wherein: distinctiveness fact instruction first judges item and the second difference judging between item; Approximate judgement is in the second judgement item, and approximate judgement and first judges that the change distance between item is less than predetermined first threshold, and wherein said change distance instruction is distinguished first and judged that item and second judges the difficulty level of item. Note, can rule of thumb be defined predetermined first threshold by user.
The key point of the present invention is in that to find have the approximate judgement that distinctiveness is true. But the real approximate result judging to be likely to not obtain with the present invention with the distinctiveness fact is coincide. The present invention is not intended to find real approximate judgement, this is because obtain the real approximate very deep domain knowledge that judges whether to, and for extremely difficult people. Such as, doctor is difficult to the core difference symptom determined for distinguishing two similar diseases, and core difference symptom depends on the age of patient, sex, position and medical history.
In the present invention, simply use document analysis technology from first group of similar document, detect the approximate judgement being currently entered document and the distinctiveness fact. In this case, it is assumed that imply that the age of patient, sex, position and medical history with the form of key word in a document.
It follows that by description by using first group of similar document and second to judge item and the second true item, detect at least one detailed process with the true approximate judgement of distinctiveness.
According to an aspect of the present invention, it is possible to by traveling through true item detection, there is the approximate judgement that distinctiveness is true. In this process, each true item of input document will be checked, in order to identify which true item is that distinctiveness is true.
Fig. 6 illustrates the flow chart extracting the process with the true approximate judgement of distinctiveness according to the embodiment of the present invention by traveling through true item.
With reference to Fig. 6, in step 2410, second judge that item extracts the original judgement distinctiveness fact for each.
Fig. 7 illustrates according to the embodiment of the present invention for second judging that item extracts the flow chart of the true process of original judgement distinctiveness for each.
As it is shown in fig. 7, in step 2411, it is possible to select in the first true item as target fact item. It follows that in step 2412, calculate the sensitivity of target fact item.
In one embodiment, the sensitivity calculating target fact item may include that the true item of use first is by delete target fact item, it is thus achieved that second group of similar document; From second group of similar document extract be different from the 3rd of the first judgement item judge item and the 3rd the fact item; And by using the 3rd to judge, item distribution in second group of similar document judges item distribution in first group of similar document with second, calculates sensitivity.
Such as, for the example of table 1, it is possible to delete true item " tuberosity: irregular " from the first true item. Then remaining true item is used to scan for, it is possible to obtain 178 documents. Compared with the first group of similar document comprising 143 documents, there are 35 additional results, they are defined as second group of similar document. Table 3 illustrates that second (adding) organizes the example of similar document. According to the 3rd, document in table 3 is judged that item carries out labelling, in order to illustrate the distribution judged in these documents.
The example of 3 second groups of similar documents of table
If item is the distinctiveness fact of " pulmonary carcinoma " (first judges) to the fact that delete, described additional result will comprise other diagnostic result; Otherwise, this result will still include judging " pulmonary carcinoma ". In order to whether item is that distinctiveness is true to the fact that check deletion, additional 35 results (second group of similar document) will be used.
Can true item " tuberosity: irregular " calculated as below relative to the sensitivity of " pulmonary carcinoma ".
Sensitivity=(the 3rd judges item distribution in second group of similar document)/(second judges item distribution in first group of similar document)
Such as, 143 results exist the diagnostic result of the three types being different from the first judgement item (namely, second judges item, including: bronchiectasis, lung abscess, and emphysema), and exist in 35 results and be different from the two kinds of diagnostic result of the first judgement item (namely, 3rd judges item, including: bronchiectasis and lung abscess).
Sensitivity=(60%+35%)/(15%+5%+10%)
Referring back to Fig. 7, in step 2413, if the sensitivity calculated is equal to or more than predetermined Second Threshold, then target fact item can be selected true as original judgement distinctiveness. Note, it is possible to defined this threshold value by user according to their experience.
Referring back to Fig. 6, if be detected that an original judgement distinctiveness is true, we will check second group of similar document further, in order to determines whether there is other approximate judgement. This carries out in step 2420, is wherein that each second judges that item extracts and newly judge the distinctiveness fact.
Fig. 8 illustrates according to the embodiment of the present invention for second judging that item extracts the flow chart of the process newly judging the distinctiveness fact for each.
As shown in Figure 8, in step 2421, by using the 3rd true item to judge item appearance ratio in second group of similar document with the corresponding the 3rd, calculate the dependency of each the 3rd true item.
Such as, judge item " lung abscess " for the 3rd, it is possible to extract the 3rd true item, and one the 3rd true item being not included in input document is " hydrothorax: exist ". By this fact item of inspection whether with judge item " lung abscess " height correlation. In one embodiment, dependency will be calculated.
Such as, second group of similar document exists 12 documents relating to " lung abscess ", and 11 in them have true item " hydrothorax: exist ", so dependency=11/12.
It follows that in step 2422, if the dependency of the 3rd true item is equal to or more than the 3rd predetermined threshold value, then select the 3rd true item as newly judging that distinctiveness is true.
In the above example, because the dependency of " hydrothorax: exist " is more than predetermined threshold (such as, 80%, it rule of thumb can be defined by user), therefore select facts item " hydrothorax: exist " is as newly judging the distinctiveness fact.
It follows that referring back to Fig. 6, in step 2430, it is possible to use the original judgement distinctiveness of extraction is true and newly judges the distinctiveness fact, calculate each and second judge that item and first judges the change distance between item.
In the above example, first judges that item is " pulmonary carcinoma ", and second judges that in item is " lung abscess ", it is possible to is calculated as the true number of original judgement distinctiveness by changing distance and newly judges the number sum that distinctiveness is true.
It follows that in step 2440, it is possible to use it changes distance and judges that item produces to have the approximate judgement that distinctiveness is true less than the second of predetermined first threshold.
Alternatively, it is also possible to produce the approximate judgement with the distinctiveness fact with minimum change distance of predetermined number.
Additionally, if there is no newly judge that distinctiveness is true, then can detect approximate judgement by " removed multiformity " result of second group of similar document. " removed multiformity " refers to that the judgement of similar document is various and has multiple. Judge that distinctiveness is true owing to adding certain, decrease the multiformity of judgement, then the judgement being removed, it is simply that approximate judgement. Removed multiformity can be simply set as diagnosis maximum in different diagnosis. Such as, if 60% in second group of similar document relates to " bronchiectasis ", it is higher than user-defined threshold value, then judge that " bronchiectasis " can be confirmed as approximate judgement, and true item " tuberosity: irregular " can be confirmed as the distinctiveness fact.
In the above example, it is possible to find that two have the approximate judgement that distinctiveness is true:
" lung abscess ": " tuberosity: irregular ", " hydrothorax: exist ".
" bronchiectasis ": " tuberosity: irregular "
According to another aspect of the present invention, it is possible to extract based on minimum change distance and there is the approximate judgement that distinctiveness is true. In this process, each inspection in similar document is judged item, in order to identify which judges that item is approximate judgement.
Fig. 9 illustrates according to the embodiment of the present invention for extracting the flow chart of the process with the true approximate judgement of distinctiveness based on minimum change distance.
As shown in Figure 9, in step 2510, the distance between each document and the input document calculating in first group of similar document can be passed through, calculate the fact that each document in first group of similar document distance, wherein by using the counting of the different true item between two documents, calculate each document in first group of similar document and the distance between input document.
Such as, there are 100 similar documents of existence in first group of similar document of different diagnostic result (second judges item), wherein there is 20 bronchiectasic documents, the document of 35 lung abscess, 15 emophysematous documents, the document of 20 phthisical documents and 10 pneumonia. In one embodiment, it is possible to by count exist compared with the first true item how many different the fact item, calculate the fact that each document distance.
Such as, for emophysematous first document, there are 4 the true items being different from the first true item; For emophysematous second document, there are 2 the true items being different from the first true item; For emophysematous 3rd document, there are 3 true items being different from the first true item etc.
Next, in step 2520, the distance of the fact that by the calculating that uses each document in first group of similar document, calculate each and second judge that item and first judges the change distance between item, calculate each second judgement item distance judging item, wherein be averaged by the distance of the fact that to each document in first group of similar document, calculate each second judge item and first judge between change distance.
Can by one is judged item all documents the fact the distance calculating that is averaged judge item distance. Such as, there are phthisical 20 similar documents, and the fact that step 2510 calculates each document distance, then according to phthisical true distance and divided by 20, calculate phthisical judgement item distance.
In the same way, the distance of the judgement item in the previous example that step 2520 calculates is as follows:
Lung abscess: 1.87
Emphysema: 2.48
Pulmonary tuberculosis: 2.68
��
In this example, it can be seen that lung abscess is the most similar judgement item of input document.
It follows that in step 2530, if second judge item judge that item distance is equal to or less than the 4th predetermined threshold value, it is possible to select this second to judge that item is as approximate judgement. Note, it is possible to defined this threshold value by user according to this experience.
In the above example, the 4th threshold value can be defined as 2. Because the judgement item distance of lung abscess is less than this threshold value, lung abscess therefore will be selected as approximate judgement.
It follows that in step 2540, it is possible to by identify the first true item and this approximate judge the fact item between different true items, extract this approximate distinctiveness fact judged.
Such as, in 35 documents of lung abscess, exist and do not comprise 30 documents of item " tuberosity: irregular " of the fact that as first true, exist and comprise the fact that be not first true 29 documents of item " hydrothorax: exist ". Therefore, true item " tuberosity: irregular " and " hydrothorax: exist " are identified as the distinctiveness fact of lung abscess.
To cause judging that item changes into lung abscess from pulmonary carcinoma consequently, it can be seen that delete true item " tuberosity: irregular " with adding fact item " hydrothorax: exist ", this can be written as:
(<tuberosity: irregular>��<hydrothorax: exist>) �� (pulmonary carcinoma �� lung abscess); Distance=2
Because of the fact that item " tuberosity: irregular " disappears, change distance in this respect can be counted as 1. The fact that additionally, there are new item " hydrothorax: exist ", thus in this respect change distance also countable be 1. Therefore, total distance that changes can be counted as 2.
According to another aspect of the present invention, it is possible to use important path excavates to extract has the approximate judgement that distinctiveness is true.
Figure 10 illustrates the flow chart using important path to extract the process with the true approximate judgement of distinctiveness according to the embodiment of the present invention.
As shown in Figure 10, in step 2610, it is possible to the different true items between item of the fact that by the identifying each document in the first true item and first group of similar document, for what each document described produced candidate, there is the approximate judgement that distinctiveness is true.
For having each document of different judgement item compared with input document, assume initially that this different judgement item judges as candidate is approximate, and assume that item is true as distinctiveness for the fact that all differences. Then, the approximate judgement with the distinctiveness fact of candidate will be produced.
Figure 11 illustrates the schematic diagram of the example with the true approximate judgement of distinctiveness producing candidate. In this example, for this input document, true item (discovery) including: " age: 50 ", " tuberosity: irregular ", " lymph node: enlargement ", " sex: women ", and " shade: exist ", and judges that item (diagnostic result) is " pulmonary carcinoma ".
For this input document, 100 similar documents can be obtained (note, these 100 similar documents make alternatively to obtain, so these documents are uncorrelated with above 143 documents), and the judgement item of 70 similar documents is different from " pulmonary carcinoma ". The judgement item of 20% in these 70 similar documents is bronchiectasis, and 35% is lung abscess, and 15% is emphysema, and 20% is pulmonary tuberculosis, and 10% is pneumonia. For having the relation between the different similar document judging item and input document, it can be written as " (discovery<shade: exist>�� 0) �� (pulmonary carcinoma �� bronchiectasis); Distance=1 ". It means that delete true item " shade: exist ", it is judged that item will be changed into " bronchiectasis " from " pulmonary carcinoma ", and distance is 1 to the fact that input between document and similar document.
It follows that the method that important path will be used to excavate, use the approximate judgement with the distinctiveness fact of candidate to extract and there is the approximate judgement that distinctiveness is true. The detailed step with the approximate judgement that the true approximate judgement extraction of distinctiveness has the distinctiveness fact using candidate is as follows.
In step 2620, it is possible to producing a transfer figure, wherein each endpoint node in this transfer figure is to judge item, and each the non-end node in this transfer figure is true item.
It follows that in step 2630, it is possible to being arranged in this transfer figure by the approximate judgement having distinctiveness true of all candidates, each paths wherein connecting two endpoint nodes in transfer figure indicates a candidate's to have the approximate judgement that distinctiveness is true. In other words, if two nodes are included in the approximate judgement with the distinctiveness fact of a candidate, it is possible to draw the limit between these nodes, therefore in transfer figure, two paths judging item node are connected by producing.
It follows that in step 2640, it is possible to by recording each the limit rate of connections connecting any two node in transfer figure, the importance on the calculating each limit in transfer figure.
In step 2650, identify its importance important limit equal to or more than the 5th predetermined threshold value. In other words, if the importance on a limit reaches predetermined threshold value, this limit will be identified that important limit. Note, it is possible to by empirically determined the 5th predetermined threshold value of user.
It follows that in step 2660, it is possible to producing at least one distinctiveness path, wherein this distinctiveness path is made up of important limit, and this distinctiveness path judges that by second item is connected to the first judgement item.
Figure 12 illustrates the schematic diagram of the example that important path excavates. As shown in figure 12, the endpoint node in transfer figure is " pulmonary carcinoma " and " pulmonary tuberculosis ", and they are to judge item. Non-end node in transfer figure includes: " shade existence ", " hydrothorax: exist ", " lymph node: enlargement " and " tuberosity: irregular ", and they are true items. If two nodes are included in the approximate judgement with the distinctiveness fact of a candidate, then draw the limit between these nodes. Also marked important limit with thick line. Distinctiveness path is from " pulmonary carcinoma " to " lymph node: enlargement " to " shade existence " to " pulmonary tuberculosis ".
Finally, in step 2670, each distinctiveness path is translated to there is the approximate judgement that distinctiveness is true.
In the above example, important path can be translated for:
(finding<lymph node: enlargement>��<shade: exist>) �� (pulmonary carcinoma �� pulmonary tuberculosis); Distance=2
It means that delete true item " lymph node: enlargement " and add true item " shade: exist " and will cause judging that item changes into pulmonary tuberculosis from pulmonary carcinoma. It addition, as it has been described above, changing distance is 2.
Therefore, by process as shown in Figure 10, it is possible to extract and there is the approximate judgement that distinctiveness is true.
According to another aspect of the present invention, it is possible to by changing true item, extract and there is the approximate judgement that distinctiveness is true. In this processes, the fact that each the is different item of input document and similar document will be checked, in order to identify which true item is that distinctiveness is true.
Figure 13 illustrates the flow chart extracting the process with the true approximate judgement of distinctiveness according to the embodiment of the present invention by changing true item.
As shown in figure 13, in step 2710, it is possible to produce candidate and distinguish sexual behavior in fact, wherein generation candidate distinguishes sexual behavior and may include that use is different from the first fact original judgement distinctiveness fact of item appointment candidate of the second true item in fact; Distinctiveness is true to use the second true item being different from the first true item to specify candidate newly to judge, number and candidate that wherein candidate's original judgement distinctiveness is true newly judge that the number sum of the distinctiveness fact is equal to a predetermined number (that is, predetermined change distance).
Such as, traveller may wish to use the current travel directory of Tokyo Tower to search for some similar travelling directory. Each travelling directory comprises some feature of destination, and it is referred to alternatively as the project that user is interested, and destination is the place that user wants to compare. Therefore, destination can be taken as judgement item, and user's project interested can be taken as true item.
In this step, it is possible to retrieval describes the information about destination, the many similar travelling directory of such as price, the time travelling required, travel mode, architectural style etc.
For each destination, it is possible to produce candidate and distinguish sexual behavior in fact.
Such as, current destination is Tokyo Tower, and will pay close attention to shallow grass temple. Different true item between the two destination may include that
Tokyo Tower:<price: 200><building: modern>
Shallow grass temple:<price: 100><building: religion>
Therefore, it can produce candidate and distinguish sexual behavior in fact.
Next, in step 2720, can verify that the candidate in first group of similar document distinguishes sexual behavior real, wherein verify that the candidate in first group of similar document distinguishes the real candidate that comprises that may include that in first group of similar document of identification of sexual behavior and newly judges that the distinctiveness fact but not including that the document that the original judgement distinctiveness of candidate is true, and the judgement item of the document identified is different from the first judgement item; And if judge in item the one of the document identified be concentrate judge item, this candidate distinguishes real being labeled as of sexual behavior have verified that, wherein corresponding to concentrating the document judging item ratio in all documents identified equal to or more than the 6th predetermined threshold value.
Such as, (it is meant that and comprises the fact (building: religion) to distinguish sexual behavior real (<building: the modern times>��<building: religion>) for candidate, but do not comprise the fact (building: the modern times)), it has been found that 10 travelling directories comprise the fact<building: religion>and do not comprise<building: the modern times>; And 9 travelling directories relate to shallow grass temple, its number is more than predetermined threshold value (such as, 60%), and therefore shallow grass temple is to concentrate to judge item. Then checking candidate distinguishes sexual behavior real (<building: the modern times>��<building: religion>), and shallow grass temple is to concentrate to judge item. Noting, this threshold value also rule of thumb can be defined by user.
It follows that in step 2730, it is possible to producing have the approximate judgement that distinctiveness is true, wherein selecting the candidate having verified that to distinguish sexual behavior implementation is that distinctiveness is true; And choice set judging, item is as approximate judgement.
The example with the true approximate judgement of distinctiveness in the example of travelling directory search is as follows.
(1) (<building: modern>��<building: religion>) �� (Tokyo Tower �� shallow grass temple); Distance=2
(2) (<building: modern>��<building: imperial family>) �� (Tokyo Tower �� imperial palace square); Distance=2
(3) (<travel mode: land>��<travel mode: waterborne>) �� (Tokyo Tower �� river, ink field cruise); Distance=2
(4) (<time: in 2 hours>�� 0) �� (Tokyo Tower �� Hylotelephium erythrostictum (Miq.) H.Ohba garden country garden); Distance=1
For project (1), it is meant that deletes item " building: the modern times " of the fact that in input travelling directory, and add true item " building: religion " and will cause judging that item (destination) changes into shallow grass temple from Tokyo Tower, and to change apart from (number of the distinctiveness fact) be 2.
For project (2), it is meant that deletes item " building: the modern times " of the fact that in input travelling directory, and add true item " building: imperial family " and will cause judging that item (destination) changes into imperial palace square from Tokyo Tower, and to change apart from (number of the distinctiveness fact) be 2.
For project (3), it is meant that the item " travel mode: land " of the fact that in deletion input travelling directory, and add true item " travel mode: waterborne " and will cause judging that item (destination) changes into the cruise of ink river, field from Tokyo Tower, and to change apart from (number of the distinctiveness fact) be 2.
For project (4), item " time: in 2 hours " will cause judging that item (destination) changes into Hylotelephium erythrostictum (Miq.) H.Ohba garden country garden from Tokyo Tower to the fact that it is meant that in deletion input travelling directory, and to change apart from (number of the distinctiveness fact) be 1.
Therefore, it can, by using the method shown in Figure 13 to change true item, extract and there is the approximate judgement that distinctiveness is true.
According to another aspect of the present invention, it is possible to use change tree and extract the approximate judgement with the distinctiveness fact. In the method, it is possible to use domain knowledge improves similar document searching.
Figure 14 illustrates and uses the flow chart changing the process that tree extracts the approximate judgement with the distinctiveness fact according to the embodiment of the present invention.
As shown in figure 14, in step 2810, it is possible to obtain about the change tree of input document, wherein this change tree is specific for the structural data of relevant with inputting document group knowledge information, each of which non-end node is a true item, and each endpoint node is one and judges item.
Such as, client may want to decision and buys any photographing unit. Client may think that the current introduction of a type of card photographing unit is good not, and he can search for the photographing unit introduction that some is similar.
In this case, product type is the content that user wants to compare, so product type can be taken as judgement item, and product parameters project can be taken as true item.
In this area, it is possible to there is manual structure or by the structural knowledge of knowledge excavation technology mining. Structural knowledge is called change tree by us. This structural knowledge can be used for organizing Search Results.
Figure 15 illustrates the schematic diagram of the example changing tree. In this example, an endpoint node is " card photographing unit ". Other endpoint node is " compact camera (compactcamera) ", " SLR photographing unit ", " professional camera " and " focal length camera ". Feature about various types of photographing units, i.e. true item constitutes non-end node.
It follows that in step 2820, it is possible to produce that there is the approximate judgement that distinctiveness is true by a paths of two endpoint nodes in selecting the change that link obtains to set.
Such as, for the change tree in Figure 15, we can select rightmost branch. For this branch, we can be translated to the following approximate judgement with the distinctiveness fact:
(parameter<optical zoom: 5 times>�� parameter<optical zoom: 50 times>) ��
(card photographing unit �� telephoto camera); Distance=2
This have the true approximate judgement of distinctiveness be meant that deletion input product introduce in the fact item " optical zoom: 5 times " and add true item " optical zoom: 50 times " and will cause judging that item (product type) changes into telephoto camera from card photographing unit, and to change apart from (number of the distinctiveness fact) be 2.
Therefore, it can extract based on processing shown in Figure 14 that there is the approximate judgement that distinctiveness is true.
Alternatively and/or additionally, extract the approximate judgement that there is distinctiveness true can also include: detect similar distinctiveness true; Merge similar distinctiveness true; The distinctiveness fact adjustment merged is used to have the approximate judgement that distinctiveness is true.
Such as, two true items " tumor size: 3.7cm " and " tumor size: 3.9cm " can be merged into a true item " tumor size: 3.5��4.0cm ". It is then possible to item adjustment has the approximate judgement that distinctiveness is true to the fact that use that this merges.
In one embodiment, it is possible to present the approximate judgement with the distinctiveness fact by exporting all lists with the true approximate judgement of distinctiveness.
In one embodiment, can be presented by following operation and there is the approximate judgement that distinctiveness is true: export it and change the distance approximate judgement that distinctiveness is true that has less than the 7th predetermined threshold value or the approximate judgement with the distinctiveness fact with minimum change distance of output predetermined number. Note, it is possible to by empirically determined the 7th predetermined threshold value of user.
In one embodiment, can being presented by following operation and have the approximate judgement that distinctiveness is true: calculate each coverage rate with the true approximate judgement of distinctiveness, wherein said coverage rate is the document of the approximate judgement coupling true with having distinctiveness ratio in first group of similar document; And export the approximate judgement that distinctiveness is true that has equal to or more than the 8th predetermined threshold value of its coverage rate or the approximate judgement with the distinctiveness fact with maximal cover rate of output predetermined number. Note, it is possible to by empirically determined the 8th predetermined threshold value of user.
In one embodiment, it is possible to by exporting change tree together with there is the approximate judgement that distinctiveness is true, present and there is the approximate judgement that distinctiveness is true.
In one embodiment, it is possible to the fact that present the first judgement item and approximate the fact that judge between the fact difference, wherein said fact difference causes and judges the item change to described approximate judgement from first. By this process, user it is apparent that from first, which true difference judges that item judges the change of item to another by causing, and if document package is containing this fact difference, he can focus more on the document. Such as, because " hydrothorax: exist " is " pulmonary carcinoma " and the essential distinction of " lung abscess ", if true item " hydrothorax: exist " exists, doctor should focus more on it. Doctor can reexamine true item " hydrothorax: exist " to provide and to diagnose accurately. This is the actual search purpose that doctor carries out document searching.
In one embodiment, for each, there is the approximate judgement that distinctiveness is true, it is possible to the sentence true corresponding to original judgement distinctiveness in instruction input document, and also may indicate that in input document new judge the distinctiveness fact. By this process, it is possible to be highlighted the pith in document, this is easy to the reading of user.
It addition, in some document, it is understood that there may be multiple judgement items, for instance patient is likely to have two kinds of diseases simultaneously. In this case, it should detect the fact that judge item about each relation of item, and can to judging that item and true item are classified, in order to obtain a series of judgement item with its fact item. It addition, input document can be taken as the combination of two documents, and for having each different judgement item of its true item, it is possible to extract according to above method and there is the approximate judgement that distinctiveness is true.
Utilize above method, it is provided that the valuable information mated with the actual search purpose of user.
Furthermore it is possible to tissue Search Results, such that it is able to save the time of user's reading documents.
Figure 16 illustrates the flow chart of the method for similar document searching according to the embodiment of the present invention.
As shown in figure 16, in step 3100, it is possible to obtain input document. It follows that in step 3200, it is possible to based on the said method of the present invention, it is determined that at least one of this input document has the approximate judgement that distinctiveness is true. It follows that in step 3300, it is possible to use at least one approximate judgement with the distinctiveness fact described obtains one group of similar document of this input document.
In one embodiment, this input document is the shadowgraph report including findings that item and diagnosis item, and this discovery item is selected as the first true item, and this diagnosis item is selected as the first judgement item.
In one embodiment, this input document is the shell folder including user's project interested and Reiseziel project, and the project that this user is interested is selected as the first true item, and this Reiseziel project is selected as the first judgement item.
In one embodiment, this input document is the product introduction including product parameters project and product type project, and this product parameters project is selected as the first true item, and this product type project is selected as the first judgement item.
Figure 17 illustrates according to the embodiment of the present invention for determining the functional-block diagram of the device 4000 with the true approximate judgement of distinctiveness. Device 4000 as shown in figure 17 can realize the method for determining the approximate judgement with the distinctiveness fact shown in Fig. 4. All functional devices (the various unit that device 4000 includes, no matter being illustrated in the figure or not being illustrated) of device 4000 can be realized, in order to realize principles of the invention by the combination of hardware, software or hardware and software. It will be appreciated by those skilled in the art that the functional device described in Figure 17 can be combined or be divided into sub-block, in order to realize present invention principle as above. Therefore, description herein can be supported arbitrarily possible combination or the decomposition of functional device described herein or further limit.
As shown in figure 17, according to an aspect of the present invention, it is used for determining that the device 4000 with the true approximate judgement of distinctiveness may include that document obtains unit 4100, judges item and true item extraction unit 4200, document analysis unit 4300, similar document analysis unit 4400 and the approximate judgement detection unit 4500 with the distinctiveness fact. Document obtains unit 4100 and is configured to obtain document, and the document package wherein obtained judges item containing first, and first judges that item is the key word of predefined type. Judge that item and true item extraction unit 4200 are configured to extract and judge item and true item. Document analysis unit 4300 is configured so that described judgement item and true item extraction unit, extracts first and judge item and the first true item from the document obtained, and the true item of each of which first is to judge, with first, the information that item is associated. Similar document analysis unit 4400 is configured so that the first judgement item and the first true item obtain first group of similar document, and use described judgement item and true item extraction unit 4200 extract from first group of similar document judge from first item different second judge item and the second fact item. There is the approximate of the distinctiveness fact and judge that detection unit 4500 is configured to pass use first group of similar document and second and judges that item and the second true item detect at least one approximate judgement with the distinctiveness fact. Described distinctiveness fact instruction first judges item and the second difference judging between item. Described approximate judgement is in the second judgement item, and described approximate judgement and described first judges that the change distance between item is less than predetermined first threshold, and wherein said change distance instruction is distinguished first and judged that item and second judges the difficulty level of item.
In one embodiment, it is judged that item and true item extraction unit 4200 can also include: for extracting the keyword extracting unit of key word from document; For identifying the judgement item recognition unit of described judgement item from the key word extracted, and it is used for the fact that select remaining key word as true item to select unit.
In one embodiment, described judgement item recognition unit can also include using at least one in lower unit: for from judging item entry territory that selection key word is as the unit judging item; For according to predetermined selection of configuration key word as the unit judging item; And for being selected key word as the unit judging item by user.
In one embodiment, can also include as the unit judging item according to predetermined selection of configuration key word: for selecting the unit of key word in sentence, judgement that wherein said sentence expression is subjective and/or objectively result.
In one embodiment, there is true approximate of distinctiveness and judge that detection unit 4500 can also include: original judgement distinctiveness fact extraction unit, for second judging that item extracts the original judgement distinctiveness fact for each; Newly judge distinctiveness fact extraction unit, for second judging that item extracts for each and newly judge the distinctiveness fact; Changing metrics calculation unit, the original judgement distinctiveness for using extraction is true and newly judges that the distinctiveness fact calculates each and second judges that item and first judges the change distance between item; And first approximate judge generation unit, judge that item generation has the approximate judgement of the distinctiveness fact for using it to change distance less than the second of predetermined first threshold.
In one embodiment, described original judgement distinctiveness fact extraction unit can also include: target fact item selects unit, for selecting one in the first true item as target fact item; Sensitivity computing unit, for calculating the sensitivity of target fact item, comprising: second group of similar document obtains unit, is used for using the described first true item by deleting described target fact item, it is thus achieved that second group of similar document; 3rd judges item and a 3rd true extraction unit, for extract from second group of similar document judge from first item different the 3rd judge item and the 3rd fact item; And sensitivity computation subunit, for by using the 3rd to judge, item distribution in second group of similar document judges that with second item distribution in first group of similar document calculates described sensitivity; And the original judgement distinctiveness fact selects unit, if for the sensitivity that calculates equal to or more than predetermined Second Threshold, selecting described target fact item true as original judgement distinctiveness.
In one embodiment, newly judge that distinctiveness fact extraction unit can also include: correlation calculations unit, for passing through to use the 3rd true item to judge item appearance ratio in second group of similar document with the corresponding the 3rd, calculate the dependency of each the 3rd true item; And newly judge that the distinctiveness fact selects unit, if the dependency for the 3rd true item equals to or more than the 3rd predetermined threshold value, select the 3rd true item as newly judging that distinctiveness is true.
In one embodiment, the approximate judgement detection unit 4500 having distinctiveness true can also include: true metrics calculation unit, for passing through to calculate the distance between each document and the document of acquisition in first group of similar document, calculate the fact that each document in first group of similar document distance, wherein the counting of the different true item by being used between two documents, calculates the distance between each document and the document of acquisition in first group of similar document; Judge item metrics calculation unit, the distance of the fact that for passing through the calculating using each document in first group of similar document, calculate each and second judge that item and first judges the change distance between item, calculate each second judgement item distance judging item, wherein be averaged by the distance of the fact that to each document in first group of similar document, calculate each second judge item and first judge between change distance; Second judges that item selects unit, if for second judge item judge that item distance is equal to or less than the 4th predetermined threshold value, selection second judges that item is as approximate judgement; And distinctiveness fact extraction unit, it being used for the different fact items between item of the fact that by the true item of identification first and the described approximate judgement, extraction is for the distinctiveness fact of described approximate judgement.
In one embodiment, the approximate judgement detection unit 4500 having distinctiveness true can also include: candidate is approximate judges generation unit, the different true items between item of the fact that for the passing through to identify each document in the first true item and first group of similar document, be each document in first group of similar document produce candidate there is the approximate judgement that distinctiveness is true; It is similar to and judges extraction unit, for using candidate's to have the approximate approximate judgement judging that extraction has the distinctiveness fact that distinctiveness is true, comprising: for the transfer figure generation unit producing transfer figure, wherein each endpoint node in transfer figure is to judge item, and each the non-end node in transfer figure is true item; Candidate is approximate judges arrangement unit, and for being arranged in transfer figure by the approximate judgement having distinctiveness true of all candidates, each paths wherein connecting two endpoint nodes in transfer figure indicates a candidate's to have the approximate judgement that distinctiveness is true; Importance computing unit, for the rate of connections by recording each the limit connecting any two node in transfer figure, the importance on each limit in calculating transfer figure; Important limit recognition unit, for identifying its importance important limit equal to or more than the 5th predetermined threshold value; Distinctiveness path generation unit, is used for producing at least one distinctiveness path, and wherein distinctiveness path is made up of important limit, and distinctiveness path judges that by second item is connected to the first judgement item; And translation unit, for being translated in each distinctiveness path, there is the approximate judgement that distinctiveness is true.
In one embodiment, the approximate judgement detection unit 4500 having distinctiveness true can also include: the candidate distinguishing sexual behavior real for producing candidate distinguishes sexual behavior reality generation unit, comprising: candidate's original judgement distinctiveness fact designating unit, for using the first true item different from the second true item to specify the original judgement distinctiveness of candidate true; Candidate newly judges distinctiveness fact designating unit, for using the second true item different from the first fact item to specify candidate newly to judge, distinctiveness is true, and number and candidate that wherein the original judgement distinctiveness of candidate is true newly judge that the number sum of the distinctiveness fact is equal to predetermined number; Candidate distinguishes sexual behavior reality authentication unit, real for verifying that the candidate in first group of similar document distinguishes sexual behavior, including: document identification unit, for identifying that the candidate that comprises in first group of similar document newly judges that but distinctiveness is true does not comprise the document that the original judgement distinctiveness of candidate is true, and the judgement item of the document identified is different from the first judgement item; And candidate distinguishes sexual behavior reality indexing unit, if of document for identifying judges that item is to concentrate to judge item, candidate distinguishes real being labeled as of sexual behavior have verified that, wherein corresponding to concentrating the document judging item ratio in all documents identified equal to or more than the 6th predetermined threshold value; And second approximate judge generation unit, for producing there is the approximate judgement that distinctiveness is true, distinguish including: the candidate having verified that sexual behavior is real selects unit, be the distinctiveness fact for selecting the candidate having verified that to distinguish sexual behavior implementation; Judge that item selects unit with concentrating, for choice set judging, item is as approximate judgement.
In one embodiment, the approximate judgement detection unit 4500 having distinctiveness true can also include: changes tree and obtains unit, for obtaining the change tree about the document obtained, wherein change tree and be specific for the structural data of relevant to the document obtained group knowledge information, each of which non-end node is true item, and each endpoint node is to judge item; And the 3rd approximate judge generation unit, for the paths by selecting two endpoint nodes changed in tree that link obtains, produce the approximate judgement with the distinctiveness fact.
In one embodiment, the approximate judgement detection unit 4500 having distinctiveness true can also include: the similar distinctiveness fact detection unit true for detecting similar distinctiveness; Unit is merged for merging the true similar distinctiveness fact of similar distinctiveness; And approximate judge adjustment unit, and true for the distinctiveness using merging, adjust the approximate judgement with the distinctiveness fact.
In one embodiment, for determining that the device 4000 with the true approximate judgement of distinctiveness can also include first and approximate judge display unit, for passing through to export all lists with the true approximate judgement of distinctiveness, present and there is the approximate judgement that distinctiveness is true.
In one embodiment, for determining that the device 4000 with the true approximate judgement of distinctiveness can also include second and approximate judge display unit, for being presented by following operation, there is the approximate judgement that distinctiveness is true: export it and change the distance approximate judgement that distinctiveness is true that has less than the 7th predetermined threshold value or the approximate judgement with the distinctiveness fact with minimum change distance of output predetermined number.
In one embodiment, for determining that the device 4000 with the true approximate judgement of distinctiveness could be included for presenting and has the 3rd of the true approximate judgement of distinctiveness and approximate judge display unit, it also includes: coverage rate computing unit, for calculating each coverage rate with the true approximate judgement of distinctiveness, wherein coverage rate is the document of the approximate judgement coupling true with having distinctiveness ratio in first group of similar document; And approximate judge output unit, for exporting the approximate judgement that distinctiveness is true that has equal to or more than the 8th predetermined threshold value of its coverage rate, or the approximate judgement with the distinctiveness fact with maximal cover rate of output predetermined number.
In one embodiment, for determining that the device 4000 with the true approximate judgement of distinctiveness can also include the 4th and approximate judge display unit, for by exporting change tree together with there is the approximate judgement that distinctiveness is true, presenting and there is the approximate judgement that distinctiveness is true.
In one embodiment, for determining that the device 4000 with the true approximate judgement of distinctiveness can also include the fact that difference display unit, the fact that for presenting the first judgement item and approximate the fact that judge between the fact difference, wherein said fact difference causes and judges the item change to approximate judgement from first.
In one embodiment, for determining that the device 4000 with the true approximate judgement of distinctiveness can also include indicating member, for there is the approximate judgement that distinctiveness is true for each, the sentence true corresponding to original judgement distinctiveness in the document that obtains of instruction, and indicate in the document obtained new judge the distinctiveness fact.
Figure 18 illustrates the functional-block diagram of the device 5000 for similar document searching according to the embodiment of the present invention. Device 5000 shown in Figure 18 can realize the method for similar document searching shown in Figure 16. All functional devices (the various unit that device 5000 includes, no matter being illustrated in the figure or not being illustrated) of device 5000 can be realized, in order to realize principles of the invention by the combination of hardware, software or hardware and software. It will be appreciated by those skilled in the art that the functional device described in Figure 18 can be combined or be divided into sub-block, in order to realize present invention principle as above. Therefore, description herein can be supported arbitrarily possible combination or the decomposition of functional device described herein or further limit.
As shown in figure 18, according to an aspect of the present invention, the device 5000 for similar document searching may include that input Document Creator unit 5100, for determining the device 4000 with the true approximate judgement of distinctiveness, obtains unit 5200 with similar document. Input Document Creator unit 5100 is configured to receive input document. For determining that the device 4000 with the true approximate judgement of distinctiveness is configured to determine that at least one inputting document has the approximate judgement that distinctiveness is true. Similar document obtains unit 5200 and is configured so that at least one has the approximate judgement that distinctiveness is true, it is thus achieved that for one group of similar document of described input document.
In one embodiment, this input document is the shadowgraph report including findings that item and diagnosis item, and described discovery item is selected as the first true item, and described diagnosis item is selected as the first judgement item.
In one embodiment, this input document is the shell folder including user's project interested and Reiseziel project, the project that described user is interested is selected as the first true item, and described Reiseziel project is selected as the first judgement item.
In one embodiment, this input document is the product introduction including product parameters project and product type project, and described product parameters project is selected as the first true item, and described product type project is selected as the first judgement item.
It addition, according to another aspect of the present invention, it is provided that for determining the device with the true approximate judgement of distinctiveness. This device can be realized in the computer system 1000 shown in Fig. 3. Described device can include processor and store the memorizer having instruction on it, when described instruction is when executed by, processor is made to perform following operation: obtaining document, the document package wherein obtained judges item containing first, and first judges that item is the key word of predefined type; Extracting first from the document obtained and judge item and the first true item, the true item of each of which first is to judge, with first, the information that item is associated; Item and the first true item obtain first group of similar document to use first to judge, and extract from first group of similar document judge from first item different second judge item and the second fact item; By using first group of similar document and second to judge item and the second true item, detecting at least one and have the approximate judgement that distinctiveness is true, wherein distinctiveness fact instruction first judges item and the second difference judging between item; Described approximate judgement is in the second judgement item, and approximate judgement and first judges that the change distance between item is less than predetermined first threshold, and wherein said change distance instruction is distinguished first and judged that item and second judges the difficulty level of item.
In one embodiment, from document, extract true item and judge that item can also include: from document, extracting key word; Judge item with identifying from the key word extracted, and select all the other key words as true item.
In one embodiment, identification judges that item can also include at least one in the following: from judging that item entry territory, selection key word is as judging item; According to predetermined selection of configuration key word as judging item; And selected key word as judging item by user.
In one embodiment, select key word as judging that item can also include according to predetermined configurations: to select the key word in sentence, judgement that wherein said sentence expression is subjective and/or objective result.
In one embodiment, detect at least one there is true approximate of distinctiveness to judge to may include that and second judge that item extracts the original judgement distinctiveness fact into each; Second judge that item extracts for each and newly judge the distinctiveness fact; Use the original judgement distinctiveness extracted true and newly judge the distinctiveness fact, calculating each and second judge that item and first judges the change distance between item; And use change distance to judge that item produces to have the approximate judgement that distinctiveness is true less than the second of predetermined first threshold.
In one embodiment, extract the original judgement distinctiveness fact to include: select in the first true item as target fact item; Calculate the sensitivity of target fact item, including: use the first true item by deleting described target fact item, it is thus achieved that second group of similar document; Extract from second group of similar document judge from first item different the 3rd judge item and the 3rd fact item; And by using the 3rd to judge that item distribution in second group of similar document judges item distribution in first group of similar document with second, calculate described sensitivity; And if the sensitivity calculated is equal to or more than predetermined Second Threshold, select described target fact item true as original judgement distinctiveness.
In one embodiment, extract and newly judge that the distinctiveness fact includes: by using the 3rd true item to judge item appearance ratio in second group of similar document with the corresponding the 3rd, calculate the dependency of each the 3rd true item; And if the dependency of the 3rd true item is equal to or more than the 3rd predetermined threshold value, select the 3rd true item as newly judging that distinctiveness is true.
In one embodiment, detect at least one approximate judgement that there is distinctiveness true and may include that by calculating the distance between each document and the document obtained in first group of similar document, calculate the fact that each document in first group of similar document distance, wherein by using the counting of the different true item between two documents, calculate the distance between each document and the document obtained in first group of similar document; The distance of the fact that by the calculating that uses each document in first group of similar document, second judge that item and first judges the change distance between item by calculating each, calculate each second judgement item distance judging item, wherein be averaged by the distance of the fact that to each document in first group of similar document, calculate each second judge item and first judge between change distance; If second judge item judge that item distance is equal to or less than the 4th predetermined threshold value, then select second to judge that item is as approximate judgement; And by identify the first true item and described approximate judge the fact item between different true items, extract the distinctiveness for described approximate judgement true.
In one embodiment, detect at least one and there are the different true items that the true approximate judgement of distinctiveness may include that the fact that by identifying each document in the first true item and first group of similar document between item, be each document in first group of similar document produce candidate there is the approximate judgement that distinctiveness is true; What use candidate has the approximate judgement that distinctiveness is true, extracts and has the approximate judgement that distinctiveness is true, including: producing transfer figure, wherein each endpoint node in transfer figure is to judge item, and to shift each the non-end node in figure be fact item; Being arranged in transfer figure by the approximate judgement having distinctiveness true of all candidates, each paths wherein connecting two endpoint nodes in transfer figure indicates a candidate's to have the approximate judgement that distinctiveness is true; By recording the rate of connections on each the limit connecting any two node in transfer figure, calculate the importance on each the limit shifted in figure; Identify its importance important limit equal to or more than the 5th predetermined threshold value; Producing at least one distinctiveness path, wherein said distinctiveness path is made up of important limit, and described distinctiveness path judges that by second item is connected to the first judgement item; And each distinctiveness path is translated to the approximate judgement with the distinctiveness fact.
In one embodiment, detect at least one approximate judgement with the distinctiveness fact and may include that generation candidate distinguishes sexual behavior in fact, including: use first fact item different from the second true item, it is intended that the original judgement distinctiveness of candidate is true; The second true item that use is different from the first true item, it is intended that candidate newly judges that distinctiveness is true, number and candidate that wherein candidate's original judgement distinctiveness is true newly judge that the number sum of the distinctiveness fact is equal to predetermined number; Verify that the candidate in first group of similar document distinguishes sexual behavior real, including: the candidate that comprises identified in first group of similar document newly judges that distinctiveness is true, but do not comprise the document that the original judgement distinctiveness of candidate is true, and the judgement item of the document identified and first judges that item is different; And if the one of the document identified judge that item is to concentrate to judge item, candidate being distinguished real being labeled as of sexual behavior and has verified that, wherein judge that the document of item ratio in all documents identified equals to or more than the 6th predetermined threshold value corresponding to this concentration; And produce that there is the approximate judgement that distinctiveness is true, and including: it is that distinctiveness is true that the candidate having verified that described in selection distinguishes sexual behavior implementation; And select described concentration to judge that item is as approximate judgement.
In one embodiment, detect at least one approximate judgement with the distinctiveness fact and may include that the acquisition change tree about the document obtained, wherein change tree and be specific for the structural data of relevant to the document obtained group knowledge information, each of which non-end node is true item, and each endpoint node is to judge item; And the paths by two endpoint nodes in selecting the change that link obtains to set, produce that there is the approximate judgement that distinctiveness is true.
In one embodiment, detect at least one approximate judgement that there is distinctiveness true can also include: detect similar distinctiveness true; Merge similar distinctiveness true; The distinctiveness fact adjustment merged is used to have the approximate judgement that distinctiveness is true.
In one embodiment, described memorizer also includes the instruction of storage on it, when described instruction is when executed by so that processor performs following operation: by exporting all lists with the true approximate judgement of distinctiveness, present and there is the approximate judgement that distinctiveness is true.
In one embodiment, described memorizer also includes the instruction of storage on it, when described instruction is when executed by, processor is made to perform following operation: to change the distance approximate judgement with the distinctiveness fact less than the 7th predetermined threshold value by exporting it, or the approximate judgement with the distinctiveness fact with minimum change distance of output predetermined number, presents and has the approximate judgement that distinctiveness is true.
In one embodiment, described memorizer also includes the instruction of storage on it, when described instruction is when executed by, making processor perform following operation: by calculating each coverage rate with the true approximate judgement of distinctiveness, wherein coverage rate is the ratio with the described approximate document judging to mate with the distinctiveness fact in first group of similar document; And by exporting, there is the approximate judgement that its coverage rate is true equal to or more than the distinctiveness of the 8th predetermined threshold value, or by exporting the approximate judgement with the distinctiveness fact with maximal cover rate of predetermined number, present and there is the approximate judgement that distinctiveness is true.
In one embodiment, described memorizer also includes the instruction of storage on it, when described instruction is when executed by so that processor performs following operation: by exporting change tree together with there is the approximate judgement that distinctiveness is true, present and there is the approximate judgement that distinctiveness is true.
In one embodiment, described memorizer also includes the instruction of storage on it, when described instruction is when executed by, make processor perform following operation: the fact that present the first judgement item and described approximate the fact that judge between the fact difference, wherein said true difference causes and judges the item change to described approximate judgement from first.
In one embodiment, described memorizer also includes the instruction of storage on it, when described instruction is when executed by, processor is made to perform following operation: to have, for each, the approximate judgement that distinctiveness is true, the sentence true corresponding to original judgement distinctiveness in the document that obtains of instruction, and new in the document that obtains of instruction judge the distinctiveness fact.
It addition, according to another aspect of the present invention, it is provided that a kind of device for similar document searching. Described device can include processor and store the memorizer having instruction on it, when described instruction is when executed by so that processor performs following operation: receives and inputs document; Determine that at least one of described input document has the approximate judgement that distinctiveness is true based on said method; And at least one has the approximate judgement that distinctiveness is true described in using, it is thus achieved that one group of similar document of described input document.
In one embodiment, this input document is the shadowgraph report including findings that item and diagnosis item, and described discovery item is selected as the first true item, and described diagnosis item is selected as the first judgement item.
In one embodiment, this input document is the shell folder including user's project interested and Reiseziel project, the project that described user is interested is selected as the first true item, and described Reiseziel project is selected as the first judgement item.
In one embodiment, this input document is the product introduction including product parameters project and product type project, and described product parameters project is selected as the first true item, and described product type project is selected as the first judgement item.
Noting, those skilled in the art are it will be clearly understood that the embodiment in the application can at random be combined.
It is likely to be achieved in many ways the method and system of the present invention. Such as, can by software, hardware, firmware or software, hardware, firmware any combination realize the method and system of the present invention. For the said sequence of step of described method merely to illustrate, the step of the method for the present invention is not limited to order described in detail above, unless specifically stated otherwise. Additionally, in certain embodiments, can being also record program in the recording medium by the invention process, these programs include the machine readable instructions for realizing the method according to the invention. Thus, the present invention also covers the record medium of the storage program for performing the method according to the invention.
Although some specific embodiments of the present invention being described in detail already by example, but it should be appreciated by those skilled in the art, above example is merely to illustrate, rather than in order to limit the scope of the present invention. It should be appreciated by those skilled in the art, can without departing from the scope and spirit of the present invention, above example be modified. The scope of the present invention be defined by the appended claims.

Claims (44)

1. for determining a method with the true approximate judgement of distinctiveness, including:
A) document obtains step, is used for obtaining document, and the document package wherein obtained judges item containing first, and described first judges that item is the key word of predefined type;
B) document analysis step, extracts first from the document obtained and judges item and the first true item, and the true item of each of which first is to judge, with first, the information that item is associated;
C) similar document analysis step, is used for using the first judgement item and the first true item to obtain first group of similar document, and is different from the second of the first judgement item judges item and the second fact item for extracting from first group of similar document;
D) there is the approximate judgement detecting step that distinctiveness is true, for by using first group of similar document and second to judge item and the second true item, detecting at least one and there is the approximate judgement that distinctiveness is true, wherein:
Described distinctiveness fact instruction first judges item and the second difference judging between item; And
Described approximate judgement is in the second judgement item, and described approximate judgement and described first judges that the change distance between item is less than predetermined first threshold, and wherein said change distance instruction is distinguished first and judged that item and second judges the difficulty level of item.
2. the method for claim 1, wherein extracts true item from document and judges that item also includes:
Key word is extracted from described document; And
From the key word extracted, identify described judgement item, and select remaining key word as described true item.
3. method as claimed in claim 2, wherein identifies that described judgement item also includes at least one in the following:
From judging that item entry territory, selection key word is as judging item;
According to predetermined selection of configuration key word as judging item; And
Selected key word as judging item by user.
4. method as claimed in claim 3, wherein according to predetermined selection of configuration key word as judging that item also includes:
Key word in selection sentence, the judgement of wherein said sentence expression subjectivity and/or objective result.
5. the method for claim 1, the wherein said approximate judgement detecting step having distinctiveness true includes:
1) for each, second to judge that item extracts original judgement distinctiveness true;
2) second judge that item extracts for each and newly judge the distinctiveness fact;
3) use the original judgement distinctiveness extracted true and newly judge the distinctiveness fact, calculating each and second judge that item and first judges the change distance between item; And
4) use it to change distance and judge item less than the second of predetermined first threshold, there is described in generation the approximate judgement that distinctiveness is true.
6. method as claimed in claim 5, wherein extracts the original judgement distinctiveness fact and includes:
Select in the first true item as target fact item;
Calculate the sensitivity of described target fact item, including:
The first true item is used to obtain second group of similar document by deleting described target fact item;
Extract from second group of similar document judge from first item different the 3rd judge item and the 3rd fact item; And
By using the 3rd to judge that item distribution in second group of similar document judges item distribution in first group of similar document to calculate described sensitivity with second; And
If computed sensitivity is equal to or more than predetermined Second Threshold, select described target fact item true as described original judgement distinctiveness.
7. method as claimed in claim 6, wherein extracts and newly judges that the distinctiveness fact includes:
By using the 3rd true item to judge item appearance ratio in second group of similar document with the corresponding the 3rd, calculate the dependency of each the 3rd true item; And
If the dependency of the 3rd true item is equal to or more than the 3rd predetermined threshold value, select the 3rd true item true as described new judgement distinctiveness.
8. the method for claim 1, the wherein said approximate judgement detecting step having distinctiveness true includes:
1) by calculating the distance between each document and the document obtained in first group of similar document, calculate the fact that each document in first group of similar document distance, wherein by using the counting of the different true item between two documents, calculate the distance between each document in first group of similar document and obtained document;
2) the fact that the calculating using each document in first group of similar document distance is passed through, second judge that item and first judges the change distance between item by calculating each, calculate each second judgement item distance judging item, wherein be averaged by the distance of the fact that to each document in first group of similar document, calculate each second judge item and first judge between change distance;
3) if second judge item judge that item distance is equal to or less than the 4th predetermined threshold value, select second to judge that item is as described approximate judgement; And
4) passing through to identify the fact that the first true item and the described approximate judgement different fact items between item, the distinctiveness extracting described approximate judgement is true.
9. the method for claim 1, the wherein said approximate judgement detecting step having distinctiveness true includes:
1) passing through to identify the fact that each document in the first true item and first group of similar document different true items between item, the candidate of generation each document described has the approximate judgement that distinctiveness is true;
2) what use described candidate has the approximate approximate judgement judging to have the distinctiveness fact described in extraction that distinctiveness is true, including:
Producing transfer figure, each endpoint node in wherein said transfer figure is to judge item, and each the non-end node in described transfer figure is true item;
The approximate judgement having distinctiveness true of all candidates is arranged in described transfer figure, wherein connects the approximate judgement with the distinctiveness fact of each paths one candidate of instruction of two endpoint nodes in described transfer figure;
By recording the rate of connections on each the limit connecting any two node in described transfer figure, calculate the importance on each limit in described transfer figure;
Identify its importance important limit equal to or more than the 5th predetermined threshold value;
Producing at least one distinctiveness path, wherein said distinctiveness path is made up of important limit, and described distinctiveness path judges that by second item is connected to the first judgement item; And
There is described in being translated in each distinctiveness path the approximate judgement that distinctiveness is true.
10. the method for claim 1, the wherein said approximate judgement detecting step having distinctiveness true includes:
1) produce candidate and distinguish sexual behavior in fact, including:
Use the first true item different from the second true item, it is intended that the original judgement distinctiveness of candidate is true;
Use the true item of second different with the first true item, it is intended that candidate newly judges that distinctiveness is true, that number that wherein candidate's original judgement distinctiveness is true and candidate newly judge the true number of distinctiveness and be equal to predetermined number;
2) verify that the candidate in first group of similar document distinguishes sexual behavior real, including:
The described candidate that comprises identified in first group of similar document newly judges that but distinctiveness is true does not comprise the document that the original judgement distinctiveness of described candidate is true, and the judgement item of the document identified is different from the first judgement item; And
If the one of the document identified judges that item is to concentrate to judge item, described candidate is distinguished real being labeled as of sexual behavior and has verified that, wherein concentrate the document judging item ratio in all documents identified equal to or more than the 6th predetermined threshold value corresponding to described; And
3) there is the approximate judgement that distinctiveness is true described in generation, including:
Selecting the candidate having verified that to distinguish sexual behavior implementation is that described distinctiveness is true; And
Described concentration is selected to judge that item is as described approximate judgement.
11. the method for claim 1, the wherein said approximate judgement detecting step having distinctiveness true includes:
1) obtaining the change tree about the document obtained, wherein said change sets the structural data being specific for relevant to the document obtained group knowledge information, and each of which non-end node is true item, and each endpoint node is to judge item; And
2) by a paths of two endpoint nodes in selecting the change that link obtains to set, there is described in generation the approximate judgement that distinctiveness is true.
12. the method as described in any one in claim 5 to 11, the wherein said approximate judgement detecting step having distinctiveness true also includes:
Detect similar distinctiveness true;
Merge similar distinctiveness true;
Use the distinctiveness merged true, adjust and there is the approximate judgement that distinctiveness is true.
13. the method for claim 1, also include: by exporting all lists with the true approximate judgement of distinctiveness, there is described in presenting the approximate judgement that distinctiveness is true.
14. the method for claim 1, also include by following operation present described in there is the approximate judgement that distinctiveness is true:
Export it and change the distance approximate judgement that distinctiveness is true that has less than the 7th predetermined threshold value, or
The approximate judgement with the distinctiveness fact with minimum change distance of output predetermined number.
15. the method for claim 1, also include by following operation present described in there is the approximate judgement that distinctiveness is true:
Calculating each coverage rate with the true approximate judgement of distinctiveness, wherein said coverage rate is the ratio with the described approximate document judging to mate with the distinctiveness fact in first group of similar document; And
Export the approximate judgement that distinctiveness is true that has equal to or more than the 8th predetermined threshold value of its coverage rate or the approximate judgement with the distinctiveness fact with maximal cover rate of output predetermined number.
16. method as claimed in claim 11, also include: set by exporting described change together with the described approximate judgement with the distinctiveness fact, there is described in presenting the approximate judgement that distinctiveness is true.
17. the method for claim 1, the fact that also include presenting the first judgement item and described approximate the fact that judge between the fact difference, wherein said true difference causes and judges the item change to described approximate judgement from first.
18. the method as described in any one in claim 5 to 7, also include: for each, there is the approximate judgement that distinctiveness is true, the sentence true corresponding to described original judgement distinctiveness in the document that instruction obtains, and indicate the described new judgement distinctiveness in the document obtained true.
19. for a method for similar document searching, including:
A) input document is received;
B) based on the method described in any one in claim 1 to 18, it is determined that at least one of described input document has the approximate judgement that distinctiveness is true; And
C) described in use, at least one has the approximate judgement that distinctiveness is true, it is thus achieved that one group of similar document of described input document.
20. method as claimed in claim 19, wherein
Described input document is the shadowgraph report including findings that item and diagnosis item, and described discovery item is selected as the first true item, and described diagnosis item is selected as the first judgement item.
21. method as claimed in claim 19, wherein
Described input document is the shell folder including user's project interested and Reiseziel project, and the project that described user is interested is selected as the first true item, and described Reiseziel project is selected as the first judgement item.
22. method as claimed in claim 19, wherein
Described input document is the product introduction including product parameters project and product type project, and described product parameters project is selected as the first true item, and described product type project is selected as the first judgement item.
23. for determining a device with the true approximate judgement of distinctiveness, including:
A) document obtains unit, is used for obtaining document, and the document package wherein obtained judges item containing first, and first judges that item is the key word of predefined type;
B) judge item and true item extraction unit, be used for extracting and judge item and fact item;
C) document analysis unit, is used for using described judgement item and true item extraction unit, extracts first and judge item and the first true item from the document obtained, and the true item of each of which first is to judge, with first, the information that item is associated;
D) similar document analysis unit, for using the first judgement item and the first true item to obtain first group of similar document, and judge item and the second fact item for using described judgement item and true item extraction unit to extract from first group of similar document to be different from the second of the first judgement item;
E) there is the approximate judgement detection unit that distinctiveness is true, for by using first group of similar document and second to judge item and the second true item, detecting at least one and there is the approximate judgement that distinctiveness is true, wherein:
Described distinctiveness fact instruction first judges item and the second difference judging between item; And
Described approximate judgement is in the second judgement item, and described approximate judgement and described first judges that the change distance between item is less than predetermined first threshold, and wherein said change distance instruction is distinguished first and judged that item and second judges the difficulty level of item.
24. device as claimed in claim 23, wherein said judgement item and true item extraction unit also include:
Keyword extracting unit, for extracting key word from described document;
Judge item recognition unit, for identifying described judgement item from the key word extracted; And
True selection unit, is used for selecting remaining key word as described true item.
25. device as claimed in claim 24, described judgement item recognition unit also includes with at least one in lower unit:
For from judging that in item entry territory, selection key word is as the unit judging item;
For according to predetermined selection of configuration key word as the unit judging item; And
For being selected key word as the unit judging item by user.
26. device as claimed in claim 25, wherein for also including as the unit judging item according to predetermined selection of configuration key word:
For selecting the unit of the key word in sentence, judgement that wherein said sentence expression is subjective and/or objectively result.
27. device as claimed in claim 23, the wherein said approximate judgement detection unit having distinctiveness true also includes:
1) original judgement distinctiveness fact extraction unit, for for each second judge item extract original judgement distinctiveness the fact;
2) newly distinctiveness fact extraction unit is judged, for second judging that item extracts for each and newly judge the distinctiveness fact;
3) change metrics calculation unit, for using the original judgement distinctiveness extracted true and newly judging the distinctiveness fact, calculate each and second judge that item and first judges the change distance between item; And
4) first approximate judges generation unit, judges item for using it to change distance less than the second of predetermined first threshold, has the approximate judgement of the distinctiveness fact described in generation.
28. device as claimed in claim 27, wherein said original judgement distinctiveness fact extraction unit also includes:
Target fact item selects unit, for selecting one in the first true item as target fact item;
Sensitivity computing unit, for calculating the sensitivity of described target fact item, including:
Second group of similar document obtains unit, is used for using the first true item by deleting described target fact item, it is thus achieved that second group of similar document;
3rd judges item and a 3rd true extraction unit, for extract from second group of similar document judge from first item different the 3rd judge item and the 3rd fact item; And
Sensitivity computation subunit, for by using the 3rd to judge that item distribution in second group of similar document judges item distribution in first group of similar document to calculate described sensitivity with second; And
The original judgement distinctiveness fact selects unit, if for computed sensitivity equal to or more than predetermined Second Threshold, selects described target fact item true as described original judgement distinctiveness.
29. device as claimed in claim 28, wherein said new judgement distinctiveness fact extraction unit also includes:
Correlation calculations unit, for by using the 3rd true item to judge item appearance ratio in second group of similar document with the corresponding the 3rd, calculating the dependency of each the 3rd true item; And
Newly judging that the distinctiveness fact selects unit, if the dependency for the 3rd true item equals to or more than the 3rd predetermined threshold value, the true item of selection the 3rd is true as described new judgement distinctiveness.
30. device as claimed in claim 23, the wherein said approximate judgement detection unit having distinctiveness true also includes:
1) true metrics calculation unit, for passing through to calculate the distance between each document and the document obtained in first group of similar document, calculate the fact that each document in first group of similar document distance, wherein by using the counting of the different true item between two documents, calculate the distance between each document in first group of similar document and obtained document;
2) item metrics calculation unit is judged, the distance of the fact that for passing through the calculating using each document in first group of similar document, second judge that item and first judges the change distance between item by calculating each, calculate each second judgement item distance judging item, wherein be averaged by the distance of the fact that to each document in first group of similar document, calculate each second judge item and first judge between change distance;
3) second judges that item selects unit, if for second judge item judge that item distance is equal to or less than the 4th predetermined threshold value, selection second judges that item is as described approximate judgement; And
4) distinctiveness fact extraction unit, is used for the different fact items between item of the fact that by the true item of identification first and the described approximate judgement, extracts the distinctiveness fact of described approximate judgement.
31. device as claimed in claim 23, the wherein said approximate judgement detection unit having distinctiveness true also includes:
1) candidate is approximate judges generation unit, for the different true items between item of the fact that by the identifying each document in the first true item and first group of similar document, produces the approximate judgement with the distinctiveness fact of the candidate of each document described;
2) approximate judge extraction unit, for using described candidate's to have the approximate judgement that distinctiveness is true, there is described in extraction the approximate judgement of the distinctiveness fact, including:
Transfer figure generation unit, is used for producing transfer figure, and each endpoint node in wherein said transfer figure is to judge item, and each the non-end node in described transfer figure is true item;
Candidate is approximate judges arrangement unit, for the approximate judgement that there is distinctiveness true of all candidates being arranged in described transfer figure, wherein connect the approximate judgement with the distinctiveness fact of each paths one candidate of instruction of two endpoint nodes in described transfer figure;
Importance computing unit, for the rate of connections by recording each the limit connecting any two node in described transfer figure, the importance on the calculating each limit in described transfer figure;
Important limit recognition unit, for identifying its importance important limit equal to or more than the 5th predetermined threshold value;
Distinctiveness path generation unit, is used for producing at least one distinctiveness path, and wherein said distinctiveness path is made up of important limit, and described distinctiveness path judges that by second item is connected to the first judgement item; And
Translation unit, has, for being translated in each distinctiveness path, the approximate judgement that distinctiveness is true.
32. device as claimed in claim 23, the wherein said approximate judgement detection unit having distinctiveness true also includes:
1) candidate distinguishes sexual behavior reality generation unit, is used for producing candidate and distinguishes sexual behavior in fact, including:
Candidate's original judgement distinctiveness fact designating unit, for using the first true item being different from the second true item to specify the original judgement distinctiveness of candidate true;
Candidate newly judges distinctiveness fact designating unit, for using the second true item different from the first true item, distinctiveness is true to specify candidate newly to judge, number that wherein candidate's original judgement distinctiveness is true and candidate newly judge the true number of distinctiveness and equal to predetermined number;
2) candidate distinguishes sexual behavior reality authentication unit, real for verifying that the candidate in first group of similar document distinguishes sexual behavior, including:
Document identification unit, for identifying that the described candidate that comprises in first group of similar document newly judges that but distinctiveness is true does not comprise the document that the original judgement distinctiveness of described candidate is true, and the judgement item of the document identified is different from the first judgement item; And
Candidate distinguishes sexual behavior reality indexing unit, if of document for identifying judges that item is to concentrate to judge item, described candidate distinguishes real being labeled as of sexual behavior have verified that, wherein concentrate the document judging item ratio in all documents identified equal to or more than the 6th predetermined threshold value corresponding to described; And
3) second approximate judges generation unit, for having the approximate judgement of the distinctiveness fact described in producing, including:
The candidate having verified that distinguishes sexual behavior and selects unit in fact, is that described distinctiveness is true for selecting the candidate having verified that to distinguish sexual behavior implementation; And
Concentrate and judge that item selects unit, be used for selecting described concentration to judge that item is as described approximate judgement.
33. device as claimed in claim 23, the wherein said approximate judgement detection unit having distinctiveness true also includes:
1) change tree and obtain unit, for obtaining the change tree about the document obtained, wherein said change sets the structural data being specific for relevant to the document obtained group knowledge information, and each of which non-end node is true item, and each endpoint node is to judge item; And
2) the 3rd approximate judges generation unit, for the paths by selecting two endpoint nodes changed in tree that link obtains, has the approximate judgement of the distinctiveness fact described in generation.
34. the device as described in any one in claim 27 to 33, the wherein said approximate judgement detection unit having distinctiveness true also includes:
Similar distinctiveness fact detection unit, is used for detecting similar distinctiveness true;
The similar distinctiveness fact merges unit, is used for merging similar distinctiveness true;
It is similar to and judges adjustment unit, true for the distinctiveness using merging, adjust and there is the approximate judgement that distinctiveness is true.
35. device as claimed in claim 23, also include first and approximate judge display unit, for by exporting all lists with the true approximate judgement of distinctiveness, there is described in presenting the approximate judgement of the distinctiveness fact.
36. device as claimed in claim 23, also include second and approximate judge display unit, for having the approximate judgement of the distinctiveness fact described in being presented by following operation:
Export it and change the distance approximate judgement that distinctiveness is true that has less than the 7th predetermined threshold value, or
The approximate judgement with the distinctiveness fact with minimum change distance of output predetermined number.
37. device as claimed in claim 23, also including the 3rd approximate judging display unit for what have the true approximate judgement of distinctiveness described in presenting, it also includes:
Coverage rate computing unit, for calculating each coverage rate with the true approximate judgement of distinctiveness, wherein said coverage rate is the document of the approximate judgement coupling true with having distinctiveness ratio in first group of similar document; And
It is similar to and judges output unit, for exporting the approximate judgement that distinctiveness is true that has equal to or more than the 8th predetermined threshold value of its coverage rate, or the approximate judgement with the distinctiveness fact with maximal cover rate of output predetermined number.
38. device as claimed in claim 33, also include the 4th and approximate judge display unit, for described changing tree by exporting together with there is the approximate judgement that distinctiveness is true, there is described in presenting the approximate judgement of the distinctiveness fact.
39. device as claimed in claim 23, also include the fact that difference display unit, the fact that for presenting the first judgement item and described approximate the fact that judge between the fact difference, wherein said true difference causes and judges the item change to described approximate judgement from first.
40. the device as described in any one in claim 27 to 29, also include indicating member, for there is the approximate judgement that distinctiveness is true for each, the sentence true corresponding to described original judgement distinctiveness in the document that instruction obtains, and indicate the described new judgement distinctiveness in the document obtained true.
41. for a device for similar document searching, including:
A) input Document Creator unit, is used for receiving input document;
B) according to any one in claim 23 to 40 for determining the device with the true approximate judgement of distinctiveness, at least one determining described input document, there is the approximate judgement that distinctiveness is true; And
C) similar document obtains unit, and described in using, at least one has the approximate judgement that distinctiveness is true, it is thus achieved that one group of similar document of described input document.
42. device as claimed in claim 41, wherein
Described input document is the shadowgraph report including findings that item and diagnosis item, and described discovery item is selected as the first true item, and described diagnosis item is selected as the first judgement item.
43. device as claimed in claim 41, wherein
Described input document is the shell folder including user's project interested and Reiseziel project, and the project that described user is interested is selected as the first true item, and described Reiseziel project is selected as the first judgement item.
44. device as claimed in claim 41, wherein
Described input document is the product introduction including product parameters project and product type project, and described product parameters project is selected as the first true item, and described product type project is selected as the first judgement item.
CN201410587566.9A 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true Active CN105630788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410587566.9A CN105630788B (en) 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410587566.9A CN105630788B (en) 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true

Publications (2)

Publication Number Publication Date
CN105630788A true CN105630788A (en) 2016-06-01
CN105630788B CN105630788B (en) 2019-05-03

Family

ID=56045742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410587566.9A Active CN105630788B (en) 2014-10-28 2014-10-28 Method and apparatus for determining the approximate judgement for having distinctiveness true

Country Status (1)

Country Link
CN (1) CN105630788B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362735A (en) * 2019-07-15 2019-10-22 北京百度网讯科技有限公司 Method and device for judging the authenticity of a statement, electronic device, readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070198530A1 (en) * 2006-02-17 2007-08-23 Fujitsu Limited Reputation information processing program, method, and apparatus
CN101567011A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Document processing device and document processing method
CN103294671A (en) * 2012-02-22 2013-09-11 腾讯科技(深圳)有限公司 Document detection method and system
CN103903164A (en) * 2014-03-25 2014-07-02 华南理工大学 Semi-supervised automatic aspect extraction method and system based on domain information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070198530A1 (en) * 2006-02-17 2007-08-23 Fujitsu Limited Reputation information processing program, method, and apparatus
CN101567011A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Document processing device and document processing method
CN103294671A (en) * 2012-02-22 2013-09-11 腾讯科技(深圳)有限公司 Document detection method and system
CN103903164A (en) * 2014-03-25 2014-07-02 华南理工大学 Semi-supervised automatic aspect extraction method and system based on domain information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362735A (en) * 2019-07-15 2019-10-22 北京百度网讯科技有限公司 Method and device for judging the authenticity of a statement, electronic device, readable medium
CN110362735B (en) * 2019-07-15 2022-05-13 北京百度网讯科技有限公司 Method and device for judging the authenticity of a statement, electronic device, readable medium

Also Published As

Publication number Publication date
CN105630788B (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN104756106B (en) Data source in characterize data storage system
CN105488196B (en) A kind of hot topic automatic mining system based on interconnection corpus
CN105760495B (en) A kind of knowledge based map carries out exploratory searching method for bug problem
CN105843850B (en) Search optimization method and device
CN104573130B (en) The entity resolution method and device calculated based on colony
Gipp et al. Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus
CN101566997A (en) Determining words related to given set of words
JP2007249584A (en) Client database creation method, data retrieval method, data retrieval system, data retrieval filtering system, client database creation program, data retreival program, data retreival filtering program, and computer-readable recording medium storing program or equipment recording program
JP2015532495A (en) System and method for presenting and navigating network data sets
CN106709037A (en) Movie recommendation method based on heterogeneous information network
Singh et al. Revisiting subject classification in academic databases: A comparison of the classification accuracy of web of science, scopus & dimensions
Ioannakis et al. RETRIEVAL—an online performance evaluation tool for information retrieval methods
KR101011726B1 (en) Apparatus and method for providing snippet
Zigkolis et al. Collaborative event annotation in tagged photo collections
Grandjean et al. Translating networks: assessing correspondence between network visualisation and analytics
Alobaid et al. Typology-based semantic labeling of numeric tabular data
JP5500070B2 (en) Data classification system, data classification method, and data classification program
CN113139096B (en) Video dataset labeling method and device
Kocak et al. NEgatiVE results in Radiomics research (NEVER): A meta-research study of publication bias in leading radiology journals
Chamberlain et al. Scalable visualisation of sentiment and stance
JP2007164633A (en) Content retrieval method, system thereof, and program thereof
KR20190023503A (en) Image based patent search apparatus
CN105630788A (en) Method and device for determining approximate judgment with distinctive truth
CN110471835B (en) Similarity detection method and system based on code files of power information system
Bianchi et al. Exploring the potentialities of automatic extraction of university webometric information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant