CN105630788B - Method and apparatus for determining the approximate judgement for having distinctiveness true - Google Patents
Method and apparatus for determining the approximate judgement for having distinctiveness true Download PDFInfo
- Publication number
- CN105630788B CN105630788B CN201410587566.9A CN201410587566A CN105630788B CN 105630788 B CN105630788 B CN 105630788B CN 201410587566 A CN201410587566 A CN 201410587566A CN 105630788 B CN105630788 B CN 105630788B
- Authority
- CN
- China
- Prior art keywords
- item
- true
- distinctiveness
- document
- judgement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention relates to the method and apparatus for determining the approximate judgement for having distinctiveness true.The described method includes: obtaining document, wherein document obtained judges item comprising first, and first judges that item is the keyword of predefined type;First is extracted from document obtained and judge item and the first true item, and wherein each first fact item is to judge the associated information of item with first;Judge that item and the first true item obtain first group similar to document using first, and is different from first similar to extraction in document from first group and judges that the second of item judges item and the second true item;And item and the second true item are judged similar to document and second by using first group, detect at least one approximate judgement true with distinctiveness.
Description
Technical field
The present invention relates to the search of similar document, particularly, are related to the past creation for being similar to current input document
The search of document.
Background technique
User always needs to judge or determine using certain documents at hand, for example, doctor can be by reference to
Certain existing diagnosis reports provide diagnostic result, and traveller can be used shell folder selection and go which or customer that can lead to
It crosses reference product introduction and determines which product bought.User can be by using document as current document searching class, to help
It helps and judges, and have a look in the case of similar, be made that judgement in the past or determine.
For example, for an input document, can be determined most similar with the input document in similar document searching processing
Document as output result.
In US2013/0044925, similar case retrieval device and similar case retrieval method are proposed.In the patent
In the method for application, judge that item is the keyword of a predefined type, is that user wants the kernel keyword determined.True item
It is the information of certain specified types associated with the judgement item.For the application about diagnosis report, selection diagnoses item, such as
Disease outcome or illness result, which are used as, judges item, and selects discovery item as true item.In the method, according to diagnosis item
It is used to scan for the diagnostic tree of discovery item creation.
Figure 1A shows the method for similar case retrieval in the patent application US2013/0044925 of the prior art
Flow chart.An input document is received in step 110 with reference to Figure 1A.In step 120, extract input document judgement item and
True item.In step 130, the fact that judge item and extract one group of similar document of item retrieval of the extraction of input document is used.
The fact that Figure 1B shows the judgement item of the extraction of the input document of the use in US2013/0044925 and extracts
Retrieve the flow chart of the processing of one group of similar document.With reference to Figure 1B, in step 131, the relationship for judging item and true item is extracted.
Then, in step 132, selection judges item and true item to establish diagnostic tree based on extracted relationship.Finally, in step
133, some similar documents are retrieved in document database using diagnostic tree.
Fig. 1 C shows the schematic diagram of the diagnostic tree in the patent application US2013/0044925 of the prior art.Using
The method of US2013/0044925 can be used diagnostic tree as shown in Figure 1 C and be retrieved from document database similar to input text
The document of shelves.
In patent US 8,352,416, another similar method for searching for similar document is proposed.The U.S.
Patent relates generally to diagnosis report search, and is scanned for using the structure being made of diagnostic result and discovery item.For example, frequency
The symptom occurred together numerously and a disease may be constructed a structure.If a pervious document in document database
With input document structure having the same, then the document is likely to be retrieved.
Fig. 2A shows the flow chart of the method for similar document searching in patent US 8,352,416.With reference to figure
2A receives input document in step 210.In step 220, the judgement item and true item of input document are extracted.In step 230, make
With the fact that judge item and extract one group of similar document of item retrieval of the extraction of input document.
The fact that Fig. 2 B shows the judgement item of the extraction of the input document of the use in patent US 8,352,416 and extracts
The flow chart of the processing of item one group of similar document of retrieval.With reference to Fig. 2 B, in step 231, the pass for judging item and true item is extracted
System.Then, in step 232, select the judgement item with predetermined relationship type with true item as a structure.Finally, in step
Rapid 233, use some similar documents in the structure retrieval document database.
Fig. 2 C shows the schematic diagram of structure used in prior art US 8,352,416.In the structure of Fig. 2 C, show
The counting of semantic primitive and semantic primitive is gone out, semantic primitive includes the title of the disease of description and the diagnosis of symptom.According to this
It counts, can extract including desired crucial contamination, and can also be extracted from the combination of extraction except desired
Entry other than keyword is as relevant keyword.It can retrieve including in desired keyword and relevant keyword
One or both diagnosis report.Using the method for US 8,352,416, can be retrieved from document database be similar to it is defeated
Enter the document of document.
US2013/0044925 in US 8,352,416 similar document search method and the prior art it is other
In method, keyword is extracted from input document, and the then relationship between analysis of key word, to find comprising having class
As relationship similar keyword similar document.In the prior art method, a knot of the document is simply shown
Fruit, but do not account for the true purpose that user scans for.
The search of similar document is different from the search using inquiry.If user utilizes query search document, inquiry can be with
Reflect the purpose of user and the aspect of user's concern.However, he/her is still when user is with a similar document of document searching
It is so solely focused on some aspect, and this aspect is the judgement item of the document.
Using the method for the prior art, only a series of document can be returned to user.As a result main to include and input text
The identical judgement item of shelves, cannot provide the user with the different certain similar documents for judging item.If user wants ratio
Compared with item is judged, he/her needs to read many documents, this is time-consuming.
Substantially with the judgement item judged in item and input document in the search result of method in the prior art retrieval
It is identical.Return have it is identical judgement item document be necessary, but return have it is different judgement items similar documents more added with
With.For example, doctor provides diagnostic result in report.Returning has very similar discovery item but has different diagnosis
As a result report is useful.For example, patient's Index for examination having the same and identical patient symptom, but have different
The report of disease is useful.This, which will provide him/her in this case Xiang doctor, should carefully make the signal of interest of diagnosis.
Therefore, it is intended that proposing the new technology of at least one of solution problem of the prior art.
Summary of the invention
It is an object of the present invention to provide the valuable information of the actual search purpose of matching user.
Another object of the present invention is that the time of user's reading documents is saved by tissue search result.
According to an aspect of the invention, there is provided a kind of for determining the side for the approximate judgement for having distinctiveness true
Method, comprising: document obtains step, and for obtaining document, wherein the document obtained judges item comprising first, and first judges item
It is the keyword of predefined type;Document analysis step extracts first from the document obtained and judges item and the first true item, wherein
Each first true item is to judge the associated information of item with first;Similar document analysis step, for using the first judgement
Item obtains first group of similar document with the first true item, and is used for from first group similar to extraction in document different from the first judgement
The second of item judges item and the second true item;Detecting step is judged with the true approximation of distinctiveness, for by using first
The similar document of group and second judges item and the second true item, detects at least one approximate judgement true with distinctiveness, in which:
Distinctiveness fact instruction first judges that item and second judges difference between item;And approximation judgement is one of second judgement item,
And the approximate judgement and described first judges that the change distance between item is less than scheduled first threshold, wherein the change
Distance instruction distinguishes first and judges that item and second judges the difficulty level of item.
According to another aspect of the present invention, a kind of method for similar document searching is provided, comprising: receive input
Document;Based on the above-mentioned method for determining the approximate judgement for having distinctiveness true, at least the one of the input document is determined
A approximate judgement true with distinctiveness;And it using at least one described approximate judgement true with distinctiveness, obtains
One group of the input document is similar to document.
According to a further aspect of the invention, it provides a kind of for determining the dress for the approximate judgement for having distinctiveness true
It sets, comprising: document obtaining unit, for obtaining document, wherein the document obtained judges item comprising first, and first judges item
It is the keyword of predefined type;Judge item and true item extraction unit, judges item and true item for extracting;Document analysis list
Member, for using judging that item and true item extraction unit extract first from the document obtained and judge item and the first fact item, wherein
Each first true item is to judge the associated information of item with first;Similar document analysis unit, for using the first judgement
Item obtains first group of similar document with the first true item, and judges item and true item extraction unit from first group of class for using
Judge that the second of item judges item and the second true item different from first like extracting in document;With the approximate judgement that distinctiveness is true
Detection unit, for judging item and the second true item similar to document and second by using first group, detecting at least one has
The true approximate judgement of distinctiveness, in which: distinctiveness fact instruction first judges item and second judges difference between item;And
Approximation judgement is one in the second judgement item, and approximate judgement and first judges that the change between item is scheduled apart from being less than
First threshold, wherein change distance instruction distinguishes first and judges that item and second judges the difficulty level of item.
According to a further aspect of the invention, a kind of device for similar document searching is provided, comprising: for receiving
Input the input Document Creator unit of document;The above-mentioned device for the determining approximate judgement for having distinctiveness true, is used for true
Surely at least one approximate judgement true with distinctiveness of document is inputted;And similar document obtaining unit, for using institute
At least one approximate judgement true with distinctiveness is stated, obtains one group for inputting document similar to document.
One of the advantages of the present invention is that the valuable information of the actual search purpose of matching user can be provided.
A further advantage is that search result can be organized, so as to save user's reading documents when
Between.
By referring to the drawings to the detailed description of exemplary embodiments of the present invention, other feature of the invention and
Its advantage will become apparent.
Detailed description of the invention
The attached drawing being included in the description and forms part of the description describes embodiments of the present invention, and even
With specification together principle for explaining the present invention.
Figure 1A shows the flow chart of the method for similar case retrieval in prior art US2013/0044925.
The fact that Figure 1B shows the judgement item of the extraction of the input document of the use in US2013/0044925 and extracts
Retrieve the flow chart of one group of processing similar to document.
Fig. 1 C shows the schematic diagram of the diagnostic tree in prior art US2013/0044925.
Fig. 2A shows the flow chart of the method for the similar document searching in patent US 8,352,416.
The fact that Fig. 2 B shows the judgement item of the extraction of the input document of the use in patent US 8,352,416 and extracts
The flow chart of processing of one group of the item retrieval similar to document.
Fig. 2 C shows the schematic diagram of structure used in patent US 8,352,416.
Fig. 3 is the schematic frame for showing the hardware configuration for the computer system 1000 that can implement embodiments of the present invention
Figure.
Fig. 4 show embodiment according to the present invention for determine have the distinctiveness true the approximate processing judged
Flow chart.
Fig. 5 shows the example of radiography report.
Fig. 6, which shows embodiment according to the present invention, by traversing true item and extracting there is the approximation of the distinctiveness fact to sentence
The flow chart of disconnected processing.
Fig. 7 shows embodiment according to the present invention for second judging the original judgement distinctiveness of item extraction for each
The flow chart of true processing.
Fig. 8 show embodiment according to the present invention for for each second judge item extract newly judge distinguish sexual behavior
The flow chart of real processing.
Fig. 9, which shows embodiment according to the present invention, has the distinctiveness fact for extracting based on minimum change distance
The flow chart of the processing of approximation judgement.
Figure 10 is shown embodiment according to the present invention and extracted using important path, and there is the approximation of the distinctiveness fact to sentence
The flow chart of disconnected processing.
Figure 11 shows the schematic diagram for generating the example of the candidate approximate judgement true with distinctiveness.
Figure 12 shows the schematic diagram of the example of important path excavation.
Figure 13 shows embodiment according to the present invention and extracts the approximation with the distinctiveness fact by changing true item
The flow chart of the processing of judgement.
Figure 14 shows embodiment according to the present invention and extracts the approximate judgement for having distinctiveness true using tree is changed
Processing flow chart.
Figure 15 shows the schematic diagram for changing the example of tree.
Figure 16 shows the flow chart of the method for similar document searching of embodiment according to the present invention.
Figure 17 shows embodiment according to the present invention for determining the device for the approximate judgement for having distinctiveness true
4000 functional block diagram.
Figure 18 shows functional block diagram of the embodiment according to the present invention for the device 5000 of similar document searching.
Specific embodiment
Carry out the various illustrative embodiments of detailed description of the present invention now with reference to attached drawing.It should also be noted that unless in addition
It illustrates, the component and the positioned opposite of step, numerical expression and numerical value otherwise illustrated in these embodiments is unlimited
The scope of the present invention processed.
Be to the description only actually of at least one illustrative embodiments below it is illustrative, never as to this hair
Bright and its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as authorizing part of specification.
It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without
It is as limitation.Therefore, the other examples of illustrative embodiments can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
Fig. 3 is the schematic frame for showing the hardware configuration for the computer system 1000 that can implement embodiments of the present invention
Figure.Method of the invention can be implemented on the hardware of computer system 1000.
As shown in Figure 3, computer system includes computer 1110.Computer 1110 includes connecting via system bus 1121
Processing unit 1120, system storage 1130, the fixed non-volatile memory interface 1140, removable non-volatile memories connect
Device interface 1150, user input interface 1160, network interface 1170, video interface 1190 and peripheral interface 1195.
System storage 1130 includes ROM (read-only memory) 1131 and RAM (random access memory) 1132.BIOS
(basic input output system) 1133 resides in ROM 1131.Operating system 1134, application program 1135, other program modules
1136 and certain program datas 1137 reside in RAM 1132.
The fixed non-volatile memory 1141 of such as hard disk etc is connected to fixed non-volatile memory interface 1140.
Fixed non-volatile memory 1141 for example can store an operating system 1144, application program 1145, other program modules 1146
With certain program datas 1147.
The removable non-volatile memory of such as floppy disk drive 1151 and CD-ROM drive 1155 etc is connected to
Removable non-volatile memory interface 1150.For example, diskette 1 152 can be inserted into floppy disk drive 1151 and CD
(CD) 1156 can be inserted into CD-ROM drive 1155.
The input equipment of such as mouse 1161 and keyboard 1162 etc is connected to user input interface 1160.
Computer 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170
Remote computer 1180 can be connected to via local area network 1171.Alternatively, network interface 1170 may be coupled to modem
(modulator-demodulator) 1172 and modem 1172 are connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 may include the memory 1181 of such as hard disk etc, store remote application
1185。
Video interface 1190 is connected to monitor 1191.
Peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.
Computer system shown in Fig. 3 be merely illustrative and be never intended to invention, its application, or uses into
Row any restrictions.
Computer system shown in Fig. 3 can be incorporated in any embodiment, can be used as stand-alone computer, or can also
As the processing system in equipment, one or more unnecessary components can be removed, one or more can also be added to it
Additional component.
Fig. 4 show embodiment according to the present invention for determine have the distinctiveness true the approximate processing judged
Flow chart.
As shown in figure 4, obtaining input document in step 2100.In this application, the type for inputting document may include,
But it is not limited to, radiography report, shell folder or product introduction.
Herein, we select a radiography report as an example.Fig. 5 shows the example of radiography report.
The document may include the certain keywords that can be classified as judge item and true item.Input the keyword in document
The first judgement item and the first true item can be referred to as.In one embodiment, judge that item can be the key of predefined type
Word.
Next, extracting first from the document obtained in step 2200 and judging item and the first true item.
There is some methods for extracting keyword from document using existing NLP technology, such as so-called entity is known
Not, subject distillation and keyword extraction.After extracting keyword from input document, it is important that identify which keyword is to sentence
Disconnected item.
Radiography is reported, document segment information can be used to select to judge item.For example, being reported in radiography
In, it can choose the keyword conduct in " diagnosis " part and judge item, and the document paragraph information is to judge that item entry domain is believed
Breath.For example, judging that item entry domain can be diagnostic result, product type, destination etc..
Alternatively and/or additionally, item can be judged according to scheduled configuration or rule selection.In a kind of embodiment
In, select keyword as judging that item can also include: the keyword selected in sentence according to predetermined configurations, wherein sentence expression
Subjective judgement and/or objective result.For example, if the subjective meaning that judges item of the context interpretation of a keyword
Think, then can choose keyword conduct and judge item.
Alternatively and/or additionally, user, which oneself can define keyword and be used as, judges item.For example, being searched in doctor
Before rope, he can choose the certain keywords that will be highlighted, and select the disease in these keywords as judgement
, and select other information or symptom (finding) as true item.Each true item is letter associated with item is judged
Breath.
In one embodiment, from extracting true item in document and judge that item can also include: to extract pass from document
Keyword;And identification judges item from the keyword of extraction, and selects remaining keyword as true item.
In one embodiment, keyword can be extracted from document by least one in following operation: used
Dictionary storage including judging item and true item;Use document layout information;And it uses and is instructed by ready training data
Experienced extraction model.
Table 1 shows the example for judging item and true item of radiography report.
Table 1: the example for judging item and true item of radiography report
Next, judge that item and the first true item obtain first group similar to document using first in step 2300, and from
First group is different from first similar to extraction in document and judges that the second of item judges item and the second true item.
It exists in the prior art many for judging that item and the first true item obtain first group similar to document using first
Known method.In one embodiment, first group can be directly inputted by user similar to document.Alternatively, inspection can be passed through
Rope obtains first group similar to document.In addition, it is, for example, possible to use the methods of United States Patent (USP) No.US 8,352,416.For table 1
Example, first group is 143 similar documents shown in table 2 similar to document, and wherein document is by according to item label judge, to show
The distribution in these documents is judged out.In addition, being extracted from first group of similar document using method identical with step 2200
Judge item and true item.First group can be referred to as the second judgement item and the second true item similar to the keyword in document.
2 first groups of the table examples similar to document
Next, judging item and the second true item, inspection similar to document and second by using first group in step 2400
Survey at least one approximate judgement true with distinctiveness, in which: distinctiveness fact instruction first judges that item and second judges item
Between difference;Approximation judgement is one in the second judgement item, and approximate judgement and first judge change between item away from
From scheduled first threshold is less than, wherein change distance instruction distinguishes first and judges that item and second judges the difficulty grade of item
Not.Note that scheduled first threshold can rule of thumb be defined by user.
Key point of the invention is to be found to have the true approximate judgement of distinctiveness.But true approximate judgement and area
The result that other sexual behavior may not obtain in fact with the present invention is coincide.The present invention is not intended to find really approximate judgement, this is because
It obtains really approximate judgement and needs very deep domain knowledge, and is extremely difficult for people.For example, doctor is difficult really
It is fixed for distinguishing the core difference symptom of two similar diseases, core difference symptom depending on the age of patient, gender, position and
Medical history.
In the present invention, document analysis technology is only used only from first group similar to the close of the current input document of detection in document
It is true like judgement and distinctiveness.In this case, it is assumed that implying that age of patient, property in the form of keyword in a document
Not, position and medical history.
Next, will describe to judge item and the second true item similar to document and second by using first group, detection is extremely
The few one detailed processing with the true approximate judgement of distinctiveness.
According to an aspect of the present invention, the approximate of the distinctiveness fact can judges by traversing true item detection.
In this process, each true item that will check input document, to identify which true item is the distinctiveness fact.
Fig. 6, which shows embodiment according to the present invention, by traversing true item and extracting there is the approximation of the distinctiveness fact to sentence
The flow chart of disconnected processing.
With reference to Fig. 6, in step 2410, second judge that item extracts the original judgement distinctiveness fact for each.
Fig. 7 shows embodiment according to the present invention for second judging the original judgement distinctiveness of item extraction for each
The flow chart of true processing.
As shown in fig. 7, one can choose in the first true item is used as target fact item in step 2411.Next,
In step 2412, the susceptibility of target fact item is calculated.
In one embodiment, the susceptibility for calculating target fact item may include: using the first true item by deleting
Except target fact item, second group is obtained similar to document;It is different from the first third for judging item from second group similar to extraction in document
Judge item and third fact item;And judge that item judges item similar to the distribution and second in document at second group by using third
At first group similar to the distribution in document, susceptibility is calculated.
For example, for the example of table 1 true item " tubercle: irregular " can be deleted from the first true item.Then make
It is scanned for the fact that remaining, 178 documents can be obtained.Compared with first group comprising 143 documents is similar to document
Compared with additional there are 35 as a result, they are defined as second group similar to document.It is similar literary that table 3 shows the second (additional) group
The example of shelves.Document in table 3 is judged that item is marked according to third, to show the distribution judged in these documents.
3 second groups of the table examples similar to document
If the fact that delete is that the distinctiveness of " lung cancer " (the first judgement) is true, the additional result will include it
Its diagnostic result;Otherwise, which will still include judgement " lung cancer ".Whether the fact that in order to check deletion is difference sexual behavior
It is real, additional 35 results will be used (second group similar to document).
Susceptibility of the true item " tubercle: irregular " relative to " lung cancer " can be calculated as follows.
Susceptibility=(third judges item at second group similar to the distribution in document)/(second judges that item is similar at first group
Distribution in document)
For example, there is the diagnostic result for being different from the three types that first judges item in 143 results (that is, second sentences
Disconnected item, comprising: bronchiectasis, lung running sore and pulmonary emphysema), and exist in 35 results and judge item different from first
Two kinds of diagnostic result (that is, third judges item, comprising: bronchiectasis and lung running sore).
Susceptibility=(60%+35%)/(15%+5%+10%)
Referring back to Fig. 7, in step 2413, if the susceptibility calculated is equal to or more than scheduled second threshold,
It is true as original judgement distinctiveness to can choose target fact item.Note that this can be defined according to their experience by user
Threshold value.
Referring back to Fig. 6, if detecting that an original judgement distinctiveness is true, we will further check second group of class
Like document, to determine whether there is other approximate judgements.This is carried out in step 2420, wherein second judging item for each
It extracts and newly judges the distinctiveness fact.
Fig. 8 show embodiment according to the present invention for for each second judge item extract newly judge distinguish sexual behavior
The flow chart of real processing.
As shown in figure 8, judging that item is similar at second group with corresponding third by using third fact item in step 2421
Appearance ratio in document calculates the correlation of each third fact item.
For example, judging item " lung running sore " for third, third fact item can be extracted, and is not included in input document
A third fact item be " pleural effusion: existing ".To check the fact item whether with judge that item " lung running sore " is highly relevant.
In one embodiment, correlation will be calculated.
For example, to be related to the document of " lung running sore " similar to there are 12 in document at second group, and 11 tools in them
There is true item " pleural effusion: existing ", so correlation=11/12.
Next, in step 2422, if the correlation of third fact item is equal to or more than scheduled third threshold value,
Third fact item is then selected to judge the distinctiveness fact as new.
In the above example, because the correlation of " pleural effusion: existing " be greater than predetermined threshold (for example, 80%, can
Rule of thumb defined by user), therefore select facts item " pleural effusion: existing " judges the distinctiveness fact as new.
Next, referring back to Fig. 6, in step 2430, the original judgement distinctiveness fact of extraction can be used and newly sentence
Disconnected distinctiveness is true, calculates each and second judges that item and first judges change distance between item.
In the above example, first judges that item is " lung cancer ", and second judges that one in item is " lung running sore ", can incite somebody to action
Change distance to be calculated as the number of the original judgement distinctiveness fact and newly judge the sum of the number of the distinctiveness fact.
Next, its second of change distance less than scheduled first threshold can be used and judge that item produces in step 2440
The raw approximate judgement for having distinctiveness true.
Alternatively, the approximation with the distinctiveness fact with minimum change distance that can also generate predetermined number is sentenced
It is disconnected.
In addition, if can then use second group of " multiplicity being removed similar to document there is no the distinctiveness fact is newly judged
Property " result detects approximate judgement." diversity being removed " refers to that the judgement of similar document is multiplicity and has multiple.By
The distinctiveness fact is judged in joined some, reducing the diversity of judgement, then the judgement being removed, is exactly approximate judgement.
The diversity being removed can be simply set as diagnosis most in different diagnosis.For example, if second group similar in document
60% be related to " bronchiectasis ", be higher than user-defined threshold value, then judge that " bronchiectasis " can be confirmed as approximation
Judgement, and true item " tubercle: irregular " can be confirmed as the distinctiveness fact.
In the above example, it can be found that two approximate judgements true with distinctiveness:
" lung running sore ": " tubercle: irregular ", " pleural effusion: existing ".
" bronchiectasis ": " tubercle: irregular "
According to another aspect of the present invention, being extracted based on minimum change distance, there is the approximation of the distinctiveness fact to sentence
It is disconnected.In this process, it will check that each of similar document judges item, to identify which judges that item is approximate judgement.
Fig. 9, which shows embodiment according to the present invention, has the distinctiveness fact for extracting based on minimum change distance
The flow chart of the processing of approximation judgement.
As shown in figure 9, in step 2510, it can be by calculating first group similar to each of document document and input text
Shelves the distance between, calculate first group similar to each of document document the fact distance, wherein by using two documents
Between the true items of difference counting, calculate first group similar to the distance between each of document document and input document.
For example, with first group of different diagnostic results (second judge item) similar to there are 100 in document similar to documents,
Wherein there are 20 bronchiectasic documents, the document of 35 lung running sores, the document of 15 pulmonary emphysema, 20 phthisical texts
The document of shelves and 10 pneumonia.In one embodiment, there can be how many differences compared with the first true item by counting
The fact, the distance of the fact that calculate each document.
For example, there are 4 true items different from the first true item for the first document of pulmonary emphysema;For pulmonary emphysema
The second document, there are 2 true items different from the first true item;For the third document of pulmonary emphysema, exist different from
3 true items etc. of one true item.
Next, in step 2520, by using the fact that first group of calculating similar to each of document document away from
From calculating each and second judge that item and first judges change distance between item, second judge sentencing for item to calculate each
Disconnected item distance, wherein by first group similar to each of document document the fact distance be averaged, calculate each
Second judges the change distance between item and the first judgement.
Can all documents by judging one item the fact distance carry out average computation and judge item distance.For example,
There are phthisical 20 similar to documents, and the distance the fact step 2510 calculates each document, then according to lung knot
The fact that core distance sum divided by 20, calculate phthisical judgement item distance.
In the same way, the judgement item distance in the upper surface of step 2520 calculating example is as follows:
Lung running sore: 1.87
Pulmonary emphysema: 2.48
Pulmonary tuberculosis: 2.68
…
In this example, it can be seen that lung running sore is the most similar judgement item for inputting document.
Next, in step 2530, if second judges that the judgement item distance of item is equal to or less than scheduled 4th threshold
Value can choose this and second judge that item judges as approximate.Note that the threshold value can be defined according to the experience by user.
In the above example, the 4th threshold value can be defined as 2.Because the judgement item distance of lung running sore is less than the threshold value,
Therefore will select lung running sore as approximate judgement.
Next, in step 2540, it can be by identifying between the fact that the first true item and the approximation judge not
With fact item, the distinctiveness for extracting approximation judgement is true.
For example, existing not including and " tubercle: not advised as the fact that first true in 35 documents of lung running sore
30 documents then ", exist include the fact that be not first true " pleural effusion: existing " 29 documents.Therefore,
The distinctiveness that true item " tubercle: irregular " and " pleural effusion: existing " are identified as lung running sore is true.
It will consequently, it can be seen that deleting true item " tubercle: irregular " and adding true item " pleural effusion: existing "
Cause to judge that item changes into lung running sore from lung cancer, this can be written as:
(<tubercle: irregular>→<pleural effusion: existing>) → (lung cancer → lung running sore);Distance=2
Because of the fact that item " tubercle: irregular " disappears, change distance in this respect can be counted as 1.In addition, depositing
In new true item " pleural effusion: existing ", thus in this respect change distance it is also countable be 1.Therefore, always change distance
It can be counted as 2.
According to another aspect of the present invention, important path can be used and excavate to extract, and there is the approximation of the distinctiveness fact to sentence
It is disconnected.
Figure 10 is shown embodiment according to the present invention and extracted using important path, and there is the approximation of the distinctiveness fact to sentence
The flow chart of disconnected processing.
It as shown in Figure 10, can be by the true item of identification first and first group similar to each in document in step 2610
The true item of difference between the fact that a document, generates the candidate approximation with the distinctiveness fact for each described document
Judgement.
For each document with different judgement items compared with inputting document, assume initially that the different judgement item is made
Judge for candidate approximation, and assumes all different true items as the distinctiveness fact.Then, generation candidate had into area
The real approximate judgement of other sexual behavior.
Figure 11 shows the schematic diagram for generating the example of the candidate approximate judgement true with distinctiveness.In this example
In, for the input document, true item (it was found that) include: " age: 50 ", " tubercle: irregular ", " lymph node: enlargement ", "
Gender: women ", and " shade: existing ", and judge that item (diagnostic result) is " lung cancer ".
For the input document, 100 similar documents can be obtained (note that this 100 similar documents are using another kind
What method obtained, so these documents are uncorrelated to 143 documents above), and the judgement item of 70 similar documents is different
In " lung cancer ".20% judgement item in this 70 similar documents is bronchiectasis, and 35% is lung running sore, and 15% is lung qi
Swollen, 20% is pulmonary tuberculosis, and 10% is pneumonia.For the pass between a similar document and input document with different judgement items
System, can be written as " (it was found that<shade: existing>→ 0) → (lung cancer → bronchiectasis);Distance=1 ".It means that deleting
True item " shade: existing " judges that item will change into " bronchiectasis " from " lung cancer ", and input document and similar document it
Between the fact distance be 1.
Next, the method that important path will be used to excavate, is mentioned using the candidate approximate judgement true with distinctiveness
Take the approximate judgement true with distinctiveness.Being extracted using the candidate approximate judgement true with distinctiveness has difference sexual behavior
The detailed step of real approximate judgement is as follows.
In step 2620, a transfer figure can produce, wherein each of transfer figure endpoint node is to judge item,
And each of transfer figure non-end node is true item.
Next, all candidate approximate judgements true with distinctiveness can be arranged in this turn in step 2630
It moves in figure, wherein each paths of two endpoint nodes in connection transfer figure indicate that one candidate has distinctiveness true
Approximate judgement.It in other words, can if two nodes are included in a candidate approximate judgement true with distinctiveness
To draw the side between these nodes, therefore it will generate and connect two paths for judging item node in transfer figure.
Next, each side company of any two node can be connected in transfer figure by being recorded in step 2640
Frequency is connect, the importance on each side in transfer figure is calculated.
In step 2650, identify that its importance is equal to or more than the important side of scheduled 5th threshold value.In other words, such as
The importance of fruit a line reaches scheduled threshold value, this is when will be identified that important.Note that can be by user rule of thumb
Determine scheduled 5th threshold value.
Next, can produce at least one distinctiveness path in step 2660, wherein the distinctiveness path is by important
Side composition, and the distinctiveness path judges that item is connected to the first judgement item for second.
Figure 12 shows the schematic diagram of the example of important path excavation.As shown in figure 12, the endpoint node in transfer figure is "
Lung cancer " and " pulmonary tuberculosis ", they are to judge item.Non-end node in transfer figure includes: " shade exists ", " pleural effusion: is deposited
", " lymph node: enlargement " and " tubercle: irregular ", they are true items.If two nodes are included in a candidate
The approximate judgement true with distinctiveness in, then draw the side between these nodes.Important side is also marked with thick line.Area
Other property path is from " lung cancer " to " lymph node: enlargement " to " shade exists " to " pulmonary tuberculosis ".
Finally, in step 2670, each distinctiveness path is translated to the approximate judgement true with distinctiveness.
In the above example, important path can be translated for:
(it was found that<lymph node: enlargement>→<shade: existing>) → (lung cancer → pulmonary tuberculosis);Distance=2
It will lead to judgement item it means that deleting true item " lymph node: enlargement " and adding true item " shade: existing "
Pulmonary tuberculosis is changed into from lung cancer.In addition, as described above, changing distance is 2.
Therefore, by processing as shown in Figure 10, the approximate judgement true with distinctiveness can be extracted.
According to another aspect of the present invention, can be by changing true item, extracting, there is the approximation of the distinctiveness fact to sentence
It is disconnected.In this processing, each different true item of input document with similar document will be checked, so which fact identified
Item is the distinctiveness fact.
Figure 13 shows embodiment according to the present invention and extracts the approximation with the distinctiveness fact by changing true item
The flow chart of the processing of judgement.
As shown in figure 13, in step 2710, it is real to can produce candidate difference sexual behavior, in fact may be used wherein generating candidate difference sexual behavior
To include: true using the be different from the second true item first true specified candidate original judgement distinctiveness of item;Using being different from
The true specified candidate of item of the second of first true item newly judges the distinctiveness fact, wherein the number of the candidate original judgement distinctiveness fact
Mesh and candidate newly judge that the sum of number of the distinctiveness fact is equal to a predetermined number (that is, scheduled change distance).
For example, traveller may wish to search for certain similar travelling introductions using the current travel directory of Tokyo Tower
Handbook.Each travelling directory includes certain features of destination, is referred to alternatively as the interested project of user, and mesh
Ground be user want compare place.Therefore, destination can be taken as judgement item, and the interested project of user can be worked as
Make true item.
In the step, the information described about destination, time needed for such as price, travelling, travelling mould can be retrieved
Many similar travelling directories of formula, architectural style etc..
For each destination, it is real to can produce candidate difference sexual behavior.
For example, current destination is Tokyo Tower, and will the shallow careless temple of concern.The true item of difference between the two destinations
May include:
Tokyo Tower:<price: 200><building: modern>
Shallow grass temple:<price: 100><building: religion>
Therefore, it is real to can produce candidate difference sexual behavior.
Next, in step 2720, it is real similar to the candidate difference sexual behavior in document that first group can be verified, wherein verifying the
One group may include: to identify that first group newly judges area comprising candidate similar in document similar to the candidate difference sexual behavior in document in fact
Other sexual behavior it is real but not including that the candidate original judgement distinctiveness fact document, and the judgement item of the document identified is different from
First judges item;And if one judged in item of the document identified is to concentrate to judge item, which is distinguished into sexual behavior
Real label is, concentrates ratio of the document for judging item in all documents identified to be equal to or greatly wherein corresponding to
In scheduled 6th threshold value.
For example, distinguishing sexual behavior real (<building: modern>→<building: religion>) for candidate, (it is meant that comprising thing
Real (building: religion), but do not include true (building: modern)), it is found that 10 travelling directories include the fact <build
It builds: religion>and do not include<building: modern>;And 9 travelling directories are related to shallow careless temple, and number is greater than predetermined
Threshold value (for example, 60%), therefore shallow careless temple is to concentrate to judge item.Then the candidate difference sexual behavior of verifying is real (<building: modern>
→<building: religion>), and shallow careless temple is to concentrate to judge item.Note that the threshold value can also rule of thumb be defined by user.
Next, can produce the approximate judgement true with distinctiveness, wherein selecting the time having verified that in step 2730
The other sexual behavior in constituency is implemented as the distinctiveness fact;And judge item as approximate judgement in choice set.
The approximate example that judges true with distinctiveness in the example of travelling directory search is as follows.
(1) (<building: modern>→<building: religion>) → (Tokyo Tower → shallow careless temple);Distance=2
(2) (<building: modern>→<building: imperial>) → (Tokyo Tower → imperial palace square);Distance=2
(3) (<travel mode: land>→<travel mode: waterborne>) → (Tokyo Tower → ink field river cruise);Distance=2
(4) (<time: in 2 hours>→ 0) → (Tokyo Tower → eight treasures (choice ingredients of certain special dishes) garden country garden);Distance=1
For project (1), the true item being meant that in deletion input travelling directory " it builds: modern ", and
It adds true item " building: religion " and will lead to and judge that item (destination) changes into shallow careless temple from Tokyo Tower, and change distance
(number of the distinctiveness fact) is 2.
For project (2), the true item being meant that in deletion input travelling directory " it builds: modern ", and
Add true item " building: imperial " and will lead to and judge that item (destination) changes into imperial palace square from Tokyo Tower, and change away from
It is 2 from (number of the distinctiveness fact).
For project (3), it is meant that the true item " travel mode: land " deleted in input travelling directory, and
And add true item " travel mode: waterborne " and will lead to and judge that item (destination) changes into Mo Tianhe cruise from Tokyo Tower, and
Changing distance (number of the distinctiveness fact) is 2.
For project (4), it is meant that the true item " time: in 2 hours " deleted in input travelling directory will be led
It causes to judge that item (destination) changes into eight treasures (choice ingredients of certain special dishes) garden country garden from Tokyo Tower, and change distance (number of the distinctiveness fact) to be
1。
Therefore, true item can be changed by using method shown in Figure 13, extracting, there is the approximation of the distinctiveness fact to sentence
It is disconnected.
According to another aspect of the present invention, it can be used and change the approximate judgement that tree extraction has distinctiveness true.?
In this method, domain knowledge can be used to improve similar document searching.
Figure 14 shows embodiment according to the present invention and extracts the approximate judgement for having distinctiveness true using tree is changed
Processing flow chart.
As shown in figure 14, in step 2810, the change tree about input document can be obtained, wherein the change tree is specific
In the structural data of one group of knowledge information related with input document, wherein each non-end node is a true item,
And each endpoint node is one and judges item.
For example, customer may want to determine to buy any camera.Customer may think that a type of card camera
Current introduction it is not good enough, and he can search for certain similar camera introductions.
In this case, product type is the content that user wants comparison, so product type can be taken as judgement item, and
And product parameters project can be taken as true item.
In this area, construction by hand may be had existed or known by the structuring of knowledge excavation technology mining
Know.Structural knowledge is known as changing tree by we.The structural knowledge can be used for tissue search result.
Figure 15 shows the schematic diagram for changing the example of tree.In this example, " card photograph that an endpoint node is
Machine ".Other endpoint nodes are " compact camera (compact camera) ", " SLR camera ", " professional camera " and "
Focal length camera ".Feature about various types of cameras, that is, true item constitutes non-end node.
Next, in step 2,820 one of two endpoint nodes in tree can be changed by the way that selection link is obtained
Paths generate the approximate judgement for having distinctiveness true.
For example, for the change tree in Figure 15, we can choose the branch of rightmost.For the branch, we can be with
Translated to the following approximate judgement true with distinctiveness:
(parameter<optical zoom: 5 times>→ parameter<optical zoom: 50 times>) →
(card camera → telephoto camera);Distance=2
This is meant that true item " the optics contracting deleted in input product introduction with the true approximate judgement of distinctiveness
Put: 5 times " and add true item " optical zoom: 50 times " and will lead to and judge that item (product type) is changed into from card camera
Telephoto camera, and changing distance (number of the distinctiveness fact) is 2.
Therefore, the approximate judgement for having distinctiveness true can be extracted based on processing shown in Figure 14.
Alternatively and/or additionally, the approximate judgement that extracting has distinctiveness true can also include: the similar area of detection
Other sexual behavior is real;It is true to merge similar distinctiveness;Using combined distinctiveness fact adjustment there is the approximation of the distinctiveness fact to sentence
It is disconnected.
For example, two true items " tumor size: 3.7cm " and " tumor size: 3.9cm " can be merged into a true item
" tumor size: 3.5~4.0cm ".It is then possible to using the fact that this merging adjustment there is the approximation of the distinctiveness fact to sentence
It is disconnected.
In one embodiment, tool can be presented by exporting the list of all approximate judgements true with distinctiveness
The real approximate judgement of sexual behavior of having any different.
In one embodiment, the approximation for having distinctiveness true can be presented by following operation to judge: exports it
Change with distinctiveness true approximate judgement of the distance less than scheduled 7th threshold value, or exports having most for predetermined number
The small approximate judgement true with distinctiveness for changing distance.Note that can be by empirically determined scheduled 7th threshold of user
Value.
In one embodiment, the approximation for having distinctiveness true can be presented by following operation to judge: calculates every
One coverage rate with the true approximate judgement of distinctiveness, wherein the coverage rate is and approximate the sentencing with the distinctiveness fact
Break matched document at first group similar to the ratio in document;And its coverage rate is exported equal to or more than scheduled 8th threshold value
The approximate judgement true with distinctiveness, or output predetermined number with maximal cover rate with the close of the distinctiveness fact
Like judgement.Note that can be by empirically determined scheduled 8th threshold value of user.
In one embodiment, can be in by with the true approximate judgement of distinctiveness together output change tree
Approximate judgement now true with distinctiveness.
In one embodiment, the fact that between the fact that the first judgement item can be presented and approximate the fact that judge, is poor
It is different, wherein the fact difference causes to judge the variation of item to the approximate judgement from first.Through this process, user can be clear
Know to Chu which true difference will cause judge that item judges to another variation of item from first, and if document includes this
The true difference of kind, he can focus more on the document.For example, because " pleural effusion: existing " is " lung cancer " and " lung running sore "
Essential distinction, if true item " pleural effusion: existing " exists, doctor should focus more on it.Doctor can reexamine thing
Real item " pleural effusion: existing " is to provide accurate diagnosis.This is the actual search purpose that doctor carries out document searching.
In one embodiment, for each approximate judgement true with distinctiveness, input document can be indicated
In correspond to it is original judgement distinctiveness fact sentence, and also can indicate that input document in new judgement distinguish sexual behavior
It is real.Through this process, the pith that can be highlighted in document, this is convenient for the reading of user.
In addition, in certain documents, it is understood that there may be multiple judgement items, such as patient may have simultaneously there are two types of disease.?
In this case, the relationship of the fact that judge item about each should be detected, and can be divided item and true item is judged
A series of class, to obtain judgement items with its true item.In addition, input document can be taken as the combination of two documents, and
And for each different judgement item with its true item, can be extracted according to above method has the distinctiveness fact
Approximation judgement.
Using above method, can provide and the matched valuable information of the actual search purpose of user.
Furthermore it is possible to search result be organized, so as to save the time of user's reading documents.
Figure 16 shows the flow chart of the method for similar document searching of embodiment according to the present invention.
As shown in figure 16, in step 3100, input document can be obtained.Next, this hair can be based in step 3200
The bright above method determines at least one approximate judgement true with distinctiveness of the input document.Next, in step
3300, at least one described approximate one group similar text judging the acquisition input document true with distinctiveness can be used
Shelves.
In one embodiment, which is the radiography report for including discovery item and diagnosis item, the discovery
Item is selected as the first true item, and the diagnosis item is selected as the first judgement item.
In one embodiment, which is the trip for including user's interested project and Reiseziel project
Row handbook, the interested project of the user is selected as the first true item, and the Reiseziel project is selected as
First judges item.
In one embodiment, which is product Jie for including product parameters project and product type project
It continues, which is selected as the first true item, and the product type project is selected as the first judgement item.
Figure 17 shows embodiment according to the present invention for determining the device for the approximate judgement for having distinctiveness true
4000 functional-block diagram.Device 4000 as shown in figure 17 may be implemented shown in Fig. 4 true with distinctiveness for determining
Approximate judgement method.It can be functional by the institute of the combination realization device 4000 of hardware, software or hardware and software
Block (the various units for including in device 4000, no matter being shown or having been not shown in the figure), it is of the invention to realize
Principle.The sub-block it will be appreciated by those skilled in the art that functional block described in Figure 17 can be combined or be divided into, to realize this
Invent principle as described above.Therefore, description herein can support functional block described herein it is any it is possible combination or
It decomposes or further limits.
As shown in figure 17, according to an aspect of the present invention, for determining the dress for the approximate judgement for having distinctiveness true
Set 4000 may include: document obtaining unit 4100, judge item and true item extraction unit 4200, document analysis unit 4300,
Detection unit 4500 is judged similar to document analysis unit 4400 and with the true approximation of distinctiveness.Document obtaining unit 4100 is matched
It is set to acquisition document, wherein document obtained judges item comprising first, and first judges that item is the keyword of predefined type.
Judge that item and true item extraction unit 4200 are configured to extract and judge item and true item.Document analysis unit 4300 is configured so that
The judgement item and true item extraction unit extract first from the document obtained and judge item and the first true item, wherein each
A first true item is to judge the associated information of item with first.Similar document analysis unit 4400 is configured so that the first judgement
Item obtains first group of similar document with the first true item, and uses the judgement item and fact item extraction unit 4200 from first
Extracted in the similar document of group with first judge item it is different second judge item and the second fact item.Approximation with the distinctiveness fact
Judge detection unit 4500 be configured to by using first group similar to document and second judge item and the second true item detect to
The few one approximate judgement true with distinctiveness.The distinctiveness fact instruction first judges that item and second judges between item
Difference.The approximate judgement is one in the second judgement item, and the approximate judgement and described first judges between item
Change distance and be less than scheduled first threshold, wherein change distance instruction distinguishes first and judges that item and second judges the difficulty of item
Spend rank.
In one embodiment, judge that item and true item extraction unit 4200 can also include: for extracting from document
The keyword extracting unit of keyword;For identifying the judgement item recognition unit of the judgement item from the keyword of extraction, with
And for selecting the fact that remaining keyword is as true item selecting unit.
In one embodiment, the judgement item recognition unit at least one of may further include the following units: use
In from judging to select keyword as the unit for judging item in item entry domain;For according to it is scheduled configuration select keyword as
Judge the unit of item;And for selecting keyword as the unit for judging item by user.
In one embodiment, select keyword as judging that the unit of item can also include: according to scheduled configuration
For selecting the unit of the keyword in sentence, wherein the judgement and/or objective result of sentence expression subjectivity.
In one embodiment, judge that detection unit 4500 can also include: original with the true approximation of distinctiveness
Distinctiveness fact extraction unit is judged, for second judge that item extracts the original judgement distinctiveness fact for each;Newly judge area
Other sexual behavior reality extraction unit, for second judge that item extracts for each and newly judging that distinctiveness is true;Change metrics calculation unit,
For using the original judgement distinctiveness fact of extraction and newly judging that the distinctiveness fact calculates each and second judges item and first
Judge the change distance between item;And first approximate judgement generate unit, be less than scheduled for changing distance using it
The second of one threshold value judges that item generates the approximate judgement for having distinctiveness true.
In one embodiment, the original judgement distinctiveness fact extraction unit can also include: target fact item
Selecting unit, for select one in the first true item as target fact item;Susceptibility computing unit, for calculating target
The susceptibility of true item comprising: second group is deleted institute similar to document obtaining unit, for passing through using the described first true item
Target fact item is stated, obtains second group similar to document;Third judges item and third fact item extraction unit, is used for from second group of class
It is extracted like document and judges that the different third of item judges item and third fact item from first;And susceptibility computation subunit, it is used for
Judge that item judges point of the item in first group of similar document with second similar to the distribution in document at second group by using third
Cloth calculates the susceptibility;And original judgement distinctiveness fact selecting unit, if for calculating susceptibility be equal to or
Greater than scheduled second threshold, select the target fact item true as original judgement distinctiveness.
In one embodiment, newly judge that distinctiveness fact extraction unit can also include: correlation calculations unit, use
In judging appearance ratio of the item in second group of similar document with corresponding third by using third fact item, each is calculated
The correlation of third fact item;And newly judge distinctiveness fact selecting unit, if the correlation etc. for third fact item
In or greater than scheduled third threshold value, third fact item is selected to judge the distinctiveness fact as new.
In one embodiment, judge that detection unit 4500 can also include: the fact with the true approximation of distinctiveness
Metrics calculation unit, for by calculating first group similar to the distance between each of document document and the document of acquisition,
The fact that calculate first group similar to each of document document distance, wherein by using not working together between two documents
The counting of real item calculates first group similar to the distance between each of document document and the document of acquisition;Judge item distance
Computing unit, for calculating each by using the fact that first group of calculating similar to each of document document distance
Second judges item and first judges change distance between item, to calculate each second judgement item distance for judging item, wherein
By to first group similar to each of document document the fact distance be averaged, calculate each and second judge item and
Change distance between one judgement;Second judges item selecting unit, if for second judge item judgement item distance be equal to or
Person is less than scheduled 4th threshold value, selects second to judge that item judges as approximation;And distinctiveness fact extraction unit, for leading to
The true items of difference identified between the fact that the first true item and the approximation judge are crossed, extraction is for the approximate judgement
Distinctiveness is true.
In one embodiment, judge that detection unit 4500 can also include: candidate with the true approximation of distinctiveness
The fact that approximation judgement generates unit, is used for through the true item of identification first and first group similar to each of document document
Between the true items of difference, be first group and generate the candidate approximation with the distinctiveness fact similar to each of document document
Judgement;Approximation judges extraction unit, has difference sexual behavior for using the candidate approximate judgement true with distinctiveness to extract
Real approximate judgement comprising: the transfer figure for generating transfer figure generates unit, wherein each of transfer figure end segment
Point is to judge item, and each of transfer figure non-end node is true item;Candidate approximation judges arrangement unit, for that will own
The candidate approximate judgement true with distinctiveness is arranged in transfer figure, wherein two endpoint nodes in connection transfer figure
Each paths indicate a candidate approximate judgement true with distinctiveness;Importance computing unit, for passing through record
The rate of connections that each side of any two node is connected in transfer figure calculates the important of each side in transfer figure
Property;Important side recognition unit, its importance is equal to or more than the important side of scheduled 5th threshold value for identification;Distinctiveness
Path generates unit, and for generating at least one distinctiveness path, wherein distinctiveness path is made of important side, and is distinguished
Property path judges that item is connected to the first judgement item for second;And translation unit, for each distinctiveness path to be translated to
With the approximate judgement that distinctiveness is true.
In one embodiment, judge that detection unit 4500 can also include: to be used for the true approximation of distinctiveness
It generates the real candidate difference sexual behavior of candidate difference sexual behavior and generates unit in fact comprising: the candidate original judgement distinctiveness fact is specified
Unit, for using the specified candidate original judgement distinctiveness of the true item of first different from the second true item true;Candidate newly sentences
Disconnected distinctiveness fact designating unit, for using the true specified candidate of item of second different from the first true item newly to judge distinctiveness
The fact, wherein the number of the candidate original judgement distinctiveness fact and candidate newly judge the sum of number of the distinctiveness fact equal to predetermined
Number;Candidate's difference sexual behavior reality authentication unit, it is real similar to the candidate difference sexual behavior in document for verifying first group, comprising: text
Shelves recognition unit newly judges that distinctiveness is true but it is candidate former not include comprising candidate similar in document for first group for identification
Beginning judges the document of the distinctiveness fact, and the judgement item of the document identified is different from first and judges item;And candidate difference
Sexual behavior reality marking unit, it is if one of the document for identifying judges that item is to concentrate to judge item, candidate's difference sexual behavior is real
Labeled as having verified that, ratio of the document for judging item in all documents identified is concentrated to be equal to or more than wherein corresponding to
Scheduled 6th threshold value;And second approximate judgement generate unit, for generating the approximate judgement true with distinctiveness, wrap
It includes: the candidate difference sexual behavior reality selecting unit having verified that, for selecting the candidate difference sexual behavior having verified that be implemented as difference sexual behavior
It is real;Item selecting unit is judged with concentrating, for judging item as approximate judgement in choice set.
In one embodiment, judge that detection unit 4500 can also include: to change with the true approximation of distinctiveness
Obtaining unit is set, for obtaining the change tree about document obtained, is specific for and document obtained wherein changing tree
The structural data of relevant one group of knowledge information, wherein each non-end node is true item, and each end segment
Point is to judge item;And the judgement of third approximation generates unit, for passing through selection link two ends obtained changed in tree
One paths of end node generate the approximate judgement for having distinctiveness true.
In one embodiment, judge that detection unit 4500 can also include: to be used for the true approximation of distinctiveness
Detect the similar distinctiveness fact detection unit of the similar distinctiveness fact;For merging the similar distinctiveness of the similar distinctiveness fact
True combining unit;And approximation judges adjustment unit, for using the distinctiveness of merging true, adjustment has distinctiveness true
Approximate judgement.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include
First approximation judges display unit, and for the lists by exporting all approximation judgements true with distinctiveness, presentation has
The true approximate judgement of distinctiveness.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include
Second approximation judges display unit, and the approximation for having distinctiveness true for passing through following operation presentation judges: exporting its change
Distance has the approximate judgement of the distinctiveness fact less than scheduled 7th threshold value, or exports changing with minimum for predetermined number
Displacement from the approximate judgement true with distinctiveness.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include
Third approximation for rendering with the true approximate judgement of distinctiveness judges display unit, further include: coverage rate calculates single
Member, for calculating each coverage rate with the true approximate judgement of distinctiveness, wherein coverage rate is and has difference sexual behavior
Real approximation judges matched document at first group similar to the ratio in document;And approximation judges output unit, for exporting
Its coverage rate judges equal to or more than the approximation true with distinctiveness of scheduled 8th threshold value, or exports predetermined number
The approximate judgement true with distinctiveness with maximal cover rate.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include
4th approximation judges display unit, for tool to be presented by setting with the true approximate judgement of distinctiveness together output change
The real approximate judgement of sexual behavior of having any different.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include
True difference display unit, for rendering between first the fact that judge item and approximate the fact that judge the fact difference, wherein
The fact difference causes to judge the variation of item to approximation judgement from first.
In one embodiment, the device 4000 for determining that the approximation for having distinctiveness true judges can also include
Indicating unit, for indicating to correspond to original in document obtained for each approximate judgement true with distinctiveness
Beginning judges the sentence of the distinctiveness fact, and indicates the fact of the new judgement distinctiveness in document obtained.
Figure 18 shows the function box of the device 5000 for similar document searching of embodiment according to the present invention
Figure.The method shown in Figure 16 for similar document searching may be implemented in device 5000 shown in Figure 18.It can be by hardware, soft
The combination realization device 5000 of part or hardware and software all functional blocks (the various units for including in device 5000, no matter
It is shown or has been not shown in the figure), to realize the principle of the present invention.It will be appreciated by those skilled in the art that Figure 18
Described in functional block can be combined or be divided into sub-block, to realize present invention principle as described above.Therefore, retouching herein
State any possible combination that can support functional block described herein or decomposition or further restriction.
As shown in figure 18, according to an aspect of the present invention, the device 5000 for similar document searching may include: defeated
Enter Document Creator unit 5100, for determining that the device 4000 for the approximate judgement for having distinctiveness true, and similar document obtain
Unit 5200.Input Document Creator unit 5100 is configured to receive input document.For determining the approximation with the distinctiveness fact
The device 4000 of judgement is configured to determine at least one approximate judgement true with distinctiveness of input document.Similar document obtains
It obtains unit 5200 and is configured so that at least one approximate judgement true with distinctiveness, obtain and be directed to the one of the input document
The similar document of group.
In one embodiment, which is the radiography report for including discovery item and diagnosis item, the hair
Existing item is selected as the first true item, and the diagnosis item is selected as the first judgement item.
In one embodiment, which is the trip for including user's interested project and Reiseziel project
Row handbook, the interested project of user is selected as the first true item, and the Reiseziel project is selected
Item is judged as first.
In one embodiment, which is product Jie for including product parameters project and product type project
It continues, the product parameters project is selected as the first true item, and the product type project is selected as first and sentences
Disconnected item.
In addition, according to another aspect of the present invention, can provide for determining the approximate judgement for having distinctiveness true
Device.The device is realized in computer system 1000 that can be shown in Fig. 3.The apparatus may include processor and thereon
It is stored with the memory of instruction, when described instruction is executed by processor, so that processor performs the following operations: document is obtained,
Wherein document obtained judges item comprising first, and first judges that item is the keyword of predefined type;From text obtained
First is extracted in shelves and judge item and the first true item, and wherein each first fact item is to judge the associated letter of item with first
Breath;Judge that item and the first true item obtain first group similar to document using first, and from first group similar to extracted in document with
First judge item it is different second judge item and the second true item;By using first group similar to document and second judge item and
Second true item detects at least one approximate judgement true with distinctiveness, and wherein distinctiveness fact instruction first judges item
And second judge difference between item;The approximate judgement is one in the second judgement item, and approximate judgement and first is sentenced
Change distance between disconnected item is less than scheduled first threshold, wherein change distance instruction distinguishes first and judges item and second
Judge the difficulty level of item.
In one embodiment, from extracting true item in document and judge that item can also include: to extract pass from document
Keyword;Item is judged with identifying from the keyword of extraction, and selects remaining keyword as true item.
In one embodiment, identification judges that item can also include at least one of the following: from judging item word
Select keyword as judging item in domain;Select keyword as judging item according to scheduled configuration;And it is selected by user
Keyword, which is used as, judges item.
In one embodiment, select keyword as judging that item can also include: selection sentence according to predetermined configurations
In keyword, wherein the judgement and/or objective result of sentence expression subjectivity.
In one embodiment, it detects at least one and judges that it is each for may include: with the true approximation of distinctiveness
A second judges that item extracts the original judgement distinctiveness fact;Second judge that item extracts for each and newly judge that distinctiveness is true;Make
With the original judgement distinctiveness fact of extraction and newly judge that distinctiveness is true, calculates each and second judge that item and first judges item
Between change distance;And judge that item generation has difference sexual behavior using second of distance less than scheduled first threshold is changed
Real approximate judgement.
In one embodiment, extracting the original judgement distinctiveness fact includes: the work selected in the first true item
For target fact item;Calculate the susceptibility of target fact item, comprising: true by deleting the target using the first true item
, second group is obtained similar to document;Judge that the different third of item judges item and the from first from second group similar to extracting in document
Three true items;And judge that item judges item in first group of class at second group similar to the distribution and second in document by using third
Like the distribution in document, the susceptibility is calculated;And if the susceptibility calculated is equal to or more than scheduled second threshold,
Select the target fact item true as original judgement distinctiveness.
In one embodiment, it extracts and newly judges that the distinctiveness fact includes: by using third fact item and corresponding
Third judges that item, similar to the appearance ratio in document, calculates the correlation of each third fact item at second group;And if
The correlation of third fact item is equal to or more than scheduled third threshold value, selects third fact item as new judgement difference sexual behavior
It is real.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to pass through meter
First group is calculated similar to the distance between each of document document and document obtained, calculates first group similar in document
The fact that each document distance, wherein calculating first group of class by using the counting of the true items of difference between two documents
Like the distance between each of document document and document obtained;By using first group similar to each of document
The fact that the calculating of document distance, second judge that item and first judges change distance between item by calculating each, calculate
Each second judgement item distance for judging item, wherein by first group similar to each of document document the fact distance
Be averaged, calculate each second judge item and first judgement between change distance;If second judges the judgement item of item
Distance is equal to or less than scheduled 4th threshold value, then selects second to judge that item judges as approximation;And pass through identification first
The true item of difference between the fact that true item and the approximation judge, extracts the difference sexual behavior for the approximate judgement
It is real.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to pass through knowledge
The true items of difference between the fact that other first true item and first group are similar to each of document document, are first group of class
The candidate approximate judgement true with distinctiveness is generated like each of document document;There is difference sexual behavior using candidate
The approximate judgement for having distinctiveness true is extracted in real approximate judgement, comprising: transfer figure is generated, wherein each in transfer figure
A endpoint node is to judge item, and each of transfer figure non-end node is true item;All candidates had into area
The real approximate judgement of other sexual behavior is arranged in transfer figure, wherein each paths of two endpoint nodes in connection transfer figure refer to
Show a candidate approximate judgement true with distinctiveness;The each of any two node is connected by being recorded in transfer figure
The rate of connections on side calculates the importance on each side in transfer figure;It is scheduled to identify that its importance is equal to or more than
The important side of 5th threshold value;At least one distinctiveness path is generated, wherein the distinctiveness path is made of important side, and
And the distinctiveness path judges that item is connected to the first judgement item for second;And each distinctiveness path is translated to has
The true approximate judgement of distinctiveness.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to generate time
The other sexual behavior in constituency is real, comprising: using the first true item different from the second true item, candidate original judgement is specified to distinguish sexual behavior
It is real;Using the second true item different from the first true item, specified candidate newly judges the distinctiveness fact, wherein candidate original judgement
The number of the distinctiveness fact and candidate newly judge that the sum of number of the distinctiveness fact is equal to scheduled number;Verify first group it is similar
Candidate difference sexual behavior reality in document, comprising: first group of identification newly judges that distinctiveness is true comprising candidate similar in document, but
It is the document not comprising the candidate original judgement distinctiveness fact, and the judgement item of the document identified and first judges item not
Together;And if one of the document identified judges that item is to concentrate to judge item, by candidate's difference sexual behavior, label is in fact,
Wherein correspond to the concentration and judges that ratio of the document of item in all documents identified is equal to or more than the scheduled 6th
Threshold value;And generate the approximate judgement for having distinctiveness true, comprising: the candidate difference sexual behavior having verified that described in selection is implemented as
Distinctiveness is true;And the concentration is selected to judge that item judges as approximation.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness may include: to be closed
In the change tree of document obtained, wherein changing the knot that tree is specific for one group of knowledge information relevant to document obtained
Structure data, wherein each non-end node is true item, and each endpoint node is to judge item;And pass through selection
The paths obtained for changing two endpoint nodes in tree are linked, the approximate judgement for having distinctiveness true is generated.
In one embodiment, detecting at least one with the true approximate judgement of distinctiveness can also include: detection
Similar distinctiveness is true;It is true to merge similar distinctiveness;There is the close of the distinctiveness fact using combined distinctiveness fact adjustment
Like judgement.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor
When row, so that processor performs the following operations: tool is presented in the list by exporting all approximate judgements true with distinctiveness
The real approximate judgement of sexual behavior of having any different.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor
When row, so that processor performs the following operations: changing distance less than scheduled 7th threshold value with distinctiveness by exporting it
True approximate judgement, or the approximate judgement true with distinctiveness with minimum change distance of output predetermined number,
The approximate judgement for having distinctiveness true is presented.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor
When row, so that processor performs the following operations: by calculating each coverage rate with the true approximate judgement of distinctiveness,
Middle coverage rate be first group similar in document with the approximate ratio for judging matched document with the distinctiveness fact;With
And judged by exporting the true approximation of the distinctiveness that there is its coverage rate to be equal to or more than scheduled 8th threshold value, or pass through
The approximate judgement true with distinctiveness with maximal cover rate of output predetermined number, to present with the distinctiveness fact
Approximation judgement.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor
When row, so that processor performs the following operations: by presenting with the true approximate judgement of distinctiveness together output change tree
With the approximate judgement that distinctiveness is true.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor
When row, so that processor performs the following operations: the thing between the fact that the fact that presentation first judges item and the approximation judge
Real difference, wherein the fact difference causes to judge the variation of item to the approximate judgement from first.
In one embodiment, the memory further includes the instruction stored thereon, when described instruction is held by processor
When row, so that processor performs the following operations: for each approximate judgement true with distinctiveness, indicating text obtained
The sentence for corresponding to the original judgement distinctiveness fact in shelves, and the new judgement difference sexual behavior in instruction document obtained
It is real.
In addition, according to another aspect of the present invention, a kind of device for similar document searching can be provided.The dress
Setting may include processor and the memory for being stored thereon with instruction, when described instruction is executed by processor, so that processor
It performs the following operations: receiving input document;Determine that at least one described for inputting document has difference sexual behavior based on the above method
Real approximate judgement;And using at least one described approximate judgement true with distinctiveness, obtain the input document
One group similar to document.
In one embodiment, which is the radiography report for including discovery item and diagnosis item, the hair
Existing item is selected as the first true item, and the diagnosis item is selected as the first judgement item.
In one embodiment, which is the trip for including user's interested project and Reiseziel project
Row handbook, the interested project of user is selected as the first true item, and the Reiseziel project is selected
Item is judged as first.
In one embodiment, which is product Jie for including product parameters project and product type project
It continues, the product parameters project is selected as the first true item, and the product type project is selected as first and sentences
Disconnected item.
Note that those skilled in the art are it will be clearly understood that the embodiment in the application can be combined arbitrarily.
Method and system of the invention may be achieved in many ways.For example, can by software, hardware, firmware or
Software, hardware, firmware any combination realize method and system of the invention.The said sequence of the step of for the method
Merely to be illustrated, the step of method of the invention, is not limited to sequence described in detail above, special unless otherwise
It does not mentionlet alone bright.In addition, in some embodiments, also the present invention can be embodied as to record program in the recording medium, these programs
Including for realizing machine readable instructions according to the method for the present invention.Thus, the present invention also covers storage for executing basis
The recording medium of the program of method of the invention.
Although some specific embodiments of the invention are described in detail by example, the skill of this field
Art personnel it should be understood that above example merely to being illustrated, the range being not intended to be limiting of the invention.The skill of this field
Art personnel are it should be understood that can without departing from the scope and spirit of the present invention modify to above embodiments.This hair
Bright range is defined by the following claims.
Claims (44)
1. a kind of method for determining the approximate judgement for having distinctiveness true, comprising:
A) document obtains step, and for obtaining document, wherein document obtained judges item comprising first, and described first sentences
Disconnected item is the keyword of predefined type;
B) document analysis step, from document obtained extract first judge item and first the fact item, wherein each first
True item is to judge the associated information of item with first;
C) similar document analysis step, for judging that item and the first true item obtain first group similar to document using first, and
Judge that the second of item judges item and the second true item for being different from first similar to extraction in document from first group;
D) detecting step is judged with the true approximation of distinctiveness, for judging by using first group similar to document and second
Item and the second true item detect at least one and judge with the true approximation of distinctiveness, in which:
At least one described approximation with the distinctiveness fact judges to be made of the distinctiveness fact and approximate judgement;
Distinctiveness fact instruction first judges item and second judges difference between item;And
The approximate judgement is one in the second judgement item, and the approximate judgement and described first judges changing between item
Displacement changes distance and judges that item and second judges difference between item according to first from being less than scheduled first threshold wherein described
Sexual behavior determines in fact.
2. the method as described in claim 1, wherein extracting true item from document and judging item further include:
Keyword is extracted from the document;And
The judgement item is identified from extracted keyword, and selects remaining keyword as the true item.
3. method according to claim 2, wherein identifying that the judgement item further includes at least one of the following:
From judge to select in item entry domain keyword as judging item;
Select keyword as judging item according to scheduled configuration;And
Select keyword as judging item by user.
4. method as claimed in claim 3, wherein selecting keyword as judging item according to scheduled configuration further include:
The keyword in sentence is selected, wherein the judgement and/or objective result of sentence expression subjectivity.
5. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) second judge that item extracts the original judgement distinctiveness fact for each;
2) second judge that item extracts for each and newly judge that distinctiveness is true;
3) judge that distinctiveness is true using the extracted original judgement distinctiveness fact and newly, calculate each second judge item with
First judges the change distance between item;And
4) changed using it and judge item apart from be less than scheduled first threshold second, generated described with the close of the distinctiveness fact
Like judgement.
6. method as claimed in claim 5, wherein extracting the original judgement distinctiveness fact and including:
Select one in the first true item as target fact item;
Calculate the susceptibility of the target fact item, comprising:
Second group is obtained similar to document by deleting the target fact item using the first true item;
Judge that the different third of item judges item and third fact item from first from second group similar to extracting in document;And
Judge that item judges item in first group of similar document similar to the distribution in document at second group with second by using third
Distribution, calculate the susceptibility;And
If susceptibility calculated is equal to or more than scheduled second threshold, select the target fact item as the original
Beginning judges the distinctiveness fact.
7. method as claimed in claim 6, newly judging that the distinctiveness fact includes: wherein extracting
Appearance ratio of the item in second group of similar document is judged with corresponding third by using third fact item, is calculated each
The correlation of a third fact item;And
If the correlation of third fact item is equal to or more than scheduled third threshold value, select third fact item as described new
Judge the distinctiveness fact.
8. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) by calculating first group similar to the distance between each of document document and document obtained, first group is calculated
The fact that each of similar document document distance, wherein by using the counting of the true items of difference between two documents,
First group is calculated similar to the distance between each of document document and obtained document;
2) the fact that calculating by using first group similar to each of document document distance, by calculate each second
Judge that item and first judges change distance between item, calculate each second judgement item distance for judging item, wherein by pair
First group similar to each of document document the fact distance be averaged, calculate each second judge item and first judgement
Between change distance;
If 3) second judge that the judgement item distance of item is equal to or less than scheduled 4th threshold value, select second judge item as
The approximate judgement;And
4) the true item of difference between the fact that judged by the true item of identification first and the approximation, extracts the approximation and sentences
Disconnected distinctiveness is true.
9. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) difference between the fact that by the true item of identification first and first group similar to each of document document is true
, generate the candidate approximate judgement true with distinctiveness of each document;
2) the approximate judgement true with distinctiveness is extracted using the candidate approximate judgement true with distinctiveness,
Include:
Generate transfer figure, wherein each of described transfer figure endpoint node be judge item, and it is described shift figure in it is every
One non-end node is true item;
All candidate approximate judgements true with distinctiveness are arranged in the transfer figure, wherein connecting the transfer figure
In each paths of two endpoint nodes indicate a candidate approximate judgement true with distinctiveness;
The rate of connections for connecting each side of any two node in the transfer figure by being recorded in, calculates in the transfer
The importance on each side in figure;
Identify that its importance is equal to or more than the important side of scheduled 5th threshold value;
At least one distinctiveness path is generated, wherein the distinctiveness path is made of important side, and the distinctiveness road
Diameter judges that item is connected to the first judgement item for second;And
Each distinctiveness path is translated into the approximate judgement true with distinctiveness.
10. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) it is real to generate candidate difference sexual behavior, comprising:
Using the first true item different from the second true item, specify candidate original judgement distinctiveness true;
Using the second true item different from the first true item, it is specified it is candidate newly judge that distinctiveness is true, original sentence wherein candidate
The number of disconnected distinctiveness fact and the candidate number for newly judging the distinctiveness fact and equal to scheduled number;
2) first group is verified similar to the candidate difference sexual behavior reality in document, comprising:
Identify that first group newly judges that distinctiveness is true original but do not include the candidate comprising the candidate similar in document
The judgement item for the document for judging the document of the distinctiveness fact, and identifying is different from first and judges item;And
If one of the document identified judges that item is to concentrate to judge item, the candidate difference sexual behavior is marked in fact to test
Card, wherein corresponding to the ratio for concentrating the document for judging item in all documents identified equal to or more than scheduled
6th threshold value;And
3) the approximate judgement true with distinctiveness is generated, comprising:
The candidate difference sexual behavior having verified that is selected to be implemented as the distinctiveness true;And
The concentration is selected to judge item as the approximate judgement.
11. the method as described in claim 1, wherein the approximation true with distinctiveness judges that detecting step includes:
1) obtain change tree about document obtained, wherein the change set be specific for it is relevant to document obtained
The structural data of one group of knowledge information, wherein each non-end node is true item, and each endpoint node is to sentence
Disconnected item;And
2) paths obtained for changing two endpoint nodes in tree are linked by selection, generated described with distinctiveness
True approximate judgement.
12. the method as described in any one of claim 5 to 11, wherein the approximate judgement inspection true with distinctiveness
Survey step further include:
It is true to detect similar distinctiveness;
It is true to merge similar distinctiveness;
It is true using combined distinctiveness, adjust the approximate judgement for having distinctiveness true.
13. the method as described in claim 1, further includes: by the column for exporting all approximate judgements true with distinctiveness
The approximate judgement true with distinctiveness is presented in table.
14. the method as described in claim 1 further includes that the approximation with the distinctiveness fact is presented by following operation
Judgement:
It exports it and changes the approximate judgement true with distinctiveness that distance is less than scheduled 7th threshold value, or
Export the approximate judgement true with distinctiveness with minimum change distance of predetermined number.
15. the method as described in claim 1 further includes that the approximation with the distinctiveness fact is presented by following operation
Judgement:
Each coverage rate with the true approximate judgement of distinctiveness is calculated, wherein the coverage rate is in first group of similar text
Shelves in the approximate ratio for judging matched document with the distinctiveness fact;And
The approximation true with distinctiveness that its coverage rate is exported equal to or more than scheduled 8th threshold value judges, or output is predetermined
The approximate judgement true with distinctiveness with maximal cover rate of number.
16. method as claimed in claim 11, further includes: by defeated together with the approximate judgement true with distinctiveness
The approximate judgement true with distinctiveness is presented in the change tree out.
17. the method as described in claim 1, further include be presented for first the fact that judge item and the fact that the approximation judges it
Between the fact difference, wherein the fact difference causes judge item to the approximate variation judged from first.
18. the method as described in any one of claim 5 to 7, further includes: for each with the close of the distinctiveness fact
Like judgement, indicate to correspond to the sentence of the original judgement distinctiveness fact in document obtained, and indicate to be obtained
Document in the new judgement distinctiveness it is true.
19. a kind of method for similar document searching, comprising:
A) input document is received;
B) based on method described in any one of claims 1 to 18, determine the input document at least one with area
The real approximate judgement of other sexual behavior;And
C) using at least one described approximate judgement true with distinctiveness, one group of similar text of the input document is obtained
Shelves.
20. method as claimed in claim 19, wherein
The input document is the radiography report for including discovery item and diagnosis item, and the discovery item is selected as the first thing
Real item, and the diagnosis item is selected as the first judgement item.
21. method as claimed in claim 19, wherein
The input document is the shell folder for including user's interested project and Reiseziel project, and user's sense is emerging
The project of interest is selected as the first true item, and the Reiseziel project is selected as the first judgement item.
22. method as claimed in claim 19, wherein
The input document is the product introduction for including product parameters project and product type project, the product parameters project quilt
It is selected as the first true item, and the product type project is selected as the first judgement item.
23. a kind of for determining the device for the approximate judgement for having distinctiveness true, comprising:
A) document obtaining unit, for obtaining document, wherein document obtained judges item comprising first, and first judges item
It is the keyword of predefined type;
B) judge item and true item extraction unit, judge item and true item for extracting;
C) document analysis unit, for extracting first from the document obtained using the judgement item and true item extraction unit
Judge item and the first true item, wherein each first true item is to judge the associated information of item with first;
D) similar document analysis unit, for judging that item and the first true item obtain first group similar to document using first, and
The of item is judged different from first for using the judgement item and true item extraction unit to extract from first group of similar document
Two judge item and the second true item;
E) detection unit is judged with the true approximation of distinctiveness, for judging by using first group similar to document and second
Item and the second true item detect at least one and judge with the true approximation of distinctiveness, in which:
At least one described approximation with the distinctiveness fact judges to be made of the distinctiveness fact and approximate judgement;
Distinctiveness fact instruction first judges item and second judges difference between item;And
The approximate judgement is one in the second judgement item, and the approximate judgement and described first judges changing between item
Displacement changes distance and judges that item and second judges difference between item according to first from being less than scheduled first threshold wherein described
Sexual behavior determines in fact.
24. device as claimed in claim 23, wherein the judgement item and true item extraction unit further include:
Keyword extracting unit, for extracting keyword from the document;
Item recognition unit is judged, for identifying the judgement item from extracted keyword;And
True selecting unit, for selecting remaining keyword as the true item.
25. device as claimed in claim 24, the judgement item recognition unit further includes at least one of lower unit:
For from judging to select keyword as the unit for judging item in item entry domain;
For selecting keyword as the unit for judging item according to scheduled configuration;And
For selecting keyword as the unit for judging item by user.
26. device as claimed in claim 25, wherein for selecting keyword as the list for judging item according to scheduled configuration
Member further include:
For selecting the unit of the keyword in sentence, wherein the judgement and/or objective result of sentence expression subjectivity.
27. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) original judgement distinctiveness fact extraction unit, for second judge that item extracts original judgement difference sexual behavior for each
It is real;
2) newly distinctiveness fact extraction unit is judged, for second judge that item extracts for each and newly judging that distinctiveness is true;
3) change metrics calculation unit, for judging the distinctiveness fact with new using the extracted original judgement distinctiveness fact,
It calculates each and second judges that item and first judges change distance between item;And
4) the first approximate judgement generates unit, judges item apart from be less than scheduled first threshold second for using it to change,
Generate the approximate judgement true with distinctiveness.
28. device as claimed in claim 27, wherein the original judgement distinctiveness fact extraction unit further include:
Target fact item selecting unit, for select one in the first true item as target fact item;
Susceptibility computing unit, for calculating the susceptibility of the target fact item, comprising:
Second group is deleted the target fact item similar to document obtaining unit, for passing through using the first true item, obtains second
The similar document of group;
Third judges item and third fact item extraction unit, for judging that item is different from first from second group similar to extraction in document
Third judge item and third fact item;And
Susceptibility computation subunit, for judging item at second group similar to the distribution and the second judgement in document by using third
Item, similar to the distribution in document, calculates the susceptibility at first group;And
Original judgement distinctiveness fact selecting unit, if being equal to or more than scheduled second threshold for susceptibility calculated
Value selects the target fact item true as the original judgement distinctiveness.
29. device as claimed in claim 28, wherein the new judgement distinctiveness fact extraction unit further include:
Correlation calculations unit, for judging item in second group of similar document with corresponding third by using third fact item
Appearance ratio, calculate the correlation of each third fact item;And
Newly distinctiveness fact selecting unit is judged, if the correlation for third fact item is equal to or more than scheduled third
Threshold value selects third fact item true as the new judgement distinctiveness.
30. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) true metrics calculation unit, for by calculating first group similar to each of document document and document obtained
The distance between, calculate first group similar to each of document document the fact distance, wherein by using two documents it
Between the true items of difference counting, calculate first group similar to the distance between each of document document and obtained document;
2) judge item metrics calculation unit, the fact that for by using first group of calculating similar to each of document document
Distance second judges that item and first judges change distance between item by calculating each, calculates each and second judges item
Judgement item distance, wherein by first group similar to each of document document the fact distance be averaged, calculate every
The one second change distance judged between item and the first judgement;
3) second judges item selecting unit, if judging that the judgement item distance of item is equal to or less than the scheduled 4th for second
Threshold value selects second to judge item as the approximate judgement;And
4) distinctiveness fact extraction unit, for by identifying between the fact that the first true item and the approximation judge not
With fact item, the distinctiveness for extracting the approximate judgement is true.
31. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) candidate approximate judgement generates unit, for literary similar to each of document by the true item of identification first and first group
The true item of difference between the fact that shelves, the candidate approximation with the distinctiveness fact for generating each document are sentenced
It is disconnected;
2) approximate to judge extraction unit, for using the approximate judgement true with distinctiveness of the candidate, extract the tool
The real approximate judgement of sexual behavior of having any different, comprising:
Transfer figure generate unit, for generate transfer figure, wherein each of described transfer figure endpoint node be judge item, and
And each of described transfer figure non-end node is true item;
Candidate approximation judges arrangement unit, for all candidate approximate judgements true with distinctiveness to be arranged in described turn
It moves in figure, wherein each paths for connecting two endpoint nodes in the transfer figure indicate that one candidate has distinctiveness
True approximate judgement;
Importance computing unit, for connecting the connection on each side of any two node in the transfer figure by being recorded in
Frequency calculates the importance on each side in the transfer figure;
Important side recognition unit, its importance is equal to or more than the important side of scheduled 5th threshold value for identification;
Distinctiveness path generates unit, for generating at least one distinctiveness path, wherein the distinctiveness path is by important
Side composition, and the distinctiveness path judges that item is connected to the first judgement item for second;And
Unit is translated, for each distinctiveness path to be translated to the approximate judgement true with distinctiveness.
32. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) candidate difference sexual behavior generates unit in fact, real for generating candidate difference sexual behavior, comprising:
Candidate original judgement distinctiveness fact designating unit, for using the first true item for being different from the second true item is specified to wait
Select original judgement distinctiveness true;
Candidate newly judges distinctiveness fact designating unit, for using the second true item different from the first true item, specifies and waits
Choosing newly judges the distinctiveness fact, wherein the number of the candidate original judgement distinctiveness fact and the candidate number for newly judging the distinctiveness fact
Purpose and be equal to scheduled number;
2) candidate difference sexual behavior reality authentication unit, it is real similar to the candidate difference sexual behavior in document for verifying first group, comprising:
Document identification unit newly judges the distinctiveness fact still not for first group similar in document comprising the candidate for identification
Document comprising the original judgement distinctiveness fact of the candidate, and the judgement item of the document identified is different from the first judgement
?;And
Candidate's difference sexual behavior reality marking unit, if one of the document for identifying judges that item is to concentrate to judge item, by institute
Stating candidate difference sexual behavior, label is in fact, wherein concentrating the document for judging item in all documents identified corresponding to described
In ratio be equal to or more than scheduled 6th threshold value;And
3) the second approximate judgement generates unit, for generating the approximate judgement true with distinctiveness, comprising:
The candidate difference sexual behavior reality selecting unit having verified that, for selecting the candidate difference sexual behavior having verified that be implemented as the difference
Sexual behavior is real;And
Concentration judges item selecting unit, for selecting the concentration to judge item as the approximate judgement.
33. device as claimed in claim 23, wherein the approximation true with distinctiveness judges detection unit further include:
1) change tree obtaining unit, for obtaining the change tree about document obtained, wherein change tree is specific for
The structural data of one group of knowledge information relevant to document obtained, wherein each non-end node is true item, and
And each endpoint node is to judge item;And
2) third approximation judgement generates unit, for changing one of two endpoint nodes in tree by the way that selection link is obtained
Paths generate the approximate judgement true with distinctiveness.
34. the device as described in any one of claim 27 to 33, wherein the approximate judgement true with distinctiveness
Detection unit further include:
Similar distinctiveness fact detection unit, it is true for detecting similar distinctiveness;
Similar distinctiveness fact combining unit, it is true for merging similar distinctiveness;
Approximation judges adjustment unit, for using the distinctiveness of merging true, adjusts the approximate judgement for having distinctiveness true.
35. device as claimed in claim 23 further includes that the first approximation judges display unit, for having by the way that output is all
The approximate judgement true with distinctiveness is presented in the list of the true approximate judgement of distinctiveness.
36. device as claimed in claim 23 further includes that the second approximation judges display unit, for being in by following operation
The existing approximate judgement true with distinctiveness:
It exports it and changes the approximate judgement true with distinctiveness that distance is less than scheduled 7th threshold value, or
Export the approximate judgement true with distinctiveness with minimum change distance of predetermined number.
It further include the of the approximate judgement true with distinctiveness for rendering 37. device as claimed in claim 23
Three approximations judge display unit, further include:
Coverage rate computing unit, for calculating each coverage rate with the true approximate judgement of distinctiveness, wherein described cover
Lid rate is to judge matched document at first group similar to the ratio in document with the approximate of the distinctiveness fact;And
Approximation judges output unit, has distinctiveness true equal to or more than scheduled 8th threshold value for exporting its coverage rate
Approximate judgement, or output predetermined number judges with true approximate of distinctiveness with maximal cover rate.
38. device as claimed in claim 33, further include the 4th it is approximate judge display unit, be used for by with distinctiveness
True approximate judgement exports the change tree together, and the approximate judgement true with distinctiveness is presented.
39. device as claimed in claim 23 further includes true difference display unit, first the fact that judge item for rendering
The fact that between approximation the fact that judge difference, wherein the fact difference causes to judge item to the approximation from first
The variation of judgement.
40. the device as described in any one of claim 27 to 29 further includes indicating unit, for having for each
The true approximate judgement of distinctiveness indicates the sentence for corresponding to the original judgement distinctiveness fact in document obtained,
And indicate that the new judgement distinctiveness in document obtained is true.
41. a kind of device for similar document searching, comprising:
A) Document Creator unit is inputted, for receiving input document;
B) for determining the approximate judgement for having distinctiveness true according to any one in claim 23 to 40
Device, for determining at least one approximate judgement true with distinctiveness of the input document;And
C) similar document obtaining unit, for using at least one described approximation with the distinctiveness fact to judge, described in acquisition
One group of input document is similar to document.
42. device as claimed in claim 41, wherein
The input document is the radiography report for including discovery item and diagnosis item, and the discovery item is selected as the first thing
Real item, and the diagnosis item is selected as the first judgement item.
43. device as claimed in claim 41, wherein
The input document is the shell folder for including user's interested project and Reiseziel project, and user's sense is emerging
The project of interest is selected as the first true item, and the Reiseziel project is selected as the first judgement item.
44. device as claimed in claim 41, wherein
The input document is the product introduction for including product parameters project and product type project, the product parameters project quilt
It is selected as the first true item, and the product type project is selected as the first judgement item.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410587566.9A CN105630788B (en) | 2014-10-28 | 2014-10-28 | Method and apparatus for determining the approximate judgement for having distinctiveness true |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410587566.9A CN105630788B (en) | 2014-10-28 | 2014-10-28 | Method and apparatus for determining the approximate judgement for having distinctiveness true |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105630788A CN105630788A (en) | 2016-06-01 |
CN105630788B true CN105630788B (en) | 2019-05-03 |
Family
ID=56045742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410587566.9A Active CN105630788B (en) | 2014-10-28 | 2014-10-28 | Method and apparatus for determining the approximate judgement for having distinctiveness true |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105630788B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362735B (en) * | 2019-07-15 | 2022-05-13 | 北京百度网讯科技有限公司 | Method and device for judging the authenticity of a statement, electronic device, readable medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101567011A (en) * | 2008-04-22 | 2009-10-28 | 株式会社Ntt都科摩 | Document processing device and document processing method |
CN103294671A (en) * | 2012-02-22 | 2013-09-11 | 腾讯科技(深圳)有限公司 | Document detection method and system |
CN103903164A (en) * | 2014-03-25 | 2014-07-02 | 华南理工大学 | Semi-supervised automatic aspect extraction method and system based on domain information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007219880A (en) * | 2006-02-17 | 2007-08-30 | Fujitsu Ltd | Reputation information processing program, method, and apparatus |
-
2014
- 2014-10-28 CN CN201410587566.9A patent/CN105630788B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101567011A (en) * | 2008-04-22 | 2009-10-28 | 株式会社Ntt都科摩 | Document processing device and document processing method |
CN103294671A (en) * | 2012-02-22 | 2013-09-11 | 腾讯科技(深圳)有限公司 | Document detection method and system |
CN103903164A (en) * | 2014-03-25 | 2014-07-02 | 华南理工大学 | Semi-supervised automatic aspect extraction method and system based on domain information |
Also Published As
Publication number | Publication date |
---|---|
CN105630788A (en) | 2016-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Deepfashion: Powering robust clothes recognition and retrieval with rich annotations | |
CN105760495B (en) | A kind of knowledge based map carries out exploratory searching method for bug problem | |
Chuang et al. | Topic model diagnostics: Assessing domain relevance via topical alignment | |
CN104573130B (en) | The entity resolution method and device calculated based on colony | |
Chou et al. | PaperVis: Literature review made easy | |
CN101566997A (en) | Determining words related to given set of words | |
CN101223525A (en) | Relationship networks | |
Laenen et al. | Web search of fashion items with multimodal querying | |
JP2015532495A (en) | System and method for presenting and navigating network data sets | |
Strötgen et al. | TimeTrails: a system for exploring spatio-temporal information in documents | |
CN106095738A (en) | Recommendation tables single slice | |
CN112966091A (en) | Knowledge graph recommendation system fusing entity information and heat | |
Zigkolis et al. | Collaborative event annotation in tagged photo collections | |
Li et al. | Attribute-aware explainable complementary clothing recommendation | |
Yang et al. | Managing discoveries in the visual analytics process | |
JPWO2010013472A1 (en) | Data classification system, data classification method, and data classification program | |
CN105630788B (en) | Method and apparatus for determining the approximate judgement for having distinctiveness true | |
Villaespesa et al. | A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection | |
KR20190023503A (en) | Image based patent search apparatus | |
JP5117589B2 (en) | Document analysis apparatus and program | |
Nguyen et al. | Social tagging analytics for processing unlabeled resources: A case study on non-geotagged photos | |
JP2014102625A (en) | Information retrieval system, program, and method | |
Jayashree et al. | Multimodal web page segmentation using self-organized multi-objective clustering | |
Yoon et al. | A conference paper exploring system based on citing motivation and topic | |
Pocco et al. | DRIFT: A visual analytic tool for scientific literature exploration based on textual and image content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |