WO2017072822A1 - Relevance evaluation system and method, program, and recording medium - Google Patents

Relevance evaluation system and method, program, and recording medium Download PDF

Info

Publication number
WO2017072822A1
WO2017072822A1 PCT/JP2015/005479 JP2015005479W WO2017072822A1 WO 2017072822 A1 WO2017072822 A1 WO 2017072822A1 JP 2015005479 W JP2015005479 W JP 2015005479W WO 2017072822 A1 WO2017072822 A1 WO 2017072822A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
evaluation
test data
relevance
components
Prior art date
Application number
PCT/JP2015/005479
Other languages
French (fr)
Japanese (ja)
Inventor
秀樹 武田
和巳 蓮子
Original Assignee
株式会社Ubic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Ubic filed Critical 株式会社Ubic
Priority to PCT/JP2015/005479 priority Critical patent/WO2017072822A1/en
Priority to JP2017547201A priority patent/JPWO2017072822A1/en
Publication of WO2017072822A1 publication Critical patent/WO2017072822A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to a relevance evaluation system, method, program for evaluating relevance between data, and a recording medium storing it.
  • a data aggregate (hereinafter simply referred to as “data”) composed of many data components (for example, “word” in the case of document data) always has a characteristic in its contents.
  • data having a large number of data components to be configured, it may be necessary to objectively evaluate the characteristics of the data without comparing the details in detail.
  • a method of calculating a characteristic value representing similarity in each piece of data and comparing the similarity of the data there is a method of calculating a characteristic value representing similarity in each piece of data and comparing the similarity of the data.
  • Patent Document 1 discloses an example of similar document search.
  • a feature word characterizing the description content is extracted from a document set made up of a large number of documents in advance, and a set of feature words is created.
  • a feature vector from a data component serving as a reference is calculated and stored for the feature word.
  • the similarity is calculated by comparing with the feature word, and it is determined that the document having the most similar score value is the closest to the input document.
  • score value the degree of similarity
  • the type of data is not limited to document data as disclosed in Patent Document 1, and data having various types of morphemes such as image data and audio data as data components can be considered. Therefore, an index that causes a difference in the degree of relevance of the data with respect to the reference data is obtained by a simple method.
  • a relevance evaluation system that evaluates relevance of test data with respect to reference data, the relevance evaluation system including a data acquisition unit that acquires the reference data and the test data, respectively, and the reference data Of the data components, an evaluation component that represents the characteristics of the reference data is extracted from the test data in the order of appearance according to the arrangement direction of the data components of the test data;
  • the relevance evaluation system includes a relevance evaluation unit that calculates a feature coefficient based on the appearance order of the evaluation components of the test data in the arrangement direction of the test data.
  • a method for evaluating the relevance between reference data and test data by a relevance evaluation system comprising a computer, wherein the reference data and the test data are respectively acquired, and the data components of the reference data Among them, the evaluation component representing the characteristics of the reference data is extracted from the test data in the order of appearance according to the arrangement direction of the data component of the test data, and the test data in the alignment direction of the test data is extracted.
  • a relevance evaluation method for calculating a feature coefficient based on the appearance order of the evaluation components of the inspection data.
  • a relevance evaluation program that can be executed in a relevance evaluation system comprising a computer, the program evaluating relevance between reference data and test data, and the program includes the reference data and the subject data.
  • Each of the test data and the evaluation component representing the characteristics of the reference data among the data components of the reference data in the arrangement direction of the data components of the test data from the test data
  • the relevance evaluation program executes the step of extracting in the order of appearance, and the step of calculating the feature coefficient based on the order of appearance of the evaluation components of the test data in the arrangement direction of the test data.
  • a storage medium that is executable in a relevance evaluation system including a computer and stores a relevance evaluation program for evaluating relevance between reference data and test data, the program including the reference data and Obtaining each of the test data, and an evaluation component representing a characteristic of the reference data among the data components of the reference data, the arrangement of the data components of the test data from the test data
  • a storage medium that performs the steps of extracting in order of appearance according to the direction and calculating the feature coefficient based on the order of appearance of the evaluation components of the test data in the alignment direction of the test data is solved by a storage medium .
  • the present invention makes it possible to select data closest to the reference data for two or more data.
  • FIG. 5 is a conceptual diagram showing reference data R.
  • FIG. 3 is a conceptual diagram showing test data T. It is the figure which showed contrast with the evaluation component of the reference data R which considered the order of appearance, and the evaluation component of the test data T which considered the order of appearance.
  • FIG. 1 is an example of a hardware configuration of the system 1.
  • the system 1 includes a server device 10 and a client terminal 11.
  • the server device 10 includes an arithmetic device 10a that performs calculation and a storage device 10b that stores data.
  • the server device 10 can execute main processing of data analysis.
  • the client terminal 11 can execute a data analysis related process in the server device 10.
  • the storage device 10b is, for example, any recording medium (for example, a memory or a hard disk) that can store data (including digital data and analog data).
  • the arithmetic device 10a is a controller (for example, a central processing unit (CPU)) that can execute a control program stored in a recording medium.
  • the computing device 10a is a computer or a computer system (a system that realizes data analysis by operating a plurality of computers in an integrated manner) that analyzes data stored at least temporarily in a recording medium.
  • the computing device 10a may be configured as a management computer (not shown) in the form of an external device of the server device 10, and the storage device 10b is configured as the data storage server device 13 of the external storage device of the server device 10. You may make it comprise with a form.
  • the management computer may include, for example, a memory, a controller, a bus, an input / output interface, and a communication interface.
  • application programs that can control the respective devices of the client terminal 11, the server device 10, and the management computer (not shown) are stored in the memory provided in each of the client terminal 11, the server device 10, and the management computer (not shown). Yes.
  • the application program software resource
  • the hardware resource cooperate to operate each device.
  • the storage device 10b is composed of, for example, a disk array system, and can include a database that records data and results of evaluation / classification of the data.
  • the server device 10 and the storage device 10b are connected by a direct connection method (DAS) or a storage device area network (SAN).
  • DAS direct connection method
  • SAN storage device area network
  • the client terminal 11 presents data in the middle of the processing process in the server device 10 to the user. As a result, the user can input, that is, provide classification information through bidirectional exchange via the client terminal 11.
  • the client terminal 11 includes, for example, a memory, a controller, a bus, an input / output interface (for example, a keyboard and a display), and a communication interface (communication means using a predetermined network). For communication).
  • the client terminal 11 may be configured to include an input device 12 such as a scanner.
  • the hardware configuration shown in FIG. 1 is merely an example, and the system 1 can be realized by other hardware configurations.
  • a configuration in which part or all of all the processes are executed in the server device 10 may be used, or a part or all of the processing may be executed in the client terminal 11.
  • the input device 12 is connected to the client terminal 11 and can transmit to the server device 10.
  • the input device 12 directly connects to the server device 10 and inputs data to the server from here. May be. It will be understood by those skilled in the art that there are various hardware configurations that can implement the system 1, and the configuration is not limited to the configuration illustrated in FIG. 1, for example.
  • FIG. 2 is a diagram illustrating the reference data R and the test data T that are comparison targets of the relevance in the present invention.
  • test data T1 and test data T2 are highly related to the reference data R. It is.
  • the feature coefficient is calculated as an index for evaluating the relevance, thereby evaluating the high relevance.
  • Both the test data T1, T2 and the reference data R are aggregates of data components.
  • test data T1 and T2 are composed of a plurality of unit data t1
  • the test data T2 is composed of a plurality of data components t2
  • the reference data R is composed of a plurality of data components r.
  • the data type of the test data T and the reference data R is not particularly limited. It may be document data, or any data aggregate, such as image data and audio data, as long as it is an aggregate of unit data.
  • the data components include morphemes, keywords, sentences, paragraphs, and / or metadata (for example, header information of an e-mail) constituting a document, partial voices constituting the voice, and volume (gain) information. And / or timbre information, partial image, partial pixel, and / or luminance information constituting an image, frame image, motion information, and / or three-dimensional information constituting a video.
  • test data T and the reference data R are assumed to be document data
  • the data component is text data having a typical word or phrase constituting it as a representative example.
  • the types of data are typically the same, but they are not necessarily the same.
  • the reference data R is document data and the data component is a word
  • the test data T is speech data
  • a comparison is made between the data component as characters and the word data as speech.
  • the degree of relevance can be evaluated.
  • FIG. 3 is a diagram showing data components.
  • the arrangement direction of data constituent elements constituting the reference data R is defined.
  • the arrangement direction necessary for evaluating the contents of the reference data R is determined.
  • the arrangement direction of the data components is determined from the left to the right.
  • the rightmost data component the leftmost data in the line one level down is assigned, and from that position
  • the alignment direction is determined so as to go to the right.
  • the order of character strings is the arrangement direction.
  • the arrangement direction most appropriate for evaluation is determined.
  • a plurality of data constituent elements that best represent the characteristics of the content of the reference data R are used as evaluation constituent elements, and appear in accordance with the arrangement direction of the predefined unit data. Extract sequentially. In the example shown in FIG. 4, five evaluation components m1, m2, m3, m4, and m5 are selected. The selection of evaluation components and the order of their appearance are selected so as to most accurately represent the characteristics of the content of the reference data R.
  • the evaluation components m1, m2, m3, m4, and m5 of the reference data R and their appearance order function as predetermined criteria for evaluating the relevance of the test data T.
  • FIG. 4 is a diagram showing the test data T.
  • the arrangement direction of the data components constituting the test data T is determined.
  • An arrangement direction necessary for evaluating the contents of the test data T is determined.
  • the arrangement direction of the data components is determined from the left to the right.
  • the rightmost data component the leftmost data in the line one level down is displayed. It is assigned and the arrangement direction is determined so as to go to the right from the position.
  • the evaluation components m1, m2, m3, m4, and m5 previously defined in the reference data R are detected in the order of appearance.
  • the data components m1, m4, m3, and m2 of the test data T corresponding to the evaluation components are detected in the order of appearance.
  • the data component of the test data T corresponding to the evaluation component m5 is not detected. That is, among the five evaluation components m1, m2, m3, m4, and m5 previously defined in the reference data R, the data components m1, m2, m3, and m4 are extracted from the test data T and their appearance
  • the order is m1, m4, m3, m2.
  • FIG. 5 shows evaluation components m1, m2, m3, m4, and m5 (upper side in FIG. 5) of the reference data R in consideration of the appearance order, and evaluation components m1, m4, and m3 of the test data T in consideration of the appearance order. , M2 (lower side in FIG. 5).
  • a characteristic coefficient (Order) which is an index indicating the degree of relevance is defined as follows.
  • the characteristic coefficient (Order) is the value of “the two combinations selected from the evaluation components detected in the test data T” with respect to “the number of combinations for selecting two from the evaluation components detected in the test data T”.
  • the ratio is “the same number of combinations as the order of appearance of the evaluation components of the reference data R”. That is, in the denominator, when the number of evaluation components detected in the test data T is N, the number of combinations of two evaluation components among the evaluation components detected in the test data T is N (N -1) / 2.
  • N N -1 / 2.
  • FIGS. 4 and 5 since four evaluation components m1, m2, m3, and m4 are detected in the test data T, there are six patterns. Specifically, it is a combination of (m1, m2), (m1, m3), (m1, m4), (m2, m3), (m2, m4), (m3, m4).
  • the numerator calculates the number of the combinations in which the appearance order of the evaluation components of the reference data R is the same among the two combinations selected from the evaluation components detected in the test data T out of the total number of combinations. To do. Here, only the order of appearance is considered, and the appearance of another constituent element between constituent elements is not considered as an evaluation target. In the examples of FIGS. 4 and 5, there are three combinations (m1, m2), (m1, m3), and (m1, m4) that have the same order of appearance as the reference data R among the above combinations. The presence of m4 between m1 and m3 is not subject to evaluation. Therefore, in this case, the characteristic coefficient (Order) is 0.5.
  • T (N) / F (N) 1. 0. That is, the more evaluation components appear in the test data T in the same order as the reference data R, the higher the relationship between the test data T and the reference data R and the characteristic coefficient (Order) is close to 1. Become. On the other hand, when the relationship between the test data T and the reference data R is low, the characteristic coefficient (Order) is close to zero.
  • the characteristic coefficient (Order) satisfies 0 ⁇ feature coefficient (Order) ⁇ 1.
  • FIG. 6 is a diagram illustrating an example of a functional block configuration of the system 1.
  • the system 1 includes, for example, a reference data acquisition unit 21, a test data acquisition unit 22, an arrangement direction determination unit 23, an evaluation component extraction unit 24, a component storage unit 25, and a component relevance evaluation unit 26.
  • a route from the reference data acquisition unit 21 to the component storage unit 25 via the arrangement direction determination unit 23 and the evaluation component extraction unit 24 is a learning process for the reference data R.
  • the route from the test data acquisition unit 22 to the component relationship evaluation unit 26 via the alignment direction determination unit 23 and the evaluation component extraction unit 24 is related to the reference data R with respect to the test data T. This is a sex assessment process.
  • the reference data acquisition unit 21 acquires the reference data input from the input device 12 or the client terminal 11 or all data components constituting the reference data R already stored in the storage device 10b.
  • the reference data acquisition unit 21 and the test data acquisition unit 22 acquire all the data components, they output the data to the alignment direction determination unit 23, determine the alignment direction of these data components, and configure the data configuration Associate elements. All the data components associated with the arrangement direction are output to the evaluation component extraction unit 24.
  • the determination of the arrangement direction may be omitted depending on the data by using the data arrangement direction when the data is acquired in the reference data acquisition unit 21 and the test data acquisition unit 22 as they are. In this case, the arrangement direction determination unit 23 becomes unnecessary. Further, the determination of the alignment direction may be performed by the reference data acquisition unit 21 and the test data acquisition unit 22 or may be performed by the evaluation component extraction unit 24.
  • the evaluation component extraction unit 24 extracts a component group that most representatively represents the content feature of the reference data R.
  • the user can select a component group using the client terminal 11.
  • the “component group” is a group of data components.
  • the “component group” selected by the evaluation component extraction unit 24 is output to the component storage unit 25.
  • the component storage unit 25 stores the “component group” in the storage device 10 b or the data storage server device 13.
  • the evaluation component extraction unit 24 extracts the evaluation components m1, m2, m3, m4, and m5 from the data components that constitute the reference data R for which the arrangement direction is determined.
  • the number of evaluation components extracted by the evaluation component extraction unit 24 is arbitrarily determined according to the characteristics of the reference data R.
  • the evaluation component extraction unit 24 outputs the extracted evaluation components m1, m2, m3, m4, and m5 to the component storage unit 25.
  • the component storage unit 25 stores in the storage device 10b or the data storage server device 13. The above is the learning process of relevance evaluation.
  • the relevance evaluation process for the test data T with respect to the reference data R will be described.
  • the above description of the arrangement direction determination unit 23 and the evaluation component extraction unit 24 functions similarly in the evaluation process of the relevance evaluation for the test data T with respect to the reference data R. That is, as shown in FIG. 6, similarly to the reference data acquisition unit 21, the test data acquisition unit 22 is also stored in the test data T input from the input device 12 or the client terminal 11 or the storage device 10b. All the data components constituting the test data T being acquired are acquired.
  • the test data acquisition unit 22 When the test data acquisition unit 22 acquires all the data components, the test data acquisition unit 22 outputs the data to the arrangement direction determination unit 23.
  • the reference data acquisition unit 21 and the test data acquisition unit 22 do not need to be configured separately, and can be the same data acquisition unit.
  • the arrangement direction determination unit 23 determines the arrangement direction and associates the data components. All the data components associated with the arrangement direction are output to the evaluation component extraction unit 24.
  • the evaluation component extraction unit 24 stores the evaluation component stored in the storage device 10b or the data storage server device 13 in the arrangement direction. Extract from all data components of the associated test data T. Not all evaluation components are extracted. Among the data components of the test data T, those corresponding to the evaluation components selected in the learning process in the reference data R are extracted in the order of appearance. In the example of FIG.
  • the evaluation component extraction unit 24 extracts evaluation components in the order of appearance of m1, m4, m3, and m2 according to the arrangement direction.
  • the extracted evaluation components m1, m4, m3, and m2 are output to the component relationship evaluation unit 26.
  • the component relevance evaluation unit 26 calculates the characteristic coefficient (Order) described above.
  • the component relevance evaluation unit 26 reads an evaluation value associated with the component input from the evaluation component extraction unit 24 from an arbitrary memory (for example, the storage device 10b), and based on the evaluation value Evaluate the target data.
  • the evaluation value is a weighting value that is set in advance for each evaluation component selected in the reference data R in accordance with their characteristics. More specifically, the component relevance evaluation unit 26 adds, for example, an evaluation value associated with a component that constitutes at least a part of the target data, for example, an index of the target data (for example, target Numerical values, letters, and / or symbols that make the data orderable can be derived. As this index, for example, a score value can be used.
  • the score value (Score) is an index for quantitatively evaluating the strength of relevance of the test data T with respect to the data components of the reference data R.
  • the calculation method of the score value (Score) is not limited.
  • the score value may be calculated by a general method as long as the content of the reference data R can be appropriately evaluated. For example, as an example, with respect to the evaluation value of the evaluation component defined for each evaluation component extracted in the reference data R, the frequency of the evaluation component appearing in the test data T is expressed by the following equation: Can be represented.
  • the component relevance evaluation unit 26 can associate the test data T with the score value and store both in the storage device 10b.
  • reference data R is fetched (S101).
  • the arrangement direction of the data components is determined for the read reference data R (S102).
  • a plurality of data components that best represents the characteristics of the content of the reference data R among the data components are displayed together with the appearance order according to the predefined arrangement direction.
  • an evaluation component group for relevance evaluation S103.
  • the extracted evaluation component group and its appearance order data are stored in the storage device 10b (S104). The above is the learning process using the reference data R for relevance evaluation.
  • test data T is fetched (S105).
  • arrangement direction of the data components constituting the test data T is determined (S106).
  • Evaluation components for relevance evaluation that have been determined in advance in the learning process are extracted from the test data T for which the arrangement direction has been determined (S107).
  • the evaluation components in the extracted test data T are extracted in the same order of appearance in the reference data R (S108).
  • a feature coefficient based on the appearance order of the evaluation components of the test data in the arrangement direction of the test data is calculated.
  • the feature coefficient calculates the degree of coincidence of the appearance order of the selected two combinations among the evaluation constituent elements of the extracted test data T with the appearance order of the evaluation constituent elements of the reference data R defined in advance.
  • the score value (Score) alone has a high relevance to the reference data R.
  • the characteristic coefficient (Order) it can be determined that the larger the characteristic coefficient is, the higher the relevance with the reference data R is. For example, in the case of FIG. 2, when the test data T1 and the test data T2 both have a score value of 70 with respect to the reference data R, the characteristic coefficients (Order of the test data T1 and the test data T2 with respect to the reference data R) ) Are 0.6 and 0.8, respectively, it can be determined that the test data T2 is more relevant to the reference data R.
  • a distribution diagram in which one axis is assigned to the score value and the other axis is assigned to the feature coefficient (Order) Is displayed on a display means such as a display or a printer, and information that allows the user to easily determine the relevance of the test data T to the reference data R is provided to the user by using two elements, “score value” and “feature coefficient”. It is also possible to make it.
  • FIG. 8 shows the algorithm of the program according to the second embodiment.
  • the component relationship evaluation unit 26 in the functional block in FIG. 6 only calculates the feature coefficient (Order).
  • the component relevance evaluation unit 26 calculates the feature coefficient (Order) as a correction value of the score value for the test data T calculated in advance.
  • FIG. 8 shows a program algorithm according to the second embodiment.
  • the steps from the step of taking in the reference data R (S201) to the step of calculating the characteristic coefficient (Order) (S209) are the same as the steps S101 to S109 of the first embodiment.
  • the component relevance evaluation unit 26 calculates the score value (Score RAW ) calculated in advance for the test data T as described below after calculating the feature coefficient (Order). (S210).
  • the score value may be calculated by a general method as long as the content of the reference data R can be appropriately evaluated.
  • the score values are different, making comparison difficult.
  • the characteristic coefficient (Order) is large, the score values are different, making comparison difficult.
  • the score value corrected by the feature coefficient it can be determined that the larger the corrected score value is, the higher the relevance with the reference data R is.
  • the score values (Score RAW ) of the test data T1 and the test data T2 with respect to the reference data R are 72 and 71, respectively
  • the test data T1 and the test data T2 with respect to the reference data R If the feature coefficient (Order) is 0.65 and 0.67 respectively, the score values corrected by the feature coefficient are 45.5 and 46.9, respectively.
  • the score value is higher in the test data T2, it is possible to determine that the test data T2 is more relevant to the reference data R.
  • the test data T1 and the score value (Score RAW ) of the test data T2 with respect to the reference data R are calculated separately from the feature coefficient (Order). That is, this is a form that can be used when the evaluation component group for calculating the score value and the feature coefficient (Order) are different.
  • the calculation of the score value and the calculation of the characteristic coefficient are carried out by a series of processes using a common evaluation component determined in advance by the reference data R.
  • FIG. 9 is a diagram illustrating an example of a functional block configuration of the system 1 according to the third embodiment.
  • the system 1 includes a reference data acquisition unit 21, a test data acquisition unit 22, an arrangement direction determination unit 23, an evaluation component extraction unit 24, and a component storage unit 25. Since these are the same as those in the first embodiment, description thereof is omitted.
  • the third embodiment further includes a component relevance evaluation unit 26, a score value calculation unit 27, and a score value correction unit 28.
  • FIG. 9 is a diagram illustrating an example of a functional block configuration of the system 1 according to the third embodiment.
  • the system 1 includes a reference data acquisition unit 21, a test data acquisition unit 22, an arrangement direction determination unit 23, an evaluation component extraction unit 24, and a component storage unit 25. Since these are the same as those in the first embodiment, only different parts will be described.
  • the third embodiment further includes a component relevance evaluation unit 26, a score value calculation unit 27, and a score value correction unit 28.
  • the evaluation component extraction unit 24 extracts an evaluation component group that most appropriately represents the content of the reference data R, and classifies it into N groups.
  • the score value calculation unit 27 calculates a score value (Score (i) RAW ) for each of the N groups.
  • the score value may be calculated by a general method as long as the content of the reference data R can be appropriately evaluated.
  • the component relevance evaluation unit 26, for each group of evaluation component groups is a feature coefficient (Order) that is the ratio of the same order of appearance as the reference data R in the two combinations selected by the method in the first embodiment. Calculate The calculation method of the characteristic coefficient (Order) is as described in the first embodiment. Then, the score value correcting unit 28 multiplies the score value (Score (i) RAW ) and the feature coefficient (Order) for each group, and calculates the sum as follows.
  • FIG. 10 shows an algorithm in the third embodiment.
  • reference data R is fetched (S301).
  • the arrangement direction of the data components is determined for the read reference data R (S302).
  • a plurality of data components that best represents the characteristics of the content of the reference data R among the data components are displayed together with the appearance order according to the predefined arrangement direction. Extract and define as an evaluation component for relevance evaluation.
  • the evaluation component group is classified into N groups (S303).
  • the extracted evaluation components and their appearance order data are stored in the storage device 10b (S304). The above is the learning process using the reference data R for relevance evaluation.
  • test data is fetched (S305).
  • arrangement direction of the data components constituting the test data T is determined (S306).
  • Evaluation components for relevance evaluation that have been determined in advance in the learning process are extracted from the test data T for which the arrangement direction has been determined (S307).
  • a score value (Score (i) RAW ) is calculated for each of the N groups of evaluation components (S308).
  • Score (i) RAW is calculated for each of the N groups of evaluation components.
  • the same order of appearance in the reference data R is extracted.
  • the degree of coincidence between the appearance order of the two selected combinations with the appearance order of the evaluation component group of the reference data R defined in advance is acquired.
  • the degree of match can be, for example, the feature factor (Order) as a match rate.
  • Order the feature factor
  • the characteristic coefficient (Order) (frequency at which 1 is assigned) / (total number of two combinations) is calculated (S310).
  • the score value (Score (i) RAW ) is multiplied by the feature coefficient (Order), and the sum is calculated (S311).
  • the score value (Score (i) RAW ) and the feature coefficient (Order) are calculated by the same evaluation component group, the calculation process is simplified and the score value is easily calculated.
  • the determination based on the corrected score value is the same as in the first and second embodiments.
  • the control block of the data analysis system may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software using a CPU.
  • the system includes a CPU that executes a program (control program for the data analysis system) that is software that implements each function, and a ROM (in which the program and various data are recorded so as to be readable by the computer (or CPU)).
  • a Read Only Memory or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for developing the program, and the like are provided.
  • the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission. Note that the above program can be implemented in any programming language. Also, any recording medium that records the above program falls within the scope of the present invention.
  • Such systems include, for example, discovery support systems, forensic systems, e-mail monitoring systems, medical application systems (eg, pharmacovigilance support systems, clinical trial efficiency systems, medical risk hedging systems, fall prediction (fall prevention) systems, prognosis predictions) System, diagnosis support system, etc.), Internet application system (eg, smart mail system, information aggregation (curation) system, user monitoring system, social media management system, etc.), information leakage detection system, project evaluation system, marketing support system, Artificial intelligence systems that analyze big data, such as intellectual property evaluation systems, fraud monitoring systems, call center escalation systems, credit check systems The relevance of a given cases may be implemented as any system) can be evaluated.
  • medical application systems eg, pharmacovigilance support systems, clinical trial efficiency systems, medical risk hedging systems, fall prediction (fall prevention) systems, prognosis predictions) System, diagnosis support system, etc.
  • Internet application system eg, smart mail system, information aggregation (curation) system, user monitoring system, social media management system,
  • preprocessing for example, extracting an important part from the data and extracting only the important part from the data
  • the analysis target may be applied), or the mode of displaying the data analysis result may be changed. It will be understood by those skilled in the art that a variety of such variations can exist, and all variations fall within the scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This relevance evaluation system is configured to evaluate relevance of examined data to reference data, and comprises: a data acquisition unit for acquiring each of reference data and examined data; an evaluation component extraction unit for extracting, from the examined data, evaluation components representing features of the reference data among data components of the reference data in the order of appearance corresponding to an arrangement direction of data components of the examined data; and a relevance evaluation unit for calculating a feature coefficient based on the order of appearance of the evaluation components of the examined data in the arrangement direction of the examined data. Even when a plurality of examined data groups show no difference in score values representing features, it is possible to recognize an examined data group that has a high degree of relevance to a reference data group.

Description

関連性評価システム、方法、プログラムおよび記録媒体Relevance evaluation system, method, program, and recording medium
 この発明は、データ間の関連性を評価する関連性評価システム、方法、プログラムおよびそれを格納した記録媒体に関する。 The present invention relates to a relevance evaluation system, method, program for evaluating relevance between data, and a recording medium storing it.
 たとえば、多くのデータ構成要素(たとえば、文書データの場合には「単語」等)で構成されるデータ集合体(以下、単に「データ」)は、その内容には必ず特徴を有している。構成されるデータ構成要素の個数が多数に及ぶデータにおいて、その内部を詳細に比較することなく、その特徴を客観的に評価することが必要となる場合がある。このような方法として、データのそれぞれにおいて、類似性を表す特性値を計算し、そのデータの類似度を比較する方法がある。 For example, a data aggregate (hereinafter simply referred to as “data”) composed of many data components (for example, “word” in the case of document data) always has a characteristic in its contents. In data having a large number of data components to be configured, it may be necessary to objectively evaluate the characteristics of the data without comparing the details in detail. As such a method, there is a method of calculating a characteristic value representing similarity in each piece of data and comparing the similarity of the data.
 たとえば、この方法の例として、特許文献1は、類似文書検索の例を開示している。ここでは、あらかじめ多数の文書からなる文書集合において、記載内容を特徴づける特徴語を抽出し、特徴語の集合を作成する。そして、文書集合を構成する各文書に対して、特徴語について、基準となるデータ構成要素からの特徴ベクトルを算出し、格納しておく。続いて、入力文書において、特徴語との対比を行って類似度を計算し、スコア値の最も類似する文書が入力文書と最も近似するものと判断している。このように、類似文書検索の例では、基準となるデータに基づいて計算される類似度(以下、「スコア値」)を計算して、類似の程度を判定することが一般的である。特許文献1では、類似検索における類似度の程度の判定精度を上げるために、文法上の観点から重みづけを行っている。 For example, as an example of this method, Patent Document 1 discloses an example of similar document search. Here, a feature word characterizing the description content is extracted from a document set made up of a large number of documents in advance, and a set of feature words is created. Then, for each document constituting the document set, a feature vector from a data component serving as a reference is calculated and stored for the feature word. Subsequently, in the input document, the similarity is calculated by comparing with the feature word, and it is determined that the document having the most similar score value is the closest to the input document. As described above, in the example of similar document search, it is common to determine the degree of similarity by calculating the degree of similarity (hereinafter referred to as “score value”) calculated based on reference data. In Patent Document 1, weighting is performed from a grammatical point of view in order to increase the accuracy of determination of the degree of similarity in similarity search.
特開2014-106665号公報JP 2014-106665 A
 特許文献1に開示された方法のように、基準となるデータに対する複数のデータの関連性の高さを調べる一般的な手法では、同一のスコア値を有する複数のデータが発見された場合には、その複数のデータにおいて、どのデータが最も基準データと関連性が高いかについて、優劣を決定づけることができない。そのため、従来、データにおいて、関連度の高さを判定するためには、一般的に、データに対してスコア値の計算精度を向上させことにより関連度の高さの判定の精度を向上させることが一般的であった。 When a plurality of data having the same score value is found in a general method for examining the degree of relevance of a plurality of data with respect to reference data, as in the method disclosed in Patent Document 1, The superiority or inferiority cannot be determined as to which data is most relevant to the reference data among the plurality of data. Therefore, conventionally, in order to determine the high degree of relevance in data, generally, the accuracy of determining the high degree of relevance is improved by improving the calculation accuracy of the score value for the data. Was common.
 しかし、データの種類は、特許文書1に開示されたように文書データに限られるものではなく、画像データ、音声データなど様々な種類の形態素をデータ構成要素とするデータが考えられる。したがって、簡単な手法により、基準データに対するデータの関連性の度合いに差異を生じさせる指標が求められる。 However, the type of data is not limited to document data as disclosed in Patent Document 1, and data having various types of morphemes such as image data and audio data as data components can be considered. Therefore, an index that causes a difference in the degree of relevance of the data with respect to the reference data is obtained by a simple method.
 基準データに対する被検データの関連性を評価する関連性評価システムであって、その関連性評価システムは、前記基準データと前記被検データとをそれぞれ取得するデータ取得部と、前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データのデータ構成要素の並び方向にしたがった出現順に、抽出する評価構成要素抽出部と、前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する関連性評価部と、を備える関連性評価システムにより解決する。 A relevance evaluation system that evaluates relevance of test data with respect to reference data, the relevance evaluation system including a data acquisition unit that acquires the reference data and the test data, respectively, and the reference data Of the data components, an evaluation component that represents the characteristics of the reference data is extracted from the test data in the order of appearance according to the arrangement direction of the data components of the test data; The relevance evaluation system includes a relevance evaluation unit that calculates a feature coefficient based on the appearance order of the evaluation components of the test data in the arrangement direction of the test data.
 コンピュータを備える関連性評価システムにより、基準データと被検データとの関連性を評価する方法であって、前記基準データと前記被検データとをそれぞれ取得し、前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データの前記データ構成要素の並び方向にしたがった出現順に、抽出し、前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する関連性評価方法により解決する。 A method for evaluating the relevance between reference data and test data by a relevance evaluation system comprising a computer, wherein the reference data and the test data are respectively acquired, and the data components of the reference data Among them, the evaluation component representing the characteristics of the reference data is extracted from the test data in the order of appearance according to the arrangement direction of the data component of the test data, and the test data in the alignment direction of the test data is extracted. This is solved by a relevance evaluation method for calculating a feature coefficient based on the appearance order of the evaluation components of the inspection data.
 コンピュータを備える関連性評価システムにおいて実行可能な関連性評価プログラムであって、そのプログラムは基準データと被検データとの関連性を評価するものであって、前記プログラムは、前記基準データと前記被検データとをそれぞれ取得する工程と、前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データの前記データ構成要素の並び方向にしたがった出現順に、抽出する工程と、前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する工程と、を実行する関連性評価プログラムにより解決する。 A relevance evaluation program that can be executed in a relevance evaluation system comprising a computer, the program evaluating relevance between reference data and test data, and the program includes the reference data and the subject data. Each of the test data and the evaluation component representing the characteristics of the reference data among the data components of the reference data in the arrangement direction of the data components of the test data from the test data Accordingly, the relevance evaluation program executes the step of extracting in the order of appearance, and the step of calculating the feature coefficient based on the order of appearance of the evaluation components of the test data in the arrangement direction of the test data. .
 コンピュータを備える関連性評価システムにおいて実行可能であって、基準データと被検データとの関連性を評価する関連性評価プログラムが格納されている記憶媒体であって、前記プログラムは、前記基準データと前記被検データとをそれぞれ取得する工程と、前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データの前記データ構成要素の並び方向にしたがった出現順に、抽出する工程と、前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する工程と、を実行する記憶媒体により解決する。 A storage medium that is executable in a relevance evaluation system including a computer and stores a relevance evaluation program for evaluating relevance between reference data and test data, the program including the reference data and Obtaining each of the test data, and an evaluation component representing a characteristic of the reference data among the data components of the reference data, the arrangement of the data components of the test data from the test data A storage medium that performs the steps of extracting in order of appearance according to the direction and calculating the feature coefficient based on the order of appearance of the evaluation components of the test data in the alignment direction of the test data is solved by a storage medium .
 本発明により、二以上のデータに対して、基準データに最も近いデータを選定することが可能となる。 The present invention makes it possible to select data closest to the reference data for two or more data.
本発明の構成要素関連性評価システム1のハードウェア構成の図である。It is a figure of the hardware constitutions of the component relevance evaluation system 1 of this invention. 本発明における関連性の比較対象となる基準データRと被検データTを説明した図である。It is a figure explaining the reference data R and the test data T used as the comparison object of the relevance in this invention. 基準データRを示した概念図である。FIG. 5 is a conceptual diagram showing reference data R. 被検データTを示した概念図である。FIG. 3 is a conceptual diagram showing test data T. 出現順を考慮した基準データRの評価構成要素と、出現順を考慮した被検データTの評価構成要素との対比を示した図である。It is the figure which showed contrast with the evaluation component of the reference data R which considered the order of appearance, and the evaluation component of the test data T which considered the order of appearance. 本発明の実施の形態1の構成要素関連性評価システム1の機能ブロック図である。It is a functional block diagram of component relevance evaluation system 1 of Embodiment 1 of the present invention. 本発明の実施の形態1のプログラムのアルゴリズムを示した図である。It is the figure which showed the algorithm of the program of Embodiment 1 of this invention. 本発明の実施の形態2のプログラムのアルゴリズムを示した図である。It is the figure which showed the algorithm of the program of Embodiment 2 of this invention. 本発明の実施の形態3の構成要素関連性評価システム1の機能ブロック図である。It is a functional block diagram of the component relationship evaluation system 1 of Embodiment 3 of this invention. 本発明の実施の形態3のプログラムのアルゴリズムを示した図である。It is the figure which showed the algorithm of the program of Embodiment 3 of this invention.
 (実施の形態1)
 〔構成要素関連性評価システムのハードウェア構成〕
 図1を参照して、本願発明の構成要素関連性評価システム(以下、単に「システム」とよぶ)について、説明する。図1は、システム1のハードウェア構成の一例である。システム1は、サーバ装置10およびクライアント端末11を有する。サーバ装置10は、計算を行う演算装置10aとデータ格納用の記憶装置10bを有する。
(Embodiment 1)
[Hardware configuration of component relevance evaluation system]
A component relevance evaluation system (hereinafter simply referred to as “system”) of the present invention will be described with reference to FIG. FIG. 1 is an example of a hardware configuration of the system 1. The system 1 includes a server device 10 and a client terminal 11. The server device 10 includes an arithmetic device 10a that performs calculation and a storage device 10b that stores data.
 サーバ装置10はデータ分析の主要処理を実行可能である。クライアント端末11はサーバ装置10におけるデータ分析の関連処理を実行可能である。記憶装置10bは、例えば、データ(デジタルデータおよびアナログデータを含む)を格納可能な任意の記録媒体(例えば、メモリ、ハードディスクなど)である。演算装置10aは、記録媒体に格納された制御プログラムを実行可能なコントローラ(例えば、中央処理装置(CPU))である。演算装置10aは、記録媒体に少なくとも一時的に格納されたデータを分析するコンピュータまたはコンピュータシステム(複数のコンピュータが統合的に動作することによってデータ分析を実現するシステム)である。なお、演算装置10aは、管理計算機(不図示)として、サーバ装置10の外部装置という形態で構成させてもよく、記憶装置10bは、データ格納サーバ装置13として、サーバ装置10の外部記憶装置の形態で構成させても良い。 The server device 10 can execute main processing of data analysis. The client terminal 11 can execute a data analysis related process in the server device 10. The storage device 10b is, for example, any recording medium (for example, a memory or a hard disk) that can store data (including digital data and analog data). The arithmetic device 10a is a controller (for example, a central processing unit (CPU)) that can execute a control program stored in a recording medium. The computing device 10a is a computer or a computer system (a system that realizes data analysis by operating a plurality of computers in an integrated manner) that analyzes data stored at least temporarily in a recording medium. The computing device 10a may be configured as a management computer (not shown) in the form of an external device of the server device 10, and the storage device 10b is configured as the data storage server device 13 of the external storage device of the server device 10. You may make it comprise with a form.
 管理計算機(不図示)は、例えば、メモリと、コントローラと、バスと、入出力インターフェースと、通信インターフェースとを備えてよい。なお、クライアント端末11、サーバ装置10、管理計算機(不図示)がそれぞれ備えるメモリには、クライアント端末11、サーバ装置10、管理計算機(不図示)の各装置を制御可能なアプリケーションプログラムが記憶されている。各コントローラがアプリケーションプログラムをそれぞれ実行することにより、アプリケーションプログラム(ソフトウェア資源)とハードウェア資源とが協働し、各装置が動作する。 The management computer (not shown) may include, for example, a memory, a controller, a bus, an input / output interface, and a communication interface. Note that application programs that can control the respective devices of the client terminal 11, the server device 10, and the management computer (not shown) are stored in the memory provided in each of the client terminal 11, the server device 10, and the management computer (not shown). Yes. As each controller executes an application program, the application program (software resource) and the hardware resource cooperate to operate each device.
 記憶装置10bは、例えば、ディスクアレイシステムから構成され、データと当該データに対する評価・分類の結果とを記録するデータベースを備えることができる。サーバ装置10と記憶装置10bとは、直接接続方式(DAS)、または記憶装置領域ネットワーク(SAN)によって接続される。 The storage device 10b is composed of, for example, a disk array system, and can include a database that records data and results of evaluation / classification of the data. The server device 10 and the storage device 10b are connected by a direct connection method (DAS) or a storage device area network (SAN).
 クライアント端末11は、サーバ装置10における処理プロセスの途中のデータをユーザに提示する。これにより、ユーザは、クライアント端末11を介して、双方向のやり取りにより、入力を行う、すなわち分類情報を与えることができる。クライアント端末11は、例えば、メモリと、コントローラと、バスと、入出力インターフェース(例えば、キーボード、ディスプレイなど)と、通信インターフェース(所定のネットワークを用いた通信手段によって、クライアント端末11とサーバ装置10とを通信可能に接続する)とを備えてよい。クライアント端末11は、スキャナなどの入力装置12を有するように構成させてもよい。 The client terminal 11 presents data in the middle of the processing process in the server device 10 to the user. As a result, the user can input, that is, provide classification information through bidirectional exchange via the client terminal 11. The client terminal 11 includes, for example, a memory, a controller, a bus, an input / output interface (for example, a keyboard and a display), and a communication interface (communication means using a predetermined network). For communication). The client terminal 11 may be configured to include an input device 12 such as a scanner.
 なお、図1に示されるハードウェア構成はあくまで例示に過ぎず、システム1は他のハードウェア構成によっても実現され得る。例えば、すべての処理の一部または全部がサーバ装置10において実行される構成であってもよいし、その一部または全部がクライアント端末11において実行される構成であってもよい。本実施例では、入力装置12はクライアント端末11に接続されて、サーバ装置10に送信が可能な構成としているが、入力装置12はサーバ装置10に直接接続して、ここからサーバへ入力を行ってもよい。システム1を実現可能なハードウェア構成が多様に存在し得ることは、当業者に理解されるところであり、例えば、図1に例示した構成には限定されない。 Note that the hardware configuration shown in FIG. 1 is merely an example, and the system 1 can be realized by other hardware configurations. For example, a configuration in which part or all of all the processes are executed in the server device 10 may be used, or a part or all of the processing may be executed in the client terminal 11. In this embodiment, the input device 12 is connected to the client terminal 11 and can transmit to the server device 10. However, the input device 12 directly connects to the server device 10 and inputs data to the server from here. May be. It will be understood by those skilled in the art that there are various hardware configurations that can implement the system 1, and the configuration is not limited to the configuration illustrated in FIG. 1, for example.
 〔構成要素関連性評価システムの関連性評価の原理〕
 続いて、図2を参照して、本発明における構成要素関連性評価システムにおいて関連性評価の原理について説明する。図2は、本発明における関連性の比較対象となる基準データRと被検データTを説明した図である。本発明における構成要素関連性評価では、2以上の被検データT(本実施例では被検データT1と被検データT2)が、基準データRとの関連性が高いか否かを判定するものである。関連性評価のための指標としての特徴係数を算定し、それによって関連性の高さを評価する。被検データT1,T2も基準データRも、いずれもデータ構成要素の集合体である。すなわち、被検データT1,T2は複数の単位データt1により、被検データT2は複数のデータ構成要素t2により、基準データRは複数のデータ構成要素rにより構成されている。被検データTも基準データRのデータの種類は特に限定されない。文書データでもよいし、画像データ、音声データなど、単位データの集合体である限りあらゆるデータの集合体が対象となる。したがって、データ構成要素としては、文書を構成する形態素、キーワード、センテンス、段落、および/またはメタデータ(例えば、電子メールのヘッダ情報)であったり、音声を構成する部分音声、ボリューム(ゲイン)情報、および/または音色情報であったり、画像を構成する部分画像、部分画素、および/または輝度情報であったり、映像を構成するフレーム画像、モーション情報、および/または3次元情報となる。
[Principle of relevance evaluation of component relevance evaluation system]
Next, the principle of relevance evaluation in the component relevance evaluation system according to the present invention will be described with reference to FIG. FIG. 2 is a diagram illustrating the reference data R and the test data T that are comparison targets of the relevance in the present invention. In the component relevance evaluation in the present invention, it is determined whether two or more test data T (in this embodiment, test data T1 and test data T2) are highly related to the reference data R. It is. The feature coefficient is calculated as an index for evaluating the relevance, thereby evaluating the high relevance. Both the test data T1, T2 and the reference data R are aggregates of data components. That is, the test data T1 and T2 are composed of a plurality of unit data t1, the test data T2 is composed of a plurality of data components t2, and the reference data R is composed of a plurality of data components r. The data type of the test data T and the reference data R is not particularly limited. It may be document data, or any data aggregate, such as image data and audio data, as long as it is an aggregate of unit data. Accordingly, the data components include morphemes, keywords, sentences, paragraphs, and / or metadata (for example, header information of an e-mail) constituting a document, partial voices constituting the voice, and volume (gain) information. And / or timbre information, partial image, partial pixel, and / or luminance information constituting an image, frame image, motion information, and / or three-dimensional information constituting a video.
 すなわち、たとえば、被検データTおよび基準データRを、文書データと仮定すれば、データ構成要素は、それを構成する単語、フレーズを代表例とするテキストデータとなる。被検データTおよび基準データRにおいて、データの種類は同一であることが代表的であるが、必ずしも同一である必要はない。たとえば、基準データRが文書データであって、データ構成要素が単語である場合に、被検データTが音声データである場合には、文字としてのデータ構成要素と、音声としての単語データの比較により、関連性の高さを評価することができる。 That is, for example, if the test data T and the reference data R are assumed to be document data, the data component is text data having a typical word or phrase constituting it as a representative example. In the test data T and the reference data R, the types of data are typically the same, but they are not necessarily the same. For example, when the reference data R is document data and the data component is a word, and the test data T is speech data, a comparison is made between the data component as characters and the word data as speech. Thus, the degree of relevance can be evaluated.
 続いて、図3から図5を参照して、本発明における構成要素関連性評価システムにおける関連性評価の原理について説明する。図3は、データ構成要素を示した図である。まず、基準データRを構成するデータ構成要素の並び方向を定義づける。基準データRの内容を評価する上で必要となる並び方向を決定する。図3に示した例では、左から右に向かってデータ構成要素の並び方向が決定され、最も右のデータ構成要素の次は、一段下がった行の最も左のデータが割り当てられ、その位置から右に向かうように並び方向を決定している。単純な例として、文書データの場合には、文字列の順序が並び方向となる。しかし、画像データなどの場合には、評価上最も適切な並び方向を決定する。 Subsequently, the principle of relevance evaluation in the component relevance evaluation system according to the present invention will be described with reference to FIGS. FIG. 3 is a diagram showing data components. First, the arrangement direction of data constituent elements constituting the reference data R is defined. The arrangement direction necessary for evaluating the contents of the reference data R is determined. In the example shown in FIG. 3, the arrangement direction of the data components is determined from the left to the right. Next to the rightmost data component, the leftmost data in the line one level down is assigned, and from that position The alignment direction is determined so as to go to the right. As a simple example, in the case of document data, the order of character strings is the arrangement direction. However, in the case of image data or the like, the arrangement direction most appropriate for evaluation is determined.
 続いて、基準データRを構成するデータ構成要素の中から、基準データRの内容の特徴を最もよく表す複数のデータ構成要素を評価構成要素として、予め定義した単位データの並び方向に従った出現順に抽出する。図4に示す例では、5つの評価構成要素m1,m2,m3,m4,m5が選択されている。評価構成要素の選択およびその出現順は、基準データRの内容の特徴を最も的確に表すように選択する。基準データRの評価構成要素m1,m2,m3,m4,m5と、それらの出現順は、被検データTの関連性を評価する上で、予め定める基準として機能する。 Subsequently, among the data constituent elements constituting the reference data R, a plurality of data constituent elements that best represent the characteristics of the content of the reference data R are used as evaluation constituent elements, and appear in accordance with the arrangement direction of the predefined unit data. Extract sequentially. In the example shown in FIG. 4, five evaluation components m1, m2, m3, m4, and m5 are selected. The selection of evaluation components and the order of their appearance are selected so as to most accurately represent the characteristics of the content of the reference data R. The evaluation components m1, m2, m3, m4, and m5 of the reference data R and their appearance order function as predetermined criteria for evaluating the relevance of the test data T.
 次に、被検データTについて、基準データRとの関連性の高さの評価について説明する。図4は、被検データTを示した図である。まず、基準データRの場合と同様に、被検データTを構成するデータ構成要素の並び方向を判定する。被検データTの内容を評価する上で必要となる並び方向を決定する。図4に示した例では、図3と同様に、左から右に向かってデータ構成要素の並び方向が判定され、最も右のデータ構成要素の次は、一段下がった行の最も左のデータが割り当てられ、その位置から右に向かうように並び方向を決定している。次に、被検データTにおいて、基準データRにおいて予め定義していた評価構成要素m1,m2,m3,m4,m5を、その出現順に検出する。図4の例では、被検データTでは、出現順に、評価構成要素に対応する被検データTのデータ構成要素m1,m4,m3,m2が、この出現順に検出されている。評価構成要素m5に対応する被検データTのデータ構成要素は検出されていない。すなわち、基準データRにおいて予め定義していた5つの評価構成要素m1,m2,m3,m4,m5のうち、被検データTでは、データ構成要素m1,m2,m3,m4が抽出され、その出現順は、m1,m4,m3,m2となっている。 Next, the evaluation of the degree of relevance of the test data T with the reference data R will be described. FIG. 4 is a diagram showing the test data T. As shown in FIG. First, as in the case of the reference data R, the arrangement direction of the data components constituting the test data T is determined. An arrangement direction necessary for evaluating the contents of the test data T is determined. In the example shown in FIG. 4, as in FIG. 3, the arrangement direction of the data components is determined from the left to the right. Next to the rightmost data component, the leftmost data in the line one level down is displayed. It is assigned and the arrangement direction is determined so as to go to the right from the position. Next, in the test data T, the evaluation components m1, m2, m3, m4, and m5 previously defined in the reference data R are detected in the order of appearance. In the example of FIG. 4, in the test data T, the data components m1, m4, m3, and m2 of the test data T corresponding to the evaluation components are detected in the order of appearance. The data component of the test data T corresponding to the evaluation component m5 is not detected. That is, among the five evaluation components m1, m2, m3, m4, and m5 previously defined in the reference data R, the data components m1, m2, m3, and m4 are extracted from the test data T and their appearance The order is m1, m4, m3, m2.
 次に、被検データTでは、出現順に検出された評価構成要素m1,m4,m3,m2と、基準データRにおける出現順と、を比較して、基準データRに対する被検データTの関連性を調べる。図5は、出現順を考慮した基準データRの評価構成要素m1,m2,m3,m4,m5(図5上側)と、出現順を考慮した被検データTの評価構成要素m1,m4,m3,m2(図5下側)との対比を示している。ここで、関連性の高さを示す指標である特性係数(Order)を以下のように定義する。
 
Figure JPOXMLDOC01-appb-I000001
 
Next, in the test data T, the evaluation components m1, m4, m3, and m2 detected in the order of appearance are compared with the order of appearance in the reference data R, and the relevance of the test data T to the reference data R is compared. Check out. FIG. 5 shows evaluation components m1, m2, m3, m4, and m5 (upper side in FIG. 5) of the reference data R in consideration of the appearance order, and evaluation components m1, m4, and m3 of the test data T in consideration of the appearance order. , M2 (lower side in FIG. 5). Here, a characteristic coefficient (Order) which is an index indicating the degree of relevance is defined as follows.

Figure JPOXMLDOC01-appb-I000001
 特性係数(Order)は、「被検データTで検出された評価構成要素から2個を選択する組合せ数」に対する「被検データTで検出された評価構成要素から選択された2個の組み合せのうち、基準データRの評価構成要素の出現順と同じ組み合せ数」の割合である。すなわち、分母は、被検データTで検出された評価構成要素の数をNとすると、被検データTで検出された評価構成要素のうち、2つの評価構成要素の組合せ数は、N(N-1)/2通りとなる。たとえば、図4および図5の例では、被検データTでは4つの評価構成要素m1,m2,m3,m4が検出されているので、6通りとなる。具体的には、(m1,m2),(m1,m3),(m1,m4),(m2,m3),(m2,m4),(m3,m4)の組み合わせとなる。 The characteristic coefficient (Order) is the value of “the two combinations selected from the evaluation components detected in the test data T” with respect to “the number of combinations for selecting two from the evaluation components detected in the test data T”. Of these, the ratio is “the same number of combinations as the order of appearance of the evaluation components of the reference data R”. That is, in the denominator, when the number of evaluation components detected in the test data T is N, the number of combinations of two evaluation components among the evaluation components detected in the test data T is N (N -1) / 2. For example, in the example of FIGS. 4 and 5, since four evaluation components m1, m2, m3, and m4 are detected in the test data T, there are six patterns. Specifically, it is a combination of (m1, m2), (m1, m3), (m1, m4), (m2, m3), (m2, m4), (m3, m4).
 そして、分子は、その組み合せの総数のうち、被検データTで検出された評価構成要素から選択された2個の組み合せのうち、基準データRの評価構成要素の出現順が同じものの数を計算する。ここでは、出現順のみを考慮し、構成要素間に別の構成要素が出現することは評価の対象としない。図4および図5の例では、前記の組み合せのうち、基準データRと出現順が同じものは、(m1,m2),(m1,m3),(m1,m4)の3通りである。m1とm3との間に、m4が存在することは評価の対象とはしない。したがって、この場合には、特性係数(Order)は、0.5となる。
 
Figure JPOXMLDOC01-appb-I000002
 
The numerator calculates the number of the combinations in which the appearance order of the evaluation components of the reference data R is the same among the two combinations selected from the evaluation components detected in the test data T out of the total number of combinations. To do. Here, only the order of appearance is considered, and the appearance of another constituent element between constituent elements is not considered as an evaluation target. In the examples of FIGS. 4 and 5, there are three combinations (m1, m2), (m1, m3), and (m1, m4) that have the same order of appearance as the reference data R among the above combinations. The presence of m4 between m1 and m3 is not subject to evaluation. Therefore, in this case, the characteristic coefficient (Order) is 0.5.

Figure JPOXMLDOC01-appb-I000002
 仮に、基準データRと完全に同一のデータの場合には、評価構成要素から選択される2個の組み合せのすべてにおいて出現順が同一であるから、T(N)/F(N)=1.0となる。すなわち、被検データTにおける評価構成要素の出現順が基準データRと同じものが多ければ多いほど、被検データTと基準データRとの関連性が高く、特性係数(Order)は1に近くなる。一方、被検データTと基準データRとの関連性が低い場合には、特性係数(Order)は0に近くなる。したがって、特性係数(Order)がより大きければ、被検データTと基準データRとの関連性は高く、特性係数(Order)がより大きければ、被検データTと基準データRとの関連性は低いということがいえる。特徴係数は、0≦特徴係数(Order)≦1を満たす。 If the data is completely the same as the reference data R, the order of appearance is the same in all the two combinations selected from the evaluation components, so that T (N) / F (N) = 1. 0. That is, the more evaluation components appear in the test data T in the same order as the reference data R, the higher the relationship between the test data T and the reference data R and the characteristic coefficient (Order) is close to 1. Become. On the other hand, when the relationship between the test data T and the reference data R is low, the characteristic coefficient (Order) is close to zero. Therefore, if the characteristic coefficient (Order) is larger, the relation between the test data T and the reference data R is high, and if the characteristic coefficient (Order) is larger, the relation between the test data T and the reference data R is It can be said that it is low. The feature coefficient satisfies 0 ≦ feature coefficient (Order) ≦ 1.
 〔構成要素関連性評価システムの機能ブロック構成〕
 図6は、システム1の機能ブロック構成の一例を示した図である。システム1は、例えば、基準データ取得部21、被検データ取得部22、並び方向判定部23、評価構成要素抽出部24、構成要素格納部25、および構成要素関連性評価部26を備える。基準データ取得部21から並び方向判定部23と評価構成要素抽出部24を介して構成要素格納部25にいたる経路が、基準データRについての学習プロセスとなる。一方、被検データ取得部22から並び方向判定部23と評価構成要素抽出部24を介して構成要素関連性評価部26にいたる経路が、被検データTについて、基準データRに対しての関連性評価のプロセスとなる。
[Functional block configuration of component relevance evaluation system]
FIG. 6 is a diagram illustrating an example of a functional block configuration of the system 1. The system 1 includes, for example, a reference data acquisition unit 21, a test data acquisition unit 22, an arrangement direction determination unit 23, an evaluation component extraction unit 24, a component storage unit 25, and a component relevance evaluation unit 26. A route from the reference data acquisition unit 21 to the component storage unit 25 via the arrangement direction determination unit 23 and the evaluation component extraction unit 24 is a learning process for the reference data R. On the other hand, the route from the test data acquisition unit 22 to the component relationship evaluation unit 26 via the alignment direction determination unit 23 and the evaluation component extraction unit 24 is related to the reference data R with respect to the test data T. This is a sex assessment process.
 基準データ取得部21は、入力装置12またはクライアント端末11から入力された基準データ、またはすでに記憶装置10bに格納されている基準データRを構成する全てのデータ構成要素を取得する。 The reference data acquisition unit 21 acquires the reference data input from the input device 12 or the client terminal 11 or all data components constituting the reference data R already stored in the storage device 10b.
 基準データ取得部21および被検データ取得部22は、全てのデータ構成要素を取得すると、それらのデータを並び方向判定部23に出力し、それらのデータ構成要素の並び方向を決定してデータ構成要素を関連付ける。並び方向が関連づけられた全てのデータ構成要素は、評価構成要素抽出部24に出力される。なお、並び方向の判定は、データによっては、基準データ取得部21および被検データ取得部22において、データを取得した際のデータの並び方向をそのまま利用することで、省略することもできる。この場合、並び方向判定部23は、不要となる。また、並び方向の判定を、基準データ取得部21および被検データ取得部22で行っても良いし、評価構成要素抽出部24で行っても良い。評価構成要素抽出部24では、基準データRの内容的特徴を最も代表的に表す構成要素群を抽出する。評価構成要素抽出部24のプロセスでは、ユーザがクライアント端末11を用いて、構成要素群を選択できる。ここで、「構成要素群」とは、データ構成要素の群である。評価構成要素抽出部24において選定された「構成要素群」は、構成要素格納部25に出力される。構成要素格納部25は、「構成要素群」を、記憶装置10bまたはデータ格納サーバ装置13に格納する。 When the reference data acquisition unit 21 and the test data acquisition unit 22 acquire all the data components, they output the data to the alignment direction determination unit 23, determine the alignment direction of these data components, and configure the data configuration Associate elements. All the data components associated with the arrangement direction are output to the evaluation component extraction unit 24. The determination of the arrangement direction may be omitted depending on the data by using the data arrangement direction when the data is acquired in the reference data acquisition unit 21 and the test data acquisition unit 22 as they are. In this case, the arrangement direction determination unit 23 becomes unnecessary. Further, the determination of the alignment direction may be performed by the reference data acquisition unit 21 and the test data acquisition unit 22 or may be performed by the evaluation component extraction unit 24. The evaluation component extraction unit 24 extracts a component group that most representatively represents the content feature of the reference data R. In the process of the evaluation component extraction unit 24, the user can select a component group using the client terminal 11. Here, the “component group” is a group of data components. The “component group” selected by the evaluation component extraction unit 24 is output to the component storage unit 25. The component storage unit 25 stores the “component group” in the storage device 10 b or the data storage server device 13.
 評価構成要素抽出部24では、並び方向が決定された基準データRを構成するデータ構成要素から、評価構成要素m1,m2,m3,m4,m5を抽出する。評価構成要素抽出部24が抽出する評価構成要素の数は、基準データRの特徴に応じて、任意に定められる。評価構成要素抽出部24は、抽出した評価構成要素m1,m2,m3,m4,m5を、構成要素格納部25に出力する。構成要素格納部25は、記憶装置10bまたはデータ格納サーバ装置13に格納する。以上が、関連性評価の学習プロセスである。 The evaluation component extraction unit 24 extracts the evaluation components m1, m2, m3, m4, and m5 from the data components that constitute the reference data R for which the arrangement direction is determined. The number of evaluation components extracted by the evaluation component extraction unit 24 is arbitrarily determined according to the characteristics of the reference data R. The evaluation component extraction unit 24 outputs the extracted evaluation components m1, m2, m3, m4, and m5 to the component storage unit 25. The component storage unit 25 stores in the storage device 10b or the data storage server device 13. The above is the learning process of relevance evaluation.
 続いて、被検データTについて、基準データRに対する関連性評価プロセスを説明する。並び方向判定部23および評価構成要素抽出部24についての上記説明は、被検データTについて、基準データRに対しての関連性評価の評価プロセスでも同様に機能する。すなわち、図6に示すように、被検データ取得部22も、基準データ取得部21と同様に、入力装置12またはクライアント端末11から入力された被検データT、またはすでに記憶装置10bに格納されている被検データTを構成する全てのデータ構成要素を取得する。 Subsequently, the relevance evaluation process for the test data T with respect to the reference data R will be described. The above description of the arrangement direction determination unit 23 and the evaluation component extraction unit 24 functions similarly in the evaluation process of the relevance evaluation for the test data T with respect to the reference data R. That is, as shown in FIG. 6, similarly to the reference data acquisition unit 21, the test data acquisition unit 22 is also stored in the test data T input from the input device 12 or the client terminal 11 or the storage device 10b. All the data components constituting the test data T being acquired are acquired.
 被検データ取得部22は、全てのデータ構成要素を取得すると、それらのデータを並び方向判定部23に出力する。基準データ取得部21と被検データ取得部22とは、別々の構成とする必要はなく、同一のデータ取得部とすることができる。並び方向判定部23は、それらの並び方向を判定してデータ構成要素を関連付ける。並び方向が関連づけられた全てのデータ構成要素は、評価構成要素抽出部24に出力される。評価構成要素抽出部24は、記憶装置10bまたはデータ格納サーバ装置13に格納された構成要素格納部25は、記憶装置10bまたはデータ格納サーバ装置13に格納されている評価構成要素を、並び方向が関連づけられた被検データTの全てのデータ構成要素から抽出する。すべての評価構成要素が抽出されるわけではなく、被検データTのデータ構成要素のうち、基準データRにおける学習プロセスで選定された評価構成要素に対応するものを、出現順に抽出する。図4の例では、評価構成要素抽出部24により、評価構成要素を並び方向にしたがってm1,m4,m3,m2の出現順に抽出される。抽出された評価構成要素m1,m4,m3,m2は、構成要素関連性評価部26に出力される。構成要素関連性評価部26は、前述の特性係数(Order)を計算する。 When the test data acquisition unit 22 acquires all the data components, the test data acquisition unit 22 outputs the data to the arrangement direction determination unit 23. The reference data acquisition unit 21 and the test data acquisition unit 22 do not need to be configured separately, and can be the same data acquisition unit. The arrangement direction determination unit 23 determines the arrangement direction and associates the data components. All the data components associated with the arrangement direction are output to the evaluation component extraction unit 24. The evaluation component extraction unit 24 stores the evaluation component stored in the storage device 10b or the data storage server device 13 in the arrangement direction. Extract from all data components of the associated test data T. Not all evaluation components are extracted. Among the data components of the test data T, those corresponding to the evaluation components selected in the learning process in the reference data R are extracted in the order of appearance. In the example of FIG. 4, the evaluation component extraction unit 24 extracts evaluation components in the order of appearance of m1, m4, m3, and m2 according to the arrangement direction. The extracted evaluation components m1, m4, m3, and m2 are output to the component relationship evaluation unit 26. The component relevance evaluation unit 26 calculates the characteristic coefficient (Order) described above.
 また、構成要素関連性評価部26は、評価構成要素抽出部24から入力された構成要素に対応付けられた評価値を任意のメモリ(例えば、記憶装置10b)から読み出し、その評価値に基づいて対象データを評価する。評価値とは、基準データRにおいて選択される評価構成要素のそれぞれについて、それらの特徴に応じて予め設定しておく重みづけ値である。より具体的には、構成要素関連性評価部26は、例えば、対象データの少なくとも一部を構成する構成要素に対応付けられた評価値を合算することによって、当該対象データの指標(例えば、対象データを序列化可能にする数値、文字、および/または記号であってよい)を導出することができる。この指標として、たとえば、スコア値を使用することができる。ここで、スコア値(Score)とは、これら基準データRのデータ構成要素に対する被検データTの関連性の強さを定量的に評価する指標である。基準データRのデータ構成要素に対する被検データTの関連性の強さを定量的に表すことができる限り、スコア値(Score)の算出方法は問わない。スコア値の算出方法は、基準データRの内容を適切に評価できる限り、一般的な手法によればよい。たとえば、一例としては、基準データRにおいて抽出した評価構成要素ごとに定めた評価構成要素の評価値に対して、被検データTにおいてその評価構成要素が出現する頻度として以下の式のように、表すことができる。構成要素関連性評価部26は、被検データTとスコア値とを対応付け、両者を記憶装置10bに格納することが可能である。
 
Figure JPOXMLDOC01-appb-I000003
 
In addition, the component relevance evaluation unit 26 reads an evaluation value associated with the component input from the evaluation component extraction unit 24 from an arbitrary memory (for example, the storage device 10b), and based on the evaluation value Evaluate the target data. The evaluation value is a weighting value that is set in advance for each evaluation component selected in the reference data R in accordance with their characteristics. More specifically, the component relevance evaluation unit 26 adds, for example, an evaluation value associated with a component that constitutes at least a part of the target data, for example, an index of the target data (for example, target Numerical values, letters, and / or symbols that make the data orderable can be derived. As this index, for example, a score value can be used. Here, the score value (Score) is an index for quantitatively evaluating the strength of relevance of the test data T with respect to the data components of the reference data R. As long as the strength of the relevance of the test data T with respect to the data components of the reference data R can be quantitatively expressed, the calculation method of the score value (Score) is not limited. The score value may be calculated by a general method as long as the content of the reference data R can be appropriately evaluated. For example, as an example, with respect to the evaluation value of the evaluation component defined for each evaluation component extracted in the reference data R, the frequency of the evaluation component appearing in the test data T is expressed by the following equation: Can be represented. The component relevance evaluation unit 26 can associate the test data T with the score value and store both in the storage device 10b.

Figure JPOXMLDOC01-appb-I000003
 なお、上記において、「部」と表記した構成は、システム1が備えたコントローラが、プログラムを実行することによって実現する機能構成であるため、「部」を、「処理」または「機能」と言い換えてもよい。また、「部」をハードウェア資源によって代替することもできるため、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組み合わせによって多様な形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 In addition, in the above, since the configuration described as “part” is a functional configuration realized by executing a program by the controller included in the system 1, “part” is rephrased as “processing” or “function”. May be. In addition, since the “unit” can be replaced by hardware resources, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof. However, it is not limited to either.
 〔構成要素関連性評価システムで実行するプログラムのアルゴリズム構成〕
 続いて、上記機能についてシステム1で実行するプログラムのアルゴリズムを説明する。まず、基準データRを取り込む(S101)。続いて、読み込んだ基準データRについて、データ構成要素の並び方向を決定する(S102)。データ構成要素の並び方向が決定された基準データRにおいて、データ構成要素の中から基準データRの内容の特徴を最もよく表す複数のデータ構成要素を、予め定義した並び方向に従った出現順とともに抽出し、関連性評価のための評価構成要素群として定義する(S103)。抽出された評価構成要素群とその出現順のデータを記憶装置10bに格納する(S104)。以上が関連性評価のための基準データRによる学習プロセスである。これに続いて、被検データTについての基準データRに対する関連性評価プロセスが進む。まず、被検データTを取り込む(S105)。続いて、被検データTを構成するデータ構成要素の並び方向を決定する(S106)。並び方向が決定された被検データTから、予め学習プロセスにおいて決定していた関連性評価のための評価構成要素を抽出する(S107)。抽出された被検データTの中の評価構成要素について、基準データRにおける出現順と同じものを抽出する(S108)。続いて、前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する。特徴係数は、抽出された被検データTの評価構成要素のうち、選択された2個の組み合わせの出現順が予め定義された基準データRの評価構成要素の出現順との合致の程度を計算することが可能である。すなわち、合致の程度は、たとえば合致率として、前記の特徴係数(Order)に対応することができる。たとえば、抽出された被検データTの評価構成要素のうち選択された2個の組み合せの総数において、その出現順が合致するものには「1」を付与し、合致しないものには「0」を付与する。前記の通り、出現順のみを考慮し、構成要素間に別の構成要素が出現することは評価の対象としない。そして、特徴係数(Order) = (1が付与された頻度)/(2個の組み合せ総数)を算出する(S109)。
[Algorithm configuration of the program executed in the component relevance evaluation system]
Subsequently, an algorithm of a program executed by the system 1 for the above function will be described. First, reference data R is fetched (S101). Subsequently, the arrangement direction of the data components is determined for the read reference data R (S102). In the reference data R in which the arrangement direction of the data components is determined, a plurality of data components that best represents the characteristics of the content of the reference data R among the data components are displayed together with the appearance order according to the predefined arrangement direction. Extracted and defined as an evaluation component group for relevance evaluation (S103). The extracted evaluation component group and its appearance order data are stored in the storage device 10b (S104). The above is the learning process using the reference data R for relevance evaluation. Subsequently, the relevance evaluation process for the test data T with respect to the reference data R proceeds. First, test data T is fetched (S105). Subsequently, the arrangement direction of the data components constituting the test data T is determined (S106). Evaluation components for relevance evaluation that have been determined in advance in the learning process are extracted from the test data T for which the arrangement direction has been determined (S107). The evaluation components in the extracted test data T are extracted in the same order of appearance in the reference data R (S108). Subsequently, a feature coefficient based on the appearance order of the evaluation components of the test data in the arrangement direction of the test data is calculated. The feature coefficient calculates the degree of coincidence of the appearance order of the selected two combinations among the evaluation constituent elements of the extracted test data T with the appearance order of the evaluation constituent elements of the reference data R defined in advance. Is possible. That is, the degree of matching can correspond to the feature coefficient (Order), for example, as a matching rate. For example, in the total number of combinations of two selected evaluation components of the extracted test data T, “1” is assigned to those that match the appearance order, and “0” is assigned to those that do not match. Is granted. As described above, only the order of appearance is considered, and the appearance of another constituent element between constituent elements is not an object of evaluation. Then, the characteristic coefficient (Order) = (frequency at which 1 is assigned) / (total number of two combinations) is calculated (S109).
 上記により、たとえば、スコア値(Score)が同一または非常に近似する2以上の被検データTが存在する場合に、従前はスコア値(Score)のみでは基準データRとの関連性の高さが判定できなかったところ、本発明では、特徴係数(Order)を計算することによって、特性係数の大きいほうが基準データRとの関連性が高いことを判定できることになる。たとえば、図2の場合、基準データRに対する被検データT1および被検データT2のスコア値がいずれも70であるときに、基準データRに対する被検データT1および被検データT2の特徴係数(Order)が、それぞれ0.6と0.8のとすると、被検データT2のほうが基準データRに対する関連性が高いと判断することが可能となる。 As described above, for example, when there are two or more test data T having the same or very similar score value (Score), the score value (Score) alone has a high relevance to the reference data R. However, in the present invention, by calculating the characteristic coefficient (Order), it can be determined that the larger the characteristic coefficient is, the higher the relevance with the reference data R is. For example, in the case of FIG. 2, when the test data T1 and the test data T2 both have a score value of 70 with respect to the reference data R, the characteristic coefficients (Order of the test data T1 and the test data T2 with respect to the reference data R) ) Are 0.6 and 0.8, respectively, it can be determined that the test data T2 is more relevant to the reference data R.
 また、スコア値が同一ではない場合、非常に近似する2以上の被検データTが存在する場合に、一の軸をスコア値に割り当て、他の軸を特徴係数(Order)に割り当てた分布図をディスプレイやプリンタなどの表示手段に表示させることにより、「スコア値」と「特徴係数」という2つの要素で、基準データRに対する被検データTの関連性を容易に判断できる情報をユーザに提供させることも可能である。 Also, if the score values are not the same, and there are two or more test data T that are very close to each other, a distribution diagram in which one axis is assigned to the score value and the other axis is assigned to the feature coefficient (Order) Is displayed on a display means such as a display or a printer, and information that allows the user to easily determine the relevance of the test data T to the reference data R is provided to the user by using two elements, “score value” and “feature coefficient”. It is also possible to make it.
 (実施の形態2)
 上記の実施の形態1におけるシステム1では、特徴係数(Order)を演算するシステム1によって判断を行う形態について説明した。しかし、特徴係数(Order)をスコア値の補正に使用することで、被検データTの関連性の高さを補正されたスコア値で評価することが可能となる。以下、これにつき、実施の形態2として説明する。
(Embodiment 2)
In the system 1 according to the above-described first embodiment, the mode in which the determination is performed by the system 1 that calculates the characteristic coefficient (Order) has been described. However, by using the feature coefficient (Order) for correcting the score value, it is possible to evaluate the degree of relevance of the test data T with the corrected score value. Hereinafter, this will be described as a second embodiment.
 実施の形態2も、ハードウェア構成としてのシステム1および機能ブロック図は実施の形態と同じであるので、ここでは異なる部分について、図6と図8を参照して説明する。図8は、実施の形態2のプログラムのアルゴリズムを示している。実施の形態1では、図6における機能ブロックにおける構成要素関連性評価部26では、特徴係数(Order)の計算のみを行った。しかし、実施の形態2では、構成要素関連性評価部26は、特徴係数(Order)を予め計算されている被検データTについてのスコア値の補正値として計算する。図8は実施の形態2についてのプログラムのアルゴリズムである。図8において、基準データRを取り込むステップ(S201)から特徴係数(Order)を計算するステップ(S209)までは、実施の形態1のステップS101からステップS109までと同じである。 In the second embodiment, the system 1 as a hardware configuration and the functional block diagram are the same as those in the embodiment, and different parts will be described here with reference to FIG. 6 and FIG. FIG. 8 shows the algorithm of the program according to the second embodiment. In the first embodiment, the component relationship evaluation unit 26 in the functional block in FIG. 6 only calculates the feature coefficient (Order). However, in the second embodiment, the component relevance evaluation unit 26 calculates the feature coefficient (Order) as a correction value of the score value for the test data T calculated in advance. FIG. 8 shows a program algorithm according to the second embodiment. In FIG. 8, the steps from the step of taking in the reference data R (S201) to the step of calculating the characteristic coefficient (Order) (S209) are the same as the steps S101 to S109 of the first embodiment.
 実施の形態2では、構成要素関連性評価部26は、特徴係数(Order)を計算した後に、下記のように、予め被検データTに対して計算されていたスコア値(Score RAW)を計算する(S210)。スコア値の算出方法は、実施の形態1で述べた通り、基準データRの内容を適切に評価できる限り、一般的な手法によればよい。
 
Figure JPOXMLDOC01-appb-I000004
In the second embodiment, the component relevance evaluation unit 26 calculates the score value (Score RAW ) calculated in advance for the test data T as described below after calculating the feature coefficient (Order). (S210). As described in the first embodiment, the score value may be calculated by a general method as long as the content of the reference data R can be appropriately evaluated.

Figure JPOXMLDOC01-appb-I000004
 実施の形態2では、特に、スコア値が非常に近似するが同一ではない2以上の被検データTが存在すると、特徴係数(Order)が大きかったとしてもスコア値が異なるため比較が困難になる場合がある。このような場合には、特徴係数により補正されたスコア値を使用することで、補正されたスコア値の大きいほうが基準データRとの関連性が高いという判定ができることになる。 In the second embodiment, in particular, if there are two or more test data T whose score values are very similar but not the same, even if the characteristic coefficient (Order) is large, the score values are different, making comparison difficult. There is a case. In such a case, by using the score value corrected by the feature coefficient, it can be determined that the larger the corrected score value is, the higher the relevance with the reference data R is.
 たとえば、図2の場合、基準データRに対する被検データT1および被検データT2のスコア値(Score RAW)がそれぞれ72と71であるときに、基準データRに対する被検データT1および被検データT2の特徴係数(Order)が、それぞれ0.65と0.67のとすると、特徴係数により補正されたスコア値は、それぞれ、45.5と46.9となる。この結果、スコア値は被検データT2のほうが高いものの、被検データT2のほうが基準データRに対する関連性が高いと判断することが可能となる。 For example, in the case of FIG. 2, when the score values (Score RAW ) of the test data T1 and the test data T2 with respect to the reference data R are 72 and 71, respectively, the test data T1 and the test data T2 with respect to the reference data R If the feature coefficient (Order) is 0.65 and 0.67 respectively, the score values corrected by the feature coefficient are 45.5 and 46.9, respectively. As a result, although the score value is higher in the test data T2, it is possible to determine that the test data T2 is more relevant to the reference data R.
 (実施の形態3)
 上記の実施の形態2におけるシステム1では、基準データRに対する被検データT1および被検データT2のスコア値(Score RAW)を、特徴係数(Order)とは別に算出している。すなわち、スコア値を算出するための評価構成要素群と、特徴係数(Order)とが、異なっている場合に使用できる形態である。実施の形態3では、基準データRで予め決定した共通の評価構成要素により、スコア値の算出と特徴係数の算出とを、一連のプロセスで実施するものである。以下、これにつき、実施の形態3として説明する。
(Embodiment 3)
In the system 1 according to the second embodiment, the test data T1 and the score value (Score RAW ) of the test data T2 with respect to the reference data R are calculated separately from the feature coefficient (Order). That is, this is a form that can be used when the evaluation component group for calculating the score value and the feature coefficient (Order) are different. In the third embodiment, the calculation of the score value and the calculation of the characteristic coefficient are carried out by a series of processes using a common evaluation component determined in advance by the reference data R. Hereinafter, this will be described as a third embodiment.
 図9は、実施の形態3のシステム1の機能ブロック構成の一例を示した図である。システム1は、実施の形態1と同じく、基準データ取得部21、被検データ取得部22、並び方向判定部23、評価構成要素抽出部24、構成要素格納部25を具備する。これらについては、実施の形態1と同じであるため、説明を省略する。これらに加え、実施の形態3では、さらに、構成要素関連性評価部26と、スコア値算出部27と、スコア値補正部28と、を備えている。 FIG. 9 is a diagram illustrating an example of a functional block configuration of the system 1 according to the third embodiment. As in the first embodiment, the system 1 includes a reference data acquisition unit 21, a test data acquisition unit 22, an arrangement direction determination unit 23, an evaluation component extraction unit 24, and a component storage unit 25. Since these are the same as those in the first embodiment, description thereof is omitted. In addition to these, the third embodiment further includes a component relevance evaluation unit 26, a score value calculation unit 27, and a score value correction unit 28.
 図9は、実施の形態3のシステム1の機能ブロック構成の一例を示した図である。システム1は、実施の形態1と同じく、基準データ取得部21、被検データ取得部22、並び方向判定部23、評価構成要素抽出部24、構成要素格納部25を具備する。これらについては、実施の形態1と同様であるため、異なる部分についてのみ説明を行う。これらに加え、実施の形態3では、さらに、構成要素関連性評価部26と、スコア値算出部27と、スコア値補正部28と、を備えている。評価構成要素抽出部24は、基準データRの内容を最も適格に表わす評価構成要素群を抽出し、それをN個のグループに分類する。スコア値算出部27は、N個のグループのそれぞれについて、スコア値(Score(i)RAW)算出する。スコア値の算出方法は、基準データRの内容を適切に評価できる限り、一般的な手法によればよい。構成要素関連性評価部26は、各評価構成要素群のグループについて、実施の形態1における方法で、選択される2つの組み合せにおける出現順が基準データRと同じものの割合である特徴係数(Order)を計算する。特徴係数(Order)の算出方法は、実施の形態1で説明したとおりである。そして、スコア値補正部28はグループごとに、スコア値(Score(i)RAW)と特徴係数(Order)とを乗じて、以下のようにその総和を計算する。
 
Figure JPOXMLDOC01-appb-I000005
FIG. 9 is a diagram illustrating an example of a functional block configuration of the system 1 according to the third embodiment. As in the first embodiment, the system 1 includes a reference data acquisition unit 21, a test data acquisition unit 22, an arrangement direction determination unit 23, an evaluation component extraction unit 24, and a component storage unit 25. Since these are the same as those in the first embodiment, only different parts will be described. In addition to these, the third embodiment further includes a component relevance evaluation unit 26, a score value calculation unit 27, and a score value correction unit 28. The evaluation component extraction unit 24 extracts an evaluation component group that most appropriately represents the content of the reference data R, and classifies it into N groups. The score value calculation unit 27 calculates a score value (Score (i) RAW ) for each of the N groups. The score value may be calculated by a general method as long as the content of the reference data R can be appropriately evaluated. The component relevance evaluation unit 26, for each group of evaluation component groups, is a feature coefficient (Order) that is the ratio of the same order of appearance as the reference data R in the two combinations selected by the method in the first embodiment. Calculate The calculation method of the characteristic coefficient (Order) is as described in the first embodiment. Then, the score value correcting unit 28 multiplies the score value (Score (i) RAW ) and the feature coefficient (Order) for each group, and calculates the sum as follows.

Figure JPOXMLDOC01-appb-I000005
 続いて、図10を参照して、実施の形態3についてのアルゴリズムについて説明する。図10は、実施の形態3におけるアルゴリズムを示している。まず、基準データRを取り込む(S301)。続いて、読み込んだ基準データRについて、データ構成要素の並び方向を決定する(S302)。データ構成要素の並び方向が決定された基準データRにおいて、データ構成要素の中から基準データRの内容の特徴を最もよく表す複数のデータ構成要素を、予め定義した並び方向に従った出現順とともに抽出し、関連性評価のための評価構成要素として定義する。この時、評価構成要素群をN個のグループに分類する(S303)。抽出された評価構成要素とその出現順のデータを記憶装置10bに格納する(S304)。以上が関連性評価のための基準データRによる学習プロセスである。これに続いて、被検データTについての基準データRに対する関連性評価プロセスが進む。まず、被検データを取り込む(S305)。続いて、被検データTを構成するデータ構成要素の並び方向を決定する(S306)。並び方向が判定された被検データTから、予め学習プロセスにおいて決定していた関連性評価のための評価構成要素を抽出する(S307)。評価構成要素のN個のグループのグループごとに、スコア値(Score(i)RAW)を算出する(S308)。一方、そのN個の評価構成要素のグループの各々において、基準データRにおける出現順と同じものを抽出する。抽出された被検データTの評価構成要素群のうち、選択された2個の組み合わせの出現順が予め定義された基準データRの評価構成要素群の出現順との合致の程度を取得する。合致の程度は、たとえば、合致率として、前記の特徴係数(Order)とすることができる。たとえば、抽出された被検データTの評価構成要素群のうち選択された2個の組み合せの総数において、その出現順が合致するものには「1」を付与し、合致しないものには「0」を付与する(S309)。前記の通り、出現順のみを考慮し、構成要素間に別の構成要素が出現することは評価の対象としない。そして、特徴係数(Order) = (1が付与された頻度)/(2個の組み合せ総数)を算出する(S310)。各グループにおいて、スコア値(Score(i)RAW)と特徴係数(Order)とを乗じて、その総和を計算する(S311)。 Next, an algorithm for the third embodiment will be described with reference to FIG. FIG. 10 shows an algorithm in the third embodiment. First, reference data R is fetched (S301). Subsequently, the arrangement direction of the data components is determined for the read reference data R (S302). In the reference data R in which the arrangement direction of the data components is determined, a plurality of data components that best represents the characteristics of the content of the reference data R among the data components are displayed together with the appearance order according to the predefined arrangement direction. Extract and define as an evaluation component for relevance evaluation. At this time, the evaluation component group is classified into N groups (S303). The extracted evaluation components and their appearance order data are stored in the storage device 10b (S304). The above is the learning process using the reference data R for relevance evaluation. Subsequently, the relevance evaluation process for the test data T with respect to the reference data R proceeds. First, test data is fetched (S305). Subsequently, the arrangement direction of the data components constituting the test data T is determined (S306). Evaluation components for relevance evaluation that have been determined in advance in the learning process are extracted from the test data T for which the arrangement direction has been determined (S307). A score value (Score (i) RAW ) is calculated for each of the N groups of evaluation components (S308). On the other hand, in each of the N evaluation component groups, the same order of appearance in the reference data R is extracted. Of the extracted evaluation component groups of the test data T, the degree of coincidence between the appearance order of the two selected combinations with the appearance order of the evaluation component group of the reference data R defined in advance is acquired. The degree of match can be, for example, the feature factor (Order) as a match rate. For example, in the total number of combinations of the two selected evaluation component groups of the extracted test data T, “1” is given to those that match the appearance order, and “0” is given to those that do not match. Is given (S309). As described above, only the order of appearance is considered, and the appearance of another constituent element between constituent elements is not an object of evaluation. Then, the characteristic coefficient (Order) = (frequency at which 1 is assigned) / (total number of two combinations) is calculated (S310). In each group, the score value (Score (i) RAW ) is multiplied by the feature coefficient (Order), and the sum is calculated (S311).
 上記により、スコア値(Score(i)RAW)と特徴係数(Order)とを同じ評価構成要素群によって計算を行うため、計算のプロセスが簡易となり、スコア値の計算が容易になる。補正されたスコア値による判断は、実施の形態1および実施の形態2と同一である。 As described above, since the score value (Score (i) RAW ) and the feature coefficient (Order) are calculated by the same evaluation component group, the calculation process is simplified and the score value is easily calculated. The determination based on the corrected score value is the same as in the first and second embodiments.
 〔ソフトウェア・ハードウェアによる実現例〕
 データ分析システムの制御ブロックは、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、CPUを用いてソフトウェアによって実現してもよい。後者の場合、上記システムは、各機能を実現するソフトウェアであるプログラム(データ分析システムの制御プログラム)を実行するCPU、当該プログラムおよび各種データがコンピュータ(またはCPU)で読み取り可能に記録されたROM(Read Only Memory)または記憶装置(これらを「記録媒体」と称する)、当該プログラムを展開するRAM(Random Access Memory)などを備えている。そして、コンピュータ(またはCPU)が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、当該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。なお、上記プログラムは、任意のプログラミング言語によって実装可能である。また、上記プログラムを記録した任意の記録媒体も、本発明の範疇に入る。
[Example of implementation using software and hardware]
The control block of the data analysis system may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software using a CPU. In the latter case, the system includes a CPU that executes a program (control program for the data analysis system) that is software that implements each function, and a ROM (in which the program and various data are recorded so as to be readable by the computer (or CPU)). A Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for developing the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission. Note that the above program can be implemented in any programming language. Also, any recording medium that records the above program falls within the scope of the present invention.
 〔他のアプリケーション例〕
 上記システムは、例えば、ディスカバリ支援システム、フォレンジックシステム、電子メール監視システム、医療応用システム(例えば、ファーマコビジランス支援システム、治験効率化システム、医療リスクヘッジシステム、転倒予測(転倒防止)システム、予後予測システム、診断支援システムなど)、インターネット応用システム(例えば、スマートメールシステム、情報アグリゲーション(キュレーション)システム、ユーザ監視システム、ソーシャルメディア運営システムなど)、情報漏洩検知システム、プロジェクト評価システム、マーケティング支援システム、知財評価システム、不正取引監視システム、コールセンターエスカレーションシステム、信用調査システムなど、ビッグデータを分析する人工知能システム(データと所定の事案との関連性を評価可能な任意のシステム)として実現され得る。なお、本発明のデータ分析システムが応用される分野によっては、当該分野に特有の事情を考慮して、例えば、データに前処理(例えば、当該データから重要箇所を抜き出し、当該重要箇所のみをデータ分析の対象とするなど)を施したり、データ分析の結果を表示する態様を変化させたりしてよい。こうした変形例が多様に存在し得ることは、当業者に理解されるところであり、すべての変形例が本発明の範疇に入る。
[Other application examples]
Such systems include, for example, discovery support systems, forensic systems, e-mail monitoring systems, medical application systems (eg, pharmacovigilance support systems, clinical trial efficiency systems, medical risk hedging systems, fall prediction (fall prevention) systems, prognosis predictions) System, diagnosis support system, etc.), Internet application system (eg, smart mail system, information aggregation (curation) system, user monitoring system, social media management system, etc.), information leakage detection system, project evaluation system, marketing support system, Artificial intelligence systems that analyze big data, such as intellectual property evaluation systems, fraud monitoring systems, call center escalation systems, credit check systems The relevance of a given cases may be implemented as any system) can be evaluated. Depending on the field to which the data analysis system of the present invention is applied, in consideration of circumstances peculiar to the field, for example, preprocessing (for example, extracting an important part from the data and extracting only the important part from the data) The analysis target may be applied), or the mode of displaying the data analysis result may be changed. It will be understood by those skilled in the art that a variety of such variations can exist, and all variations fall within the scope of the present invention.
 本発明は上述したそれぞれの実施の形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施の形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施の形態についても、本発明の技術的範囲に含まれる。さらに、各実施の形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成できる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the technical means disclosed in different embodiments can be appropriately combined. Embodiments to be made are also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
 1  システム
 10 サーバ装置
 11 クライアント端末
 12 入力装置
 13 データ格納サーバ装置
 21 基準データ取得部
 22 被検データ取得部
 23 並び方向判定部
 24 評価構成要素抽出部
 25 構成要素格納部
 26 構成要素関連性評価部
 27 スコア値算出部
 28 スコア値補正部
 

 
DESCRIPTION OF SYMBOLS 1 System 10 Server apparatus 11 Client terminal 12 Input apparatus 13 Data storage server apparatus 21 Reference | standard data acquisition part 22 Test data acquisition part 23 Arrangement direction determination part 24 Evaluation component extraction part 25 Component element storage part 26 Component element evaluation part 27 score value calculation unit 28 score value correction unit

Claims (7)

  1.  基準データに対する被検データの関連性を評価する関連性評価システムであって、その関連性評価システムは、
     前記基準データと前記被検データとをそれぞれ取得するデータ取得部と、
     前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データのデータ構成要素の並び方向にしたがった出現順に、抽出する評価構成要素抽出部と、
     前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する関連性評価部と、を備える関連性評価システム。
    A relevance evaluation system that evaluates the relevance of test data to reference data,
    A data acquisition unit for acquiring the reference data and the test data, respectively;
    Evaluation component extraction for extracting an evaluation component representing the characteristics of the reference data among the data components of the reference data in the order of appearance according to the arrangement direction of the data components of the test data from the test data And
    A relevance evaluation unit that calculates a feature coefficient based on the appearance order of the evaluation components of the test data in the arrangement direction of the test data.
  2.  請求項1に記載の関連性評価システムであって、
     前記特徴係数は、前記被検データの前記評価構成要素から選択された構成要素の2つの組み合せの総数に対する、前記総数のうち前記基準データにおける出現順と同じ出現順の評価構成要素の2つの組み合せ出現数の割合である関連性評価システム。
    The relevance evaluation system according to claim 1,
    The characteristic coefficient is a combination of two evaluation components in the same order of appearance in the reference data out of the total number of two combinations of components selected from the evaluation components of the test data. A relevance assessment system that is the percentage of occurrences.
  3.  請求項1に記載の関連性評価システムであって、
     前記関連性評価部は、前記被検データのスコア値に、前記特徴係数を乗じる演算を行う関連性評価システム。
    The relevance evaluation system according to claim 1,
    The relevance evaluation unit is a relevance evaluation system that performs an operation of multiplying the score value of the test data by the feature coefficient.
  4.  請求項1に記載の関連性評価システムであって、
     前記評価構成要素抽出部は、抽出した前記被検データの前記評価構成要素を複数のグループに分類し、
     前記関連性評価システムは、前記複数のグループのそれぞれについて、抽出した前記評価構成要素に基づいてスコア値を算出するスコア値算出部を備え、
     前記関連性評価部は、前記複数のグループのそれぞれについて、前記特徴係数を計算し、
     前記関連性評価システムは、前記複数のグループのそれぞれについて、前記スコア値と、前記特徴係数とを乗じて、前記複数のグループのすべてについてその乗じた数の総和を計算するスコア値補正部とを備える関連性評価システム。
    The relevance evaluation system according to claim 1,
    The evaluation component extraction unit classifies the evaluation component of the extracted test data into a plurality of groups,
    The relevance evaluation system includes a score value calculation unit that calculates a score value based on the extracted evaluation component for each of the plurality of groups.
    The relevance evaluation unit calculates the feature coefficient for each of the plurality of groups,
    The relevance evaluation system includes a score value correction unit that multiplies the score value and the feature coefficient for each of the plurality of groups, and calculates a sum of the multiplied numbers for all of the plurality of groups. Relevance evaluation system provided.
  5.  コンピュータを備える関連性評価システムにより、基準データと被検データとの関連性を評価する方法であって、
     前記基準データと前記被検データとをそれぞれ取得し、
     前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データの前記データ構成要素の並び方向にしたがった出現順に、抽出し、
     前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する関連性評価方法。
    A method for evaluating the relationship between reference data and test data using a relationship evaluation system comprising a computer,
    Obtaining the reference data and the test data,
    The evaluation component representing the characteristics of the reference data among the data components of the reference data is extracted from the test data in the order of appearance according to the arrangement direction of the data components of the test data,
    A relevance evaluation method for calculating a feature coefficient based on the order of appearance of the evaluation components of the test data in the arrangement direction of the test data.
  6.  コンピュータを備える関連性評価システムにおいて実行可能な関連性評価プログラムであって、そのプログラムは基準データと被検データとの関連性を評価するものであって、前記プログラムは、
     前記基準データと前記被検データとをそれぞれ取得する工程と、
     前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データの前記データ構成要素の並び方向にしたがった出現順に、抽出する工程と、
     前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する工程と、を実行する関連性評価プログラム。
    A relevance evaluation program executable in a relevance evaluation system comprising a computer, the program evaluating relevance between reference data and test data, the program comprising:
    Obtaining the reference data and the test data, respectively;
    Extracting the evaluation component representing the characteristics of the reference data among the data components of the reference data in the order of appearance according to the arrangement direction of the data components of the test data from the test data;
    A relevance evaluation program for executing a step of calculating a feature coefficient based on the order of appearance of the evaluation components of the test data in the arrangement direction of the test data.
  7.  コンピュータを備える関連性評価システムにおいて実行可能であって、基準データと被検データとの関連性を評価する関連性評価プログラムが格納されている記憶媒体であって、前記プログラムは、
     前記基準データと前記被検データとをそれぞれ取得する工程と、
     前記基準データの前記データ構成要素のうち前記基準データの特徴を表す評価構成要素を、前記被検データから前記被検データの前記データ構成要素の並び方向にしたがった出現順に、抽出する工程と、
     前記被検データの前記並び方向における前記被検データの前記評価構成要素の出現順に基づく特徴係数を計算する工程と、を実行する記憶媒体。
     

     
    A storage medium that is executable in a relevance evaluation system including a computer and stores a relevance evaluation program for evaluating relevance between reference data and test data, the program being
    Obtaining the reference data and the test data, respectively;
    Extracting the evaluation component representing the characteristics of the reference data among the data components of the reference data in the order of appearance according to the arrangement direction of the data components of the test data from the test data;
    And a step of calculating a feature coefficient based on the order of appearance of the evaluation components of the test data in the arrangement direction of the test data.


PCT/JP2015/005479 2015-10-30 2015-10-30 Relevance evaluation system and method, program, and recording medium WO2017072822A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2015/005479 WO2017072822A1 (en) 2015-10-30 2015-10-30 Relevance evaluation system and method, program, and recording medium
JP2017547201A JPWO2017072822A1 (en) 2015-10-30 2015-10-30 Relevance evaluation system, method, program, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/005479 WO2017072822A1 (en) 2015-10-30 2015-10-30 Relevance evaluation system and method, program, and recording medium

Publications (1)

Publication Number Publication Date
WO2017072822A1 true WO2017072822A1 (en) 2017-05-04

Family

ID=58629917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/005479 WO2017072822A1 (en) 2015-10-30 2015-10-30 Relevance evaluation system and method, program, and recording medium

Country Status (2)

Country Link
JP (1) JPWO2017072822A1 (en)
WO (1) WO2017072822A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010055373A (en) * 2008-08-28 2010-03-11 Sky Co Ltd Note evaluation device or note evaluation program
JP2011113426A (en) * 2009-11-30 2011-06-09 Fujitsu Ltd Dictionary generation device, dictionary generating program, and dictionary generation method
JP2012252484A (en) * 2011-06-02 2012-12-20 Hitachi Systems Ltd Reply automatic creation system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006277413A (en) * 2005-03-29 2006-10-12 Toshiba Corp Document classification device and document classification method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010055373A (en) * 2008-08-28 2010-03-11 Sky Co Ltd Note evaluation device or note evaluation program
JP2011113426A (en) * 2009-11-30 2011-06-09 Fujitsu Ltd Dictionary generation device, dictionary generating program, and dictionary generation method
JP2012252484A (en) * 2011-06-02 2012-12-20 Hitachi Systems Ltd Reply automatic creation system

Also Published As

Publication number Publication date
JPWO2017072822A1 (en) 2018-07-26

Similar Documents

Publication Publication Date Title
JP6402265B2 (en) Method, computer device and storage device for building a decision model
CN111028006B (en) Service delivery auxiliary method, service delivery method and related device
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN110968664A (en) Document retrieval method, device, equipment and medium
CN110213660B (en) Program distribution method, system, computer device and storage medium
US20160335249A1 (en) Information processing apparatus, information processing method, and non-transitory computer readable medium
CN110210572B (en) Image classification method, device, storage medium and equipment
CN112017777A (en) Method and device for predicting similar pair problem and electronic equipment
US11232325B2 (en) Data analysis system, method for controlling data analysis system, and recording medium
JP6144314B2 (en) Data classification system, method, program and recording medium thereof
JP6026036B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
CN113836297B (en) Training method and device for text emotion analysis model
WO2017072822A1 (en) Relevance evaluation system and method, program, and recording medium
US7933853B2 (en) Computer-readable recording medium, apparatus and method for calculating scale-parameter
CN114595787A (en) Recommendation model training method, recommendation device, medium and equipment
JP6509391B1 (en) Computer system
US11514311B2 (en) Automated data slicing based on an artificial neural network
CN113468421A (en) Product recommendation method, device, equipment and medium based on vector matching technology
CN113704623A (en) Data recommendation method, device, equipment and storage medium
CN115769194A (en) Automatic data linking across datasets
CN112541069A (en) Text matching method, system, terminal and storage medium combined with keywords
KR20210023453A (en) Apparatus and method for matching review advertisement
JP5946949B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
CN114579762B (en) Knowledge graph alignment method, device, equipment, storage medium and program product
Singh et al. Application of error level analysis in image spam classification using deep learning model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15907186

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017547201

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15907186

Country of ref document: EP

Kind code of ref document: A1