WO2023074457A1 - マッチングシステム、マッチング方法、プログラム及び学習済モデル - Google Patents

マッチングシステム、マッチング方法、プログラム及び学習済モデル Download PDF

Info

Publication number
WO2023074457A1
WO2023074457A1 PCT/JP2022/038679 JP2022038679W WO2023074457A1 WO 2023074457 A1 WO2023074457 A1 WO 2023074457A1 JP 2022038679 W JP2022038679 W JP 2022038679W WO 2023074457 A1 WO2023074457 A1 WO 2023074457A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
matching
seeds
needs
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/038679
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
拓己 石渡
恵美子 寄▲崎▼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Konica Minolta Inc
Original Assignee
Konica Minolta Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Konica Minolta Inc filed Critical Konica Minolta Inc
Priority to JP2023556340A priority Critical patent/JPWO2023074457A1/ja
Publication of WO2023074457A1 publication Critical patent/WO2023074457A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals

Definitions

  • This disclosure relates to matching systems, matching methods, programs, and trained models.
  • Patent Document 1 using seed data such as in-house technology stored in memory and needs data of the public and customers, a SWOT analysis is used to identify business fields in which the retained seeds meet public needs. is disclosed.
  • the purpose of this disclosure is to provide a matching system, matching method, program, and trained model that can more easily and objectively quantitatively evaluate the match between seeds and needs.
  • an acquisition unit that acquires first data related to needs and second data related to seeds; an analysis unit that quantifies and analyzes the acquired first data and second data; an output unit that outputs matching information indicating the degree of matching between the first data and the second data using the analysis result by the analysis unit; is a matching system comprising
  • the invention according to claim 2 is the matching system according to claim 1,
  • the analysis unit converts the first data and the second data into vectors representing a multidimensional space in the digitization.
  • the analysis unit calculates the degree of conformity based on a degree of similarity between a first vector obtained from the first data and a second vector obtained from the second data.
  • the analysis unit has a trained model that uses as inputs a first vector obtained from the first data and a second vector obtained from the second data and outputs the degree of fitness.
  • the invention according to claim 5 is the matching system according to claim 4,
  • the trained model is based on a pattern recognition algorithm.
  • the invention according to claim 6 is the matching system according to any one of claims 1 to 5,
  • An input unit for acquiring input data The acquisition unit acquires the first data or the second data from the input data.
  • the invention according to claim 7 is the matching system according to any one of claims 1 to 6,
  • the first data includes at least one predetermined element related to needs,
  • the second data includes at least one predetermined element related to seeds.
  • the element of the first data includes at least one of the purpose, content, field and name of the needs
  • the elements of the second data include at least one of features, functions, competing technologies and names of seeds.
  • the invention according to claim 9 is the matching system according to claim 7 or 8,
  • An input unit for acquiring input data The acquisition unit extracts the element when acquiring the first data or the second data from the input data.
  • each of the first data and the second data includes at least a noun or a verb indicating a function.
  • the invention according to claim 11 is the matching system according to claim 10,
  • the first data and the second data each include an object for a noun or verb indicating the function.
  • a matching method performed by a computer control unit, an acquisition step of acquiring first data related to needs and second data related to seeds; an analysis step of quantifying and analyzing the acquired first data and second data, respectively; An output step of outputting matching information indicating the degree of matching between the first data and the second data using the analysis result in the analysis step; including.
  • the invention according to claim 13, the computer, Acquisition means for acquiring first data related to needs and second data related to seeds; analysis means for digitizing and analyzing the acquired first data and second data, respectively; output means for outputting matching information indicating the degree of matching between the first data and the second data using the analysis result by the analysis means; It is a program that functions as
  • a first vector obtained from the first data related to the needs and a second vector obtained from the second data related to the seeds are used as inputs, and the degree of conformity between the first data and the second data is output. It is a trained model that
  • FIG. 1 is an overall configuration diagram of a matching system according to an embodiment; FIG. It is a figure explaining the content of seeds/needs data.
  • 4 is a flowchart showing a control procedure of database generation processing;
  • FIG. 10 is a flow chart showing a control procedure of correspondence search control processing;
  • FIG. 11 is a flow chart showing a processing procedure of matching degree calculation processing called in correspondence search control processing;
  • FIG. It is a figure explaining the setting of a conformity degree of a 2nd example.
  • 7 is a flowchart showing a control procedure of learned model generation processing;
  • FIG. 11 is a flow chart showing a control procedure of a second example of a degree-of-fit calculation process using a trained model;
  • FIG. 4 is a flowchart showing a control procedure of matrix generation processing related to collaborative filtering
  • FIG. 11 is a flow chart showing a control procedure of a matching degree calculation process of a third example using a technique of collaborative filtering
  • FIG. 10 is a diagram showing an example of output results
  • FIG. 1 is an overall configuration diagram of a matching system 100 of this embodiment.
  • a matching system 100 of this embodiment includes an information processing device 1 , a database device 2 , and a terminal device 3 .
  • the information processing device 1 is a computer that performs processing related to matching in this embodiment, and may be, for example, an ordinary PC (Personal Computer).
  • the information processing device 1 is connected to the database device 2, refers to the held data stored in the database device 2, and writes and adds new data.
  • the database device 2 may not be directly connected to the information processing device 1, and may be accessible through the network N.
  • the information processing device 1 includes a control unit 11 (acquisition unit, analysis unit, output unit), a storage unit 12, a communication unit 13, and the like.
  • the control unit 11 has a hardware processor such as a CPU that performs arithmetic processing and controls the operation of the information processing apparatus 1 and a memory such as a RAM.
  • the storage unit 12 has a nonvolatile memory and stores a program 121, setting data, and the like.
  • the program 121 includes a control program related to the correspondence search control process of this embodiment.
  • the program 121 may include a part or all of the database generation processing, learned model generation processing, and matrix generation processing described later.
  • the program 121 also includes the generated trained model 1210 .
  • the non-volatile memory may be, for example, flash memory or HDD (Hard Disk Drive).
  • the communication unit 13 controls transmission and reception of data with external devices via the network N.
  • the terminal device 3 is included in the external device.
  • a communication standard for data transmission/reception may be, for example, a LAN (Local Area Network) standard (TCP/IP, etc.).
  • the information processing device 1 may also include a display unit, an operation reception unit, and the like.
  • the database device 2 has a storage unit 21 .
  • the storage unit 21 stores and holds needs/seeds data 211 acquired in advance, learning data and learned models for converting each term, phrase, sentence, etc. into a multidimensional vector (semantic vector). .
  • the terminal device 3 may be an ordinary PC or a mobile terminal (smartphone, etc.), and performs operations such as inputting data to be matched and displaying matching results.
  • the terminal device 3 includes a control unit 31, a communication unit 32, a display unit 33, an operation reception unit 34 (input unit), and the like.
  • the control unit 31 includes a hardware processor and a memory that perform arithmetic processing and centrally control the operation of the terminal device 3 .
  • the communication unit 32 controls transmission and reception of data with an external device via the network N.
  • FIG. The external device includes the information processing apparatus 1 described above.
  • the communication standard related to data transmission/reception is the same as that of the information processing apparatus 1, that is, the standard related to LAN (TCP/IP, etc.) is included.
  • the display unit 33 has a display screen on which characters can be displayed, and performs display operations on the display screen under the control of the control unit 31 .
  • the display screen is not particularly limited, but is, for example, a liquid crystal display screen (LCD).
  • the operation reception unit 34 receives an input operation from the user of the terminal device 3 and outputs the content of the input operation to the control unit 31 as an operation signal.
  • the operation reception unit 34 has, for example, a keyboard and a pointing device.
  • the pointing device includes a mouse and the like.
  • the operation reception unit 34 may have a touch panel or the like that overlaps the display screen. Alternatively, these may be externally attached peripheral devices.
  • seeds such as knowledge related to technology and intellectual property in general owned by itself (for example, a corporation such as its own company) (that is, things including intangibles. Here, things do not include people) seeds
  • Matching needs such as demands and wishes of customers and society.
  • the degree of suitability is an index that indicates whether the seeds can satisfy the needs/whether the needs can be satisfied by the seeds. In the case of a problem, it can be an index that shows whether the seeds can be the solution.
  • Input of data is received by the operation receiving unit 34 of the terminal device 3 or the like.
  • the operation accepting unit 34 may directly accept a data input operation, or may accept an input operation specifying a file name and, if necessary, a path where the file is located.
  • the control unit 31 transmits the input data received by the operation receiving unit 34 through the communication unit 32 to the information processing device 1 via the network N.
  • each of the elements of seeds and elements of needs is represented (converted) by a multidimensional vector (vector representing a multidimensional space) that is an array of numerical values representing the sizes of multiple semantic components. digitized by Then, the degree of conformity is quantitatively evaluated according to the degree of matching (distance) between the multidimensional vector of seeds (second vector) and the multidimensional vector of needs (first vector).
  • FIG. 2A is a diagram for explaining the content of seeds/needs data.
  • the seeds and needs element concisely presents the required information.
  • the four elements of the need (Why), the content (What), the field and name of the request, and the four elements of the seeds, such as technology features, functions, competing technologies and names are defined. .
  • These eight elements are known as the Elevator Pitch syntax, a short, to-the-point business speech syntax.
  • the four elements related to needs and the four elements related to seeds are different data. may be extracted from the set.
  • the four elements of needs can be acquired from information collected by sales representatives, mass media information, Internet information, and the like.
  • the four elements related to the seeds can be obtained from technical documents within the company.
  • Technical documentation may be internal documentation only, or may include contractual documents, publicly available press releases and patent documents, and the like.
  • the number of elements may be narrowed down and acquired and used within a range in which the elements related to needs and the elements related to seeds correspond. For example, a total of four elements may be obtained: two elements of the purpose and content of the request for the needs and two elements of the features and functions of the technology for the seeds.
  • the above eight elements may not necessarily be extracted according to the elevator pitch syntax, and other items may be set so that the correspondence between needs and seeds can be appropriately quantitatively evaluated as described later. .
  • items (elements) related to needs may be determined in advance from business factors such as management status and market size, and items (elements) related to seeds include the status of joint research and the status of disclosure. may be predetermined.
  • the process of extracting these elements from the original document may be done manually by the person in charge. Alternatively, part or all of the processing may be performed by the information processing device 1 or another terminal device based on an input, instruction, or the like from the terminal device 3 .
  • extracting elements partially for example, input data is decomposed into morphemes such as words using morphological analysis, and then syntactic analysis is used to determine dependencies between morphemes, co-occurrence relationships, etc.
  • morphemes such as words using morphological analysis
  • syntactic analysis is used to determine dependencies between morphemes, co-occurrence relationships, etc.
  • predicates that is, nouns used for verbs and functional expressions (such as nouns that can be combined with "do" to be verbs; Nouns) and objects are used as the minimum units, and elements are extracted by adding modifiers to them according to necessity and conditions.
  • nouns used for verbs and functional expressions such as nouns that can be combined with "do" to be verbs; Nouns
  • objects are used as the minimum units, and elements are extracted by adding modifiers to them according to necessity and conditions.
  • ⁇ hot water'' included in modifiers or objects
  • ⁇ temperature'' object
  • ⁇ maintain'' predicate
  • the gist of the content may be determined based on, for example, the name of the document file or the title of the text document. Through the morphological analysis, syntactic analysis, etc., the main content, purpose, function, feature, etc. corresponding to the name can be automatically extracted.
  • FIG. 2B is a flowchart showing a control procedure by the control unit 11 of the database generation process executed by the information processing device 1 or the like. For example, one or a large number of text data are prepared in advance so as to be readable, and this processing related to the generation of comparison target data is executed by a predetermined input operation or at execution timing such as periodic processing. Note that this database generation process may be executed so that the setting as to whether the text data to be read corresponds to needs or seeds can be acquired.
  • control unit 11 selects and acquires one text data from the prepared text data (step S301; acquisition unit, acquisition step, acquisition means).
  • the control unit 11 determines whether the input data is data related to needs or data related to seeds, and extracts four elements corresponding to the determination result (step S302).
  • four elements may be extracted for each.
  • the control unit 11 organizes the contents of the four extracted elements into verbs (predicates), objects, and modifiers (step S303).
  • the control unit 11 determines whether or not the sorted content overlaps with the content already stored in the storage unit 21 of the database device 2 (step S304).
  • the control unit 11 stores the sorted data in the storage unit 21 of the database device 2 after newly adding or partially updating the data according to the presence or absence of duplication (step S305). It should be noted that if the new organization data completely overlaps with the existing content, there is no need to update it, so the control section 11 may omit the process of step S305.
  • the control unit 11 determines whether or not all the input data to be processed have been acquired (step S306). When it is determined that all the input data have been obtained (“YES” in step S306), the control unit 11 ends the database generation process. If it is determined that the acquisition of input data has not ended (“NO” in step S306), the process of the control unit 11 returns to step S301.
  • each input data is read in order to extract and organize four elements. Extraction and arrangement of four elements may be performed.
  • the needs/seeds data 211 which is a database obtained in advance, manually or automatically by the database generation process, seeds that match a certain need that has been separately input, or seeds that have been input Processing is performed to search for needs that match the seeds.
  • FIG. 3 is a flowchart showing the control procedure by the control unit 11 of the correspondence search control process executed by the information processing device 1.
  • FIG. This corresponding search control process is started, for example, when the control unit 11 acquires a search execution command together with the input data of the needs or seeds input by the terminal device 3 .
  • the input data has necessary elements determined in advance as described above.
  • control unit 11 acquires input data from the terminal device 3 (step S101; acquisition unit, acquisition step, acquisition means). The control unit 11 acquires a setting as to whether the input data is data related to needs or seeds (step S102).
  • the control unit 11 determines whether the input is data (first data) related to needs (step S103). If the input is determined to be the data (first data) related to needs ("YES" in step S103), the control unit 11 sets the retained data (second data) related to seeds as a search target (step S104). Then, the processing of the control unit 11 proceeds to step S106. When it is determined that the input is not the data related to the needs (the data is the data related to the seeds (second data)) ("NO" in step S103), the control unit 11 outputs the held data related to the needs (the first data). data) is set as a search target (step S105). Then, the processing of the control unit 11 proceeds to step S106.
  • control unit 11 executes a matching degree calculation process, which will be described later, to calculate the matching degree of each search target (step S106).
  • the control unit 11 extracts the data of seeds or needs to be searched whose matching degree satisfies the criteria (step S107).
  • the control unit 11 appropriately processes the extracted data as necessary and outputs it to the terminal device 3 as matching information in an easy-to-read form (step S108; output unit, output step, output means). Then, the control unit 11 terminates the correspondence search control process.
  • numerical evaluation is performed by converting the contents extracted and organized in the form of verbs (predicates) and objects into multidimensional vectors.
  • the number of dimensions of the multidimensional vector is not particularly limited, but is, for example, 50 to 200 dimensions in total. Alternatively, they may be converted element by element into multidimensional vectors and these may simply be combined.
  • the request of the need and the content of the need may each be represented by a 50-dimensional vector, and the need may be represented by a combined 100-dimensional vector.
  • a seed feature and a seed function may each be represented by a 50-dimensional vector, and the seed may be represented by a combined 100-dimensional vector.
  • Word2vec, doc2vec derived therefrom, BERT (Bidirectional Encoder Representations from Transformers), and the like are known for conversion from natural language expression to multidimensional vector expression, although not particularly limited.
  • Machine learning related to these conversions may be executed within the matching system 100 of the present embodiment, may be acquired from the outside and used what has already been learned, or access an external server to perform these You can use the program.
  • the degree of adaptation is represented by the distance (an example of similarity) between a first vector representing needs and a second vector representing seeds.
  • distance an example of similarity
  • cosine similarity is used as the distance.
  • Cosine similarity is the inner product of two vectors divided by the product of their respective magnitudes. If the two vectors are unit vectors, the cosine similarity is simply the inner product of the two vectors.
  • the distance may be represented by other indices such as the Euclidean distance.
  • FIG. 4 is a flow chart showing the processing procedure of the degree-of-match calculation process called in the above correspondence search control process.
  • This matching degree calculation process constitutes an analysis step in the matching method of this embodiment, and also constitutes an analysis means in the program 121 .
  • the control unit 11 converts the content of the input data into a multidimensional vector as described above (step S151).
  • the control unit 11 acquires one search target data from the needs/seeds data 211 (step S152).
  • the control unit 11 converts the acquired search target data into a multidimensional vector (step S153).
  • the control unit 11 calculates the distance (for example, cosine similarity) between the multidimensional vector related to the input data and the multidimensional vector related to the search target data (step S154).
  • the control unit 11 determines whether or not all search target data has been acquired (step S155). If it is determined that all search target data has been acquired ("YES" in step S155), the control unit 11 terminates the matching degree calculation process and returns the process to the corresponding search control process.
  • step S155 If it is determined that not all search target data has been acquired (there is search target data that has not been acquired) ("NO" in step S155), the processing of the control unit 11 returns to step S152.
  • a machine learning model related to image recognition may be used as another example (second example) of the degree of conformity.
  • a 50 x 2 pixel matrix in which the values of each component of the multidimensional vector of purpose and content related to needs (requirements) are arranged in one column, and a multidimensional vector of features and functions related to seeds (technologies)
  • a matrix of 50 ⁇ 2 pixels in which each component value is arranged in one column is further combined to generate a matrix of 50 ⁇ 4 pixels (200 pixels, each component value corresponding to a tone value).
  • the degree of compatibility is obtained from the degree of similarity of this matrix pattern to the tendency of the matrix pattern when the seeds and needs are matched.
  • FIG. 5A is a diagram explaining the setting of the degree of conformity in this second example.
  • the 200-pixel matrix pattern is determined.
  • a matrix pattern of 4 rows and 50 columns is used here, it is not limited to this. It may be a simple vector in which 200 elements are arranged in a line, or may be a matrix pattern with other numbers of rows and columns, such as 2 rows and 100 columns.
  • the degree of conformity of the matrix pattern is obtained by inputting this 200-pixel matrix (that is, the first vector and the second vector) to the trained model 1210 that has previously learned a machine learning model. can get.
  • the learned model 1210 may be generated between each data related to seeds and each data related to needs stored in the database device 2 as described above.
  • the distance (cosine similarity) between the second vector related to the data of a certain seed and the first vector related to the data of a certain need is obtained as described above.
  • the above matrix data is generated and used as learning data for those having a high matching degree according to the distance (a small value in the case of cosine similarity) and satisfying the matching condition.
  • a machine learning model is learned by associating this learning data with a numerical value representing high matching (“Good”) as teacher data.
  • a plurality of items are extracted that have a small matching degree according to the distance obtained above (the value is large for cosine similarity) and satisfy the non-matching condition.
  • Learning data is generated by further randomly rearranging vector components related to each element of seeds and each element of needs in the plurality of sets of extracted data.
  • a machine learning model is learned by associating this learning data with numerical values representing low adaptation (“Bad”) as teacher data.
  • a trained model 1210 is obtained in which the rate of high matching with respect to the input matrix pattern (first data and second data) is output as a numerical value or the like as the degree of matching.
  • the algorithm of the machine learning model for example, a supervised model and an algorithm related to pattern recognition may be used, including pattern recognition algorithms such as support vector machines and neural networks, and particularly deep learning.
  • a trained model that outputs a need that matches the input of a multidimensional vector related to a certain seed and a trained model that outputs a seed that matches the input of a multidimensional vector related to a certain need are common. , or may be generated separately.
  • FIG. 5B is a flowchart showing a control procedure by the control unit 11 for the learned model generation process. This process is prepared in an unlearned state with the algorithm of the machine learning model determined in advance, and according to a predetermined input operation from the terminal device 3 or the update of the stored data related to the needs and seeds of the database device 2 It can be started automatically.
  • the control unit 11 converts the content of a certain seed or need into a multidimensional vector (step S201).
  • the control unit 11 acquires one comparison target data (needs data if the input is seeds, and seeds data if the input is needs) (step S202).
  • the control unit 11 converts the obtained comparison target data into a multidimensional vector (step S203).
  • the control unit 11 calculates the distance between the two obtained multidimensional vectors (step S204).
  • the control unit 11 determines whether the calculated distance is within the lower reference (less than the lower reference value) (step S205). If it is determined to be within the lower standard ("YES" in step S205), the control unit 11 generates matrix data combining two multidimensional vectors (step S206).
  • the generated matrix data may be, for example, 4 rows and 50 columns, although it is not particularly limited, as described above.
  • the control unit 11 inputs the generated matrix data to the machine learning model.
  • the control unit 11 optimizes the parameters of the machine learning model by setting highly compatible "Good” as teacher data for this matrix data and back propagating deviations (errors) from the output results. (step S207). Then, the processing of the control unit 11 proceeds to step S210.
  • step S208 If it is determined in the determination process in step S205 that the calculated distance is not within the lower reference value (is greater than or equal to the lower reference value) ("NO" in step S205), the control unit 11 It is determined whether or not the distance obtained is within the upper reference (greater than the upper reference value) (step S208). The distance within the upper reference is greater than that within the lower reference. Between the upper criterion and the lower criterion, there may be a distance range that is not included in either. If it is determined that the distance is within the upper criterion (“YES" in step S208), the control unit 11 converts the original needs data and seeds data set of the two multidimensional vectors from which the distance was obtained to This is stored (step S209). Then, the processing of the control unit 11 proceeds to step S210.
  • step S210 determines whether or not all data to be compared has been acquired. If it is determined that not all the data to be compared has been acquired (there is data to be compared that has not been acquired) ("NO" in step S210), the process of the control unit 11 proceeds to step S202. return.
  • step S211 determines whether or not all data of needs or seeds to be input has been input. If it is determined that all the data of the needs or seeds to be input has not been input (there is data that has not been input) ("NO” in step S211), the processing of the control unit 11 proceeds to step S201. return. At this time, all acquisition information of the comparison target data is initialized.
  • the control unit 11 stores the set of needs data and seeds data stored in the process of step S209. A part of the elements of any one of them is appropriately replaced with a part of the same elements in the other stored set, and then each multidimensional vector is generated again, and matrix data combining these is generated. (Step S212). It should be noted that replacement data may be determined so that the distance between the needs and the seeds does not become close as a result of the replacement.
  • the control unit 11 inputs the matrix data to the machine learning model.
  • control unit 11 sets low-matching “Bad” as teacher data, and optimizes the parameters of the machine learning model by, for example, backpropagating the difference (error) between the output result and the teacher data (step S213). .
  • the trained model 1210 is stored in the storage unit 12, and the control unit 11 performs a trained model generation process. exit.
  • FIG. 6 is a flow chart showing the control procedure of the degree-of-fit calculation process of the second example using the trained model 1210 generated in this way.
  • This conformity level calculation process includes steps S161 and S162 in place of the process of step S154 in the conformity level calculation process of the first example.
  • Other processes are the same, and the same processing contents are assigned the same reference numerals, and detailed description thereof will be omitted.
  • the control unit 11 When the content of the search target data is converted into a multidimensional vector in the process of step S153, the control unit 11 combines the multidimensional vector relating to the input content and the multidimensional vector relating to the search target data to generate matrix data. (Step S161). The control unit 11 inputs this matrix data to the learned model 1210 and performs arithmetic processing related to the learned model 1210 . The control unit 11 acquires the value of the degree of adaptation (output from the learned model 1210) obtained as a result of the processing (step S162). Then, the processing of the control unit 11 proceeds to step S155.
  • Collaborative filtering defines the correspondence between two parameters (here, needs and seeds), and when there is an input of one parameter, the tendency of the other parameter for the other parameter (here, selection, output based on the tendency of similar items, the other parameter, which is not selected and output in response to the other parameter, is selected and output.
  • FIG. 7A is a diagram illustrating the correspondence between needs and seeds related to this collaborative filtering.
  • needs and seeds are arranged in a matrix, and "1" is input for the corresponding relationship that has been selected.
  • the tendencies of the seeds selected for the need 03 and the need NM are similar (seeds 02, 05, 06, etc.), and the corresponding relationships are close.
  • the need 03 is newly input, the seeds 01 that have been selected for the needs NM and not selected for the needs 03 are output.
  • the matching system 100 of the present embodiment using this collaborative filtering technology, when there is an input of needs data or seeds data (one), a multidimensional vector close to the multidimensional vector related to the input content , and select seeds or needs (other) that correspond to the selected needs or seeds (one). That is, since the input needs or seeds (one) is not necessarily the same as what is already held, a close one is used. In this collaborative filtering, appropriate output cannot be produced unless a certain degree of selection has been made and the similarity tendency has been determined. becomes.
  • the selection related to the correspondence relationship is not limited to the selection operation in the correspondence search control process.
  • the selection may also include correspondence through development and sales of actual products. Specifically, instead of the database generation process that extracts seeds and needs from separate text data as described above, needs described in correspondence with seeds data in data such as product development information and sales information Data may be obtained and defined as being in the selected correspondence.
  • FIG. 7B is a flowchart showing a control procedure by the control unit 11 for matrix generation processing related to collaborative filtering.
  • the control unit 11 acquires the contents of each seed and need (one to four elements each) stored from the database device 2, and converts them into multidimensional vectors (step S251).
  • the control unit 11 allocates each seed and need as a component of each row/column of the two-dimensional matrix together with the obtained multidimensional vector (step S252).
  • the control unit 11 sets "1" to the selected cell for each cell indicating a combination of seeds and needs (step S253). Note that, for example, each cell is set to "0" as an initial value (initialized), and only the cell set to "1" is changed from “0" to "1". All you have to do is Then, the control unit 11 terminates the matrix generation process.
  • FIG. 8 is a flow chart showing the control procedure by the control unit 11 of the matching degree calculation process of the third example using this collaborative filtering technique. In this matching level calculation process, only step S151 in the matching level calculation process of the first example shown in FIG.
  • control unit 11 calculates the distance between the input data and the data of the same classification (seeds or needs) set in the matrix (step S171).
  • the control unit 11 extracts the reference number of data in descending order of the calculated distance (step S172).
  • the control unit 11 selects other classified data that has been selected corresponding to the extracted same classified data (step S173).
  • the control unit 11 calculates a score for each of the selected other classification data (step S174).
  • the score may be determined based on the absolute value of the cosine similarity, the reciprocal of the Euclidean distance, or the like so that the larger the distance (the smaller the degree of similarity), the smaller the score.
  • the control unit 11 outputs each selection data together with the score (step S175). Then, the control unit 11 terminates the matching degree calculation process and returns the process to the correspondence search control process.
  • FIG. 9 is a diagram showing an example of output results. Here, there is shown a bar graph in which some of the requirements (needs) held in advance for the technology (seed) D are listed in percent.
  • the matching system 100 of this embodiment includes the control unit 11 of the information processing device 1 .
  • the control unit 11 as an acquisition unit, acquires first data related to needs and second data related to seeds, and as an analysis unit, quantifies and analyzes the acquired first data and second data, and outputs As a part, this analysis result is used to output matching information indicating the degree of matching between the first data and the second data.
  • this matching system 100 it is possible to quantitatively evaluate the match between seeds and needs more easily and objectively.
  • control unit 11 converts the first data and the second data into vectors representing a multidimensional space in the quantification of seeds and needs. Natural language expressions that express seeds and needs still contain a large amount of information even if summarized concisely. Numerical values corresponding to the meaning can be obtained more accurately.
  • control unit 11 determines the degree of conformity based on the degree of similarity (for example, cosine similarity) between the first vector obtained from the first data and the second vector obtained from the second data. calculate. According to such processing, the numerical similarity can be obtained from the obtained first vector and the second vector by simple calculation. can be obtained.
  • degree of similarity for example, cosine similarity
  • control unit 11 as the analysis unit uses the learned model 1210 that uses the first vector and the second vector as inputs and outputs the degree of adaptation.
  • the matching system 100 can obtain an objective and quantitative evaluation by properly learning the machine learning model and outputting the degree of conformity of the first vector and the second vector.
  • this trained model 1210 is based on a pattern recognition algorithm. By performing pattern recognition processing using this trained model 1210 based on the matrix pattern in which the values of the respective direction components converted into multidimensional vectors are arranged, the matching system 100 can detect the overall similarity of the multidimensional vectors. It is possible to appropriately quantitatively evaluate the degree of
  • the terminal device 3 also includes an operation reception unit 34 as an input unit that acquires input data.
  • the control unit 11 as an acquisition unit, acquires the first data or the previous second data from the input data.
  • the degree of compatibility with the other is calculated for most of the data held in the database device 2 (here, round-robin). Therefore, in the matching system 100, when a user desires to acquire other information that matches one need or seed, desired information can be obtained easily and appropriately.
  • the first data includes at least one predetermined element related to needs
  • the second data includes at least one predetermined element related to seeds.
  • Elements of the first data include at least one of the purpose, content, field, and name of the needs
  • elements of the second data include at least one of the features, functions, competing technologies, and names of the seeds. including.
  • control unit 11 when the control unit 11 acquires the input data accepted by the operation accepting unit 34 as an acquisition unit, the control unit 11 extracts at least one of the eight elements related to the elevator pitch syntax from the input data. In this way, elements may be automatically extracted from input document data. This eliminates the need to organize and generate input information manually in advance, thereby reducing labor.
  • the first data and the second data each include at least a noun or a verb indicating a function.
  • the degree of compatibility with high accuracy. be able to.
  • the first data and the second data each include an object for a noun or a verb indicating a function.
  • the target of the operation is also included in the first data and the second data, it is possible to obtain input data that more accurately expresses the operation content or the request content in a concise manner. It is also possible to improve the accuracy of the fitness results obtained based on the quantified data.
  • the matching method of the present embodiment includes an acquisition step of acquiring first data related to needs and second data related to seeds, an analysis step of numerically analyzing the acquired first data and second data, An output step of outputting matching information indicating the degree of matching between the first data and the second data using the analysis result in the analysis step.
  • the trained model 1210 of the present embodiment uses the first vector obtained from the first data related to needs and the second vector obtained from the second data related to seeds as inputs, and uses these first vectors as inputs. A degree of matching between the data and the second data is output. With such a trained model 1210, it is possible to obtain the overall degree of matching between the seeds data and the needs data in a more appropriate and objective value.
  • one data of needs or seeds (one data) is input, the degree of conformity with respect to a plurality of other data held for this is calculated, and the one with a large degree of conformity
  • a large number of data related to needs and data related to seeds may be combined in a round-robin manner to detect omissions in implementation.
  • the needs/seeds data 211 held in the storage unit 21 of the database device 2 is also acquired by the matching degree calculation process and then converted into a multidimensional vector.
  • a multidimensional vector may be held in advance in the seed data 211 .
  • the multidimensional vector data in the needs/seeds data 211 may be updated as needed.
  • the digitization is explained as multi-dimensional vectorization, but the range of information that can be expressed with scalar values may be expressed only with scalar values.
  • the matching degree is obtained based on image recognition technology. For example, by approximating the waveforms of the one-dimensional arrays of the first vector and the second vector, the matching degree may be obtained based on the similarity of the waveforms.
  • image data including graphs and the like is generated and output as output data, but the present invention is not limited to this.
  • Text data or the like may simply be output, a predetermined dedicated output format may be defined in a structured language, and data may be output according to the standard data format of spreadsheet software. .
  • the output may be sent to a printer or the like instead of being sent back to the terminal device 3 to form an image.
  • the input data may be concisely arranged according to other criteria, or may be simply extracted from the data expressed in natural language in units of necessary sentences and clauses without rearrangement.
  • the information processing device 1, the database device 2, and the terminal device 3 are described as separate configurations, but all processing may be performed by a single computer (matching device).
  • the corresponding search control process may be distributed by the controllers of a plurality of server devices.
  • the terminal device 3 is not limited to one specific device, and a plurality of devices may exist.
  • the storage unit 12 made up of a non-volatile memory such as an HDD or a flash memory is taken as an example of a computer-readable medium for storing the program 121 related to control such as calculation of the degree of conformity of the present invention. Illustrated, but not limited to. As other computer-readable media, it is possible to apply other non-volatile memories such as MRAM, and portable recording media such as CD-ROMs and DVD discs.
  • a carrier wave is also applicable to the present invention as a medium for providing program data according to the present invention via a communication line.
  • the specific configurations, contents and procedures of processing operations, etc. shown in the above embodiments can be changed as appropriate without departing from the scope of the present invention.
  • the scope of the present invention includes the scope of the invention described in the claims and the scope of equivalents thereof.
  • This invention can be used for matching systems, matching methods, programs, and trained models.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Library & Information Science (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2022/038679 2021-10-26 2022-10-18 マッチングシステム、マッチング方法、プログラム及び学習済モデル Ceased WO2023074457A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023556340A JPWO2023074457A1 (https=) 2021-10-26 2022-10-18

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-174687 2021-10-26
JP2021174687 2021-10-26

Publications (1)

Publication Number Publication Date
WO2023074457A1 true WO2023074457A1 (ja) 2023-05-04

Family

ID=86157701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/038679 Ceased WO2023074457A1 (ja) 2021-10-26 2022-10-18 マッチングシステム、マッチング方法、プログラム及び学習済モデル

Country Status (2)

Country Link
JP (1) JPWO2023074457A1 (https=)
WO (1) WO2023074457A1 (https=)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025017901A1 (ja) * 2023-07-20 2025-01-23 三菱電機株式会社 情報提案システム、および、情報提案方法
JP7849562B1 (ja) * 2025-12-19 2026-04-21 株式会社AIST Solutions ニーズ・シーズマッチング支援システム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019013344A1 (ja) * 2017-07-14 2019-01-17 株式会社マスターリンク 情報処理装置
JP2019211846A (ja) * 2018-05-31 2019-12-12 リンカーズ株式会社 技術情報提供システム
JP2021026413A (ja) * 2019-08-01 2021-02-22 株式会社大和総研 マッチングシステムおよびプログラム
JP2021157363A (ja) * 2020-03-26 2021-10-07 株式会社野村総合研究所 ニーズマッチング装置およびプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019013344A1 (ja) * 2017-07-14 2019-01-17 株式会社マスターリンク 情報処理装置
JP2019211846A (ja) * 2018-05-31 2019-12-12 リンカーズ株式会社 技術情報提供システム
JP2021026413A (ja) * 2019-08-01 2021-02-22 株式会社大和総研 マッチングシステムおよびプログラム
JP2021157363A (ja) * 2020-03-26 2021-10-07 株式会社野村総合研究所 ニーズマッチング装置およびプログラム

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025017901A1 (ja) * 2023-07-20 2025-01-23 三菱電機株式会社 情報提案システム、および、情報提案方法
JP7849562B1 (ja) * 2025-12-19 2026-04-21 株式会社AIST Solutions ニーズ・シーズマッチング支援システム

Also Published As

Publication number Publication date
JPWO2023074457A1 (https=) 2023-05-04

Similar Documents

Publication Publication Date Title
Gupta et al. Prediction of research trends using LDA based topic modeling
Yang et al. Foundation models meet visualizations: Challenges and opportunities
Steinruecken et al. The automatic statistician
Fan et al. Attribute reduction based on max-decision neighborhood rough set model
Karl et al. A practical guide to text mining with topic extraction
Usuga Cadavid et al. Valuing free-form text data from maintenance logs through transfer learning with CamemBERT
JP5171962B2 (ja) 異種データセットからの知識移転を伴うテキスト分類
US11188819B2 (en) Entity model establishment
WO2010061813A1 (ja) 能動計量学習装置、能動計量学習方法および能動計量学習プログラム
Velasco-Elizondo et al. Knowledge representation and information extraction for analysing architectural patterns
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
Gillies et al. Theme and topic: How qualitative research and topic modeling can be brought together
KR20210129465A (ko) 연구노트 관리 장치 및 이를 이용한 연구노트 검색 방법
US20210342344A1 (en) Weighed Order Decision Making with Visual Representation
WO2023074457A1 (ja) マッチングシステム、マッチング方法、プログラム及び学習済モデル
Malberg et al. FELIX: Automatic and interpretable feature engineering using llms
Lopez et al. Alphad3m: an open-source automl library for multiple ml tasks
Khouya et al. Enriching ontology with named entity recognition (NER) integration
CN120911441A (zh) 基于多模态文献数据智能分析的营销系统
Mostafa et al. Improve the sentiment of bengali language texts with stopword removal
JP2023051423A (ja) 情報処理システム、情報処理方法、および情報処理プログラム
Cherradi et al. Enhancing data lake management systems with LDA approach
US20250307663A1 (en) Method and system for generating knowledge graph
Andrade-Cabrera et al. Literature Reviews with AI: Leveraging Research Rabbit and BERT-Based Models for Efficient Retrieval and Topic Clustering
US12293156B2 (en) Deep technology innovation management by cross-pollinating innovations dataset

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22886785

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023556340

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22886785

Country of ref document: EP

Kind code of ref document: A1