WO2023074457A1

WO2023074457A1 - Matching system, matching method, program, and trained model

Info

Publication number: WO2023074457A1
Application number: PCT/JP2022/038679
Authority: WO
Inventors: 拓己石渡; 恵美子寄▲崎▼
Original assignee: コニカミノルタ株式会社
Priority date: 2021-10-26
Filing date: 2022-10-18
Publication date: 2023-05-04

Abstract

Provided are a matching system, a matching method, a program, and a trained model that are capable of quantitatively evaluating matching of seeds for needs more easily and objectively. This matching system comprises: an acquisition unit that acquires first data on needs and second data on seeds; an analysis unit that quantifies and analyzes the acquired first data and second data; and an output unit that uses the result of analysis by the analysis unit to output matching information indicating the degree of matching of the second data for the first data.

Description

Matching system, matching method, program and trained model

This disclosure relates to matching systems, matching methods, programs, and trained models.

Conventionally, in business development, etc., it is important to sort out the relationship between the seeds of technology, knowledge, equipment and human resources owned by corporations and others and the needs of the world. However, in many cases, researchers and developers do not understand the needs of the world, and strategic planners do not recognize their own seeds. There are often times when it doesn't work.

In recent years, there have been an increasing number of cases where large-scale data is used to make decisions on business selection, efficiency improvement, and new development. In Patent Document 1, using seed data such as in-house technology stored in memory and needs data of the public and customers, a SWOT analysis is used to identify business fields in which the retained seeds meet public needs. is disclosed.

Japanese Patent Application Laid-Open No. 2009-20712

However, with conventional technology, the criteria for evaluating the match between seeds and needs have not been established, making objective quantitative evaluation difficult.

The purpose of this disclosure is to provide a matching system, matching method, program, and trained model that can more easily and objectively quantitatively evaluate the match between seeds and needs.

In order to achieve the above object, the invention according to claim 1,
an acquisition unit that acquires first data related to needs and second data related to seeds;
an analysis unit that quantifies and analyzes the acquired first data and second data;
an output unit that outputs matching information indicating the degree of matching between the first data and the second data using the analysis result by the analysis unit;
is a matching system comprising

Further, the invention according to claim 2 is the matching system according to claim 1,
The analysis unit converts the first data and the second data into vectors representing a multidimensional space in the digitization.

Further, the invention according to claim 3 is the matching system according to claim 2,
The analysis unit calculates the degree of conformity based on a degree of similarity between a first vector obtained from the first data and a second vector obtained from the second data.

Further, the invention according to claim 4 is the matching system according to claim 2,
The analysis unit has a trained model that uses as inputs a first vector obtained from the first data and a second vector obtained from the second data and outputs the degree of fitness.

Further, the invention according to claim 5 is the matching system according to claim 4,
The trained model is based on a pattern recognition algorithm.

Further, the invention according to claim 6 is the matching system according to any one of claims 1 to 5,
An input unit for acquiring input data,
The acquisition unit acquires the first data or the second data from the input data.

Further, the invention according to claim 7 is the matching system according to any one of claims 1 to 6,
The first data includes at least one predetermined element related to needs,
The second data includes at least one predetermined element related to seeds.

Further, the invention according to claim 8 is the matching system according to claim 7,
The element of the first data includes at least one of the purpose, content, field and name of the needs,
The elements of the second data include at least one of features, functions, competing technologies and names of seeds.

Further, the invention according to claim 9 is the matching system according to claim 7 or 8,
An input unit for acquiring input data,
The acquisition unit extracts the element when acquiring the first data or the second data from the input data.

Further, the invention according to claim 10 is the matching system according to any one of claims 1 to 9,
Each of the first data and the second data includes at least a noun or a verb indicating a function.

Further, the invention according to claim 11 is the matching system according to claim 10,
The first data and the second data each include an object for a noun or verb indicating the function.

Further, the invention according to claim 12,
A matching method performed by a computer control unit,
an acquisition step of acquiring first data related to needs and second data related to seeds;
an analysis step of quantifying and analyzing the acquired first data and second data, respectively;
An output step of outputting matching information indicating the degree of matching between the first data and the second data using the analysis result in the analysis step;
including.

Further, the invention according to claim 13,
the computer,
Acquisition means for acquiring first data related to needs and second data related to seeds;
analysis means for digitizing and analyzing the acquired first data and second data, respectively;
output means for outputting matching information indicating the degree of matching between the first data and the second data using the analysis result by the analysis means;
It is a program that functions as

Further, the invention according to claim 14,
A first vector obtained from the first data related to the needs and a second vector obtained from the second data related to the seeds are used as inputs, and the degree of conformity between the first data and the second data is output. It is a trained model that

According to this disclosure, there is an effect that the match between seeds and needs can be quantitatively evaluated more easily and objectively.

1 is an overall configuration diagram of a matching system according to an embodiment; FIG. It is a figure explaining the content of seeds/needs data. 4 is a flowchart showing a control procedure of database generation processing; FIG. 10 is a flow chart showing a control procedure of correspondence search control processing; FIG. FIG. 11 is a flow chart showing a processing procedure of matching degree calculation processing called in correspondence search control processing; FIG. It is a figure explaining the setting of a conformity degree of a 2nd example. 7 is a flowchart showing a control procedure of learned model generation processing; FIG. 11 is a flow chart showing a control procedure of a second example of a degree-of-fit calculation process using a trained model; FIG. It is a figure explaining the correspondence of needs and seeds which concern on collaborative filtering. 4 is a flowchart showing a control procedure of matrix generation processing related to collaborative filtering; FIG. 11 is a flow chart showing a control procedure of a matching degree calculation process of a third example using a technique of collaborative filtering; FIG. FIG. 10 is a diagram showing an example of output results;

Embodiments will be described below with reference to the drawings.
FIG. 1 is an overall configuration diagram of a matching system 100 of this embodiment.
A matching system 100 of this embodiment includes an information processing device 1 , a database device 2 , and a terminal device 3 .

The information processing device 1 is a computer that performs processing related to matching in this embodiment, and may be, for example, an ordinary PC (Personal Computer). The information processing device 1 is connected to the database device 2, refers to the held data stored in the database device 2, and writes and adds new data. The database device 2 may not be directly connected to the information processing device 1, and may be accessible through the network N.

The information processing device 1 includes a control unit 11 (acquisition unit, analysis unit, output unit), a storage unit 12, a communication unit 13, and the like.
The control unit 11 has a hardware processor such as a CPU that performs arithmetic processing and controls the operation of the information processing apparatus 1 and a memory such as a RAM.

The storage unit 12 has a nonvolatile memory and stores a program 121, setting data, and the like. The program 121 includes a control program related to the correspondence search control process of this embodiment. In addition, the program 121 may include a part or all of the database generation processing, learned model generation processing, and matrix generation processing described later. When the learned model generation processing is included, the program 121 also includes the generated trained model 1210 .
The non-volatile memory may be, for example, flash memory or HDD (Hard Disk Drive).

The communication unit 13 controls transmission and reception of data with external devices via the network N. The terminal device 3 is included in the external device. A communication standard for data transmission/reception may be, for example, a LAN (Local Area Network) standard (TCP/IP, etc.).

The information processing device 1 may also include a display unit, an operation reception unit, and the like.

The database device 2 has a storage unit 21 . The storage unit 21 stores and holds needs/seeds data 211 acquired in advance, learning data and learned models for converting each term, phrase, sentence, etc. into a multidimensional vector (semantic vector). .

The terminal device 3 may be an ordinary PC or a mobile terminal (smartphone, etc.), and performs operations such as inputting data to be matched and displaying matching results.

The terminal device 3 includes a control unit 31, a communication unit 32, a display unit 33, an operation reception unit 34 (input unit), and the like. The control unit 31 includes a hardware processor and a memory that perform arithmetic processing and centrally control the operation of the terminal device 3 . The communication unit 32 controls transmission and reception of data with an external device via the network N. FIG. The external device includes the information processing apparatus 1 described above. The communication standard related to data transmission/reception is the same as that of the information processing apparatus 1, that is, the standard related to LAN (TCP/IP, etc.) is included.

The display unit 33 has a display screen on which characters can be displayed, and performs display operations on the display screen under the control of the control unit 31 . The display screen is not particularly limited, but is, for example, a liquid crystal display screen (LCD).

The operation reception unit 34 receives an input operation from the user of the terminal device 3 and outputs the content of the input operation to the control unit 31 as an operation signal. The operation reception unit 34 has, for example, a keyboard and a pointing device. The pointing device includes a mouse and the like. In addition to or instead of this, the operation reception unit 34 may have a touch panel or the like that overlaps the display screen. Alternatively, these may be externally attached peripheral devices.

Next, matching in the matching system 100 of this embodiment will be described.
In the matching system 100, seeds such as knowledge related to technology and intellectual property in general owned by itself (for example, a corporation such as its own company) (that is, things including intangibles. Here, things do not include people) seeds, Matching needs such as demands and wishes of customers and society.

In this matching system 100, with respect to input of data related to seeds (second data) or data related to needs (first data) (that is, either one of the first data and the second data), the one and the other data held by the matching system 100, and the other data having a high degree of matching (matching degree) with the one data is output. In other words, the degree of suitability is an index that indicates whether the seeds can satisfy the needs/whether the needs can be satisfied by the seeds. In the case of a problem, it can be an index that shows whether the seeds can be the solution. Input of data is received by the operation receiving unit 34 of the terminal device 3 or the like. The operation accepting unit 34 may directly accept a data input operation, or may accept an input operation specifying a file name and, if necessary, a path where the file is located. The control unit 31 transmits the input data received by the operation receiving unit 34 through the communication unit 32 to the information processing device 1 via the network N. FIG.

In analysis related to matching, each of the elements of seeds and elements of needs is represented (converted) by a multidimensional vector (vector representing a multidimensional space) that is an array of numerical values representing the sizes of multiple semantic components. digitized by Then, the degree of conformity is quantitatively evaluated according to the degree of matching (distance) between the multidimensional vector of seeds (second vector) and the multidimensional vector of needs (first vector).

FIG. 2A is a diagram for explaining the content of seeds/needs data.
Preferably, the seeds and needs element concisely presents the required information. For example, here, the four elements of the need (Why), the content (What), the field and name of the request, and the four elements of the seeds, such as technology features, functions, competing technologies and names are defined. . These eight elements are known as the Elevator Pitch syntax, a short, to-the-point business speech syntax.

In the data stored in the database device 2 as the needs/seeds data 211 to be compared with the input data, among these eight elements, the four elements related to needs and the four elements related to seeds are different data. may be extracted from the set. For example, the four elements of needs can be acquired from information collected by sales representatives, mass media information, Internet information, and the like. Also, for example, the four elements related to the seeds can be obtained from technical documents within the company. Technical documentation may be internal documentation only, or may include contractual documents, publicly available press releases and patent documents, and the like.

It should be noted that it is not always necessary to use all eight elements, and the number of elements may be narrowed down and acquired and used within a range in which the elements related to needs and the elements related to seeds correspond. For example, a total of four elements may be obtained: two elements of the purpose and content of the request for the needs and two elements of the features and functions of the technology for the seeds.
In addition, the above eight elements may not necessarily be extracted according to the elevator pitch syntax, and other items may be set so that the correspondence between needs and seeds can be appropriately quantitatively evaluated as described later. . For example, items (elements) related to needs may be determined in advance from business factors such as management status and market size, and items (elements) related to seeds include the status of joint research and the status of disclosure. may be predetermined.

The process of extracting these elements from the original document may be done manually by the person in charge. Alternatively, part or all of the processing may be performed by the information processing device 1 or another terminal device based on an input, instruction, or the like from the terminal device 3 . When extracting elements partially, for example, input data is decomposed into morphemes such as words using morphological analysis, and then syntactic analysis is used to determine dependencies between morphemes, co-occurrence relationships, etc. By obtaining , it is possible to easily determine the tendency by using a well-known technique of text mining, such as extracting words and phrases that frequently appear in the information once. Then, based on this result, the person in charge or the like may extract the four elements while selecting words in line with the tendency.

The longer these extracted elements, the more information they contain, but on the other hand, they tend to include less important words. In the matching system 100 of the present embodiment, for example, predicates, that is, nouns used for verbs and functional expressions (such as nouns that can be combined with "do" to be verbs; Nouns) and objects are used as the minimum units, and elements are extracted by adding modifiers to them according to necessity and conditions. For example, with regard to a heat-insulating structure (heat-retaining container, etc.), ``hot water'' (included in modifiers or objects), ``temperature'' (object), ``maintain'' (predicate), etc. are organized.

Also, if all the processing is performed by the information processing apparatus 1 or the like, the gist of the content may be determined based on, for example, the name of the document file or the title of the text document. Through the morphological analysis, syntactic analysis, etc., the main content, purpose, function, feature, etc. corresponding to the name can be automatically extracted.

FIG. 2B is a flowchart showing a control procedure by the control unit 11 of the database generation process executed by the information processing device 1 or the like. For example, one or a large number of text data are prepared in advance so as to be readable, and this processing related to the generation of comparison target data is executed by a predetermined input operation or at execution timing such as periodic processing. Note that this database generation process may be executed so that the setting as to whether the text data to be read corresponds to needs or seeds can be acquired.

When the database generation process is started, the control unit 11 selects and acquires one text data from the prepared text data (step S301; acquisition unit, acquisition step, acquisition means). The control unit 11 determines whether the input data is data related to needs or data related to seeds, and extracts four elements corresponding to the determination result (step S302). In addition, when a plurality of needs or seeds are specified in one text data, four elements may be extracted for each.

The control unit 11 organizes the contents of the four extracted elements into verbs (predicates), objects, and modifiers (step S303). The control unit 11 determines whether or not the sorted content overlaps with the content already stored in the storage unit 21 of the database device 2 (step S304). The control unit 11 stores the sorted data in the storage unit 21 of the database device 2 after newly adding or partially updating the data according to the presence or absence of duplication (step S305). It should be noted that if the new organization data completely overlaps with the existing content, there is no need to update it, so the control section 11 may omit the process of step S305.

The control unit 11 determines whether or not all the input data to be processed have been acquired (step S306). When it is determined that all the input data have been obtained (“YES” in step S306), the control unit 11 ends the database generation process. If it is determined that the acquisition of input data has not ended ("NO" in step S306), the process of the control unit 11 returns to step S301.

In the above embodiment, each input data is read in order to extract and organize four elements. Extraction and arrangement of four elements may be performed.

By referring to the needs/seeds data 211, which is a database obtained in advance, manually or automatically by the database generation process, seeds that match a certain need that has been separately input, or seeds that have been input Processing is performed to search for needs that match the seeds.

FIG. 3 is a flowchart showing the control procedure by the control unit 11 of the correspondence search control process executed by the information processing device 1. FIG. This corresponding search control process is started, for example, when the control unit 11 acquires a search execution command together with the input data of the needs or seeds input by the terminal device 3 . The input data has necessary elements determined in advance as described above.

When the correspondence search control process is started, the control unit 11 acquires input data from the terminal device 3 (step S101; acquisition unit, acquisition step, acquisition means). The control unit 11 acquires a setting as to whether the input data is data related to needs or seeds (step S102).

The control unit 11 determines whether the input is data (first data) related to needs (step S103). If the input is determined to be the data (first data) related to needs ("YES" in step S103), the control unit 11 sets the retained data (second data) related to seeds as a search target ( step S104). Then, the processing of the control unit 11 proceeds to step S106. When it is determined that the input is not the data related to the needs (the data is the data related to the seeds (second data)) ("NO" in step S103), the control unit 11 outputs the held data related to the needs (the first data). data) is set as a search target (step S105). Then, the processing of the control unit 11 proceeds to step S106.

After proceeding to step S106, the control unit 11 executes a matching degree calculation process, which will be described later, to calculate the matching degree of each search target (step S106). The control unit 11 extracts the data of seeds or needs to be searched whose matching degree satisfies the criteria (step S107). The control unit 11 appropriately processes the extracted data as necessary and outputs it to the terminal device 3 as matching information in an easy-to-read form (step S108; output unit, output step, output means). Then, the control unit 11 terminates the correspondence search control process.

Next, calculation of the degree of conformity will be described.
In the matching system 100 of the present embodiment, numerical evaluation is performed by converting the contents extracted and organized in the form of verbs (predicates) and objects into multidimensional vectors. The number of dimensions of the multidimensional vector is not particularly limited, but is, for example, 50 to 200 dimensions in total. Alternatively, they may be converted element by element into multidimensional vectors and these may simply be combined. For example, the request of the need and the content of the need may each be represented by a 50-dimensional vector, and the need may be represented by a combined 100-dimensional vector. Similarly, a seed feature and a seed function may each be represented by a 50-dimensional vector, and the seed may be represented by a combined 100-dimensional vector. Word2vec, doc2vec derived therefrom, BERT (Bidirectional Encoder Representations from Transformers), and the like are known for conversion from natural language expression to multidimensional vector expression, although not particularly limited. Machine learning related to these conversions may be executed within the matching system 100 of the present embodiment, may be acquired from the outside and used what has already been learned, or access an external server to perform these You can use the program.

As a first example, the degree of adaptation is represented by the distance (an example of similarity) between a first vector representing needs and a second vector representing seeds. For example, cosine similarity is used as the distance. Cosine similarity is the inner product of two vectors divided by the product of their respective magnitudes. If the two vectors are unit vectors, the cosine similarity is simply the inner product of the two vectors. Alternatively, the distance may be represented by other indices such as the Euclidean distance.

FIG. 4 is a flow chart showing the processing procedure of the degree-of-match calculation process called in the above correspondence search control process.
This matching degree calculation process constitutes an analysis step in the matching method of this embodiment, and also constitutes an analysis means in the program 121 .
When the matching degree calculation process is called, the control unit 11 converts the content of the input data into a multidimensional vector as described above (step S151).

The control unit 11 acquires one search target data from the needs/seeds data 211 (step S152). The control unit 11 converts the acquired search target data into a multidimensional vector (step S153). The control unit 11 calculates the distance (for example, cosine similarity) between the multidimensional vector related to the input data and the multidimensional vector related to the search target data (step S154).

The control unit 11 determines whether or not all search target data has been acquired (step S155). If it is determined that all search target data has been acquired ("YES" in step S155), the control unit 11 terminates the matching degree calculation process and returns the process to the corresponding search control process.

If it is determined that not all search target data has been acquired (there is search target data that has not been acquired) ("NO" in step S155), the processing of the control unit 11 returns to step S152.

Alternatively, a machine learning model related to image recognition may be used as another example (second example) of the degree of conformity. For example, a 50 x 2 pixel matrix in which the values of each component of the multidimensional vector of purpose and content related to needs (requirements) are arranged in one column, and a multidimensional vector of features and functions related to seeds (technologies) A matrix of 50×2 pixels in which each component value is arranged in one column is further combined to generate a matrix of 50×4 pixels (200 pixels, each component value corresponding to a tone value). Then, the degree of compatibility is obtained from the degree of similarity of this matrix pattern to the tendency of the matrix pattern when the seeds and needs are matched.

FIG. 5A is a diagram explaining the setting of the degree of conformity in this second example.
As shown in FIG. 5A, the 200-pixel matrix pattern is determined. Although a matrix pattern of 4 rows and 50 columns is used here, it is not limited to this. It may be a simple vector in which 200 elements are arranged in a line, or may be a matrix pattern with other numbers of rows and columns, such as 2 rows and 100 columns.

The degree of conformity of the matrix pattern is obtained by inputting this 200-pixel matrix (that is, the first vector and the second vector) to the trained model 1210 that has previously learned a machine learning model. can get. The learned model 1210 may be generated between each data related to seeds and each data related to needs stored in the database device 2 as described above.

Specifically, the distance (cosine similarity) between the second vector related to the data of a certain seed and the first vector related to the data of a certain need is obtained as described above. Then, the above matrix data is generated and used as learning data for those having a high matching degree according to the distance (a small value in the case of cosine similarity) and satisfying the matching condition. A machine learning model is learned by associating this learning data with a numerical value representing high matching (“Good”) as teacher data.

On the other hand, a plurality of items are extracted that have a small matching degree according to the distance obtained above (the value is large for cosine similarity) and satisfy the non-matching condition. Learning data is generated by further randomly rearranging vector components related to each element of seeds and each element of needs in the plurality of sets of extracted data. A machine learning model is learned by associating this learning data with numerical values representing low adaptation (“Bad”) as teacher data.

Through these learnings, a trained model 1210 is obtained in which the rate of high matching with respect to the input matrix pattern (first data and second data) is output as a numerical value or the like as the degree of matching. As the algorithm of the machine learning model, for example, a supervised model and an algorithm related to pattern recognition may be used, including pattern recognition algorithms such as support vector machines and neural networks, and particularly deep learning.
Note that a trained model that outputs a need that matches the input of a multidimensional vector related to a certain seed and a trained model that outputs a seed that matches the input of a multidimensional vector related to a certain need are common. , or may be generated separately.

FIG. 5B is a flowchart showing a control procedure by the control unit 11 for the learned model generation process. This process is prepared in an unlearned state with the algorithm of the machine learning model determined in advance, and according to a predetermined input operation from the terminal device 3 or the update of the stored data related to the needs and seeds of the database device 2 It can be started automatically.

When the learned model generation process is started, the control unit 11 converts the content of a certain seed or need into a multidimensional vector (step S201). The control unit 11 acquires one comparison target data (needs data if the input is seeds, and seeds data if the input is needs) (step S202). The control unit 11 converts the obtained comparison target data into a multidimensional vector (step S203).

The control unit 11 calculates the distance between the two obtained multidimensional vectors (step S204). The control unit 11 determines whether the calculated distance is within the lower reference (less than the lower reference value) (step S205). If it is determined to be within the lower standard ("YES" in step S205), the control unit 11 generates matrix data combining two multidimensional vectors (step S206). The generated matrix data may be, for example, 4 rows and 50 columns, although it is not particularly limited, as described above. The control unit 11 inputs the generated matrix data to the machine learning model. In addition, the control unit 11 optimizes the parameters of the machine learning model by setting highly compatible "Good" as teacher data for this matrix data and back propagating deviations (errors) from the output results. (step S207). Then, the processing of the control unit 11 proceeds to step S210.

If it is determined in the determination process in step S205 that the calculated distance is not within the lower reference value (is greater than or equal to the lower reference value) ("NO" in step S205), the control unit 11 It is determined whether or not the distance obtained is within the upper reference (greater than the upper reference value) (step S208). The distance within the upper reference is greater than that within the lower reference. Between the upper criterion and the lower criterion, there may be a distance range that is not included in either. If it is determined that the distance is within the upper criterion ("YES" in step S208), the control unit 11 converts the original needs data and seeds data set of the two multidimensional vectors from which the distance was obtained to This is stored (step S209). Then, the processing of the control unit 11 proceeds to step S210.

After proceeding to the process of step S210, the control unit 11 determines whether or not all data to be compared has been acquired (step S210). If it is determined that not all the data to be compared has been acquired (there is data to be compared that has not been acquired) ("NO" in step S210), the process of the control unit 11 proceeds to step S202. return. When it is determined that all the data to be compared have been acquired ("YES" in step S210), the control unit 11 determines whether or not all data of needs or seeds to be input has been input (step S211). If it is determined that all the data of the needs or seeds to be input has not been input (there is data that has not been input) ("NO" in step S211), the processing of the control unit 11 proceeds to step S201. return. At this time, all acquisition information of the comparison target data is initialized.

When it is determined that all input target needs or seeds data have been input ("YES" in step S211), the control unit 11 stores the set of needs data and seeds data stored in the process of step S209. A part of the elements of any one of them is appropriately replaced with a part of the same elements in the other stored set, and then each multidimensional vector is generated again, and matrix data combining these is generated. (Step S212). It should be noted that replacement data may be determined so that the distance between the needs and the seeds does not become close as a result of the replacement. The control unit 11 inputs the matrix data to the machine learning model. In addition, the control unit 11 sets low-matching “Bad” as teacher data, and optimizes the parameters of the machine learning model by, for example, backpropagating the difference (error) between the output result and the teacher data (step S213). . After inputting all the generated matrix data and optimizing parameters by backpropagation of errors, etc., the trained model 1210 is stored in the storage unit 12, and the control unit 11 performs a trained model generation process. exit.

FIG. 6 is a flow chart showing the control procedure of the degree-of-fit calculation process of the second example using the trained model 1210 generated in this way. This conformity level calculation process includes steps S161 and S162 in place of the process of step S154 in the conformity level calculation process of the first example. Other processes are the same, and the same processing contents are assigned the same reference numerals, and detailed description thereof will be omitted.

When the content of the search target data is converted into a multidimensional vector in the process of step S153, the control unit 11 combines the multidimensional vector relating to the input content and the multidimensional vector relating to the search target data to generate matrix data. (Step S161). The control unit 11 inputs this matrix data to the learned model 1210 and performs arithmetic processing related to the learned model 1210 . The control unit 11 acquires the value of the degree of adaptation (output from the learned model 1210) obtained as a result of the processing (step S162). Then, the processing of the control unit 11 proceeds to step S155.

Alternatively, a technique of collaborative filtering may be used as a third example of goodness of fit.
Collaborative filtering defines the correspondence between two parameters (here, needs and seeds), and when there is an input of one parameter, the tendency of the other parameter for the other parameter (here, selection, output based on the tendency of similar items, the other parameter, which is not selected and output in response to the other parameter, is selected and output.

FIG. 7A is a diagram illustrating the correspondence between needs and seeds related to this collaborative filtering.
In FIG. 7A, needs and seeds are arranged in a matrix, and "1" is input for the corresponding relationship that has been selected. Here, the tendencies of the seeds selected for the need 03 and the need NM are similar (seeds 02, 05, 06, etc.), and the corresponding relationships are close. Here, when the need 03 is newly input, the seeds 01 that have been selected for the needs NM and not selected for the needs 03 are output.

In the matching system 100 of the present embodiment, using this collaborative filtering technology, when there is an input of needs data or seeds data (one), a multidimensional vector close to the multidimensional vector related to the input content , and select seeds or needs (other) that correspond to the selected needs or seeds (one). That is, since the input needs or seeds (one) is not necessarily the same as what is already held, a close one is used. In this collaborative filtering, appropriate output cannot be produced unless a certain degree of selection has been made and the similarity tendency has been determined. becomes.

In this case, the selection related to the correspondence relationship is not limited to the selection operation in the correspondence search control process. The selection may also include correspondence through development and sales of actual products. Specifically, instead of the database generation process that extracts seeds and needs from separate text data as described above, needs described in correspondence with seeds data in data such as product development information and sales information Data may be obtained and defined as being in the selected correspondence.

FIG. 7B is a flowchart showing a control procedure by the control unit 11 for matrix generation processing related to collaborative filtering.
First, the control unit 11 acquires the contents of each seed and need (one to four elements each) stored from the database device 2, and converts them into multidimensional vectors (step S251).

The control unit 11 allocates each seed and need as a component of each row/column of the two-dimensional matrix together with the obtained multidimensional vector (step S252). The control unit 11 sets "1" to the selected cell for each cell indicating a combination of seeds and needs (step S253). Note that, for example, each cell is set to "0" as an initial value (initialized), and only the cell set to "1" is changed from "0" to "1". All you have to do is Then, the control unit 11 terminates the matrix generation process.

FIG. 8 is a flow chart showing the control procedure by the control unit 11 of the matching degree calculation process of the third example using this collaborative filtering technique.
In this matching level calculation process, only step S151 in the matching level calculation process of the first example shown in FIG.

Following the processing of step S151, the control unit 11 calculates the distance between the input data and the data of the same classification (seeds or needs) set in the matrix (step S171). The control unit 11 extracts the reference number of data in descending order of the calculated distance (step S172).

The control unit 11 selects other classified data that has been selected corresponding to the extracted same classified data (step S173). The control unit 11 calculates a score for each of the selected other classification data (step S174). The score may be determined based on the absolute value of the cosine similarity, the reciprocal of the Euclidean distance, or the like so that the larger the distance (the smaller the degree of similarity), the smaller the score.

The control unit 11 outputs each selection data together with the score (step S175). Then, the control unit 11 terminates the matching degree calculation process and returns the process to the correspondence search control process.

FIG. 9 is a diagram showing an example of output results.
Here, there is shown a bar graph in which some of the requirements (needs) held in advance for the technology (seed) D are listed in percent.

By outputting image data for illustration in this way, it is possible to show the user of the terminal device 3 (matching system 100) a result of the degree of matching that is more intuitive and easy to understand. At this time, by changing and setting the reference value (such as the lower limit of the degree of conformity) to be displayed in a list according to the user's input operation to the operation reception unit 34, the accuracy required for the output result by the user can be adjusted. can be done. By setting the reference value high, the display is narrowed down to only information with high accuracy (accuracy), and if the user desires a reference amount of information rather than accuracy (accuracy), set the reference value low. A lot of miscellaneous information is displayed.

As described above, the matching system 100 of this embodiment includes the control unit 11 of the information processing device 1 . The control unit 11, as an acquisition unit, acquires first data related to needs and second data related to seeds, and as an analysis unit, quantifies and analyzes the acquired first data and second data, and outputs As a part, this analysis result is used to output matching information indicating the degree of matching between the first data and the second data.
In this way, it is possible to quantify the seeds and needs and quantitatively judge the degree of suitability. It becomes possible to obtain information about the other Therefore, according to this matching system 100, it is possible to quantitatively evaluate the match between seeds and needs more easily and objectively.

In addition, the control unit 11, as an analysis unit, converts the first data and the second data into vectors representing a multidimensional space in the quantification of seeds and needs. Natural language expressions that express seeds and needs still contain a large amount of information even if summarized concisely. Numerical values corresponding to the meaning can be obtained more accurately.

Further, the control unit 11, as the analysis unit, determines the degree of conformity based on the degree of similarity (for example, cosine similarity) between the first vector obtained from the first data and the second vector obtained from the second data. calculate. According to such processing, the numerical similarity can be obtained from the obtained first vector and the second vector by simple calculation. can be obtained.

In addition, the control unit 11 as the analysis unit uses the learned model 1210 that uses the first vector and the second vector as inputs and outputs the degree of adaptation. In this way, the matching system 100 can obtain an objective and quantitative evaluation by properly learning the machine learning model and outputting the degree of conformity of the first vector and the second vector.

Also, this trained model 1210 is based on a pattern recognition algorithm. By performing pattern recognition processing using this trained model 1210 based on the matrix pattern in which the values of the respective direction components converted into multidimensional vectors are arranged, the matching system 100 can detect the overall similarity of the multidimensional vectors. It is possible to appropriately quantitatively evaluate the degree of

The terminal device 3 also includes an operation reception unit 34 as an input unit that acquires input data.
The control unit 11, as an acquisition unit, acquires the first data or the previous second data from the input data. By specifying and inputting one of the needs and seeds in this way, the degree of compatibility with the other is calculated for most of the data held in the database device 2 (here, round-robin). Therefore, in the matching system 100, when a user desires to acquire other information that matches one need or seed, desired information can be obtained easily and appropriately.

Also, the first data includes at least one predetermined element related to needs, and the second data includes at least one predetermined element related to seeds. In this way, by standardizing the extracted items and facilitating correspondence between the first data and the second data, the matching system 100 can calculate the matching degree more appropriately and improve the accuracy of the matching information. can.

Elements of the first data include at least one of the purpose, content, field, and name of the needs, and elements of the second data include at least one of the features, functions, competing technologies, and names of the seeds. including. In this way, by using data containing information that simply and concisely expresses needs and seeds as input data, it is possible to more accurately quantify these data while reducing noise. Therefore, in this matching system 100, it is possible to improve the accuracy of the matching information including the result of the degree of matching that is output.

Further, when the control unit 11 acquires the input data accepted by the operation accepting unit 34 as an acquisition unit, the control unit 11 extracts at least one of the eight elements related to the elevator pitch syntax from the input data.
In this way, elements may be automatically extracted from input document data. This eliminates the need to organize and generate input information manually in advance, thereby reducing labor.

Also, the first data and the second data each include at least a noun or a verb indicating a function. In particular, for needs and seeds related to technical content, by obtaining input data including natural language expressions that appropriately express what kind of operation is performed or required, we can calculate the degree of compatibility with high accuracy. be able to.

Also, the first data and the second data each include an object for a noun or a verb indicating a function. In addition to the above, since the target of the operation is also included in the first data and the second data, it is possible to obtain input data that more accurately expresses the operation content or the request content in a concise manner. It is also possible to improve the accuracy of the fitness results obtained based on the quantified data.

In addition, the matching method of the present embodiment includes an acquisition step of acquiring first data related to needs and second data related to seeds, an analysis step of numerically analyzing the acquired first data and second data, An output step of outputting matching information indicating the degree of matching between the first data and the second data using the analysis result in the analysis step. With such a matching method, it is possible to quantify seeds and needs and quantitatively judge the degree of suitability. information can be obtained. Therefore, according to this matching method, the match between seeds and needs can be quantitatively evaluated more easily and objectively.

In addition, by installing and executing the program 121 related to the matching method on a computer (such as the information processing device 1), it is possible to easily match seeds and needs with a general-purpose device without requiring a special configuration. A quantitative evaluation can be made objectively.

In addition, the trained model 1210 of the present embodiment uses the first vector obtained from the first data related to needs and the second vector obtained from the second data related to seeds as inputs, and uses these first vectors as inputs. A degree of matching between the data and the second data is output. With such a trained model 1210, it is possible to obtain the overall degree of matching between the seeds data and the needs data in a more appropriate and objective value.

It should be noted that the present invention is not limited to the above embodiments, and various modifications are possible.
For example, in the above embodiment, one data of needs or seeds (one data) is input, the degree of conformity with respect to a plurality of other data held for this is calculated, and the one with a large degree of conformity However, when a large number of data related to needs and data related to seeds are provided, they may be combined in a round-robin manner to detect omissions in implementation.

Further, in the above-described embodiment, the needs/seeds data 211 held in the storage unit 21 of the database device 2 is also acquired by the matching degree calculation process and then converted into a multidimensional vector. A multidimensional vector may be held in advance in the seed data 211 . In this case, when a trained model for converting natural language data into a multidimensional vector is updated, the multidimensional vector data in the needs/seeds data 211 may be updated as needed.

Also, in the third example of the above embodiment, when using collaborative filtering, only binary setting is performed depending on the presence or absence of output, but this is not the only option. Addition or weighting according to the output frequency, the actual implementation status after output, and the user's impression (such as "like") who developed and distributed ideas according to the combination regardless of the output result You may perform multi-value setting with The value to be set is not limited to an integer, and may be any real number regardless of whether it is positive or negative. The matrix data may be updated by performing the matrix generation process considering these as needed. Further, in this collaborative filtering, although the degree of similarity of the multidimensional vectors of the input data and the retained data of the same type has been explained, it is not always necessary to consider the degree of similarity. It may be switched so as not to be considered when it is unnecessary according to an increase in the output setting or the like.

Also, in the above embodiment, the digitization is explained as multi-dimensional vectorization, but the range of information that can be expressed with scalar values may be expressed only with scalar values.

Further, in the above-described embodiment, as a second example of comparing the entire component array of the first vector and the entire component array of the second vector, the matching degree is obtained based on image recognition technology. For example, by approximating the waveforms of the one-dimensional arrays of the first vector and the second vector, the matching degree may be obtained based on the similarity of the waveforms.

In addition, when using a machine learning model, instead of using the methods mainly used for pattern recognition as described above, methods using decision trees such as random forests and gradient boosting for the above component sequences are used by machines. May be used for learning models.

Also, in the above embodiment, it is assumed that image data including graphs and the like is generated and output as output data, but the present invention is not limited to this. Text data or the like may simply be output, a predetermined dedicated output format may be defined in a structured language, and data may be output according to the standard data format of spreadsheet software. . Also, the output may be sent to a printer or the like instead of being sent back to the terminal device 3 to form an image.

Also, in the above embodiment, it was explained that four elements each of needs and seeds are extracted according to the structure of the elevator pitch syntax and used for quantification, but it is not limited to this. The input data may be concisely arranged according to other criteria, or may be simply extracted from the data expressed in natural language in units of necessary sentences and clauses without rearrangement.

Further, in the above embodiment, the description is made assuming that the combination of the noun or verb indicating the function, the object, and, if necessary, the form with modifiers is arranged and then numerically quantified. can't Adjectives and the like may be included.

Also, in the above embodiment, the information processing device 1, the database device 2, and the terminal device 3 are described as separate configurations, but all processing may be performed by a single computer (matching device). On the other hand, the corresponding search control process may be distributed by the controllers of a plurality of server devices. Moreover, the terminal device 3 is not limited to one specific device, and a plurality of devices may exist.

In the above description, the storage unit 12 made up of a non-volatile memory such as an HDD or a flash memory is taken as an example of a computer-readable medium for storing the program 121 related to control such as calculation of the degree of conformity of the present invention. Illustrated, but not limited to. As other computer-readable media, it is possible to apply other non-volatile memories such as MRAM, and portable recording media such as CD-ROMs and DVD discs. A carrier wave is also applicable to the present invention as a medium for providing program data according to the present invention via a communication line.
In addition, the specific configurations, contents and procedures of processing operations, etc. shown in the above embodiments can be changed as appropriate without departing from the scope of the present invention. The scope of the present invention includes the scope of the invention described in the claims and the scope of equivalents thereof.

This invention can be used for matching systems, matching methods, programs, and trained models.

1 information processing device 11 control unit 12 storage unit 121 program 1210 learned model 13 communication unit 2 database device 21 storage unit 211 needs/seeds data 3 terminal device 31 control unit 32 communication unit 33 display unit 34 operation reception unit 100 matching system N network

Claims

an acquisition unit that acquires first data related to needs and second data related to seeds;
an analysis unit that quantifies and analyzes the acquired first data and second data;
an output unit that outputs matching information indicating the degree of matching between the first data and the second data using the analysis result by the analysis unit;
A matching system with
The matching system according to claim 1, wherein the analysis unit converts the first data and the second data into vectors representing a multidimensional space in the quantification.
3. The matching system according to claim 2, wherein the analysis unit calculates the degree of matching based on the degree of similarity between the first vector obtained from the first data and the second vector obtained from the second data.
3. The analysis unit has a trained model that uses as inputs a first vector obtained from the first data and a second vector obtained from the second data and outputs the degree of fitness. Matching system as described.
　The matching system according to claim 4, wherein the trained model is based on a pattern recognition algorithm.
An input unit for acquiring input data,
The matching system according to any one of claims 1 to 5, wherein the acquisition unit acquires the first data or the second data from the input data.
The first data includes at least one predetermined element related to needs,
The matching system according to any one of claims 1 to 6, wherein the second data includes at least one predetermined element related to seeds.
The element of the first data includes at least one of the purpose, content, field and name of the needs,
8. The matching system according to claim 7, wherein said elements of said second data include at least one of features, functions, competing technologies and names of seeds.
An input unit for acquiring input data,
9. The matching system according to claim 7, wherein the acquisition unit extracts the element when acquiring the first data or the second data from the input data.
　The matching system according to any one of claims 1 to 9, wherein the first data and the second data each include at least a noun or a verb indicating a function.
11. The matching system according to claim 10, wherein said first data and said second data each include an object for a noun or verb indicating said function.
A matching method performed by a computer control unit,
an acquisition step of acquiring first data related to needs and second data related to seeds;
an analysis step of quantifying and analyzing the acquired first data and second data, respectively;
An output step of outputting matching information indicating the degree of matching between the first data and the second data using the analysis result in the analysis step;
Matching methods, including
the computer,
Acquisition means for acquiring first data related to needs and second data related to seeds;
analysis means for digitizing and analyzing the acquired first data and second data, respectively;
output means for outputting matching information indicating the degree of matching between the first data and the second data using the analysis result by the analysis means;
A program that acts as
A first vector obtained from the first data related to the needs and a second vector obtained from the second data related to the seeds are used as inputs, and the degree of conformity between the first data and the second data is output. trained model.