WO2022252014A1 - 一种创新创业服务供给与需求智能匹配的方法 - Google Patents
一种创新创业服务供给与需求智能匹配的方法 Download PDFInfo
- Publication number
- WO2022252014A1 WO2022252014A1 PCT/CN2021/097254 CN2021097254W WO2022252014A1 WO 2022252014 A1 WO2022252014 A1 WO 2022252014A1 CN 2021097254 W CN2021097254 W CN 2021097254W WO 2022252014 A1 WO2022252014 A1 WO 2022252014A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- supply
- demand
- innovation
- matching
- mapping
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000013507 mapping Methods 0.000 claims abstract description 22
- 238000005516 engineering process Methods 0.000 claims abstract description 21
- 230000008859 change Effects 0.000 claims abstract description 5
- 230000003993 interaction Effects 0.000 claims abstract description 3
- 230000001105 regulatory effect Effects 0.000 claims abstract description 3
- 230000008569 process Effects 0.000 claims description 10
- 238000010801 machine learning Methods 0.000 claims description 9
- 230000008520 organization Effects 0.000 claims description 7
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000012827 research and development Methods 0.000 claims description 3
- 230000008676 import Effects 0.000 claims description 2
- 238000006467 substitution reaction Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract description 28
- 230000003044 adaptive effect Effects 0.000 abstract 1
- 238000012549 training Methods 0.000 description 12
- 239000013598 vector Substances 0.000 description 9
- 238000005065 mining Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Definitions
- the invention belongs to the field of innovation and entrepreneurship technology services, and specifically relates to a method for intelligently matching the supply and demand of innovation and entrepreneurship services.
- This patent is based on artificial intelligence technology's innovative and entrepreneurial service technology integration research and development and application demonstration, quickly (for example, within a week) for a given region (such as Xiong'an, Hainan) or industry (such as high-speed rail training) to create a complete service platform issues. It is highly innovative: it solves the problem that there are many mass innovation service platforms, but it is difficult to effectively serve the continuous emergence of new industries and new regions, and it is difficult to quickly serve the "chain length" of the local industrial chain.
- This patent connects many entrepreneurship and innovation service platforms to the fourth-party service platform, which greatly shortens the construction period and greatly improves the efficiency of all parties.
- the platform has successfully demonstrated in the innovation and entrepreneurship ecology with distinctive Chinese characteristics, including high-speed rail training, innovation and innovation in Xiongan New Area, Hainan, party building work, and national high-tech parks.
- the new fourth-party service platform of this patent uses the key common technologies of artificial intelligence developed for the needs of mass entrepreneurship and innovation, including service robots, domain transfer learning, knowledge maps of 7 areas of mass entrepreneurship and innovation, etc., which is the first in the industry. Its daily deployment is also the first in the industry.
- the main steps are: (1) Artificial intelligence technology realizes automatic discovery of regional characteristics, automatic discovery of industry characteristics, vectorization, weight distribution, and completion of regional or industry portraits; (2) Module (micro-service) factory of the existing platform to realize module areas Or industry matching, process automatic assembly, and rapid assembly platform; (3) Realize regional or industry customization through a small amount of manual process drag-and-drop and less code.
- This patent includes the following content: a method for intelligently matching the supply and demand of innovation and entrepreneurship services, including the following steps (see Figure 1 for details):
- User portrait obtain the user's direct attributes, calculated attributes, and tag attributes
- Digital twin add the three types of attributes in step 1 to the digital twin of the organization to form a model
- Supply and demand mapping Obtain the text describing the supply, establish a mapping from text to features, calculate the similarity between features and users, and obtain the mapping of supply and demand;
- step 1 of the method for intelligently matching the supply and demand of innovation and entrepreneurship services includes the following process:
- a portrait is generated according to the portrait rules
- step three of the method for intelligently matching innovation and entrepreneurship service supply and demand there are three ways to obtain the value of the tag attribute:
- the digital twin of the organization Digital Twin of Organization, DTO
- DTO Digital Twin of Organization
- the user's direct attributes, calculated attributes and tag attributes are added to the DTO, Form a model; add scene elements, thus, the combination of user attribute x scene in the model, that is, the (user attribute, scene) binary group, becomes the most basic unit.
- the text describing the supply will be obtained, and the mapping from text to features will be established; the similarity between features and user attributes will be calculated to obtain the mapping of supply and demand.
- the digital twin model of the user triggers a change in demand, updates the supply and demand mapping; adds scene elements, and combines user attribute x scene as a trigger condition.
- the scenarios that the technology provided by this patent can serve include but are not limited to: (1) Provide relatively complete information before the investment fund’s “investment” and “withdrawal” decisions; (2) “Withdraw” after the “investment” During the previous period, real-time tracking of the company’s changes in scientific and technological innovation to prepare for prediction; (3) Offline face-to-face consultation; (4) Method of request: online consultation of enterprises, providing keywords; (5) Supply method: online Real-time question and answer on the Internet, online search, and online reading.
- This patent researches on the intelligent matching technology of innovation and entrepreneurship service supply and demand, and transforms the problem of matching supply and demand into the generation of automatic attributes and the configuration of elements. "This framework is normalized; the extraction of included attributes and the calculation of its element configuration; (2) After establishing a digital twin for the enterprise, use the model of this patent to make predictions.
- Figure 1 is a schematic flow chart of the patented innovation and entrepreneurship service supply and demand intelligent matching method.
- Figure 2 is the digital twin model of the asset process of the patent enterprise: two-dimensional input.
- Fig. 3 is a working schematic diagram of the matching technology in this patent embodiment to solve the talent policy recommendation problem.
- Fig. 4 is a schematic diagram of a talent portrait in the embodiment of this patent.
- Fig. 5 is a pLSA model diagram of the patent embodiment.
- Fig. 6 is a schematic diagram of a policy intelligent analysis process in this patent embodiment.
- the applicant uses this technical solution to solve the problem of talent policy recommendation with matching technology; governments at all levels have a variety of policies for talent introduction, innovation and entrepreneurship, and talents cannot quickly understand them.
- it is necessary to establish a policy-to-talent matching system; the working framework of this embodiment is shown in Figure 2. It solves the problem of how to describe a talent; how to describe a policy; how to establish a mapping from policy to talent; and how to assist users to find the policy they are looking for when the talent information is insufficient.
- Talent portraits to solve the problem of talent portrayal.
- User portrait refers to the process of information mining and analysis application by obtaining different dimensional attribute information (such as demographic characteristics, interest preferences and behavior patterns, etc.) that constitute the user model.
- Talent portrait is the application of user portrait in the field of talent management. It can clearly and comprehensively display the characteristics of talents in the industry, and provide intelligent management and decision support for the talent work of the talent management department.
- Talent portraits consist of three types of data:
- Label attribute Labels that classify talents according to one or more attributes in the talent profile, such as Class A talents, senior returnees, and talents who are prone to loss.
- the classification model can be manually formulated by experts or obtained by machine learning. Compared with direct attributes and computed attributes, the benefits of tag attributes lie in data desensitization and data dimensionality reduction.
- the data flow of talent profiling is shown in the figure below.
- the system collects talent-related data from multiple sources, and forms raw talent data through ETL (extraction, transformation, and loading) and cooperation. Part of the raw data directly becomes the basic attribute value of the talent, and the other part of the raw data is analyzed and calculated to obtain the calculated attribute value of the talent. Then, based on the talent attributes, the talents are automatically tagged through machine learning/deep learning, and the tag attribute values are obtained. Finally, the talent profile provides data to the application through the following two data interfaces (see Figure 3 for details):
- Good data has good talent portraits and good intelligence. Applications with good data should be selected, and various measures should be taken to ensure data quality, such as data cleaning, data auditing, and data governance.
- Incremental iterative development method is adopted, and the requirements with high priority are arranged to be implemented in the previous iterations, and each version is released quickly, and improvements are made in the next iteration according to the feedback.
- the goal of text mining is to mine useful information from texts to assist the development of downstream applications.
- the objects of text mining mainly include various text materials (such as resumes, etc.) comments, etc.).
- text mining we can obtain structured user information and subject information of user works, and provide characteristic information for talent classification.
- Text information extraction refers to extracting the required information from structured or unstructured text.
- the pointer When extracting text information based on rules, the pointer formulates corresponding rules for each slot information to be extracted. When scanning the text, it matches the rules and extracts text information that conforms to the rules.
- text feature extraction is to extract features from text and use them for feature input of machine learning such as classification.
- Short text keyword feature extraction technology is mainly divided into keyword extraction based on statistical machine learning and keyword extraction based on deep learning.
- Keyword extraction based on deep learning generally uses a sequence tagging model.
- the input of the network is a text vector, and the output is the probability of whether the word at each position should be output as a keyword.
- This model relies on a large amount of pre-labeled training data, which is not suitable for the actual application scenarios of this topic. Therefore, it is recommended to use a keyword extraction scheme based on statistical machine learning.
- the keyword extraction technology based on statistical learning is mainly based on the TF-iDF model.
- the TF-iDF model describes the importance of words in documents, and its specific calculation formula is shown in Equation 1.
- TFiDF i,j represents the number of times word i appears in document j
- k represents the size of the vocabulary
- ⁇ j:t i ⁇ represents the number of documents containing word i.
- the pLSA topic model is shown in Figure 5. Where d stands for text, z stands for topic, and w stands for word.
- each document is composed of multiple topics, and each topic is composed of multiple words.
- each piece of text has only a limited number of one or several label categories, the text does not only contain the information of these limited number of label categories, but the information of these label categories is the most sufficient. Therefore, we can apply the idea of pLSA to analyze the topic component of each word in the text, build a model, and label the data.
- the topic distribution proportion of each word or phrase is calculated as a parameter of the model.
- the topic distribution of the document is obtained by superimposing the topic distribution proportion of the words in the new document, and the topic with a higher probability is taken as the final extracted text topic feature.
- the policy text generally includes applicants, application conditions, application time limit, application materials, subsidy standards, etc.
- policy recommendations can be made based on rule matching.
- the number of policies is large enough, it can be used as a labeled training data set to train the AI recommendation model.
- the first step is to analyze the original text of the policy, extract the entities and relationships in it, use these extracted entity relationships as input, and use the multi-dimensional policy features in the policy feature vector as output, and use machine learning algorithm training to obtain the policy
- the mapping model of text to policy features is further mapped to talents through policy features.
- the key technology in this process is the entity relationship extraction based on the policy text.
- the policy feature vector in the second step can be used to construct a feature vector-based entity relationship extraction method. It can also be considered to construct a policy dependency tree, establish an entity relationship extraction method based on the dependency tree kernel function, and complement each other with a feature-based entity relationship extraction method (see Figure 6 for the entire process flow).
- the word vector (word2vec) obtained by pre-training on a large-scale corpus can be used to represent both the user question and the existing question as the word vector mean of the words it contains, and calculate the cosine similarity of the two Degree, and weighted with the edit distance of the sentence to get the final similarity measure, so as to sort and return similar questions and answers.
- word2vec is not only a deep learning algorithm, but also a tool for calculating word vectors.
- the training result obtained by this tool - word embedding, can measure the similarity between words very well.
- the algorithm of this project mainly uses the model trained by word2vec on a large-scale corpus to calculate the similarity of policy question and answer. Due to the particularity of the field of policy question answering, the similarity calculated by the word2vec model based on general corpus training may have errors, so the edit distance of sentences is introduced to correct the deviation.
- NL2SQL technology When the semantic matching fails and the answer to the policy question cannot be found in the existing question-and-answer database, we use NL2SQL technology to directly convert the question into a query of structured policy data.
- the key technology of this step is the establishment of NL2SQL model. Due to the lack of Chinese NL2SQL annotation data, it is impossible to train the NL2SQL model on the Chinese corpus, so we adopted the method of transfer learning, by translating Chinese questions into English questions, and using the model trained by English NL2SQL to convert the questions in English. It is an English SQL query statement, and then the query keywords in the English SQL query statement are translated into corresponding Chinese keywords to realize Chinese NL2SQL.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了一种创新创业服务供给与需求智能匹配的方法,对创新创业服务供给与需求智能匹配技术进行研究,把供需匹配的问题,转化为自动化的属性的生成和要素的配置;主要创新点为:(1)把原始数据使用证监会"科创属性"这个框架进行归一化;包括属性的提取,及其要素配置的计算;(2)在为企业建立数字孪生之后,使用本方案的模型,进行预测。本方案包括如下内容:用户画像,数字孪生,以及供需映射;获得描述供应的文本,建立文本到特征的映射,计算特性与用户的相似度,得到供需的映射;基于数字孪生模型,研究供应到需求的映射、互动,计算随需求的变化,供应匹配的相似度,进行自适应调整,从而获得两者的精确匹配。
Description
本发明属于创新创业技术服务领域,具体涉及到一种创新创业服务供给与需求智能匹配的方法。
目前有众多的双创服务平台,但难以有效地服务新行业、新区域的不断涌现,难以迅速地服务地方产业链的“链长”。如许多企业不知道政策,申报为手工作业,多有重复劳动;企业在解读政策时,经常需要服务机构的面对面帮助;投资经理了解被投企业主要靠大赛、熟人介绍、见面;孵化器不厌其烦地回答一批批的被孵化企业提出的相似的问题;靠第3方平台的口碑、政府工作人员的调研才能将第三方企业推荐给地方产业链的“链长”;依靠教研组的经验将双创培训的材料,推荐给众多的受训人员等等诸多不便。
发明内容
本专利基于人工智能技术的创新创业服务技术集成研发与应用示范,迅速地(例如1周时间)为给定区域(例如雄安、海南)或行业(例如高铁培训)的双创服务,建成完整的平台的问题。具有很强的创新性:解决了目前有众多的双创服务平台,但难以有效地服务新行业、新区域的不断涌现,难以迅速地服务地方产业链的“链长”。本专利把众多双创服务平台接入第4方服务平台,大幅度缩短建设周期,大大提高各方效率。平台在具有鲜明中国特色的双创生态中,包括高铁训练、雄安新区双创、海南、党建工作、国家高新技术园区等,成功示范。此外本专利新建的第4方服务平台,运用了针对双创需求研发的人工智能关键共性技术,包括服务机器人、领域的迁移学习、7个双创领域的知识图谱、等等,为业内首次。其日常部署也是业内首创。主要步骤为:(1)人工智能技术实现区域特征自动发现,行业特点自动发现,向量化,权重分配,完成区域或行业画像;(2)现有平台的模块(微服务)工厂,实现模块区域或行业匹配、流程自动组装,快速组装平台;(3)通过少量人工流程拖拽式、少代码方式实现区域或行业定制。
本专利包括如下内容:一种创新创业服务供给与需求智能匹配的方法,包括如下步骤(详见图1):
一、用户画像:获取用户的直接属性、计算属性以及标签属性;
二、数字孪生:将步骤一中的三类属性加入组织的数字孪生,形成模型;
三、供需映射:获得描述供应的文本,建立文本到特征的映射,计算特性与用户的相似度,得到供需的映射;
四、基于数字孪生模型,得到供应到需求的映射、互动,计算随需求的变化,对供应匹配的相似度进行自适应调整,从而获得两者的精确匹配。
优选地,所述的创新创业服务供给与需求智能匹配的方法的步骤一包括如下过程:
一、根据企业提供的工商、财务、知识产权以及企业成长过程中的产生的团队、场地面积、融资情况等企业动态信息,按画像规则,生成画像;
二、将用户画像的问题进行转化,确立标签属性,以证监会科创属性为主要大类,建立多个标签属性,包括研发、专利、表彰、进口替代、团队;
三、获取标签属性的值。
优选地,所述的创新创业服务供给与需求智能匹配的方法中的步骤三,由三个方法获取标签属性的值:
(1)一部分原始数据直接成为基本属性值,
(2)另一部分原始数据通过分析与计算得到计算属性值;
(3)通过机器学习/深度学习自动打标签,得到标签属性值。
优选地,所述的创新创业服务供给与需求智能匹配的方法的步骤二中采用组织的数字孪生(Digital Twin ofan Organization,DTO)的架构,把用户的直接属性、计算属性和标签属性加入DTO,形成模型;增加场景元素,由此,在模型中用户属性x场景的组合,即(用户属性,场景)二元组,成为最基本的单元。
优选地,所述的创新创业服务供给与需求智能匹配的方法的步骤三中将获得描述供应的文本,建立文本到特征的映射;计算特征与用户的属性的相似度,得到供需的映射。
优选地,所述的创新创业服务供给与需求智能匹配的方法的步骤四中,由用户数字孪生模型触发需求的变化,更新供需映射;增加场景元素,把用户属性x场景组合,作为触发条件。
本专利提供的技术所能支撑的系统,可以对标(1)法国KimaVentures,世界上最活跃的早期投资基金之一,3个人每周投2-3个早期项目;(2)瑞典 EQTVentures,使用神经网络跟踪几百万个项目;(3)美国Signalfire,既跟踪6百万个项目,也跟踪人才。
本专利提供的技术所能服务的场景,包括但不限于:(1)在投资基金的“投”和“退”决策之前,提供比较完整的资料;(2)在“投”之后“退”之前期间,实时跟踪企业在科创方面的变化,以备预测;(3)线下面对面咨询;(4)需求的提出方式:企业线上咨询,提供关键字;(5)供给的方式:线上的实时问答、线上的搜索、线上的阅读。这些供给的方式,主要适合比较固化的服务需求场景;比如企业工商注册流程、企业成立1年内能申请的项目政策等;(6)供给的方式:推送到企业:根据企业的画像、成长阶段,主要是活动培训通知、企业政策申报通知。本专利所支撑的系统,可能包括与产品众筹的界面可比拟的界面,包括基本情况、创新点、描述、时间轴、团队、展望等。能够匹配的一端,“服务的供给”,可以以多种形式出现,包括(1)服务机构提供的文本;(2)简洁的关键字;(3)一组短句。匹配的另一端,“企业的需求”,可以以多种形式出现,包括(1)关键字;(2)短句。
本专利对创新创业服务供给与需求智能匹配技术研究,把供需匹配的问题,转化为自动化的属性的生成和要素的配置,主要创新点为:(1)把原始数据使用证监会“科创属性”这个框架进行归一化;包括的属性的提取,及其要素配置的计算;(2)在为企业建立数字孪生之后,使用本专利的模型,进行预测。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本专利创新创业服务供给与需求智能匹配方法的流程示意图。
图2为本专利企业资产流程的数字孪生模型:两维的输入。
图3为本专利实施例匹配技术解决人才政策推荐问题的工作示意图。
图4为本专利实施例人才画像示意。
图5为本专利实施例pLSA模型图。
图6为本专利实施例政策智能解析流程示意图。
下面结合实施例对本发明做进一步的详细说明,以下实施例是对本发明的解释而本发明并不局限于以下实施例。
实施例:
根据本专利的技术方案,申请人将该技术方案用于用匹配技术解决人才政策推荐问题;各级政府针对人才引进、创新创业的政策内容繁多,人才无法快速了解。为了解决这个问题,建立一个政策到人才的匹配系统十分有必要;本实施例工作框架如图2。解决了如何刻画一个人才;如何刻画一个政策;如何建立政策到人才的映射;以及当人才信息不够时,如何协助用户找到他想找的政策的问题。
一、用人才画像解决人才刻画的问题。用户画像是指通过获取构成用户模型的不同维度属性信息(如人口统计学特征、兴趣偏好和行为模式等)进行信息挖掘和分析应用的过程。人才画像是用户画像在人才管理领域的应用,它能清晰全面地展示行业人才特征,为人才管理部门的人才工作提供智能管理与决策支持。人才画像由三类数据组成:
(1)直接属性。直接从数据源通过ETL得到的属性,例如姓名、身份证号等。
(2)计算属性。对ETL得到的数据进行加工后得到的属性,例如:分析用户的行为日志,计算得到活跃度;分析用户的项目数据,计算得到经验分;分析用户的论文数据,计算得到学术影响力。计算属性常常是动态属性,需要定期增量式计算,并考虑到时间熵。
(3)标签属性。根据人才画像中一个或多个属性进行人才分类而打上的标签,例如A类人才、海归资深人士、易流失人才等。其中分类模型可以是专家人工制定,也可以是由机器学习得到。与直接属性和计算属性相比,标签属性的好处在于数据脱敏和数据降维。
人才画像的数据流如下图所示。系统从多源数据采集人才相关数据,通过ETL(抽取、转换与装载)与配合,形成人才原始数据。一部分原始数据直接成为人才基本属性值,另一部分原始数据通过分析与计算得到人才计算属性值。然后基于人才属性,通过机器学习/深度学习自动对人才打标签,得至标签属性值。最后人才画像通过以下两种数据接口向应用提供数据(详见图3):
直接提供人才画像的属性数据
从人才画像的属性数据中提取特征值或特征向量。
我们采用应用导向、数据驱动的增量式方法推动人才画像系统的建设(详见图4):
(1)应用导向
选择最有价值的业务需求与智能应用,确定合适的应用场景,以应用为导向开展人才画像的设计、构建与应用,即“有的放矢”方法。
(2)数据驱动
好的数据才有好的人才画像、好的智能。应选择有良好数据的应用,并采取各种措施保障数据质量,例如数据清洗、数据稽核、数据治理等。
(3)增量式迭代开发
采用增量式迭代开发方法,优先级高的需求安排在前面的迭代进行实现,快速推出一个个版本,根据反馈在下个迭代进行改进。
人才画像的关键技术:
文本挖掘
文本挖掘的目标是从文本中挖掘出有用信息,辅助下游应用的开发。具体到人才画像任务,文本挖掘的对象主要包括待审核“人才”提交的各类文本材料(如简历等)以及从各数据源爬取来的文本数据(如发送的微博、发表的文章及评论等)。通过文本挖掘,我们可以得到结构化的用户信息以及用户作品的主题信息,为人才分类提供特征信息。
文本信息抽取
文本信息抽取是指从结构化或非结构化的文本抽取出所需的信息。在本课题中,我们使用文本信息抽取技术从人员提交的文本材料中提取“人才”相关基本信息。
目前,文本信息抽取方法主要分为以下几种,其各自优缺点、适用场景以及对数据的要求如表1所示。
表1 文本信息抽取方法比较
考虑到各类模型的适用场景、以及可收集和标注的数据的量级很有可能无法满足网络的训练的缘故,因此在实际应用时,推荐采用基于规则的文本信息抽取技术。
基于规则的文本信息抽取时指针对每个待提取的槽位信息制定对应的规则,在对文本进行扫描时,对规则进行匹配,提取出符合规则的文本信息。
以人员提交的简历为例,如图。从简历中,我们希望提取出人员的姓名、出生时间、名族、教育经历、工作经历等字段信息。我们可以针对每一个字段设定特定的匹配规则,例如针对“姓名”字段,我们的匹配规则可以是“姓名:[待提取内容]<制表符>”;另外,由于人员提交的简历文本格式以及对于同一字段的描述文本会有一些区别,因此需要随机抽取一些简历进行案例分析,尽可能制定完备的匹配规则。
文本特征提取
文本特征提取的目的是将从文本中提取到特征,用于分类等机器学习的特征输入。
本节中涉及到的文本类型主要有两种,一种是人员发表的论著的标题、概要等信息,这一类信息往往反映人员的学术研究兴趣;另一种是人员在社交媒体上发表的一些言论信息,这一类信息则可以反映出人员的业余爱好、性格、个人兴趣。
针对论文标题等短文本,我们拟采用关键词提取的技术提取出关键词特征;针对微博、博客等长文本,我们拟采用主题模型进行主题的提取。
短文本关键词特征提取
短文本关键词特征提取技术主要分为基于统计机器学习的关键词提取与基于深度学习的关键词提取。
基于深度学习的关键词提取一般采用序列标注模型,网络的输入为文本向量,输出为每个位置的单词是否应作为关键词输出的概率。这种模型依赖于大量预先标注的训练数据,不够适用于本课题的实际应用场景。因此,推荐使用基于统计机器学习的关键词提取方案。
基于统计学习的关键词提取技术主要是基于TF-iDF模型。TF-iDF模型刻画的是词语在文档中的重要程度,其具体计算公式如式1所示。
其中TFiDF
i,j代表词i在文档j中出现的次数,k代表词汇表的大小,代表语料库中的文档总数,|{j:t
i∈代表包含词i的文档数目。
TF-iDF模型的主要思想:如果某个词或短语在一篇文章中出现的频率高,且在其他文章中很少出现,那么该词或短语适合用来描述文档的特点。这种关键词提取方法简单且准确率较高,有较好的理论依据,且不依赖于大量的训练数据。
长文本主题特征提取
针对用户提交的长文本信息,我们期望从中提取出主题类型的文本特征,即该段文本是与什么主题相关的。我们拟采用较为成熟的pLSA主题模型作为长文 本主题特征提取的核心技术。
pLSA主题模型如图5所示。其中d代表文本,z代表主题,w代表词语。
LSA模型的核心思想是:每个文档由多个主题组成,每个主题由多个单词组成。每段文本尽管只有有限的一个或几个标签类别,但文本并不只包含这些有限数量的标签类别信息,而是这些标签类别的信息最为充分。因此我们可以应用pLSA的思想,分析文本中每一个词语的主题成分,建立模型,来对数据进行标签标注。
在实际应用中,我们需要首先确定文本的主题类型词典,并选择一批文本数据进行主题特征的标注。随后根据pLSA模型的思想,计算每个词或短语的主题分布比重,作为模型的参数。在特征提取阶段,通过叠加新文档中词语的主题分布比重得到该文档的主题分布,取概率较高的主题作为最终提取出的文本主题特征。
用政策特征向量解决政策刻画的问题。
政策文本中一般会包括申请对象,申请条件,申请时限,申请材料,补贴标准等内容。对政策文本进行文本特征提取,根据预定义的关键词进行聚类,能够得出政策的粗略分类。然而,人才政策政策推荐问题容错率低,要求能够对政策精准刻画。因此我们建立了政策特征向量模型,人工标注政策特征。
在政策数量不足时,可以通过基于规则匹配的方式进行政策推荐。政策数量足够多时,可以作为标注好的训练数据集来训练AI推荐模型。
用NLP技术和机器学习算法解决政策到人才的映射问题。
经过第二步的政策特征向量提取,部分政策原文已经被拆解成为结构化数据,将这些数据作为训练数据,针对更大集合的政策原文进行训练,可以期望解决广泛的政策到人才的映射问题。
首先第一步是对政策原文进行文本分析,提取其中的实体及关系,以这些提取出的实体关系作为输入,以政策特征向量中多维度的政策特征作为输出,用机器学习算法训练,获得政策文本到政策特征的映射模型,进一步通过政策特征映射到人才。
这个过程中的关键技术是基于政策文本的实体关系提取。第二步中的政策特征向量可以用于构造基于特征向量的实体关系抽取方法。也可以考虑构造政策依存树,建立基于依存树核函数的实体关系抽取方法,和基于特征的实体关系 抽取方法相互补充(整个步骤流程见图6)。
用NLP技术协助用户找到需要的政策。
当用户画像缺失时,无法实现政策到人的精准匹配。因此需要建立渠道搜集用户特征。其中一个方法是根据用户提问来向用户推荐相关政策。
针对用户提问进行来分析用户意图,推荐适合的政策,需要建立问答数据的词库,并对词库离线建立倒排索引。收到用户提问后,根据问题的分词结果和倒排索引,快速搜索召回若干个可能相关的问题。
由于政策问答数据较少,可利用在大规模语料库上预训练得到的词向量(word2vec),将用户问题和已有问题均表示为其所包含的词语的词向量均值,计算两者的cosine相似度,并和句子的编辑距离加权得到最终的相似度度量,从而排序返回相似问题及答案。
其中的关键技术之一是利用word2vec计算句子的相似度。word2vec不仅是一种深度学习算法,也是用于计算词向量的工具。该工具得到的训练结果——词向量(word embedding),可以很好地度量词与词之间的相似性。由于政策问答语料样本较小,本项目的算法主要利用word2vec在大规模语料库上训练好的模型来计算政策问答的相似度。由于政策问答这一领域的特殊性,基于通用语料库训练的word2vec模型计算出的相似度可能会有误差,因此引入句子的编辑距离进行纠偏。
当语义匹配失败后,从已有的问答数据库里找不到政策问题的答案,我们采用NL2SQL技术讲提问直接转换为对结构化政策数据的查询。这一步的关键技术是NL2SQL模型的建立。由于中文NL2SQL标注数据的缺乏,无法在中文语料上训练NL2SQL模型,因此我们采用了迁移学习的办法,通过将中文问题翻译为英文问题,在英文问题上利用英文的NL2SQL训练出的模型将问题转换为英文SQL查询语句,再将英文SQL查询语句中的查询关键词翻译成对应的中文关键词,实现中文的NL2SQL。
此外,需要说明的是,本说明书中所描述的具体实施例,其各个步骤操作所取名称等可以不同。凡依本发明专利构思所述的特征及原理所做的等效或简单变化,均包括于本发明专利的保护范围内。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,只要不偏离本发明的结构或者超越本权利要求书所定义的范围,均应属于本发明的保护范围。
Claims (6)
- 一种创新创业服务供给与需求智能匹配的方法,其特征在于:包括如下步骤:一、用户画像:获取用户的直接属性、计算属性以及标签属性;二、数字孪生:将步骤一中的三类属性加入组织的数字孪生,形成模型;三、供需映射:获得描述供应的文本,建立文本到特征的映射,计算特性与用户的相似度,得到供需的映射;四、基于数字孪生模型,得到供应到需求的映射、互动,计算随需求的变化,对供应匹配的相似度进行自适应调整,从而获得两者的精确匹配。
- 根据权利要求1所述的创新创业服务供给与需求智能匹配的方法,其特征在于,所述的步骤一包括如下过程:一、根据企业提供的工商、财务、知识产权以及企业成长过程中的产生的团队、场地面积、融资情况等企业动态信息,按画像规则,生成画像;二、将用户画像的问题进行转化,确立标签属性,以证监会科创属性为主要大类,建立多个标签属性,包括研发、专利、表彰、进口替代、团队;三、获取标签属性的值。
- 根据权利要求2所述的创新创业服务供给与需求智能匹配的方法,其特征在于,所述的步骤三,由三个方法获取标签属性的值:(1)一部分原始数据直接成为基本属性值,(2)另一部分原始数据通过分析与计算得到计算属性值;(3)通过机器学习/深度学习自动打标签,得到标签属性值。
- 根据权利要求1所述的创新创业服务供给与需求智能匹配的方法,其特征在于,所述的步骤二中采用组织的数字孪生,把用户的直接属性、计算属性和标签属性加入组织的数字孪生,由此,在模型中(用户属性,场景)的组合成为最基本的单元。
- 根据权利要求1所述的创新创业服务供给与需求智能匹配的方法,其特征在于,所述的步骤三中将获得描述供应的文本,建立文本到特征的映射;计算特征与用户的属性的相似度,得到供需的映射。
- 根据权利要求1所述的创新创业服务供给与需求智能匹配的方法,其特征在于,所述的步骤四中,由用户数字孪生模型触发需求的变化,更新供需映射;增加场景元素,把(用户属性,场景)组合,作为触发条件。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110596161.1 | 2021-05-30 | ||
CN202110596161.1A CN115481827A (zh) | 2021-05-30 | 2021-05-30 | 一种创新创业服务供给与需求智能匹配的方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022252014A1 true WO2022252014A1 (zh) | 2022-12-08 |
Family
ID=84323799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/097254 WO2022252014A1 (zh) | 2021-05-30 | 2021-05-31 | 一种创新创业服务供给与需求智能匹配的方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115481827A (zh) |
WO (1) | WO2022252014A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116301731A (zh) * | 2023-02-17 | 2023-06-23 | 武汉天恒信息技术有限公司 | 一种基于自然语言的需求分析方法、设备及存储设备 |
CN116561432A (zh) * | 2023-06-27 | 2023-08-08 | 广州钛动科技股份有限公司 | 智能员工内容数据推荐系统 |
CN116821315A (zh) * | 2023-06-09 | 2023-09-29 | 佛山京益数字技术有限公司 | 一种在大数据中实现人企双向匹配的方法及系统 |
CN117764536A (zh) * | 2024-01-12 | 2024-03-26 | 四川大学 | 一种基于人工智能的创新创业项目辅助管理系统 |
CN118051607A (zh) * | 2024-02-21 | 2024-05-17 | 北京市大数据中心 | 基于深度学习的政策信息服务推荐方法、系统及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106383894A (zh) * | 2016-09-23 | 2017-02-08 | 深圳市由心网络科技有限公司 | 一种企业供需信息匹配方法和装置 |
CN110414917A (zh) * | 2019-06-21 | 2019-11-05 | 东华大学 | 基于人才画像的招聘推荐方法 |
CN112288391A (zh) * | 2020-10-28 | 2021-01-29 | 甘肃和润智信企业管理咨询有限公司 | 基于区间匹配的人岗匹配方法及系统 |
-
2021
- 2021-05-30 CN CN202110596161.1A patent/CN115481827A/zh active Pending
- 2021-05-31 WO PCT/CN2021/097254 patent/WO2022252014A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106383894A (zh) * | 2016-09-23 | 2017-02-08 | 深圳市由心网络科技有限公司 | 一种企业供需信息匹配方法和装置 |
CN110414917A (zh) * | 2019-06-21 | 2019-11-05 | 东华大学 | 基于人才画像的招聘推荐方法 |
CN112288391A (zh) * | 2020-10-28 | 2021-01-29 | 甘肃和润智信企业管理咨询有限公司 | 基于区间匹配的人岗匹配方法及系统 |
Non-Patent Citations (2)
Title |
---|
HONG, GUOER: "Digital Twin of an Organization (DTO) Five Steps to Creating a Digital Twin of an Organization (DTO)", ZHIHU, pages 1 - 2, XP009542854, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/302111957> [retrieved on 20230228] * |
ZHAO, GUODONG: "Digital Twin Organization is the Final Form of Enterprise Digitalization", YUNXIAN TECHNOLOGY, pages 1 - 2, XP009542936, Retrieved from the Internet <URL:https://www.dstoutiao.com/html/dsws/2021/0413/98365.html> * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116301731A (zh) * | 2023-02-17 | 2023-06-23 | 武汉天恒信息技术有限公司 | 一种基于自然语言的需求分析方法、设备及存储设备 |
CN116821315A (zh) * | 2023-06-09 | 2023-09-29 | 佛山京益数字技术有限公司 | 一种在大数据中实现人企双向匹配的方法及系统 |
CN116821315B (zh) * | 2023-06-09 | 2024-02-23 | 佛山京益数字技术有限公司 | 一种在大数据中实现人企双向匹配的方法及系统 |
CN116561432A (zh) * | 2023-06-27 | 2023-08-08 | 广州钛动科技股份有限公司 | 智能员工内容数据推荐系统 |
CN116561432B (zh) * | 2023-06-27 | 2024-05-03 | 广州钛动科技股份有限公司 | 智能员工内容数据推荐系统 |
CN117764536A (zh) * | 2024-01-12 | 2024-03-26 | 四川大学 | 一种基于人工智能的创新创业项目辅助管理系统 |
CN118051607A (zh) * | 2024-02-21 | 2024-05-17 | 北京市大数据中心 | 基于深度学习的政策信息服务推荐方法、系统及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115481827A (zh) | 2022-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022252014A1 (zh) | 一种创新创业服务供给与需求智能匹配的方法 | |
Sung et al. | Improving short answer grading using transformer-based pre-training | |
US11481448B2 (en) | Semantic matching and retrieval of standardized entities | |
US20170076225A1 (en) | Model-based classification of content items | |
US20170075978A1 (en) | Model-based identification of relevant content | |
Sun et al. | Pre-processing online financial text for sentiment classification: A natural language processing approach | |
CN108874783A (zh) | 电力信息运维知识模型构建方法 | |
Li et al. | An intelligent approach to data extraction and task identification for process mining | |
CN117370513A (zh) | 基于轻量级大模型的智能问答系统的构建方法 | |
Wang | Information Extraction and Knowledge Map Construction based on Natural Language Processing | |
Dong et al. | Knowledge graph construction based on knowledge enhanced word embedding model in manufacturing domain | |
Bajpai et al. | Aspect-sentiment embeddings for company profiling and employee opinion mining | |
Lamba et al. | Sentiment analysis | |
Hassan | Designing a flexible system for automatic detection of categorical student sentiment polarity using machine learning | |
Gogate et al. | Random features and random neurons for brain-inspired big data analytics | |
Korade et al. | Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning. | |
Guru Rao | Ontology matching using domain-specific knowledge and semantic similarity | |
Wang et al. | Enriching BERT with Knowledge Graph Embedding for Industry Classification | |
Jiang et al. | Which group do you belong to? sentiment-based pagerank to measure formal and informal influence of nodes in networks | |
Yu et al. | Multi-module Fusion Relevance Attention Network for Multi-label Text Classification. | |
Yang et al. | A general solution and practice for automatically constructing domain knowledge graph | |
Cambria et al. | Semantic outlier detection for affective common-sense reasoning and concept-level sentiment analysis | |
Xu et al. | Research on intelligent campus and visual teaching system based on Internet of things | |
Qi et al. | [Retracted] Text Score Analysis under the IPE Environment Based on Improved Transformer | |
Jiang et al. | Transfer learning based recurrent neural network algorithm for linguistic analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21943393 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21943393 Country of ref document: EP Kind code of ref document: A1 |