CN112487161A - Enterprise demand oriented expert recommendation method, device, medium and equipment - Google Patents

Enterprise demand oriented expert recommendation method, device, medium and equipment Download PDF

Info

Publication number
CN112487161A
CN112487161A CN202011345299.6A CN202011345299A CN112487161A CN 112487161 A CN112487161 A CN 112487161A CN 202011345299 A CN202011345299 A CN 202011345299A CN 112487161 A CN112487161 A CN 112487161A
Authority
CN
China
Prior art keywords
expert
information
data
enterprise
demand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011345299.6A
Other languages
Chinese (zh)
Inventor
胡笛
唐杰
刘德兵
张鹏
仇瑜
王笑尘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhipu Huazhang Technology Co ltd
Original Assignee
Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyuan Artificial Intelligence Research Institute filed Critical Beijing Zhiyuan Artificial Intelligence Research Institute
Priority to CN202011345299.6A priority Critical patent/CN112487161A/en
Publication of CN112487161A publication Critical patent/CN112487161A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the technical field of computer network information, and provides an enterprise demand oriented expert recommendation method, apparatus, medium and device, the method comprising: collecting expert thesis data and enterprise demand data; preprocessing the collected expert thesis data and the enterprise demand data to obtain expert information and demand information; extracting keywords from the preprocessed expert information and the preprocessed demand information; constructing a feature vector model according to the expert information and the demand information after the keyword extraction; and carrying out similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result. According to the method, the topic model is adopted to extract the characteristics of the expert information and the enterprise requirements, the expert field characteristic vector and the enterprise requirement characteristic vector are respectively constructed based on the characteristic keywords, and the related field experts are recommended to the enterprise based on the similarity of the expert field characteristic vector and the enterprise requirement characteristic vector, so that the semantic drift problem of mechanical retrieval is effectively avoided.

Description

Enterprise demand oriented expert recommendation method, device, medium and equipment
Technical Field
The present disclosure relates to the field of computer network information technologies, and more particularly, to a method, an apparatus, a medium, and a device for expert recommendation for enterprise needs.
Background
With the rapid breakthrough of technologies such as big data, cloud computing, internet of things and the like, a new round of scientific and technological revolution and industrial transformation around the world are in the future, and China puts forward a development strategy for implementing innovation driving, accelerates the transformation of the traditional industry, establishes an innovation system which combines production, study and research and takes enterprises as a main body and markets as a guide, and enables the enterprises to become a main body of technical innovation.
The top talents in the field are the guarantee of enterprise innovation, but many enterprises at present can face the shortage of internal talents when encountering technical problems in the field, and need to ask for help from external experts urgently. Colleges and universities, as the main position of scientific research in China, have a large number of talents with great scientific and technological innovations and can provide sufficient expert resources for enterprises to solve the technical problems. At present, part of enterprises can obtain talents in the field through authoritative recommendation by using a relational network; or searching scientific research achievements according to the domain keywords and acquiring related experts according to author information. The former excessively depends on social resources, is only suitable for some famous enterprises, and has great limitation on medium and small enterprises. The latter has the defects of mechanically matching words and lacking features, so that the duplication checking rate and the recall checking rate are low. This also causes that the enterprise can't accurately discover the expert in this field in time, leads to the difficult accurate butt joint between knowledge and the industry.
Disclosure of Invention
The technical problem that the prior art cannot meet the requirement of a user on expert recommendation is solved.
In order to achieve the technical purpose, the disclosure provides an expert recommendation method facing enterprise needs, which includes:
collecting expert thesis data and enterprise demand data;
preprocessing the collected expert thesis data and the enterprise demand data to obtain expert information and demand information;
extracting keywords from the preprocessed expert information and the preprocessed demand information to obtain expert characteristic information and demand characteristic information;
constructing a feature vector model according to the expert feature information and the demand feature information after the keyword extraction;
and carrying out similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result.
Further, the collecting the expert thesis data and the enterprise requirement data specifically includes:
and collecting the title, abstract and/or keyword data of the expert paper according to the paper database and selecting an online internet website to collect the title, keyword and/or requirement detail data of the enterprise requirement.
Further, the preprocessing the collected expert thesis data and the collected enterprise demand data specifically includes:
respectively performing word segmentation on the expert thesis data and the enterprise demand data by adopting an LTP model to obtain expert word segmentation data and enterprise word segmentation data;
removing stop words from the participled expert participle data and the participled enterprise data;
and merging the repeated information in the data without the stop words to respectively obtain the expert information and the demand information.
Further, the extracting the keywords from the preprocessed expert information and the preprocessed requirement information specifically includes:
and respectively extracting keywords from the expert information and the demand information by adopting an LDA (latent dirichlet allocation) model, and acquiring a keyword list of each piece of expert information and each piece of demand information to be used as the expert characteristic information and the demand characteristic information.
Further, the constructing a feature vector model according to the expert feature information and the requirement feature information after the keyword extraction specifically includes:
performing feature extraction on the expert feature information and the demand feature information by using a TF-IDF algorithm to obtain an information subject term;
and performing feature selection on the information subject term, and constructing a feature vector and a feature vector model based on the selected feature subject term.
Further, the obtaining of the expert recommendation result by performing similarity calculation analysis according to the feature vectors in the feature vector model specifically includes:
calculating and analyzing according to the feature vectors in the feature vector model by combining a calculation factor of the number of the same feature words of the text in the total length of the feature vectors of the text on the basis of cosine similarity analysis to obtain an expert recommendation result;
wherein, the similarity analysis is calculated by adopting the following formula:
Figure BDA0002799724820000031
wherein c is a proportional adjustment coefficient, N (D, E) represents the number of the same feature words in the requirement information D and the expert information E, Min (D, E) represents the smaller of the total number of the features of the requirement information D and the total number of the features of the expert information E, sim (D, E) represents the cosine similarity of the requirement information D and the expert information E.
In order to achieve the above technical object, the present disclosure can also provide an expert recommendation apparatus for enterprise needs, including:
the data collection module is used for collecting expert thesis data and enterprise demand data;
the preprocessing module is used for preprocessing the collected expert thesis data and the enterprise demand data to obtain expert information and demand information;
the keyword extraction module is used for extracting keywords from the preprocessed expert information and the preprocessed demand information to obtain expert characteristic information and demand characteristic information;
the vector model construction module is used for constructing a feature vector model according to the expert feature information and the requirement feature information after the keyword extraction;
and the similarity analysis module is used for carrying out similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result.
Further, the collecting the expert thesis data and the enterprise requirement data specifically includes:
and collecting the title, abstract and/or keyword data of the expert paper according to the paper database and selecting an online internet website to collect the title, keyword and/or requirement detail data of the enterprise requirement.
To achieve the above technical objects, the present disclosure can also provide a computer storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the steps of the above expert recommendation method for enterprise needs.
In order to achieve the above technical object, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the expert recommendation method for enterprise needs when executing the computer program.
The beneficial effect of this disclosure does:
the invention provides an enterprise expert recommending method based on an LDA topic model, which adopts the topic model to extract characteristics of expert information and enterprise requirements, respectively constructs expert field characteristic vectors and enterprise requirement characteristic vectors based on characteristic keywords, and recommends relevant field experts for enterprises based on the similarity of the expert field characteristic vectors and the enterprise requirement characteristic vectors, thereby effectively avoiding the semantic drift problem of mechanical retrieval.
Drawings
Fig. 1 shows a schematic flow diagram of embodiment 1 of the present disclosure;
fig. 2 shows a schematic structural diagram of embodiment 3 of the present disclosure;
fig. 3 shows a schematic structural diagram of embodiment 4 of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.
The present disclosure relates to the interpretation of terms:
TF-IDF: term frequency-inverse document frequency, a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency).
LDA:Latent Dirichlet Allocation。
LTP: language Technology Platform (LTP).
The first embodiment is as follows:
as shown in fig. 1:
the disclosure provides an expert recommendation method facing enterprise requirements, which comprises the following steps:
s1: collecting expert thesis data and enterprise demand data;
specifically, the collecting expert thesis data and enterprise demand data specifically includes:
and collecting the title, abstract and/or keyword data of the expert paper according to the paper database and selecting an online internet website to collect the title, keyword and/or requirement detail data of the enterprise requirement.
Preferably, according to the technical scheme, platforms such as CNKI, Wanfang, Aminer and the like are selected as information sources for acquiring expert document information, and information such as topics, abstracts, paper keywords and the like of papers published by experts is collected. A scientist online website is selected as an enterprise demand information acquisition data source, and information such as a demand title, a demand keyword and demand details is collected.
S2: preprocessing the collected expert thesis data and the enterprise demand data to obtain expert information and demand information;
specifically, the preprocessing the collected expert thesis data and the collected enterprise requirement data specifically includes:
respectively performing word segmentation on the expert thesis data and the enterprise demand data by adopting an LTP model to obtain expert word segmentation data and enterprise word segmentation data;
removing stop words from the participled expert participle data and the participled enterprise data;
and merging the repeated information in the data without the stop words to respectively obtain the expert information and the demand information.
The word segmentation and the stop word removal in the preprocessing disclosed by the invention are processing methods commonly used in the natural language processing field, and except the implementation mode, the expert thesis data and the enterprise demand data can be segmented respectively by adopting a Conditional Random Field (CRF) to obtain expert word segmentation data and enterprise word segmentation data.
S3: extracting keywords from the preprocessed expert information and the preprocessed demand information to obtain expert characteristic information and demand characteristic information;
specifically, the extracting the keywords from the expert feature information and the requirement feature information after extracting the keywords specifically includes:
and respectively extracting key words of the expert characteristic information and the demand characteristic information by adopting an LDA (latent dirichlet allocation) model, and acquiring a key word list of each piece of expert information and each piece of demand information to be used as the expert characteristic information and the demand characteristic information.
Lda (late Dirichlet allocation) is a document topic generation model, also called a three-layer bayesian probability model, and includes three layers of structures of words, topics and documents. By generative model, we mean that each word of an article is considered to be obtained through a process of "selecting a topic with a certain probability and selecting a word from the topic with a certain probability". Document-to-topic follows a polynomial distribution, and topic-to-word follows a polynomial distribution.
LDA is an unsupervised machine learning technique that can be used to identify underlying topic information in large-scale document collections (document collections) or corpora (corpus). It adopts bag of words (bag of words) method, which treats each document as a word frequency vector, thereby converting text information into digital information easy to model. The bag-of-words approach does not take into account word-to-word ordering, which simplifies the complexity of the problem and also provides opportunities for model improvement. Each document represents a probability distribution of topics, and each topic represents a probability distribution of words.
For each document in the corpus, LDA defines the generation process as follows:
1. extracting a theme from the theme distribution for each document;
2. extracting a word from the word distribution corresponding to the extracted subject;
3. the above process is repeated until every word in the document is traversed.
Each document in the corpus corresponds to a multinomial distribution of T topics (given in advance by trial and error, etc.), which is denoted as θ. Each topic, in turn, corresponds to a multinomial distribution of V words in the vocabulary, which multinomial distribution is denoted as
Figure BDA0002799724820000081
The meanings of some letters are defined first: document set D, topic (topic) set T
Each document D in D is treated as a word sequence < w1, w 2., wn >, wi denotes the ith word, let D have n words. (LDA is called wordbag inside, and the appearance position of each word has no influence on LDA algorithm in practice)
All the different words referred to in D form a large set Vocobulary (VOC), and the LDA takes the document set D as input, and two result vectors (k together, VOC contains m words):
for document D in each D, the probability θ D < pt 1., ptk > that corresponds to a different Topic, where pti represents the probability that D corresponds to the ith Topic in T. The calculation method is intuitive, and pti is nti/n, where nti denotes the number of words in d corresponding to the ith topic, and n is the total number of all words in d.
For topict in each T, the probability of generating a different word
Figure BDA0002799724820000091
Where pwi represents the probability that t generates the ith word in the VOC. The calculation is also straightforward, pwi ═ Nwi/N, where Nwi represents the number of i-th words in the VOC corresponding to topict and N represents the total number of all words corresponding to topict.
The core formula of LDA is as follows:
p(w|d)=p(w|t)*p(t|d)
the formula is intuitively seen, namely Topic is taken as an intermediate layer, and the current sum of theta and theta can be passed
Figure BDA0002799724820000092
The probability of the occurrence of the word w in the document d is given. Wherein p (t | d) is calculated by θ d, and p (w | t) is calculated by
Figure BDA0002799724820000093
And (4) calculating.
In practice, the current sum of θ d is used
Figure BDA0002799724820000094
We can calculate p (w | d) for a word in a document when it corresponds to any one of Topic, and then update Topic to which the word should correspond based on these results. Then, ifThis update changes Topic to which the word corresponds, which in turn affects θ d and
Figure BDA0002799724820000095
when the LDA algorithm starts, θ d and θ d are randomly given
Figure BDA0002799724820000096
Assign a value (for all d and t). The above process is then repeated, and the final converged result is the LDA output. The iterative learning process is described in more detail:
1. for the ith word wi in a particular document ds, if let topic be tj corresponding to the word, the above formula can be rewritten as:
pj(wi|ds)=p(wi|tj)*p(tj|ds)
2. we can now enumerate topic in T, resulting in all pj (wi | ds), where j takes on values 1-k. A topic may then be selected for the ith word wi in ds based on these probability value results. The simplest idea is to take the largest tj (note that only j is a variable in this equation) for pj (wi | ds), i.e., argmax [ j ] pj (wi | ds)
3. Then, if the ith word wi in ds selects a topic different from the original, then the sum of θ d and θ d will be
Figure BDA0002799724820000101
Has an effect (which can be easily known from the calculation formula of the two vectors mentioned earlier). Their influence in turn affects the calculation of p (w | d) mentioned above. One calculation of p (w | D) is done for all w in all D in D and reselecting topic is considered as one iteration. After n loop iterations, the desired result for LDA is converged.
S4: constructing a feature vector model according to the preprocessed expert information and the preprocessed demand information;
specifically, the constructing a feature vector model according to the preprocessed expert information and the preprocessed demand information specifically includes:
performing feature extraction on the expert information and the demand information by using a TF-IDF algorithm to obtain information subject terms;
and performing feature selection on the information subject term, and constructing a feature vector and a feature vector model based on the selected feature subject term.
TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of relevance between a document and a user query. In addition to TF-IDF, search engines on the internet use a ranking method based on link analysis to determine the order in which documents appear in search results.
The main idea of TFIDF is: if a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. TFIDF is actually: TF, IDF, TF Term Frequency (Term Frequency), IDF Inverse file Frequency (Inverse Document Frequency). TF represents the frequency with which terms appear in document d. The main idea of IDF is: if the documents containing the entry t are fewer, that is, the smaller n is, the larger IDF is, the entry t has good category distinguishing capability. If the number of documents containing the entry t in a certain class of document C is m, and the total number of documents containing the entry t in other classes is k, it is obvious that the number of documents containing t is m + k, when m is large, n is also large, and the value of the IDF obtained according to the IDF formula is small, which means that the category distinguishing capability of the entry t is not strong. In practice, however, if a term frequently appears in a document of a class, it indicates that the term can well represent the characteristics of the text of the class, and such terms should be given higher weight and selected as characteristic words of the text of the class to distinguish the document from other classes. In a given document, the Term Frequency (TF) refers to the frequency with which a given word appears in the document. This number is a normalization of the number of words (term count) to prevent it from biasing towards long documents.
S5: and carrying out similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result.
Specifically, the obtaining of the expert recommendation result by performing similarity calculation analysis according to the feature vectors in the feature vector model specifically includes:
calculating and analyzing according to the feature vectors in the feature vector model by combining a calculation factor of the number of the same feature words of the text in the total length of the feature vectors of the text on the basis of cosine similarity analysis to obtain an expert recommendation result;
wherein, the similarity analysis is calculated by adopting the following formula:
Figure BDA0002799724820000121
wherein c is a proportional adjustment coefficient, N (D, E) represents the number of the same feature words in the requirement information D and the expert information E, Min (D, E) represents the smaller of the total number of the features of the requirement information D and the total number of the features of the expert information E, sim (D, E) represents the cosine similarity of the requirement information D and the expert information E.
The calculation process of the cosine similarity is as follows:
Figure BDA0002799724820000122
wherein Vt,AAnd Vt,BThe weights of the t-th feature words of the vectors A and B, respectively.
Example two:
as shown in fig. 2:
the present disclosure also provides an expert recommending apparatus for enterprise demand, including:
a data collection module 201, configured to collect expert thesis data and enterprise demand data;
the preprocessing module 202 is configured to preprocess the collected expert thesis data and the collected enterprise demand data to obtain expert information and demand information;
the keyword extraction module 203 is configured to perform keyword extraction on the preprocessed expert information and the preprocessed demand information to obtain expert characteristic information and demand characteristic information;
the vector model construction module 204 is used for constructing a feature vector model according to the expert feature information and the requirement feature information after the keyword extraction;
and the similarity analysis module 205 is configured to perform similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result.
The data collection module 201 of the present disclosure is sequentially connected to the preprocessing module 202, the keyword extraction module 203, the vector model construction module 204, and the similarity analysis module 205.
Further, the collecting the expert thesis data and the enterprise requirement data specifically includes:
and collecting the title, abstract and/or keyword data of the expert paper according to the paper database and selecting an online internet website to collect the title, keyword and/or requirement detail data of the enterprise requirement.
Example three:
the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the above-described enterprise demand oriented expert recommendation method when executed by a processor.
The computer storage medium of the present disclosure may be implemented with a semiconductor memory, a magnetic core memory, a magnetic drum memory, or a magnetic disk memory.
Semiconductor memories are mainly used as semiconductor memory elements of computers, and there are two types, Mos and bipolar memory elements. Mos devices have high integration, simple process, but slow speed. The bipolar element has the advantages of complex process, high power consumption, low integration level and high speed. NMos and CMos were introduced to make Mos memory dominate in semiconductor memory. NMos is fast, e.g. 45ns for 1K bit sram from intel. The CMos power consumption is low, and the access time of the 4K-bit CMos static memory is 300 ns. The semiconductor memories described above are all Random Access Memories (RAMs), i.e. read and write new contents randomly during operation. And a semiconductor Read Only Memory (ROM), which can be read out randomly but cannot be written in during operation, is used to store solidified programs and data. The ROM is classified into a non-rewritable fuse type ROM, PROM, and a rewritable EPROM.
The magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of practical use experience. Magnetic core memories were widely used as main memories before the mid 70's. The storage capacity can reach more than 10 bits, and the access time is 300ns at the fastest speed. The typical international magnetic core memory has a capacity of 4 MS-8 MB and an access cycle of 1.0-1.5 mus. After semiconductor memory is rapidly developed to replace magnetic core memory as a main memory location, magnetic core memory can still be applied as a large-capacity expansion memory.
Drum memory, an external memory for magnetic recording. Because of its fast information access speed and stable and reliable operation, it is being replaced by disk memory, but it is still used as external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and micro computers, subminiature magnetic drums have emerged, which are small, lightweight, highly reliable, and convenient to use.
Magnetic disk memory, an external memory for magnetic recording. It combines the advantages of drum and tape storage, i.e. its storage capacity is larger than that of drum, its access speed is faster than that of tape storage, and it can be stored off-line, so that the magnetic disk is widely used as large-capacity external storage in various computer systems. Magnetic disks are generally classified into two main categories, hard disks and floppy disk memories.
Hard disk memories are of a wide variety. The structure is divided into a replaceable type and a fixed type. The replaceable disk is replaceable and the fixed disk is fixed. The replaceable and fixed magnetic disks have both multi-disk combinations and single-chip structures, and are divided into fixed head types and movable head types. The fixed head type magnetic disk has a small capacity, a low recording density, a high access speed, and a high cost. The movable head type magnetic disk has a high recording density (up to 1000 to 6250 bits/inch) and thus a large capacity, but has a low access speed compared with a fixed head magnetic disk. The storage capacity of a magnetic disk product can reach several hundred megabytes with a bit density of 6250 bits per inch and a track density of 475 tracks per inch. The disk set of the multiple replaceable disk memory can be replaced, so that the disk set has large off-body capacity, large capacity and high speed, can store large-capacity information data, and is widely applied to an online information retrieval system and a database management system.
Example four:
the present disclosure also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the expert recommendation method for enterprise needs are implemented.
Fig. 3 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 3, the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus. The storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions, when executed by the processor, can enable the processor to implement an enterprise requirement-oriented expert recommendation method. The processor of the electrical device is used to provide computing and control capabilities to support the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method of thread timeout fault detection. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 3 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The electronic device includes, but is not limited to, a smart phone, a computer, a tablet, a wearable smart device, an artificial smart device, a mobile power source, and the like.
The processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing remote data reading and writing programs, etc.) stored in the memory and calling data stored in the memory.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connected communication between the memory and at least one processor or the like.
Fig. 3 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the electronic device may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. An expert recommendation method facing enterprise needs is characterized by comprising the following steps:
collecting expert thesis data and enterprise demand data;
preprocessing the collected expert thesis data and the enterprise demand data to obtain expert information and demand information;
extracting keywords from the preprocessed expert information and the preprocessed demand information to obtain expert characteristic information and demand characteristic information;
constructing a feature vector model according to the expert feature information and the demand feature information after the keyword extraction;
and carrying out similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result.
2. The method of claim 1, wherein the collecting expert paper data and enterprise demand data specifically comprises:
and collecting the title, abstract and/or keyword data of the expert paper according to the paper database and selecting an online internet website to collect the title, keyword and/or requirement detail data of the enterprise requirement.
3. The method of claim 1, wherein the preprocessing the collected expert paper data and the enterprise demand data specifically comprises:
respectively performing word segmentation on the expert thesis data and the enterprise demand data by adopting an LTP model to obtain expert word segmentation data and enterprise word segmentation data;
removing stop words from the participled expert participle data and the participled enterprise data;
and merging the repeated information in the data without the stop words to respectively obtain the expert information and the demand information.
4. The method according to claim 1, wherein the extracting the preprocessed expert information and the preprocessed requirement information includes:
and respectively extracting keywords from the expert information and the demand information by adopting an LDA (latent dirichlet allocation) model, and acquiring a keyword list of each piece of expert information and each piece of demand information to be used as the expert characteristic information and the demand characteristic information.
5. The method according to claim 1, wherein the constructing a feature vector model according to the expert feature information and the requirement feature information after keyword extraction specifically comprises:
performing feature extraction on the expert feature information and the demand feature information by using a TF-IDF algorithm to obtain an information subject term;
and performing feature selection on the information subject term, and constructing a feature vector and a feature vector model based on the selected feature subject term.
6. The method according to claim 1, wherein the obtaining of the expert recommendation result by performing similarity calculation analysis according to the feature vectors in the feature vector model specifically comprises:
calculating and analyzing according to the feature vectors in the feature vector model by combining a calculation factor of the number of the same feature words of the text in the total length of the feature vectors of the text on the basis of cosine similarity analysis to obtain an expert recommendation result;
wherein, the similarity analysis is calculated by adopting the following formula:
Figure FDA0002799724810000021
wherein c is a proportional adjustment coefficient, N (D, E) represents the number of the same feature words in the requirement information D and the expert information E, Min (D, E) represents the smaller of the total number of the features of the requirement information D and the total number of the features of the expert information E, sim (D, E) represents the cosine similarity of the requirement information D and the expert information E.
7. An expert recommendation device facing enterprise needs, comprising:
the data collection module is used for collecting expert thesis data and enterprise demand data;
the preprocessing module is used for preprocessing the collected expert thesis data and the enterprise demand data to obtain expert information and demand information;
the keyword extraction module is used for extracting keywords from the preprocessed expert information and the preprocessed demand information to obtain expert characteristic information and demand characteristic information;
the vector model construction module is used for constructing a feature vector model according to the expert feature information and the requirement feature information after the keyword extraction;
and the similarity analysis module is used for carrying out similarity calculation analysis according to the feature vectors in the feature vector model to obtain an expert recommendation result.
8. The apparatus of claim 7, wherein the collecting expert paper data and enterprise demand data specifically comprises:
and collecting the title, abstract and/or keyword data of the expert paper according to the paper database and selecting an online internet website to collect the title, keyword and/or requirement detail data of the enterprise requirement.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the enterprise requirements oriented expert recommendation method as claimed in any one of claims 1 to 6 when executing the computer program.
10. A computer storage medium having computer program instructions stored thereon, wherein the program instructions, when executed by a processor, are adapted to perform the steps corresponding to the enterprise needs oriented expert recommendation method of any of claims 1-6.
CN202011345299.6A 2020-11-26 2020-11-26 Enterprise demand oriented expert recommendation method, device, medium and equipment Pending CN112487161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011345299.6A CN112487161A (en) 2020-11-26 2020-11-26 Enterprise demand oriented expert recommendation method, device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011345299.6A CN112487161A (en) 2020-11-26 2020-11-26 Enterprise demand oriented expert recommendation method, device, medium and equipment

Publications (1)

Publication Number Publication Date
CN112487161A true CN112487161A (en) 2021-03-12

Family

ID=74934837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011345299.6A Pending CN112487161A (en) 2020-11-26 2020-11-26 Enterprise demand oriented expert recommendation method, device, medium and equipment

Country Status (1)

Country Link
CN (1) CN112487161A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240562A (en) * 2021-05-27 2021-08-10 南通大学 Method and system for recommending and matching obstetrical and academic research projects based on nlp
CN114118760A (en) * 2021-11-19 2022-03-01 西南交通大学 RAMS demand analysis method for key parts of high-speed train
CN116701633A (en) * 2023-06-14 2023-09-05 上交所技术有限责任公司 Industry classification method based on patent big data
CN117131279A (en) * 2023-09-13 2023-11-28 合肥工业大学 Data processing method and device for expert recommendation
CN117495142A (en) * 2023-11-18 2024-02-02 北京连华永兴科技发展有限公司 Enterprise water treatment scheme recommendation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885761A (en) * 2019-01-24 2019-06-14 暨南大学 A kind of position of human resources enterprise and talent's matching and recommended method
CN111782797A (en) * 2020-07-13 2020-10-16 贵州省科技信息中心 Automatic matching method for scientific and technological project review experts and storage medium
CN111813898A (en) * 2020-08-28 2020-10-23 北京智源人工智能研究院 Expert recommendation method, device and equipment based on semantic search and storage medium
US20200342332A1 (en) * 2019-04-29 2020-10-29 Kenneth Neumann Methods and systems for classification using expert data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885761A (en) * 2019-01-24 2019-06-14 暨南大学 A kind of position of human resources enterprise and talent's matching and recommended method
US20200342332A1 (en) * 2019-04-29 2020-10-29 Kenneth Neumann Methods and systems for classification using expert data
CN111782797A (en) * 2020-07-13 2020-10-16 贵州省科技信息中心 Automatic matching method for scientific and technological project review experts and storage medium
CN111813898A (en) * 2020-08-28 2020-10-23 北京智源人工智能研究院 Expert recommendation method, device and equipment based on semantic search and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李莹: ""面向企业需求的专家推荐算法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240562A (en) * 2021-05-27 2021-08-10 南通大学 Method and system for recommending and matching obstetrical and academic research projects based on nlp
CN114118760A (en) * 2021-11-19 2022-03-01 西南交通大学 RAMS demand analysis method for key parts of high-speed train
CN114118760B (en) * 2021-11-19 2023-04-07 西南交通大学 RAMS demand analysis method for key parts of high-speed train
CN116701633A (en) * 2023-06-14 2023-09-05 上交所技术有限责任公司 Industry classification method based on patent big data
CN117131279A (en) * 2023-09-13 2023-11-28 合肥工业大学 Data processing method and device for expert recommendation
CN117495142A (en) * 2023-11-18 2024-02-02 北京连华永兴科技发展有限公司 Enterprise water treatment scheme recommendation method and system
CN117495142B (en) * 2023-11-18 2024-07-19 北京连华永兴科技发展有限公司 Enterprise water treatment scheme recommendation method and system

Similar Documents

Publication Publication Date Title
Qiang et al. Short text topic modeling techniques, applications, and performance: a survey
Yan et al. Network-based bag-of-words model for text classification
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN103514183B (en) Information search method and system based on interactive document clustering
CN112487161A (en) Enterprise demand oriented expert recommendation method, device, medium and equipment
CN109960756B (en) News event information induction method
Yao et al. Bursty event detection from collaborative tags
US20080040342A1 (en) Data processing apparatus and methods
CN111401045A (en) Text generation method and device, storage medium and electronic equipment
CN110162522A (en) A kind of distributed data search system and method
Remi et al. Domain ontology driven fuzzy semantic information retrieval
Martín et al. Using semi-structured data for assessing research paper similarity
CN106372122A (en) Wiki semantic matching-based document classification method and system
Maiya et al. Topic similarity networks: visual analytics for large document sets
Bouchakwa et al. Multi-level diversification approach of semantic-based image retrieval results
CN111522950A (en) Rapid identification system for unstructured massive text sensitive data
Yamunathangam et al. An overview of topic representation and topic modelling methods for short texts and long corpus
Xu et al. Measuring semantic relatedness between flickr images: from a social tag based view
Xia et al. Content-irrelevant tag cleansing via bi-layer clustering and peer cooperation
Zhao et al. Lsif: A system for large-scale information flow detection based on topic-related semantic similarity measurement
Zhu et al. Customized organization of social media contents using focused topic hierarchy
CN113761125A (en) Dynamic summary determination method and device, computing equipment and computer storage medium
Alfarra et al. Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)
Liu et al. A query suggestion method based on random walk and topic concepts
Zhu et al. A sample extension method based on Wikipedia and its application in text classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210326

Address after: 100084 b201c-1, 3rd floor, building 8, yard 1, Zhongguancun East Road, Haidian District, Beijing

Applicant after: Beijing innovation Zhiyuan Technology Co.,Ltd.

Address before: B201d-1, 3rd floor, building 8, yard 1, Zhongguancun East Road, Haidian District, Beijing 100083

Applicant before: Beijing Zhiyuan Artificial Intelligence Research Institute

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210624

Address after: 100084 603a, 6th floor, building 6, yard 1, Zhongguancun East Road, Haidian District, Beijing

Applicant after: Beijing Zhipu Huazhang Technology Co.,Ltd.

Address before: 100084 b201c-1, 3rd floor, building 8, yard 1, Zhongguancun East Road, Haidian District, Beijing

Applicant before: Beijing innovation Zhiyuan Technology Co.,Ltd.

TA01 Transfer of patent application right
CB03 Change of inventor or designer information

Inventor after: Hu Di

Inventor after: Liu Debing

Inventor after: Zhang Peng

Inventor after: Qiu Yu

Inventor after: Wang Xiaochen

Inventor before: Hu Di

Inventor before: Tang Jie

Inventor before: Liu Debing

Inventor before: Zhang Peng

Inventor before: Qiu Yu

Inventor before: Wang Xiaochen

CB03 Change of inventor or designer information
RJ01 Rejection of invention patent application after publication

Application publication date: 20210312

RJ01 Rejection of invention patent application after publication