CN110287236A - A kind of data digging method based on interview information, system and terminal device - Google Patents
A kind of data digging method based on interview information, system and terminal device Download PDFInfo
- Publication number
- CN110287236A CN110287236A CN201910553409.9A CN201910553409A CN110287236A CN 110287236 A CN110287236 A CN 110287236A CN 201910553409 A CN201910553409 A CN 201910553409A CN 110287236 A CN110287236 A CN 110287236A
- Authority
- CN
- China
- Prior art keywords
- word
- sentence
- attribute
- location information
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 83
- 239000011159 matrix material Substances 0.000 claims abstract description 64
- 238000004590 computer program Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 10
- 238000005498 polishing Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 6
- 239000000463 material Substances 0.000 claims 1
- 230000007115 recruitment Effects 0.000 abstract description 5
- 238000009412 basement excavation Methods 0.000 abstract description 2
- 238000007418 data mining Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
- G06Q10/1053—Employment or hiring
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Entrepreneurship & Innovation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Strategic Management (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention is suitable for technical field of data processing, provides a kind of data digging method based on interview information, system and terminal device, and method includes: to obtain target corpus, and target corpus is arranged as M sentence;Convolutional neural networks CNN model is established according to sentence;Obtain the term vector matrix in CNN model;Using location information editor's term vector matrix, the term vector matrix with location information is obtained, and by the term vector matrix training CNN model with location information, so that the attribute word in CNN model output target corpus;According to attribute word, in target corpus, the interviewee with objective attribute target attribute is obtained.The rapid computations and accurate excavation to interview information may be implemented through the invention, screen interviewee corresponding with recruitment needs.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of data digging method based on interview information, it is
System and terminal device.
Background technique
It is particularly significant to the management and use of data in big data era.How effectively how data are collected from many aspects,
Using existing data, how many income can be obtained in a large amount of data by determining.In mass data, can directly it obtain
The major part for taking and using is text data, these data are related to all trades and professions of society.In face of the text of the scale of construction huge in this way
Data, text classification are to handle the core means of text data, are had very in terms of the efficient management and use of text data
Important meaning.For example, collecting the speech data of interviewee when being interviewed on a large scale, corpus is obtained, using corpus as base
Plinth carries out text classification processing, can effectively extract key message, solve the problems, such as that interview information is mixed and disorderly, to facilitate
HR retrieves the information of needs, screens interviewee.And the text categorization task for containing a large amount of corpus, usually use neural network
Algorithm solves.
Currently, the neural network for being suitable for text classification is divided into CNN (Convolutional Neural Networks, volume
Product neural network) and RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network) two classes, wherein CNN ratio RNN operation
Faster, the computing resource needed is less, however, being trained by CNN to corpus, can not capture the position of vocabulary in corpus
Information is easy to influence the data mining accuracy rate of interview information;And although RNN can capture location information, it needs big
The computing resource of amount, and there are problems that gradient disappearance, text classification result can be made error occur, equally influence interview information
Data mining accuracy rate.
Summary of the invention
It is a primary object of the present invention to propose that a kind of data digging method based on interview information, system and terminal are set
It is standby, to solve in the prior art, by the Application of Neural Network of text classification in the data mining of interview information, obtained text
The problem of this classification resultant error is big, influences the data mining accuracy rate of interview information.
To achieve the above object, first aspect of the embodiment of the present invention provides a kind of data mining side based on interview information
Method, comprising:
Target corpus is obtained, the target corpus is arranged as M sentence, wherein M is positive integer, and target corpus is face
Try information;
Convolutional neural networks CNN model is established according to the sentence;
Obtain the term vector matrix in the CNN model;
Using term vector matrix described in location information editor, the term vector matrix with location information is obtained, and passes through institute
The term vector matrix training CNN model with location information is stated, so that the CNN model exports in the target corpus
Attribute word, wherein the attribute word includes the word with category attribute and the word with position attribution;
According to the attribute word, in the target corpus, the interviewee with objective attribute target attribute is obtained.
In conjunction with first aspect present invention, in first embodiment of the invention, the acquisition target corpus is whole by target corpus
Reason is M sentence, comprising:
Predetermined word joint number is set;
It according to the predetermined word joint number, is intercepted in the target corpus, obtains the identical institute's predicate of M byte number
Sentence.
It is described that convolutional Neural is established according to the sentence in second embodiment of the invention in conjunction with first aspect present invention
Network C NN model, comprising:
I-th sentence is divided into N number of original word, and the original word is set as K dimensional vector, wherein i is
Positive integer less than or equal to M, K and N are positive integer;
Based on i-th sentence, the CNN model of N × K is established.
In conjunction with first aspect present invention first embodiment and second embodiment, in third embodiment of the invention, when
When byte number in the sentence is less than the predetermined word joint number, with 0 polishing;
When the dimension of the original word is less than K, with 0 polishing;
When the number of the original word is less than N, with 0 polishing.
In conjunction with first aspect present invention, in four embodiment of the invention, it is described using location information editor institute's predicate to
Moment matrix obtains the term vector matrix with location information, and passes through the term vector matrix training institute with location information
CNN model is stated, so that the CNN model exports the attribute word in the target corpus, comprising:
According to original word i-th sentence location information, by the original word be encoded to vector be spliced to word to
Layer is measured, the term vector matrix with location information is obtained;
By the feature in the term vector matrix described in CNN model extraction with location information, i-th sentence is exported
In attribute word;
According to the attribute word of sentence described in M item, the attribute word in the target corpus is obtained.
It is described according to original word in fifth embodiment of the invention in conjunction with the 4th embodiment of first aspect present invention
In the location information of i-th sentence, the original word is encoded to vector and is spliced to term vector layer, obtained described with position
The term vector matrix of information, comprising:
It obtains in i-th sentence, the type of j-th of original word, wherein j is just whole less than or equal to N
Number;
When the type of j-th of original word is verb, j-th of original word, jth -1 original are obtained
The location information of beginning word and jth+1 original word in i-th sentence;
By j-th of original word, jth -1 original word and jth+1 original word in i-th institute
Location information in predicate sentence is encoded to vector and is spliced to the term vector layer, obtains the term vector with location information
Matrix.
Second aspect of the embodiment of the present invention provides a kind of data digging system based on interview information, comprising:
Corpus sorting module arranges the target corpus for M sentence, wherein M is positive for obtaining target corpus
Integer, target corpus are interview information;
Model construction module, for establishing convolutional neural networks CNN model according to the sentence;
Term vector obtains module, for obtaining the term vector matrix in the CNN model;
Attribute word obtains module, for using term vector matrix described in location information editor, obtaining to have location information
Term vector matrix, and by the term vector matrix training CNN model with location information, so that the CNN mould
Type exports the attribute word in the target corpus, wherein the attribute word includes having the word of category attribute and having
The word of position attribution;
Destination selection module, for according to the attribute word, in the target corpus, obtaining to have objective attribute target attribute
Interviewee.
In conjunction with second aspect of the present invention, in first embodiment of the invention, corpus sorting module includes:
Byte number setting unit, for predetermined word joint number to be arranged;
Sentence interception unit obtains M item for being intercepted in the target corpus according to the predetermined word joint number
The identical sentence of byte number.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in
In above-mentioned memory and the computer program that can be run on above-mentioned processor, when above-mentioned processor executes above-mentioned computer program
The step of realizing method provided by first aspect as above.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, above-mentioned computer-readable storage
Media storage has computer program, and above-mentioned computer program realizes method provided by first aspect as above when being executed by processor
The step of.
The embodiment of the present invention proposes a kind of data digging method based on interview information, and target corpus is divided into a plurality of language
Sentence, then establishes convolutional neural networks CNN model using the sentence in target corpus, then obtain the term vector matrix of CNN model,
Location information is added in term vector matrix, the term vector matrix with location information is obtained, uses the word with location information
Vector matrix carries out classifying text task in CNN model, so that the result of CNN model output is including position in target corpus
The attribute word of attribute and category attribute is set, at this time according to attribute word to target corpus, i.e. interview information carries out data mining,
Recruitment needs can be corresponded to, the interviewee with corresponding objective attribute target attribute is obtained, wherein use the term vector square with location information
When battle array carries out classifying text task in CNN model, feature extraction of the CNN model to word can be influenced by location information,
So as to while capturing location information, promote the accuracy rate of text classification.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram for the data digging method based on interview information that the embodiment of the present invention one provides;
Fig. 2 is the implementation process schematic diagram of the data digging method provided by Embodiment 2 of the present invention based on interview information;
Fig. 3 is the detailed implementation process schematic diagram of step S1042 in Fig. 2;
Fig. 4 is the composed structure schematic diagram for the data digging system based on interview information that the embodiment of the present invention four provides.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or device.
Herein, using the suffix for indicating such as " module ", " component " or " unit " of element only for advantageous
In explanation of the invention, there is no specific meanings for itself.Therefore, " module " can be used mixedly with " component ".
In subsequent description, inventive embodiments serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
Embodiment one
As shown in Figure 1, the embodiment of the invention provides a kind of data digging method based on interview information, may be implemented pair
The rapid computations of interview information and accurate excavation, screen interviewee corresponding with recruitment needs.In embodiments of the present invention, it is based on
The data digging method of interview information may include:
S101, target corpus is obtained, the target corpus is arranged as M sentence.
Wherein, M is positive integer, and target corpus is interview information.
In above-mentioned steps S101, target corpus is the basic unit for constituting corpus, and corpus is usually expressed as textual data
According to form, therefore target corpus is also the form of text data.
In a particular application, interview information can be the speech record based on interviewee, and acquisition modes can be with are as follows: in face
The interview process of examination person is recorded, and the target corpus of form of textual data is then obtained by recording file, in target corpus
Speech record including at least one interviewee.
In one embodiment, a kind of implementation of above-mentioned steps S101 can be with are as follows:
Predetermined word joint number is set;
It according to the predetermined word joint number, is intercepted in the target corpus, obtains the identical institute's predicate of M byte number
Sentence.
In above-mentioned implementation, by optimizing the method for sorting of corpus, keep the byte number of every sentence identical, it can be with
Improve the efficiency and accuracy of text classification.
S102, convolutional neural networks CNN model is established according to the sentence.
In one embodiment, a kind of implementation of above-mentioned steps S102 can be with are as follows:
I-th sentence is divided into N number of original word, and the original word is set as K dimensional vector, wherein i is
Positive integer less than or equal to M, K and N are positive integer;
Based on i-th sentence, the CNN model of N × K is established.
Wherein, original word derives from sentence, also the word to extract from target corpus.
In a particular application, original word is set to K dimensional vector, but in N number of original word, there are some original words
Dimension is less than the case where the case where K or original word dimension are greater than K.And in CNN model, it can be by a hidden layer, it will
The word of initial coding projects in a lower dimensional space, reduces the dimension of original word, therefore, the numerical value of K can be set as
Greatest measure, to guarantee the uniformity of matrix, and the case where be not in that dimension is excessively high, influence arithmetic speed, wherein maximum number
Value indicates to make the dimension of any original word to be less than K.
It similarly, is the uniformity for guaranteeing sentence, every sentence is divided into N number of original word, and the numerical value of N is set as maximum
Numerical value, the original word marked off in i-th sentence are consistently less than N number of.
In conjunction with a kind of implementation of above-mentioned steps S101 and step S102, the embodiment of the present invention also proposes a kind of realization side
Formula, dimension, the original word number and predetermined word joint number in sentence of unified original word.Implementation are as follows:
When the byte number in the sentence is less than the predetermined word joint number, with 0 polishing;
When the dimension of the original word is less than K, with 0 polishing;
When the number of the original word is less than N, with 0 polishing.
In a particular application, dimension, the word number in sentence, predetermined word joint number lacked with 0 polishing, to be aligned square
Battle array reduces the computing resource of text classification convenient for establishing unified CNN model, improves computational efficiency.
Term vector matrix in S103, the acquisition CNN model.
CNN model in above-mentioned steps S102 neutralization procedure S103, can apply in image characteristics extraction, can also answer
In text classification.And the embodiment of the present invention carries out text categorization task based on the sentence in target corpus, then establishes text
The CNN model of classification, enables CNN model to handle each sentence, then the term vector that different sentences are exported by CNN model
Matrix is different.
S104, using term vector matrix described in location information editor, obtain the term vector matrix with location information, and lead to
The term vector matrix training CNN model with location information is crossed, so that the CNN model exports the target corpus
In attribute word.
In above-mentioned steps S104, attribute word includes the word with category attribute and the word with position attribution;
Location information is the positional relationship in sentence between each word, that is, the relationship between each original word hereinafter.
If the interview in target corpus including multiple interviewees records, the attribute word in target corpus comes from multiple faces
Examination person.
S105, according to the attribute word, in the target corpus, obtain have objective attribute target attribute interviewee.
It, can be accurate since attribute word is the word with category attribute and position attribution in above-mentioned steps S105
Expression interviewee relevant information, reduce the mistake in semantic analysis, therefore sieved in target corpus according to attribute word
When selecting interviewee corresponding with recruitment needs, the interviewee with objective attribute target attribute can be accurately found.
Target corpus is divided into a plurality of sentence by the data digging method provided in an embodiment of the present invention based on interview information,
Then convolutional neural networks CNN model is established using the sentence in target corpus, then obtains the term vector matrix of CNN model, it will
Location information be added term vector matrix in, obtain have location information term vector matrix, using the word with location information to
Moment matrix carries out classifying text task in CNN model, so that the result of CNN model output is including position in target corpus
The attribute word of attribute and category attribute, at this time according to attribute word to target corpus, i.e. interview information carries out data mining, can
To correspond to recruitment needs, the interviewee with corresponding objective attribute target attribute is obtained, wherein use the term vector matrix with location information
When carrying out classifying text task in CNN model, feature extraction of the CNN model to word can be influenced by location information, from
And the accuracy rate of text classification can be promoted while capturing location information.
Embodiment two
As shown in Fig. 2, the embodiment of the present invention is illustrated the detailed implementation process of step S104 in embodiment one, it is above-mentioned
A kind of implementation of step S104 are as follows:
S1041, according to original word i-th sentence location information, by the original word be encoded to vector splicing
To term vector layer, the term vector matrix with location information is obtained.
In above-mentioned steps S1041, term vector matrix is a part of CNN model output, and i-th sentence is directly inputted
It is trained in CNN model, then in the term vector matrix obtained, each term vector corresponds to an original word.
In embodiments of the present invention, it additionally provides and the location information of original word is encoded to vector, be spliced to term vector
The process of layer:
Wherein, the location information of original word is encoded to vector, when being spliced to term vector layer, setting position information coding
Weight is all 1, no bias term.
The process that original word is encoded to vector is schematically illustrated below:
On the basis of traditional textcnn, the location information of original word is encoded to the vector of 100 dimensions, is spliced to
Simultaneously setting position information coding weight is all 1 to term vector layer:
PE (POS, 2i)=sin (pos/10000^ (2i/d presets dimension))
PE (POS, 2i+1)=cos (pos/10000^ (2i/d presets dimension))
Wherein, pos is position of the vocabulary in sentence, and i is i-th of dimension of position vector.
S1042, by described in CNN model extraction with location information term vector matrix in feature, export i-th institute
Attribute word in predicate sentence.
In above-mentioned steps S1042, attribute word be by CNN model training after the completion of, have category attribute and position
The word of attribute, while classification and the position of word are reflected, and the position attribution of word influences the category attribute of word.
In a particular application, if the term vector matrix that CNN model is directly constituted original word is trained, then instructing
During white silk, location information of the original word in training matrix can be only obtained, original word cannot be directly obtained in sentence
In location information, i.e. positional relationship in sentence between each word.
The attribute word of S1043, the sentence according to M item obtain the attribute word in the target corpus.
Above-mentioned steps S1043 is equivalent to M times and repeats step S1042, to obtain at most M × N number of category in target corpus
Property word.
As shown in figure 3, the embodiment of the present invention also shows a kind of implementation of above-mentioned steps S1042, above-mentioned steps
S1042 may include:
S10421, obtain i-th sentence in, the type of j-th of original word, wherein j be less than or equal to
The positive integer of N.
In above-mentioned steps S10421, in Text Classification, the type of each original word can be straight in sentence
It obtains, for example, noun shows as/n, verb shows as/v, and adjective shows as/adj, and preposition shows as/vj, wherein
During CNN text classification, to reduce inessential text data, usual automatic fitration preposition adjective etc. is not needed point
The text of class.
S10422, when the type of j-th of original word be verb when, obtain j-th original word, jth -1
The location information of a original word and jth+1 original word in i-th sentence.
In above-mentioned steps S10422, if the noun position before and after verb is unclear, and directly passes through CNN model training,
To then mistake semantically be caused, for example, " the management in " in/school of management/study ", and " in/study/school of management "
School " is noun, but meaning is not identical.
S10423, j-th of original word, jth -1 original word and jth+1 original word are existed
Location information in i-th sentence is encoded to vector and is spliced to the term vector layer, has location information described in acquisition
Term vector matrix.
S10421 to step S10423 through the above steps is executing the word for having location information by CNN model extraction
Before feature in vector matrix, text data has been screened, has improved data mining efficiency.
The embodiment of the present invention illustrates provided by embodiment one and the embodiment of the present invention also for interviewing scene based on face
Try the data digging method of information, effect in practical applications.
Where it is assumed that sorting out " I learns in school of management " this sentence from the corpus of A interviewee.
Firstly, sentence " I learns in school of management " is divided into 4 original words: " I " " " by step S1041
" school of management " " study ".
Then, it executes specific implementation provided by step S1042 and the embodiment of the present invention: obtaining the class of each word
Type, wherein " study " is verb, and+1 original word of jth is not present, then directly acquires the location information a of " school of management ",
The location information b of " study ", it is known that a=3, b=4 add position to original word in term vector matrix according to location information
It include the original word with location information in term vector matrix after setting coding, it is assumed that its form of expression is " school of management3",
" study4”。
Finally, carrying out step S1043, by the feature in CNN model extraction sentence " I learns in school of management ", have
Original word " the school of management of location information3", after being converted to attribute word, position attribution will affect categorical data, into
When row feature extraction, " school of management3" not as the feature of " study ", and by " school of management3" feature as " school ", most
" school of management is showed themselves in that in whole classification results3" it is classified as educational background, without being classified as vocational skills.
Embodiment three
As shown in figure 4, the embodiment of the invention provides a kind of data digging system 40 based on interview information, using having
The term vector matrix of location information carries out classifying text task in CNN model, to influence CNN model pair by location information
The feature extraction of word is realized while capturing location information, promotes the accuracy rate of text classification, data digging system 40
Include:
Corpus sorting module 41 arranges target corpus for M sentence, wherein M is positive whole for obtaining target corpus
Number, target corpus are interview information;
Model construction module 42, for establishing convolutional neural networks CNN model according to sentence;
Term vector obtains module 43, for obtaining the term vector matrix in CNN model;
Attribute word obtains module 44, and for using location information editor term vector matrix, obtaining has location information
Term vector matrix, and by the term vector matrix training CNN model with location information, so that CNN model exports target corpus
In attribute word, wherein attribute word includes the word with category attribute and the word with position attribution;
Destination selection module 45, in target corpus, obtaining the interview with objective attribute target attribute according to attribute word
Person.
In one embodiment, corpus sorting module 41 may include:
Byte number setting unit, for predetermined word joint number to be arranged;
Sentence interception unit obtains M byte number phase for being intercepted in target corpus according to predetermined word joint number
Same sentence.
In one embodiment, model construction module 42 may include:
Original word for i-th sentence to be divided into N number of original word, and is set as K dimension by original word division unit
Vector, wherein i is the positive integer less than or equal to M, and K and N are positive integer;
CNN model construction unit establishes the CNN model of N × K for being based on i-th sentence.
The embodiment of the present invention also provide a kind of terminal device include memory, processor and storage on a memory and can be
The computer program run on processor when the processor executes the computer program, is realized as described in embodiment one
The data digging method based on interview information in each step.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium is computer readable storage medium, thereon
It is stored with computer program, when the computer program is executed by processor, is realized as described in embodiment one based on interview
Each step in the data digging method of information.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although previous embodiment
Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation
Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or
Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of data digging method based on interview information characterized by comprising
Target corpus is obtained, the target corpus is arranged as M sentence, wherein M is positive integer, and target corpus is interview letter
Breath;
Convolutional neural networks CNN model is established according to the sentence;
Obtain the term vector matrix in the CNN model;
Using term vector matrix described in location information editor, the term vector matrix with location information is obtained, and passes through the tool
There is the term vector matrix training CNN model of location information, so that the CNN model exports the attribute in the target corpus
Word, wherein the attribute word includes the word with category attribute and the word with position attribution;
According to the attribute word, in the target corpus, the interviewee with objective attribute target attribute is obtained.
2. as described in claim 1 based on the data digging method of interview information, which is characterized in that the acquisition target language
Material, target corpus is arranged as M sentence, comprising:
Predetermined word joint number is set;
It according to the predetermined word joint number, is intercepted in the target corpus, obtains the identical sentence of M byte number.
3. as described in claim 1 based on the data digging method of interview information, which is characterized in that described according to the sentence
Establish convolutional neural networks CNN model, comprising:
I-th sentence is divided into N number of original word, and the original word is set as K dimensional vector, wherein i be less than
Or the positive integer equal to M, K and N are positive integer;
Based on i-th sentence, the CNN model of N × K is established.
4. based on the data digging method of interview information as described in any one of Claims 2 or 3, which is characterized in that when institute's predicate
When byte number in sentence is less than the predetermined word joint number, with 0 polishing;
When the dimension of the original word is less than K, with 0 polishing;
When the number of the original word is less than N, with 0 polishing.
5. as described in claim 1 based on the data digging method of interview information, which is characterized in that described to use location information
Edit the term vector matrix, obtain the term vector matrix with location information, and by the word with location information to
The moment matrix training CNN model, so that the CNN model exports the attribute word in the target corpus, comprising:
According to original word in the location information of i-th sentence, the original word is encoded to vector and is spliced to term vector layer,
Obtain the term vector matrix with location information;
By the feature in the term vector matrix described in CNN model extraction with location information, export in i-th sentence
Attribute word;
According to the attribute word of sentence described in M item, the attribute word in the target corpus is obtained.
6. as claimed in claim 5 based on the data digging method of interview information, which is characterized in that described according to original word
In the location information of i-th sentence, the original word is encoded to vector and is spliced to term vector layer, obtained described with position
The term vector matrix of information, comprising:
It obtains in i-th sentence, the type of j-th of original word, wherein j is the positive integer less than or equal to N;
When the type of j-th of original word is verb, j-th of original word, jth -1 original list are obtained
The location information of word and jth+1 original word in i-th sentence;
By j-th of original word, jth -1 original word and jth+1 original word in i-th institute's predicate
Location information in sentence is encoded to vector and is spliced to the term vector layer, obtains the term vector matrix with location information.
7. a kind of data digging system based on interview information characterized by comprising
Corpus sorting module arranges the target corpus for M sentence for obtaining target corpus, wherein and M is positive integer,
Target corpus is interview information;
Model construction module, for establishing convolutional neural networks CNN model according to the sentence;
Term vector obtains module, for obtaining the term vector matrix in the CNN model;
Attribute word obtains module, for obtaining the word with location information using term vector matrix described in location information editor
Vector matrix, and by the term vector matrix training CNN model with location information, so that the CNN model is defeated
Attribute word in the target corpus out, wherein the attribute word includes having the word of category attribute and with position
The word of attribute;
Destination selection module, in the target corpus, obtaining the interview with objective attribute target attribute according to the attribute word
Person.
8. as claimed in claim 6 based on the data digging system of interview information, which is characterized in that corpus sorting module packet
It includes:
Byte number setting unit, for predetermined word joint number to be arranged;
Sentence interception unit obtains M byte for being intercepted in the target corpus according to the predetermined word joint number
The identical sentence of number.
9. a kind of terminal device, which is characterized in that on a memory and can be on a processor including memory, processor and storage
The computer program of operation, which is characterized in that when the processor executes the computer program, realize such as claim 1 to 6
Each step in described in any item data digging methods based on interview information.
10. a kind of storage medium, the storage medium is computer readable storage medium, is stored thereon with computer program,
It is characterized in that, when the computer program is executed by processor, realizes as claimed in any one of claims 1 to 6 based on interview
Each step in the data digging method of information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910553409.9A CN110287236B (en) | 2019-06-25 | 2019-06-25 | Data mining method, system and terminal equipment based on interview information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910553409.9A CN110287236B (en) | 2019-06-25 | 2019-06-25 | Data mining method, system and terminal equipment based on interview information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110287236A true CN110287236A (en) | 2019-09-27 |
CN110287236B CN110287236B (en) | 2024-03-19 |
Family
ID=68005621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910553409.9A Active CN110287236B (en) | 2019-06-25 | 2019-06-25 | Data mining method, system and terminal equipment based on interview information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287236B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160125659A1 (en) * | 2014-10-31 | 2016-05-05 | IntegrityWare, Inc. | Methods and systems for multilevel editing of subdivided polygonal data |
CN107239444A (en) * | 2017-05-26 | 2017-10-10 | 华中科技大学 | A kind of term vector training method and system for merging part of speech and positional information |
CN108399230A (en) * | 2018-02-13 | 2018-08-14 | 上海大学 | A kind of Chinese financial and economic news file classification method based on convolutional neural networks |
CN109189925A (en) * | 2018-08-16 | 2019-01-11 | 华南师范大学 | Term vector model based on mutual information and based on the file classification method of CNN |
CN109543084A (en) * | 2018-11-09 | 2019-03-29 | 西安交通大学 | A method of establishing the detection model of the hidden sensitive text of network-oriented social media |
-
2019
- 2019-06-25 CN CN201910553409.9A patent/CN110287236B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160125659A1 (en) * | 2014-10-31 | 2016-05-05 | IntegrityWare, Inc. | Methods and systems for multilevel editing of subdivided polygonal data |
CN107239444A (en) * | 2017-05-26 | 2017-10-10 | 华中科技大学 | A kind of term vector training method and system for merging part of speech and positional information |
CN108399230A (en) * | 2018-02-13 | 2018-08-14 | 上海大学 | A kind of Chinese financial and economic news file classification method based on convolutional neural networks |
CN109189925A (en) * | 2018-08-16 | 2019-01-11 | 华南师范大学 | Term vector model based on mutual information and based on the file classification method of CNN |
CN109543084A (en) * | 2018-11-09 | 2019-03-29 | 西安交通大学 | A method of establishing the detection model of the hidden sensitive text of network-oriented social media |
Also Published As
Publication number | Publication date |
---|---|
CN110287236B (en) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112241481B (en) | Cross-modal news event classification method and system based on graph neural network | |
Rosé et al. | Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning | |
CN109816032B (en) | Unbiased mapping zero sample classification method and device based on generative countermeasure network | |
Bekkerman et al. | High-precision phrase-based document classification on a modern scale | |
CN112819023B (en) | Sample set acquisition method, device, computer equipment and storage medium | |
CN113886567A (en) | Teaching method and system based on knowledge graph | |
CN114443899A (en) | Video classification method, device, equipment and medium | |
KR20200145299A (en) | Intelligent recruitment support platform based on online interview video analysis and social media information analysis | |
CN110674297A (en) | Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment | |
CN113627194B (en) | Information extraction method and device, and communication message classification method and device | |
Omurca et al. | A document image classification system fusing deep and machine learning models | |
Si et al. | Federated non-negative matrix factorization for short texts topic modeling with mutual information | |
Engin et al. | Multimodal deep neural networks for banking document classification | |
CN114780723A (en) | Portrait generation method, system and medium based on guide network text classification | |
Mahmud et al. | Deep learning based sentiment analysis from Bangla text using glove word embedding along with convolutional neural network | |
CN113011126A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN111859955A (en) | Public opinion data analysis model based on deep learning | |
CN115358477B (en) | Fight design random generation system and application thereof | |
Chaudhuri et al. | Automating assessment of design exams: a case study of novelty evaluation | |
Shanmukhaa et al. | Construction of knowledge graphs for video lectures | |
US20230138491A1 (en) | Continuous learning for document processing and analysis | |
CN110287236A (en) | A kind of data digging method based on interview information, system and terminal device | |
CN115130453A (en) | Interactive information generation method and device | |
Maharaj | Generalizing in the Real World with Representation Learning | |
Wang et al. | A cnn-based feature extraction scheme for patent analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |