CN110287236A - A kind of data digging method based on interview information, system and terminal device - Google Patents

A kind of data digging method based on interview information, system and terminal device Download PDF

Info

Publication number
CN110287236A
CN110287236A CN201910553409.9A CN201910553409A CN110287236A CN 110287236 A CN110287236 A CN 110287236A CN 201910553409 A CN201910553409 A CN 201910553409A CN 110287236 A CN110287236 A CN 110287236A
Authority
CN
China
Prior art keywords
word
sentence
attribute
location information
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910553409.9A
Other languages
Chinese (zh)
Other versions
CN110287236B (en
Inventor
邓悦
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910553409.9A priority Critical patent/CN110287236B/en
Publication of CN110287236A publication Critical patent/CN110287236A/en
Application granted granted Critical
Publication of CN110287236B publication Critical patent/CN110287236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is suitable for technical field of data processing, provides a kind of data digging method based on interview information, system and terminal device, and method includes: to obtain target corpus, and target corpus is arranged as M sentence;Convolutional neural networks CNN model is established according to sentence;Obtain the term vector matrix in CNN model;Using location information editor's term vector matrix, the term vector matrix with location information is obtained, and by the term vector matrix training CNN model with location information, so that the attribute word in CNN model output target corpus;According to attribute word, in target corpus, the interviewee with objective attribute target attribute is obtained.The rapid computations and accurate excavation to interview information may be implemented through the invention, screen interviewee corresponding with recruitment needs.

Description

A kind of data digging method based on interview information, system and terminal device
Technical field
The present invention relates to technical field of data processing more particularly to a kind of data digging method based on interview information, it is System and terminal device.
Background technique
It is particularly significant to the management and use of data in big data era.How effectively how data are collected from many aspects, Using existing data, how many income can be obtained in a large amount of data by determining.In mass data, can directly it obtain The major part for taking and using is text data, these data are related to all trades and professions of society.In face of the text of the scale of construction huge in this way Data, text classification are to handle the core means of text data, are had very in terms of the efficient management and use of text data Important meaning.For example, collecting the speech data of interviewee when being interviewed on a large scale, corpus is obtained, using corpus as base Plinth carries out text classification processing, can effectively extract key message, solve the problems, such as that interview information is mixed and disorderly, to facilitate HR retrieves the information of needs, screens interviewee.And the text categorization task for containing a large amount of corpus, usually use neural network Algorithm solves.
Currently, the neural network for being suitable for text classification is divided into CNN (Convolutional Neural Networks, volume Product neural network) and RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network) two classes, wherein CNN ratio RNN operation Faster, the computing resource needed is less, however, being trained by CNN to corpus, can not capture the position of vocabulary in corpus Information is easy to influence the data mining accuracy rate of interview information;And although RNN can capture location information, it needs big The computing resource of amount, and there are problems that gradient disappearance, text classification result can be made error occur, equally influence interview information Data mining accuracy rate.
Summary of the invention
It is a primary object of the present invention to propose that a kind of data digging method based on interview information, system and terminal are set It is standby, to solve in the prior art, by the Application of Neural Network of text classification in the data mining of interview information, obtained text The problem of this classification resultant error is big, influences the data mining accuracy rate of interview information.
To achieve the above object, first aspect of the embodiment of the present invention provides a kind of data mining side based on interview information Method, comprising:
Target corpus is obtained, the target corpus is arranged as M sentence, wherein M is positive integer, and target corpus is face Try information;
Convolutional neural networks CNN model is established according to the sentence;
Obtain the term vector matrix in the CNN model;
Using term vector matrix described in location information editor, the term vector matrix with location information is obtained, and passes through institute The term vector matrix training CNN model with location information is stated, so that the CNN model exports in the target corpus Attribute word, wherein the attribute word includes the word with category attribute and the word with position attribution;
According to the attribute word, in the target corpus, the interviewee with objective attribute target attribute is obtained.
In conjunction with first aspect present invention, in first embodiment of the invention, the acquisition target corpus is whole by target corpus Reason is M sentence, comprising:
Predetermined word joint number is set;
It according to the predetermined word joint number, is intercepted in the target corpus, obtains the identical institute's predicate of M byte number Sentence.
It is described that convolutional Neural is established according to the sentence in second embodiment of the invention in conjunction with first aspect present invention Network C NN model, comprising:
I-th sentence is divided into N number of original word, and the original word is set as K dimensional vector, wherein i is Positive integer less than or equal to M, K and N are positive integer;
Based on i-th sentence, the CNN model of N × K is established.
In conjunction with first aspect present invention first embodiment and second embodiment, in third embodiment of the invention, when When byte number in the sentence is less than the predetermined word joint number, with 0 polishing;
When the dimension of the original word is less than K, with 0 polishing;
When the number of the original word is less than N, with 0 polishing.
In conjunction with first aspect present invention, in four embodiment of the invention, it is described using location information editor institute's predicate to Moment matrix obtains the term vector matrix with location information, and passes through the term vector matrix training institute with location information CNN model is stated, so that the CNN model exports the attribute word in the target corpus, comprising:
According to original word i-th sentence location information, by the original word be encoded to vector be spliced to word to Layer is measured, the term vector matrix with location information is obtained;
By the feature in the term vector matrix described in CNN model extraction with location information, i-th sentence is exported In attribute word;
According to the attribute word of sentence described in M item, the attribute word in the target corpus is obtained.
It is described according to original word in fifth embodiment of the invention in conjunction with the 4th embodiment of first aspect present invention In the location information of i-th sentence, the original word is encoded to vector and is spliced to term vector layer, obtained described with position The term vector matrix of information, comprising:
It obtains in i-th sentence, the type of j-th of original word, wherein j is just whole less than or equal to N Number;
When the type of j-th of original word is verb, j-th of original word, jth -1 original are obtained The location information of beginning word and jth+1 original word in i-th sentence;
By j-th of original word, jth -1 original word and jth+1 original word in i-th institute Location information in predicate sentence is encoded to vector and is spliced to the term vector layer, obtains the term vector with location information Matrix.
Second aspect of the embodiment of the present invention provides a kind of data digging system based on interview information, comprising:
Corpus sorting module arranges the target corpus for M sentence, wherein M is positive for obtaining target corpus Integer, target corpus are interview information;
Model construction module, for establishing convolutional neural networks CNN model according to the sentence;
Term vector obtains module, for obtaining the term vector matrix in the CNN model;
Attribute word obtains module, for using term vector matrix described in location information editor, obtaining to have location information Term vector matrix, and by the term vector matrix training CNN model with location information, so that the CNN mould Type exports the attribute word in the target corpus, wherein the attribute word includes having the word of category attribute and having The word of position attribution;
Destination selection module, for according to the attribute word, in the target corpus, obtaining to have objective attribute target attribute Interviewee.
In conjunction with second aspect of the present invention, in first embodiment of the invention, corpus sorting module includes:
Byte number setting unit, for predetermined word joint number to be arranged;
Sentence interception unit obtains M item for being intercepted in the target corpus according to the predetermined word joint number The identical sentence of byte number.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In above-mentioned memory and the computer program that can be run on above-mentioned processor, when above-mentioned processor executes above-mentioned computer program The step of realizing method provided by first aspect as above.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, above-mentioned computer-readable storage Media storage has computer program, and above-mentioned computer program realizes method provided by first aspect as above when being executed by processor The step of.
The embodiment of the present invention proposes a kind of data digging method based on interview information, and target corpus is divided into a plurality of language Sentence, then establishes convolutional neural networks CNN model using the sentence in target corpus, then obtain the term vector matrix of CNN model, Location information is added in term vector matrix, the term vector matrix with location information is obtained, uses the word with location information Vector matrix carries out classifying text task in CNN model, so that the result of CNN model output is including position in target corpus The attribute word of attribute and category attribute is set, at this time according to attribute word to target corpus, i.e. interview information carries out data mining, Recruitment needs can be corresponded to, the interviewee with corresponding objective attribute target attribute is obtained, wherein use the term vector square with location information When battle array carries out classifying text task in CNN model, feature extraction of the CNN model to word can be influenced by location information, So as to while capturing location information, promote the accuracy rate of text classification.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram for the data digging method based on interview information that the embodiment of the present invention one provides;
Fig. 2 is the implementation process schematic diagram of the data digging method provided by Embodiment 2 of the present invention based on interview information;
Fig. 3 is the detailed implementation process schematic diagram of step S1042 in Fig. 2;
Fig. 4 is the composed structure schematic diagram for the data digging system based on interview information that the embodiment of the present invention four provides.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
Herein, using the suffix for indicating such as " module ", " component " or " unit " of element only for advantageous In explanation of the invention, there is no specific meanings for itself.Therefore, " module " can be used mixedly with " component ".
In subsequent description, inventive embodiments serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
Embodiment one
As shown in Figure 1, the embodiment of the invention provides a kind of data digging method based on interview information, may be implemented pair The rapid computations of interview information and accurate excavation, screen interviewee corresponding with recruitment needs.In embodiments of the present invention, it is based on The data digging method of interview information may include:
S101, target corpus is obtained, the target corpus is arranged as M sentence.
Wherein, M is positive integer, and target corpus is interview information.
In above-mentioned steps S101, target corpus is the basic unit for constituting corpus, and corpus is usually expressed as textual data According to form, therefore target corpus is also the form of text data.
In a particular application, interview information can be the speech record based on interviewee, and acquisition modes can be with are as follows: in face The interview process of examination person is recorded, and the target corpus of form of textual data is then obtained by recording file, in target corpus Speech record including at least one interviewee.
In one embodiment, a kind of implementation of above-mentioned steps S101 can be with are as follows:
Predetermined word joint number is set;
It according to the predetermined word joint number, is intercepted in the target corpus, obtains the identical institute's predicate of M byte number Sentence.
In above-mentioned implementation, by optimizing the method for sorting of corpus, keep the byte number of every sentence identical, it can be with Improve the efficiency and accuracy of text classification.
S102, convolutional neural networks CNN model is established according to the sentence.
In one embodiment, a kind of implementation of above-mentioned steps S102 can be with are as follows:
I-th sentence is divided into N number of original word, and the original word is set as K dimensional vector, wherein i is Positive integer less than or equal to M, K and N are positive integer;
Based on i-th sentence, the CNN model of N × K is established.
Wherein, original word derives from sentence, also the word to extract from target corpus.
In a particular application, original word is set to K dimensional vector, but in N number of original word, there are some original words Dimension is less than the case where the case where K or original word dimension are greater than K.And in CNN model, it can be by a hidden layer, it will The word of initial coding projects in a lower dimensional space, reduces the dimension of original word, therefore, the numerical value of K can be set as Greatest measure, to guarantee the uniformity of matrix, and the case where be not in that dimension is excessively high, influence arithmetic speed, wherein maximum number Value indicates to make the dimension of any original word to be less than K.
It similarly, is the uniformity for guaranteeing sentence, every sentence is divided into N number of original word, and the numerical value of N is set as maximum Numerical value, the original word marked off in i-th sentence are consistently less than N number of.
In conjunction with a kind of implementation of above-mentioned steps S101 and step S102, the embodiment of the present invention also proposes a kind of realization side Formula, dimension, the original word number and predetermined word joint number in sentence of unified original word.Implementation are as follows:
When the byte number in the sentence is less than the predetermined word joint number, with 0 polishing;
When the dimension of the original word is less than K, with 0 polishing;
When the number of the original word is less than N, with 0 polishing.
In a particular application, dimension, the word number in sentence, predetermined word joint number lacked with 0 polishing, to be aligned square Battle array reduces the computing resource of text classification convenient for establishing unified CNN model, improves computational efficiency.
Term vector matrix in S103, the acquisition CNN model.
CNN model in above-mentioned steps S102 neutralization procedure S103, can apply in image characteristics extraction, can also answer In text classification.And the embodiment of the present invention carries out text categorization task based on the sentence in target corpus, then establishes text The CNN model of classification, enables CNN model to handle each sentence, then the term vector that different sentences are exported by CNN model Matrix is different.
S104, using term vector matrix described in location information editor, obtain the term vector matrix with location information, and lead to The term vector matrix training CNN model with location information is crossed, so that the CNN model exports the target corpus In attribute word.
In above-mentioned steps S104, attribute word includes the word with category attribute and the word with position attribution; Location information is the positional relationship in sentence between each word, that is, the relationship between each original word hereinafter.
If the interview in target corpus including multiple interviewees records, the attribute word in target corpus comes from multiple faces Examination person.
S105, according to the attribute word, in the target corpus, obtain have objective attribute target attribute interviewee.
It, can be accurate since attribute word is the word with category attribute and position attribution in above-mentioned steps S105 Expression interviewee relevant information, reduce the mistake in semantic analysis, therefore sieved in target corpus according to attribute word When selecting interviewee corresponding with recruitment needs, the interviewee with objective attribute target attribute can be accurately found.
Target corpus is divided into a plurality of sentence by the data digging method provided in an embodiment of the present invention based on interview information, Then convolutional neural networks CNN model is established using the sentence in target corpus, then obtains the term vector matrix of CNN model, it will Location information be added term vector matrix in, obtain have location information term vector matrix, using the word with location information to Moment matrix carries out classifying text task in CNN model, so that the result of CNN model output is including position in target corpus The attribute word of attribute and category attribute, at this time according to attribute word to target corpus, i.e. interview information carries out data mining, can To correspond to recruitment needs, the interviewee with corresponding objective attribute target attribute is obtained, wherein use the term vector matrix with location information When carrying out classifying text task in CNN model, feature extraction of the CNN model to word can be influenced by location information, from And the accuracy rate of text classification can be promoted while capturing location information.
Embodiment two
As shown in Fig. 2, the embodiment of the present invention is illustrated the detailed implementation process of step S104 in embodiment one, it is above-mentioned A kind of implementation of step S104 are as follows:
S1041, according to original word i-th sentence location information, by the original word be encoded to vector splicing To term vector layer, the term vector matrix with location information is obtained.
In above-mentioned steps S1041, term vector matrix is a part of CNN model output, and i-th sentence is directly inputted It is trained in CNN model, then in the term vector matrix obtained, each term vector corresponds to an original word.
In embodiments of the present invention, it additionally provides and the location information of original word is encoded to vector, be spliced to term vector The process of layer:
Wherein, the location information of original word is encoded to vector, when being spliced to term vector layer, setting position information coding Weight is all 1, no bias term.
The process that original word is encoded to vector is schematically illustrated below:
On the basis of traditional textcnn, the location information of original word is encoded to the vector of 100 dimensions, is spliced to Simultaneously setting position information coding weight is all 1 to term vector layer:
PE (POS, 2i)=sin (pos/10000^ (2i/d presets dimension))
PE (POS, 2i+1)=cos (pos/10000^ (2i/d presets dimension))
Wherein, pos is position of the vocabulary in sentence, and i is i-th of dimension of position vector.
S1042, by described in CNN model extraction with location information term vector matrix in feature, export i-th institute Attribute word in predicate sentence.
In above-mentioned steps S1042, attribute word be by CNN model training after the completion of, have category attribute and position The word of attribute, while classification and the position of word are reflected, and the position attribution of word influences the category attribute of word.
In a particular application, if the term vector matrix that CNN model is directly constituted original word is trained, then instructing During white silk, location information of the original word in training matrix can be only obtained, original word cannot be directly obtained in sentence In location information, i.e. positional relationship in sentence between each word.
The attribute word of S1043, the sentence according to M item obtain the attribute word in the target corpus.
Above-mentioned steps S1043 is equivalent to M times and repeats step S1042, to obtain at most M × N number of category in target corpus Property word.
As shown in figure 3, the embodiment of the present invention also shows a kind of implementation of above-mentioned steps S1042, above-mentioned steps S1042 may include:
S10421, obtain i-th sentence in, the type of j-th of original word, wherein j be less than or equal to The positive integer of N.
In above-mentioned steps S10421, in Text Classification, the type of each original word can be straight in sentence It obtains, for example, noun shows as/n, verb shows as/v, and adjective shows as/adj, and preposition shows as/vj, wherein During CNN text classification, to reduce inessential text data, usual automatic fitration preposition adjective etc. is not needed point The text of class.
S10422, when the type of j-th of original word be verb when, obtain j-th original word, jth -1 The location information of a original word and jth+1 original word in i-th sentence.
In above-mentioned steps S10422, if the noun position before and after verb is unclear, and directly passes through CNN model training, To then mistake semantically be caused, for example, " the management in " in/school of management/study ", and " in/study/school of management " School " is noun, but meaning is not identical.
S10423, j-th of original word, jth -1 original word and jth+1 original word are existed Location information in i-th sentence is encoded to vector and is spliced to the term vector layer, has location information described in acquisition Term vector matrix.
S10421 to step S10423 through the above steps is executing the word for having location information by CNN model extraction Before feature in vector matrix, text data has been screened, has improved data mining efficiency.
The embodiment of the present invention illustrates provided by embodiment one and the embodiment of the present invention also for interviewing scene based on face Try the data digging method of information, effect in practical applications.
Where it is assumed that sorting out " I learns in school of management " this sentence from the corpus of A interviewee.
Firstly, sentence " I learns in school of management " is divided into 4 original words: " I " " " by step S1041 " school of management " " study ".
Then, it executes specific implementation provided by step S1042 and the embodiment of the present invention: obtaining the class of each word Type, wherein " study " is verb, and+1 original word of jth is not present, then directly acquires the location information a of " school of management ", The location information b of " study ", it is known that a=3, b=4 add position to original word in term vector matrix according to location information It include the original word with location information in term vector matrix after setting coding, it is assumed that its form of expression is " school of management3", " study4”。
Finally, carrying out step S1043, by the feature in CNN model extraction sentence " I learns in school of management ", have Original word " the school of management of location information3", after being converted to attribute word, position attribution will affect categorical data, into When row feature extraction, " school of management3" not as the feature of " study ", and by " school of management3" feature as " school ", most " school of management is showed themselves in that in whole classification results3" it is classified as educational background, without being classified as vocational skills.
Embodiment three
As shown in figure 4, the embodiment of the invention provides a kind of data digging system 40 based on interview information, using having The term vector matrix of location information carries out classifying text task in CNN model, to influence CNN model pair by location information The feature extraction of word is realized while capturing location information, promotes the accuracy rate of text classification, data digging system 40 Include:
Corpus sorting module 41 arranges target corpus for M sentence, wherein M is positive whole for obtaining target corpus Number, target corpus are interview information;
Model construction module 42, for establishing convolutional neural networks CNN model according to sentence;
Term vector obtains module 43, for obtaining the term vector matrix in CNN model;
Attribute word obtains module 44, and for using location information editor term vector matrix, obtaining has location information Term vector matrix, and by the term vector matrix training CNN model with location information, so that CNN model exports target corpus In attribute word, wherein attribute word includes the word with category attribute and the word with position attribution;
Destination selection module 45, in target corpus, obtaining the interview with objective attribute target attribute according to attribute word Person.
In one embodiment, corpus sorting module 41 may include:
Byte number setting unit, for predetermined word joint number to be arranged;
Sentence interception unit obtains M byte number phase for being intercepted in target corpus according to predetermined word joint number Same sentence.
In one embodiment, model construction module 42 may include:
Original word for i-th sentence to be divided into N number of original word, and is set as K dimension by original word division unit Vector, wherein i is the positive integer less than or equal to M, and K and N are positive integer;
CNN model construction unit establishes the CNN model of N × K for being based on i-th sentence.
The embodiment of the present invention also provide a kind of terminal device include memory, processor and storage on a memory and can be The computer program run on processor when the processor executes the computer program, is realized as described in embodiment one The data digging method based on interview information in each step.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium is computer readable storage medium, thereon It is stored with computer program, when the computer program is executed by processor, is realized as described in embodiment one based on interview Each step in the data digging method of information.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although previous embodiment Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all include Within protection scope of the present invention.

Claims (10)

1. a kind of data digging method based on interview information characterized by comprising
Target corpus is obtained, the target corpus is arranged as M sentence, wherein M is positive integer, and target corpus is interview letter Breath;
Convolutional neural networks CNN model is established according to the sentence;
Obtain the term vector matrix in the CNN model;
Using term vector matrix described in location information editor, the term vector matrix with location information is obtained, and passes through the tool There is the term vector matrix training CNN model of location information, so that the CNN model exports the attribute in the target corpus Word, wherein the attribute word includes the word with category attribute and the word with position attribution;
According to the attribute word, in the target corpus, the interviewee with objective attribute target attribute is obtained.
2. as described in claim 1 based on the data digging method of interview information, which is characterized in that the acquisition target language Material, target corpus is arranged as M sentence, comprising:
Predetermined word joint number is set;
It according to the predetermined word joint number, is intercepted in the target corpus, obtains the identical sentence of M byte number.
3. as described in claim 1 based on the data digging method of interview information, which is characterized in that described according to the sentence Establish convolutional neural networks CNN model, comprising:
I-th sentence is divided into N number of original word, and the original word is set as K dimensional vector, wherein i be less than Or the positive integer equal to M, K and N are positive integer;
Based on i-th sentence, the CNN model of N × K is established.
4. based on the data digging method of interview information as described in any one of Claims 2 or 3, which is characterized in that when institute's predicate When byte number in sentence is less than the predetermined word joint number, with 0 polishing;
When the dimension of the original word is less than K, with 0 polishing;
When the number of the original word is less than N, with 0 polishing.
5. as described in claim 1 based on the data digging method of interview information, which is characterized in that described to use location information Edit the term vector matrix, obtain the term vector matrix with location information, and by the word with location information to The moment matrix training CNN model, so that the CNN model exports the attribute word in the target corpus, comprising:
According to original word in the location information of i-th sentence, the original word is encoded to vector and is spliced to term vector layer, Obtain the term vector matrix with location information;
By the feature in the term vector matrix described in CNN model extraction with location information, export in i-th sentence Attribute word;
According to the attribute word of sentence described in M item, the attribute word in the target corpus is obtained.
6. as claimed in claim 5 based on the data digging method of interview information, which is characterized in that described according to original word In the location information of i-th sentence, the original word is encoded to vector and is spliced to term vector layer, obtained described with position The term vector matrix of information, comprising:
It obtains in i-th sentence, the type of j-th of original word, wherein j is the positive integer less than or equal to N;
When the type of j-th of original word is verb, j-th of original word, jth -1 original list are obtained The location information of word and jth+1 original word in i-th sentence;
By j-th of original word, jth -1 original word and jth+1 original word in i-th institute's predicate Location information in sentence is encoded to vector and is spliced to the term vector layer, obtains the term vector matrix with location information.
7. a kind of data digging system based on interview information characterized by comprising
Corpus sorting module arranges the target corpus for M sentence for obtaining target corpus, wherein and M is positive integer, Target corpus is interview information;
Model construction module, for establishing convolutional neural networks CNN model according to the sentence;
Term vector obtains module, for obtaining the term vector matrix in the CNN model;
Attribute word obtains module, for obtaining the word with location information using term vector matrix described in location information editor Vector matrix, and by the term vector matrix training CNN model with location information, so that the CNN model is defeated Attribute word in the target corpus out, wherein the attribute word includes having the word of category attribute and with position The word of attribute;
Destination selection module, in the target corpus, obtaining the interview with objective attribute target attribute according to the attribute word Person.
8. as claimed in claim 6 based on the data digging system of interview information, which is characterized in that corpus sorting module packet It includes:
Byte number setting unit, for predetermined word joint number to be arranged;
Sentence interception unit obtains M byte for being intercepted in the target corpus according to the predetermined word joint number The identical sentence of number.
9. a kind of terminal device, which is characterized in that on a memory and can be on a processor including memory, processor and storage The computer program of operation, which is characterized in that when the processor executes the computer program, realize such as claim 1 to 6 Each step in described in any item data digging methods based on interview information.
10. a kind of storage medium, the storage medium is computer readable storage medium, is stored thereon with computer program, It is characterized in that, when the computer program is executed by processor, realizes as claimed in any one of claims 1 to 6 based on interview Each step in the data digging method of information.
CN201910553409.9A 2019-06-25 2019-06-25 Data mining method, system and terminal equipment based on interview information Active CN110287236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910553409.9A CN110287236B (en) 2019-06-25 2019-06-25 Data mining method, system and terminal equipment based on interview information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910553409.9A CN110287236B (en) 2019-06-25 2019-06-25 Data mining method, system and terminal equipment based on interview information

Publications (2)

Publication Number Publication Date
CN110287236A true CN110287236A (en) 2019-09-27
CN110287236B CN110287236B (en) 2024-03-19

Family

ID=68005621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910553409.9A Active CN110287236B (en) 2019-06-25 2019-06-25 Data mining method, system and terminal equipment based on interview information

Country Status (1)

Country Link
CN (1) CN110287236B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125659A1 (en) * 2014-10-31 2016-05-05 IntegrityWare, Inc. Methods and systems for multilevel editing of subdivided polygonal data
CN107239444A (en) * 2017-05-26 2017-10-10 华中科技大学 A kind of term vector training method and system for merging part of speech and positional information
CN108399230A (en) * 2018-02-13 2018-08-14 上海大学 A kind of Chinese financial and economic news file classification method based on convolutional neural networks
CN109189925A (en) * 2018-08-16 2019-01-11 华南师范大学 Term vector model based on mutual information and based on the file classification method of CNN
CN109543084A (en) * 2018-11-09 2019-03-29 西安交通大学 A method of establishing the detection model of the hidden sensitive text of network-oriented social media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125659A1 (en) * 2014-10-31 2016-05-05 IntegrityWare, Inc. Methods and systems for multilevel editing of subdivided polygonal data
CN107239444A (en) * 2017-05-26 2017-10-10 华中科技大学 A kind of term vector training method and system for merging part of speech and positional information
CN108399230A (en) * 2018-02-13 2018-08-14 上海大学 A kind of Chinese financial and economic news file classification method based on convolutional neural networks
CN109189925A (en) * 2018-08-16 2019-01-11 华南师范大学 Term vector model based on mutual information and based on the file classification method of CNN
CN109543084A (en) * 2018-11-09 2019-03-29 西安交通大学 A method of establishing the detection model of the hidden sensitive text of network-oriented social media

Also Published As

Publication number Publication date
CN110287236B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN112241481B (en) Cross-modal news event classification method and system based on graph neural network
Rosé et al. Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning
CN109816032B (en) Unbiased mapping zero sample classification method and device based on generative countermeasure network
Bekkerman et al. High-precision phrase-based document classification on a modern scale
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN113886567A (en) Teaching method and system based on knowledge graph
CN114443899A (en) Video classification method, device, equipment and medium
KR20200145299A (en) Intelligent recruitment support platform based on online interview video analysis and social media information analysis
CN110674297A (en) Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment
CN113627194B (en) Information extraction method and device, and communication message classification method and device
Omurca et al. A document image classification system fusing deep and machine learning models
Si et al. Federated non-negative matrix factorization for short texts topic modeling with mutual information
Engin et al. Multimodal deep neural networks for banking document classification
CN114780723A (en) Portrait generation method, system and medium based on guide network text classification
Mahmud et al. Deep learning based sentiment analysis from Bangla text using glove word embedding along with convolutional neural network
CN113011126A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN111859955A (en) Public opinion data analysis model based on deep learning
CN115358477B (en) Fight design random generation system and application thereof
Chaudhuri et al. Automating assessment of design exams: a case study of novelty evaluation
Shanmukhaa et al. Construction of knowledge graphs for video lectures
US20230138491A1 (en) Continuous learning for document processing and analysis
CN110287236A (en) A kind of data digging method based on interview information, system and terminal device
CN115130453A (en) Interactive information generation method and device
Maharaj Generalizing in the Real World with Representation Learning
Wang et al. A cnn-based feature extraction scheme for patent analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant