CN109272262B - Method for analyzing natural language features - Google Patents

Method for analyzing natural language features Download PDF

Info

Publication number
CN109272262B
CN109272262B CN201811422169.0A CN201811422169A CN109272262B CN 109272262 B CN109272262 B CN 109272262B CN 201811422169 A CN201811422169 A CN 201811422169A CN 109272262 B CN109272262 B CN 109272262B
Authority
CN
China
Prior art keywords
sentence
determining
word
information
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811422169.0A
Other languages
Chinese (zh)
Other versions
CN109272262A (en
Inventor
蒋万强
龙诗娥
侯健
成鸿丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Noobie Internet Technology Co ltd
Original Assignee
Guangzhou Noobie Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Noobie Internet Technology Co ltd filed Critical Guangzhou Noobie Internet Technology Co ltd
Priority to CN201811422169.0A priority Critical patent/CN109272262B/en
Publication of CN109272262A publication Critical patent/CN109272262A/en
Application granted granted Critical
Publication of CN109272262B publication Critical patent/CN109272262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for analyzing natural language features, and relates to the field of language analysis. The method comprises the steps of obtaining natural language information of a tested person, and processing the natural language information to obtain processed language information; and determining the number of sentences included in the language information, and determining latitude information corresponding to each sentence by a natural language feature analysis method.

Description

Method for analyzing natural language features
Technical Field
The invention relates to the field of language analysis, in particular to a method for analyzing natural language features.
Background
Natural Language is a way for human beings to communicate with each other, and Natural Language Processing (NLP) is broadly defined as automatic analysis Processing and operation of Natural Language such as voice and text by software. The most common natural language processing applications include text reading, speech synthesis, speech recognition, automatic Chinese word segmentation, part-of-speech tagging, syntactic analysis, natural language generation, text classification, information retrieval, information extraction, word collation, question-answering systems, machine translation, automatic summarization, word implications, and the like.
The nature of talent evaluation analysis is a structured theory and data model, the basis of natural language processing is data and algorithm, the theoretical model of talent evaluation is endowed with natural language processing analysis, and talent evaluation analysis is completed under the drive of big data.
The traditional talent evaluation analysis generally uses a written exam, an expert interview, a scale, situation simulation, system simulation and the like to complete the evaluation analysis on people. Wherein, the written examination is to draw up examination questions according to the working property, condition requirement and necessary theoretical knowledge of job duty of the candidate object to be worked on, and the tested person is allowed to perform written examination; the expert interviewing comprises interviewing, answering and the like, and the chief examiner directly faces the examinee and obtains the evaluated examination method in a language expression or actual operation mode; the appraisal performance is that the ability of talents is reflected through the actual performance, and meanwhile, the appraised object writes a report, and a democratic review is performed by the subordinate and peer with great work relevance with the appraised object; the scale is to decompose the quality of a person into a plurality of elements to form a standard evaluation system, and then ask for superior leaders, peer employees and the person to score according to the standard, and form the evaluation of the person through summary analysis; the situation simulation is to place the testee in a simulated working situation, and observe and evaluate the psychology and ability of the testee in the simulated working situation by adopting various evaluation technologies; the system simulation is that the testee is put in a dynamic model which is formed by a computer and is close to an actual system, so that the testee plays a certain role, the human-computer conversation mode is adopted for working, and the computer predicts various potentials of the testee according to the working behavior and the actual results of the testee in the specified total time; the video off-line analysis is to record the working video of the tested person in the working scene, and the expert carries out post-artificial labeling analysis on the recorded video to form evaluation on one person.
The above-mentioned many kinds of examination are carried on the premise that the person who is tested and the person who is tested know the purpose of the test, there is the question that person who is tested and person who is tested depend on the test method excessively.
Disclosure of Invention
The embodiment of the invention provides a natural language feature analysis method, which is used for solving the problem that testers and testees depend on a test method excessively in the prior art.
The embodiment of the invention provides a method for analyzing natural language features, which comprises the following steps: acquiring natural language information of a tested person, and processing the natural language information to obtain processed language information;
and determining the number of sentences included in the language information, and determining latitude information corresponding to each sentence by a natural language feature analysis method.
Preferably, the latitude information includes an intelligibility degree;
the determining the intelligibility degree of each sentence through a natural language feature method comprises the following steps:
performing word segmentation or word segmentation on a first sentence included in the language information;
determining a shannon information amount Si of each valid participle or word included in the first sentence by the formula Si ═ logPi;
by the formula
Figure BDA0001880079690000021
Determining an average shannon information content of the effective participles of the first sentence;
wherein log is a logarithmic function based on 2, 10, or a natural number e, Pi is the probability of occurrence of each valid participle or word, N is the total number of valid participles or words of the first sentence,
Figure BDA0001880079690000022
is the Shannon information of all effective participles or characters included in the first sentenceThe quantity, S, represents the intelligibility of the first sentence.
Preferably, the latitude information includes a concentration degree;
the determining the concentration degree corresponding to each sentence through a natural language feature method comprises the following steps:
determining a first sentence included in the language information as s _ t, and determining a plurality of sentences preceding the first sentence as s _ 0.
Using the s _0, the., s _ { t-1} as an input of a neural network model of CNN, RNN and Transformer, and using an output of the neural network model as a vector representation of the whole of the s _0, the., s _ { t-1 };
taking the s _ t as the input of a neural network model of CNN, RNN and Transformer, and taking the output of the neural network model as the vector representation of the s _ t;
determining C as the vector representation of the s _0, the., s _ { T-1} integer, determining T as the vector representation of s _ T, and calculating the concentration degree of C and T by inputting C and T into CNN, RNN and Transformer or determining the cosine of the included angle between C and T as the concentration degree of the first sentence.
Preferably, the latitude information includes a concentration degree;
the determining the concentration degree corresponding to each sentence through a natural language feature method comprises the following steps:
determining a first sentence included in the language information as s _ t, and determining a plurality of sentences preceding the first sentence as s _ 0.
By establishing a corpus, marking the degree of correlation of two sections of characters, and training a neural network model (CNN, RNN or Transformer) by using the corpus;
and inputting the s _ t and s _0, the s _ { t-1} as two sections of characters into a trained neural network model, and obtaining the correlation degree of the two sections of characters as the concentration degree of the first sentence.
Preferably, the latitude information includes a concentration degree;
the determining the concentration degree corresponding to each sentence through a natural language feature method comprises the following steps:
determining a first sentence included in the language information as s _ t, and determining a plurality of sentences preceding the first sentence as s _ 0.
Confirming the average value of each word vector included in the first sentence as the vector representation of the first sentence;
determining an average value of the s _ 0., _{ t-1} sentence vector as a vector representation of the s _ 0., _{ t-1} whole;
representing the vector of the s _ t as the input of a neural network model of CNN, RNN and Transformer, and representing the output of the neural network model as the vector of the s _ t;
determining C as the vector representation of the s _0, the., _{ T-1} entirety, determining T as the vector representation of s _ T, and determining the cosine of the included angle between C and T as the concentration degree of the first sentence.
Preferably, the latitude information includes a semantic richness degree:
the determining the semantic richness degree corresponding to each sentence through a natural language feature method comprises the following steps:
segmenting a first sentence included in the language information, confirming that n effective segments are included in the first sentence, and determining the ith effective segment as w _ i;
obtaining the word vector of the w _ i through a common word vector model, and determining the word vector of the w _ i as e _ i;
determining an average word vector of word vectors of n valid participles included in a first sentence, and determining the average word vector as mu, wherein mu is (e _0+ ·+ e _ n)/n;
determining Euclid distances between word vectors of all words included in the first sentence and the average word vector mu, and determining the standard deviation of the Euclid distances as the semantic richness of the first sentence.
Preferably, the latitude information includes a semantic richness degree:
the determining the semantic richness degree corresponding to each sentence through a natural language feature method comprises the following steps:
segmenting a first sentence included in the language information, confirming that n effective segments are included in the first sentence, and determining the ith effective segment as w _ i;
obtaining the word vector of the w _ i through a common word vector model, and determining the word vector of the w _ i as e _ i;
determining an average word vector of word vectors of n valid participles included in a first sentence, and determining the average word vector as mu, wherein mu is (e _0+ ·+ e _ n)/n;
determining cosine similarity between word vectors of all words included in the first sentence and the average word vector mu, and determining standard deviation of the cosine similarity as semantic richness of the first sentence.
Preferably, the latitude information includes a degree of difficulty of change;
the determining the difficulty variation degree corresponding to each sentence through a natural language feature method comprises the following steps:
the first sentence included in the language information is divided into n words or characters,
determining the shannon information quantity Si of each effective participle or word through a formula Si-logPi;
determining the difficulty and the ease change degree of the first sentence through a formula sqrt (var);
where log is a logarithmic function based on 2, 10, or a natural number e, Pi is the probability of occurrence of each valid participle or word, var mean ((S _ i-mu) ^2, i), mu mean (S _ i, i), mean (x _ i, i) means averaging x _ 0.
Preferably, the language information includes voice information and text information.
The embodiment of the invention provides a method for analyzing natural language features, which comprises the following steps: acquiring natural language information of a tested person, and processing the natural language information to obtain processed language information; and determining the number of sentences included in the language information, and determining latitude information corresponding to each sentence by a natural language feature analysis method. The method obtains the natural language information of the tested person in a natural working state without changing the daily working behavior habit of the tested person, and carries out evaluation analysis on the obtained natural language information of the tested person by a natural language characteristic method, and has the following advantages: data acquisition is more natural, and test and analysis result distortion caused by subjective factors of a tested person or test environment pressure is reduced; the data analysis is more objective, and the test and analysis result distortion caused by subjective factors of experts or teachers to be evaluated by other traditional methods is reduced; the data analysis is more real-time, and the voice data of the tested person is instantly collected, processed and displayed in real time through the natural language real-time analysis device, and the tested person can be obtained. The low efficiency method that a large amount of manpower, material resources and time are consumed in the traditional test method is changed; the data analysis is more scientific, the modeling of the data analysis is based on the analysis of big data, the data analysis is continuously optimized by an artificial intelligence method, and the data analysis result is more and more accurate along with the increase of the number of testers. The method can efficiently finish evaluation analysis on the tested person in real time, improve the efficiency of traditional talent evaluation, reduce the degree of dependence on an unobtrusive expert system, and reduce the subjectivity of experts and the tested person in the traditional talent evaluation method and the reduction of the reliability and the effectiveness of evaluation results caused by factors such as the psychological pressure of the evaluation environment and the like. The problem of exist the person of testing and the person of being tested excessively rely on to test system among the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for analyzing natural language features according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 exemplarily shows a flow chart of an analysis method for natural language features provided by an embodiment of the present invention, as shown in fig. 1, the method mainly includes the following steps:
step 101, acquiring natural language information of a tested person, and processing the natural language information to obtain processed language information;
step 102, determining the number of sentences included in the language information, and determining latitude information corresponding to each sentence through a natural language feature analysis method.
In step 101, the voice information of the tested person may be acquired through a specific voice recording device, and furthermore, the text information of the tested person may be acquired through a text recording device. It should be noted that, in the embodiment of the present invention, a specific method for acquiring the voice information and the text information of the tested person is not limited.
Further, the acquired natural language information is processed, that is, the text information included in the natural language information is classified into one category, and the other category of voice information included in the natural language information is classified into another category. In the embodiment of the present invention, the specific method for classifying the text information and the voice information is not limited.
In step 102, the number of sentences corresponding to each test rule in the text information or the voice information is determined, and then the latitude information corresponding to each sentence is sequentially analyzed by a natural language feature analysis method.
In order to clearly describe the analysis method of the natural language features provided by the embodiment of the present invention, the following takes the analysis of the first sentence as an example to describe the analysis method in detail. It should be noted that the first sentence herein does not represent the first sentence in the information expressed by the subject, and the first sentence herein may represent any sentence in the information expressed by the subject.
In the embodiment of the invention, the latitude information corresponding to the first sentence mainly comprises comprehensibility degree, concentration degree, semantic abundance degree and difficulty and easiness change degree.
Specifically, the method for determining the intelligibility degree of the first sentence through the natural language feature method comprises the following steps:
step 201, performing word segmentation or word segmentation on a first sentence included in the language information;
step 202, determining shannon information quantity Si of each effective participle or word included in the first sentence through a formula Si ═ -logPi;
step 203, passing the formula
Figure BDA0001880079690000071
Determining an average shannon information amount of effective participles of a first sentence;
in the above steps 202 and 203, log is a logarithmic function based on 2, 10, or a natural number e, Pi is the probability of each valid participle or word occurring, N is the total number of valid participles or words of the first sentence,
Figure BDA0001880079690000072
is the shannon information content of all valid participles or words comprised by the first sentence, S represents the intelligibility of the first sentence.
For example, the intelligibility is calculated for each sentence included in the language information, and the specific steps are as follows:
1.1 performing word segmentation or word segmentation on the first sentence.
1.2 calculate the shannon information quantity Si of each valid participle or word (called Wi) included in the first sentence, with the formula Si ═ -logPi, where log is a logarithmic function that can be based on 2, 10, or a natural number e, and Pi is the probability of occurrence of the word or word, i.e. the word frequency.
1.3 calculate the first sentenceAverage shannon information quantity S of effective participles of son, calculation formula
Figure BDA0001880079690000081
and/N. N is the number of valid word segments or the number of words of the sentence,
Figure BDA0001880079690000082
and calculating the shannon information quantity of all effective participles or characters in the sentence, wherein/represents the division.
1.4S indicates the intelligibility of the sentence.
Specifically, the concentration degree corresponding to the first sentence is determined by a natural language feature method, which mainly includes the following methods:
the first method comprises the following steps:
step 301-1, determining a first sentence included in the language information as s _ t, and determining a plurality of sentences before the first sentence as s _ 0., s _ { t-1 };
step 301-2, using s _ 0., _{ t-1} as input of neural network model of CNN, RNN and transform, and using output of neural network model as vector representation of s _ 0., _{ t-1} whole;
step 301-3, taking s _ t as input of neural network models of CNN, RNN and Transformer, and taking output of the neural network models as vector representation of s _ t;
step 301-4, determine C as s _ 0., s _ { T-1} overall vector representation, and determine T as s _ T vector representation, input C and T CNN, RNN, Transformer calculate the concentration degree of C and T or determine the cosine of the angle between C and T as the concentration degree of the first sentence.
The second method comprises the following steps:
step 302-1, determining a first sentence included in the language information as s _ t, and determining a plurality of sentences before the first sentence as s _ 0., s _ { t-1 };
step 302-2, marking the degree of correlation of the two sections of characters by establishing a corpus, and training a neural network model of CNN, RNN or Transformer by using the corpus;
step 302-3, inputting the trained neural network model with s _ t and s _ 0., s _ { t-1} as two sections of characters, and obtaining the degree of correlation of the two sections of characters as the concentration degree of the first sentence.
The third method comprises the following steps:
step 303-1, determining a first sentence included in the language information as s _ t, and determining a plurality of sentences before the first sentence as s _ 0., s _ { t-1 };
step 303-2, determining the average value of each word vector included in the first sentence as the vector representation of the first sentence;
step 303-3, determining an average value of vectors of the s _ 0., _{ t-1} sentences as a vector representation of the s _ 0., s _ { t-1} entirety;
step 303-4, representing the vector of s _ t as the input of the neural network model of CNN, RNN and Transformer, and representing the output of the neural network model as the vector of s _ t;
step 303-5, determine C as a vector representation of s _ 0., s _ { T-1} in its entirety, determine T as a vector representation of s _ T, and determine the cosine of the angle between C and T as the concentration degree of the first sentence.
For example, given the first sentence as s _ t and the sentence or sentences preceding the first sentence as s _ 0.
2.1 for each iin {0.. t }, the vector representation of the first sentence s _ i is computed as follows:
2.1.1 performing word segmentation on the s _ i, and removing stop words according to actual needs;
2.1.2 using the segmented sequence as input of a neural network model including but not limited to CNN, RNN and Transformer, and the output is used as vector representation of s _ i; the word vector can also be simply averaged as the vector representation of the sentence;
2.2 obtain a vector representation of the entirety of s _ 0., s _ { t-1}, which can be used in two ways:
2.2.1 represent vectors for s _ 0., _{ t-1} individually as inputs to neural network models including, but not limited to, CNN, RNN, Transformer, with their outputs as vector representations for s _ 0., s _ { t-1} in their entirety;
2.2.2 can also simply average the vectors of s _ 0., _{ t-1}, as a vector representation of its entirety;
2.3 if 2.2.1 is adopted, the vector representation of s _ t is also subjected to the same network transformation in 2.2.1, and the output is used as a new vector representation of s _ t; if 2.2.2 is adopted, no operation is needed;
2.4, calculating similarity, wherein C is expressed by a vector of s _ 0., _{ T-1}, and T is expressed by a vector of s _ T, then the cosine of an included angle between C and T is defined as the similarity, and the formula is as follows: < C, T >/(norm (C) norm (T)), where < x, y > represents the inner product and norm is the norm of L2.
Specifically, determining the semantic richness corresponding to the first sentence by using a natural language feature method mainly includes the following two methods:
the first method comprises the following steps:
step 401-1, performing word segmentation on a first sentence included in the language information, determining that n effective word segments are included in the first sentence, and determining the ith effective word segment as w _ i;
step 401-2, obtaining a word vector of w _ i through a common word vector model, and determining the word vector of w _ i as e _ i;
step 401-3, determining an average word vector of word vectors of n valid participles included in the first sentence, and determining the average word vector as mu, where mu is (e _0+ ·+ e _ n)/n;
and step 401-4, determining Euclid distances between word vectors of all words included in the first sentence and the average word vector mu, and determining the standard deviation of the Euclid distances as the semantic richness of the first sentence.
The second method comprises the following steps:
step 402-1, performing word segmentation on a first sentence included in the language information, confirming that n effective word segments are included in the first sentence, and determining the ith effective word segment as w _ i;
step 402-2, obtaining a word vector of w _ i through a common word vector model, and determining the word vector of w _ i as e _ i;
step 402-3, determining an average word vector of word vectors of n effective participles included in the first sentence, and determining the average word vector as mu, wherein mu is (e _0+ ·+ e _ n)/n;
step 402-4, determining cosine similarity between word vectors of all words included in the first sentence and the average word vector mu, and determining standard deviation of the cosine similarity as semantic richness of the first sentence.
For example, the semantic richness is calculated for the first sentence, and the specific steps are as follows:
3.1, segmenting the first sentence, removing stop words, remaining effective segmentation, recording the ith word as w _ i, and setting n effective segmentation;
3.2 obtaining a word vector of w _ i by using a common word vector model, and marking as e _ i;
3.2 calculate the average word vector of all word vectors of the first sentence valid participles, denoted as mu, (e _0+ ·+ e _ n)/n;
3.3 calculating the distance between the word vector of each word in the first sentence and mu, wherein the distance can be Euclid distance or cosine similarity, and the calculation result corresponding to w _ i is recorded as d _ i;
3.4 the standard deviation of this distance is defined as the semantic richness of the first sentence, and the formula is: sqrt (var), where var ═ mean ((d _ i-md) ^2, i), md ═ mean (d _ i, i), mean (x _ i, i) denotes averaging x _ 0.
Specifically, the method for determining the difficulty variation degree corresponding to the first sentence through the natural language feature method mainly comprises the following steps:
step 501, performing word segmentation or word segmentation on a first sentence included in the language information, wherein n word segments or words are provided in total,
step 502, determining the shannon information quantity Si of each effective participle or word by a formula Si ═ logPi;
step 503, determining the difficulty variation degree of the first sentence through a formula sqrt (var);
in steps 502 and 503, log is a logarithmic function based on 2, 10, or a natural number e, Pi is the probability of occurrence of each valid participle or word, var mean ((S _ i-mu) ^2, i), mu mean (S _ i, i), mean (x _ i, i) means averaging x _ 0.
For example, the difficulty variation degree of the first sentence is calculated by the following specific steps:
4.1, dividing words or characters for the first sentence, wherein n are provided;
4.2 calculating the shannon information content Si of each valid participle or word included in the first sentence, which has the formula Si ═ logPi, where log is a logarithmic function that can be based on 2, 10, or a natural number e, and Pi is the probability of occurrence of the word or word, i.e. the word frequency;
4.3S _ 0., the standard deviation of S _ n is defined as the degree of difficulty and ease of the first sentence, and the formula is: sqrt (var), where var ═ mean ((S _ i-mu) ^2, i), mu ═ mean (S _ i, i), mean (x _ i, i) denotes averaging x _ 0.
In summary, an embodiment of the present invention provides a method for analyzing natural language features, including: acquiring natural language information of a tested person, and processing the natural language information to obtain processed language information; and determining the number of sentences included in the language information, and determining latitude information corresponding to each sentence by a natural language feature analysis method. The method obtains the natural language information of the tested person in a natural working state without changing the daily working behavior habit of the tested person, and carries out evaluation analysis on the obtained natural language information of the tested person by a natural language characteristic method, and has the following advantages: data acquisition is more natural, and test and analysis result distortion caused by subjective factors of a tested person or test environment pressure is reduced; the data analysis is more objective, and the test and analysis result distortion caused by subjective factors of experts or teachers to be evaluated by other traditional methods is reduced; the data analysis is more real-time, and the voice data of the tested person is instantly collected, processed and displayed in real time through the natural language real-time analysis device, and the tested person can be obtained. The low efficiency method that a large amount of manpower, material resources and time are consumed in the traditional test method is changed; the data analysis is more scientific, the modeling of the data analysis is based on the analysis of big data, the data analysis is continuously optimized by an artificial intelligence method, and the data analysis result is more and more accurate along with the increase of the number of testers. The method can efficiently finish evaluation analysis on the tested person in real time, improve the efficiency of traditional talent evaluation, reduce the degree of dependence on an unobtrusive expert system, and reduce the subjectivity of experts and the tested person in the traditional talent evaluation method and the reduction of the reliability and the effectiveness of evaluation results caused by factors such as the psychological pressure of the evaluation environment and the like. The problem of exist the person of testing and the person of being tested excessively rely on to test system among the prior art is solved.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method for analyzing natural language features, comprising:
acquiring natural language information of a tested person, and processing the natural language information to obtain processed language information;
determining the number of sentences included in the language information, and determining the corresponding dimension information of each sentence through a natural language feature analysis method;
wherein the dimension information includes a concentration level;
the determining the concentration degree corresponding to each sentence through a natural language feature method comprises the following steps:
determining a first sentence included in the language information as s _ t, and determining a plurality of sentences preceding the first sentence as s _ 0.
Using the s _0, the., s _ { t-1} as an input of a neural network model of CNN, RNN and Transformer, and using an output of the neural network model as a vector representation of the whole of the s _0, the., s _ { t-1 };
taking the s _ t as the input of a neural network model of CNN, RNN and Transformer, and taking the output of the neural network model as the vector representation of the s _ t;
determining C as the vector representation of the s _0, the., s _ { T-1} integer, determining T as the vector representation of s _ T, and calculating the concentration degree of C and T by inputting C and T into CNN, RNN and Transformer or determining the cosine of the included angle between C and T as the concentration degree of the first sentence.
2. The method of claim 1, wherein the dimensional information includes a degree of intelligibility;
the determining the intelligibility degree of each sentence through a natural language feature method comprises the following steps:
performing word segmentation or word segmentation on a first sentence included in the language information;
determining a shannon information amount Si of each valid participle or word included in the first sentence by the formula Si ═ logPi;
by the formula
Figure FDA0003380156000000011
Determining an average shannon information content of the effective participles of the first sentence;
wherein log is a logarithmic function based on 2, 10, or a natural number e, Pi is the probability of occurrence of each valid participle or word, N is the total number of valid participles or words of the first sentence,
Figure FDA0003380156000000021
is the shannon information content of all valid participles or characters comprised by said first sentence, S represents the intelligibility of said first sentence.
3. The method of claim 1, in which the dimensional information comprises a concentration level;
the determining the concentration degree corresponding to each sentence through a natural language feature method comprises the following steps:
determining a first sentence included in the language information as s _ t, and determining a plurality of sentences preceding the first sentence as s _ 0.
By establishing a corpus, marking the degree of correlation of two sections of characters, and training a neural network model (CNN, RNN or Transformer) by using the corpus;
and inputting the s _ t and s _0, the s _ { t-1} as two sections of characters into a trained neural network model, and obtaining the correlation degree of the two sections of characters as the concentration degree of the first sentence.
4. The method of claim 1, in which the dimensional information comprises a concentration level;
the determining the concentration degree corresponding to each sentence through a natural language feature method comprises the following steps:
determining a first sentence included in the language information as s _ t, and determining a plurality of sentences preceding the first sentence as s _ 0.
Confirming the average value of each word vector included in the first sentence as the vector representation of the first sentence;
determining an average value of the s _ 0., _{ t-1} sentence vector as a vector representation of the s _ 0., _{ t-1} whole;
representing the vector of the s _ t as the input of a neural network model of CNN, RNN and Transformer, and representing the output of the neural network model as the vector of the s _ t;
determining C as the vector representation of the s _0, the., _{ T-1} entirety, determining T as the vector representation of s _ T, and determining the cosine of the included angle between C and T as the concentration degree of the first sentence.
5. The method of claim 1, wherein the dimension information comprises a semantic richness:
the determining the semantic richness degree corresponding to each sentence through a natural language feature method comprises the following steps:
segmenting a first sentence included in the language information, confirming that n effective segments are included in the first sentence, and determining the ith effective segment as w _ i;
obtaining the word vector of the w _ i through a common word vector model, and determining the word vector of the w _ i as e _ i;
determining an average word vector of word vectors of n valid participles included in a first sentence, and determining the average word vector as mu, wherein mu is (e _0+ ·+ e _ n)/n;
determining Euclid distances between word vectors of all words included in the first sentence and the average word vector mu, and determining the standard deviation of the Euclid distances as the semantic richness of the first sentence.
6. The method of claim 1, wherein the dimension information comprises a semantic richness:
the determining the semantic richness degree corresponding to each sentence through a natural language feature method comprises the following steps:
segmenting a first sentence included in the language information, confirming that n effective segments are included in the first sentence, and determining the ith effective segment as w _ i;
obtaining the word vector of the w _ i through a common word vector model, and determining the word vector of the w _ i as e _ i;
determining an average word vector of word vectors of n valid participles included in a first sentence, and determining the average word vector as mu, wherein mu is (e _0+ ·+ e _ n)/n;
determining cosine similarity between word vectors of all words included in the first sentence and the average word vector mu, and determining standard deviation of the cosine similarity as semantic richness of the first sentence.
7. The method of claim 1, wherein the dimensional information includes a degree of difficulty;
the determining the difficulty variation degree corresponding to each sentence through a natural language feature method comprises the following steps:
carrying out word segmentation or word segmentation on a first sentence included in the language information, wherein n word segments or words are shared,
determining the shannon information quantity Si of each effective participle or word through a formula Si-logPi;
determining the difficulty and the ease change degree of the first sentence through a formula sqrt (var);
where log is a logarithmic function based on 2, 10, or a natural number e, Pi is the probability of occurrence of each valid participle or word, var mean ((S _ i-mu) ^2, i), mu mean (S _ i, i), mean (x _ i, i) means averaging x _ 0.
8. The method of any one of claims 1 to 7, wherein the language information comprises voice information and text information.
CN201811422169.0A 2018-11-26 2018-11-26 Method for analyzing natural language features Active CN109272262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811422169.0A CN109272262B (en) 2018-11-26 2018-11-26 Method for analyzing natural language features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811422169.0A CN109272262B (en) 2018-11-26 2018-11-26 Method for analyzing natural language features

Publications (2)

Publication Number Publication Date
CN109272262A CN109272262A (en) 2019-01-25
CN109272262B true CN109272262B (en) 2022-04-01

Family

ID=65190805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811422169.0A Active CN109272262B (en) 2018-11-26 2018-11-26 Method for analyzing natural language features

Country Status (1)

Country Link
CN (1) CN109272262B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083826A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of old man's bilingual alignment method based on Transformer model
CN113672698B (en) * 2021-08-01 2024-05-24 北京网聘信息技术有限公司 Intelligent interview method, system, equipment and storage medium based on expression analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294660A (en) * 2012-02-29 2013-09-11 张跃 Automatic English composition scoring method and system
CN106446109A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Acquiring method and device for audio file abstract
CN106997376A (en) * 2017-02-28 2017-08-01 浙江大学 The problem of one kind is based on multi-stage characteristics and answer sentence similarity calculating method
CN107633472A (en) * 2017-10-31 2018-01-26 广州努比互联网科技有限公司 It is a kind of based on teaching event stream and the internet learning method discussed immediately
CN107679144A (en) * 2017-09-25 2018-02-09 平安科技(深圳)有限公司 News sentence clustering method, device and storage medium based on semantic similarity
CN108733653A (en) * 2018-05-18 2018-11-02 华中科技大学 A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information
CN108874921A (en) * 2018-05-30 2018-11-23 广州杰赛科技股份有限公司 Extract method, apparatus, terminal device and the storage medium of text feature word

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8306356B1 (en) * 2007-09-28 2012-11-06 Language Technologies, Inc. System, plug-in, and method for improving text composition by modifying character prominence according to assigned character information measures
JP6524008B2 (en) * 2016-03-23 2019-06-05 株式会社東芝 INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294660A (en) * 2012-02-29 2013-09-11 张跃 Automatic English composition scoring method and system
CN106446109A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Acquiring method and device for audio file abstract
CN106997376A (en) * 2017-02-28 2017-08-01 浙江大学 The problem of one kind is based on multi-stage characteristics and answer sentence similarity calculating method
CN107679144A (en) * 2017-09-25 2018-02-09 平安科技(深圳)有限公司 News sentence clustering method, device and storage medium based on semantic similarity
CN107633472A (en) * 2017-10-31 2018-01-26 广州努比互联网科技有限公司 It is a kind of based on teaching event stream and the internet learning method discussed immediately
CN108733653A (en) * 2018-05-18 2018-11-02 华中科技大学 A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information
CN108874921A (en) * 2018-05-30 2018-11-23 广州杰赛科技股份有限公司 Extract method, apparatus, terminal device and the storage medium of text feature word

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Shared features dominate semantic richness effects for concrete concepts;RayGrondin等;《Journal of Memory and Language》;20090131;第60卷(第1期);第1-19页 *
基于低维语义向量模型的语义相似度度量;蔡圆媛等;《中国科学技术大学学报》;20160915;第46卷(第9期);第719-726页 *
基于语义相似度的中文文本分类研究;李晓军;《中国优秀硕士学位论文全文数据库信息科技辑》;20180415(第4期);第I138-3522页 *
面向可读性评估的文本表示技术研究;蒋智威;《中国博士学位论文全文数据库信息科技辑》;20180915(第9期);第I138-48页 *

Also Published As

Publication number Publication date
CN109272262A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109213999B (en) Subjective question scoring method
CN108121702B (en) Method and system for evaluating and reading mathematical subjective questions
CN111241243A (en) Knowledge measurement-oriented test question, knowledge and capability tensor construction and labeling method
CN108052504B (en) Structure analysis method and system for mathematic subjective question answer result
CN113314100B (en) Method, device, equipment and storage medium for evaluating and displaying results of spoken language test
KR102422790B1 (en) Artificial intelligence-based interviewer system and method for determining job competence
CN110688471B (en) Training sample obtaining method, device and equipment
CN112667776B (en) Intelligent teaching evaluation and analysis method
CN115641101A (en) Intelligent recruitment method, device and computer readable medium
CN109272262B (en) Method for analyzing natural language features
CN112001628A (en) Recommendation method of intelligent interview video
CN110705523B (en) Entrepreneur performance evaluation method and system based on neural network
CN116071032A (en) Human resource interview recognition method and device based on deep learning and storage medium
Sukvichai et al. Automatic speech recognition for Thai sentence based on MFCC and CNNs
Chintalapudi et al. Speech emotion recognition using deep learning
CN115762721A (en) Medical image quality control method and system based on computer vision technology
Szekrényes et al. Classification of formal and informal dialogues based on turn-taking and intonation using deep neural networks
Pandey et al. Interview bot with automatic question generation and answer evaluation
Ghadekar et al. A Semantic Approach for Automated Hiring using Artificial Intelligence & Computer Vision
Elbarougy et al. Feature selection method for real-time speech emotion recognition
Trinh et al. Automatic process resume in talent pool by applying natural language processing
CN114399821B (en) Policy recommendation method, device and storage medium
INTURI et al. Assessment Of Descriptive Answers in Moodle-based E-learning using Winnowing Algorithm
CN113869257A (en) Handwriting analysis method, device and equipment and readable storage medium
Biswas et al. HRPro: A Machine Learning Approach for Recruitment Process Automation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant