CN110297907A - Generate method, computer readable storage medium and the terminal device of interview report - Google Patents
Generate method, computer readable storage medium and the terminal device of interview report Download PDFInfo
- Publication number
- CN110297907A CN110297907A CN201910582433.5A CN201910582433A CN110297907A CN 110297907 A CN110297907 A CN 110297907A CN 201910582433 A CN201910582433 A CN 201910582433A CN 110297907 A CN110297907 A CN 110297907A
- Authority
- CN
- China
- Prior art keywords
- sentence
- model
- text data
- emotion
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/043—Distributed expert systems; Blackboards
Abstract
Generate method, computer readable storage medium and the terminal device of interview report.This application discloses a kind of voice content analysis methods.The described method includes: obtaining voice data;Based on the voice data, corresponding text data is obtained;The text data is inputted into trained sentiment analysis model, the trained sentiment analysis model includes that trained emotion extracts model and trained sentiment classification model;Model is extracted by the trained emotion, and the text data is divided into polarity emotion text data and neutral emotion text data;The polarity emotion text data are divided into positive emotion text data and negative sense emotion text data by the trained sentiment classification model;Sentiment analysis result is obtained according to the positive emotion text data and the negative sense emotion text data.
Description
Technical field
It, can more particularly, to a kind of method of generation interview report, computer this application involves field of artificial intelligence
Read storage medium and terminal device.
Background technique
Interview is very important a part in design studies, suffers from very important effect in all trades and professions.Interview
Mode it is varied, for example, user's interview, expert interviewing, joint interview, outdoor interview etc. pass through in design studies
Interview can see clearly the true idea of user, social industrial trend.
Traditional interview is usually unit expansion by two people or more, and someone is responsible for and user links up, someone's sound recordings
And supplement.In order to excavate user's true idea in the depth of one's heart in interview, it will usually from careful greeting and warm-up ask
Topic starts, and adjusts question order according to the practical answer of user in interview, and therefore, recorder needs in interview in interview
Effective information is extracted rapidly, in real time record and label emphasis.By the recording entire interview process of discs after interview, by voice
Text is converted to, is put down in conjunction with interview and carries out analyzing in detail.And according to practical interview target, the demand of user is seen clearly in analysis,
The pain spot of product, industry opportunity point etc. related content, above step require the completion of interview personnel and manpower, take around 5-6
The analysis work of 1 hour interview could be completed within a hour, it is to visit that interview is at high cost, analysis work lengthy and jumbled complicated, the consuming time
In what is said or talked about the problem of highly significant.
Summary of the invention
The application aims to solve at least one of the technical problems existing in the prior art.For this purpose, the purpose of the application
A kind of method for being to propose generation interview report, this method can reduce the cost of demand analysis and time in interview, more
Simply.
Second purpose of the application is to propose a kind of computer readable storage medium.
The application third purpose is to propose a kind of terminal device.
In order to achieve the above object, the first aspect of the application provides a kind of voice content analysis method, and feature exists
In, which comprises obtain voice data;Based on the voice data, corresponding text data is obtained;By the textual data
According to inputting trained sentiment analysis model, the trained sentiment analysis model include trained emotion extract model and
Trained sentiment classification model;Model is extracted by the trained emotion, and the text data is divided into polarity emotion text
Notebook data and neutral emotion text data;The polarity emotion text data are divided by the trained sentiment classification model
For positive emotion text data and negative sense emotion text data;According to the positive emotion text data and negative sense emotion text
Notebook data obtains sentiment analysis result.
In some embodiments, the text data is sentence vector, and the sentence vector is obtained by following steps: being obtained former
Beginning text;To complete sentence each in the urtext: the complete sentence subordinate sentence is made pauses in reading unpunctuated ancient writings according to punctuation mark,
Obtain at least one short sentence;Determine the sentence vector of at least one short sentence.
In some embodiments, the sentence vector of at least one short sentence described in the determination include: for it is described at least one
Each short sentence in short sentence, the term vector of the short sentence is determined based on word2vec model;Term vector based on the short sentence is true
The fixed sentence vector.
In some embodiments, it is described short to determine that the sentence vector comprises determining that for the term vector based on the short sentence
The mean value of the term vector of sentence is the sentence vector.
In some embodiments, the method further includes: delete modal particle in the urtext, stop words and
At least one of messy code.
In some embodiments, the trained emotion is extracted model and is obtained by following steps: what acquisition had marked
Training data, the training data marked include the neutral emotion text data being marked and non-neutral emotion text number
According to;The training data marked is inputted into initial emotion and extracts model;Model is extracted when the initial emotion to pass through
When reaching the condition of convergence after training, determine that the trained emotion extracts model.
In some embodiments, the trained sentiment classification model is obtained by following steps: what acquisition had marked
Training data, the training data marked include the positive emotion text data being marked and negative sense emotion text number
According to;The training data marked is inputted initial sentiment classification model to be trained;When the initial emotional semantic classification
When model reaches the condition of convergence after training, the trained sentiment classification model is determined.
In some embodiments, the method further includes: the voice data is determined according to the text data
Field belonging to content;According to field belonging to the content of the voice data, determines and transfer the trained emotion point
Analyse model.
In some embodiments, the method further includes: receive user input, it is described input include the voice number
According to content belonging to field;According to field belonging to the content of the voice data, determines and transfer the trained feelings
Feel analysis model.
Further aspect of the application provides a kind of voice content analytical equipment, comprising: at least one storage equipment, institute
Stating storage equipment includes one group of instruction;And at least one processor communicated at least one described storage equipment, wherein when
When executing one group of instruction, at least one described processor makes the analytical equipment execute method above-mentioned.
In order to achieve the above object, the third aspect of the application provides a kind of computer readable storage medium, storage
There are computer executable instructions, the computer executable instructions are arranged to carry out preceding method.
In order to achieve the above object, the fourth aspect of the application provides a kind of terminal device comprising: processor;With
The memory of the processor communication connection;Wherein, the memory is stored with the instruction that can be executed by the processor, described
When instruction is executed by the processor, the processor is made to execute preceding method.
The additional aspect and advantage of the application will be set forth in part in the description, and will partially become from the following description
It obtains obviously, or recognized by the practice of the application.
Detailed description of the invention
The above-mentioned and/or additional aspect and advantage of the application will become from the description of the embodiment in conjunction with the following figures
Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the method for generating interview report of one embodiment of the application;
Fig. 2 is the schematic diagram according to the training neural network model process of one embodiment of the application;
Fig. 3 is the schematic diagram for the further explanation of the training process in Fig. 2;
Fig. 4 is the block diagram according to the terminal device of one embodiment of the application;
Fig. 5 is the flow chart according to the intelligent sound data-requirements extraction process of one embodiment of the application;
Fig. 6 is the process according to the demand estimation model of one embodiment of the application and the building process of feature lexicon
Figure;
Fig. 7 is the frame according to the comprising modules of the intelligent sound data-requirements extraction system of one embodiment of the application
Figure;
Fig. 8 is the flow chart according to the availability deterministic process of one embodiment of the application;
Fig. 9 is the flow chart according to the building process of the availability judgment models of one embodiment of the application;
Figure 10 is the block diagram according to the availability extraction system of one embodiment of the application;
Figure 11 is the flow chart according to the voice content analysis method of one embodiment of the application;
Figure 12 is the flow chart according to the semantic primitive clustering method of one embodiment of the application;
Figure 13 is to determine one based on the similarity of multiple semantic primitives between any two according to one embodiment of the application
The flow chart of a or multiple cluster centres;
Figure 14 is to calculate separately each candidate semantic unit and multiple semantic primitives according to one embodiment of the application
In each of remaining semantic primitive between similarity flow chart;
Figure 15 is the stream according to the candidate semantic vector of each candidate semantic unit of calculating of one embodiment of the application
Cheng Tu;
Figure 16 is the candidate semantic vector according to each candidate semantic unit of calculating of another embodiment of the application
Flow chart;
Figure 17 is the schematic diagram according to the semantic primitive clustering apparatus of one embodiment of the application.
Specific embodiment
Embodiments herein is described below in detail, the embodiment being described with reference to the drawings is exemplary, and is retouched in detail below
State embodiments herein.
The method for generating interview report of the embodiment of the present application, has merged artificial intelligence and design studies technology, with less
The step of and the time construct autonomous interview research process, can help designer and other people independently completes interview data receive
Collection, analysis and the process reported, reduce the human cost of interview analysis, shorten the time of interview analysis consuming, simple easily to implement.
Below with reference to Fig. 1-Fig. 3 description according to the method for generating interview report of the application first aspect embodiment.
Fig. 1 is according to the flow chart of the method for generating interview report of one embodiment of the application, as shown in Figure 1, originally
The method for applying for the generation interview report of embodiment includes step S1 to step S4.
Step S1 obtains interview corpus.
Specifically, in embodiment, the method for the generation interview report of the embodiment of the present application can be with the shape of application program
Formula is loaded in terminal device such as smart phone, tablet computer, on laptop, and the human-computer interaction window of terminal device can be with
The application icon of the method for association the embodiment of the present application is provided, in response to program enabled instruction, human-computer interaction interface is provided
Recording starting trigger unit can must then be recorded in response to the triggering command of recording starting trigger unit by terminal device itself
Sound module acquires interview corpus, in short, operation recording starting trigger unit can start to record when interview starts, Jin Erke
To obtain interview corpus.
In further embodiments, it can also be recorded by other recording devices to interview, such as pass through microphone
Or the sound pick-up outfits such as microphone array, recorder, recording pen record interview corpus, and interview corpus is transferred to and loads this Shen
Please embodiment method application program terminal device, to obtain interview corpus, and then independently interview corpus is analyzed,
The time of interview analysis can also equally be shortened, save human cost.
Step S2 is instructed in response to interview report generation, is analyzed according to neural network model interview corpus, is obtained
By the performance information of interview person.
Specifically, after completing interview, the human-computer interaction interface of terminal device provides interview report generation trigger unit,
Or the corresponding key or knob of setting receives interview when interview report generation trigger unit receives triggering command
Report generation instruction, the processor of terminal device can carry out the analysis expected interview independently to generate interview report.
In some embodiments, for original interview corpus, need to carry out it a series of processing in order to subsequent
The processing such as analyzed, such as pre-processed and segmented to original language material using neural network model, wherein pretreatment can
To include for example being identified to interview corpus to be converted to text data, and text data is split and purified.
Specifically, there is neural network training, study to continue to optimize the mechanism of result, can be with structure by neural metwork training
Build neural network model.First the training of neural network model is done and is briefly described.
Fig. 2 be according to the schematic diagram of the neural network model training process of one embodiment of the application, as shown in Fig. 2,
It can be for production for the expert system in the open algorithm optimization interface of expert user such as Fig. 2, the major function of expert system
The customization function of product constructs training set to the core feature neural network of project, generates neural network mould in training characteristics parameter
When type, the content of training set determines final result.By the input expert system building of expert's priori knowledge, the content of training set
It is input to neural network algorithm model, expert can carry out the output result of the neural network algorithm model based on the training set
Assessment amendment, and then optimize expertise, the knowledge after optimization is further input to expert system, and then carry out to training set
Amendment, inputs to neural network algorithm model again, circuits sequentially iteration, until the output result of neural network model is more forced
Nearly optimal solution.
Further, Fig. 3 is the schematic diagram being explained further for the neural network model training process in Fig. 2, tool
Body, training set can be constructed jointly by team, in conjunction with linguistics, psychology, design science method, excavate the structure of raw data base
The standard mode of training text is determined according to the constraint condition of result of study specification neural network at feature.Such as such as Fig. 3 institute
Show, for expert system side, expert's priori knowledge is inputted into expertise recording module, limits the condition that feature determines, such as
Judgement whether or the judgement of degree etc., mark key message for example set keyword, and mark description information such as language
Gas or grammer etc., and then it is built into training set.For algorithm model side, training set content input algorithm model is trained,
Such as it carries out semantic analysis, generate analysis as a result, being modified to the assessment of analysis result to analysis result based on expert, in turn
Training set is augmented, trained set content is continued to optimize, to construct the neural network model of needs.
In embodiments herein, can be based on interview corpus the characteristics of and expertise, by process above come
Neural network model is constructed, interview corpus is analyzed by neural network model, include in acquisition interview corpus is interviewed
All kinds of performance informations of what is said or talked about person such as emotion information, demand information, the anticipation information to industrial trend, the controversial issue to certain problems
Information, availability information etc., wherein different neural network models is constructed by the different performance information of interview person for obtaining.
Step S3 generates interview report according to by the performance information of interview person.
For example, about some product carry out user's investigation, it is desirable to judged by user's investigation and analysis user demand or
To the attitude of product, to be improved to product.Specifically, when to user's interview, operation recording trigger unit is to start record
Sound obtains interview corpus, after the completion of interview, input interview report generation instruction, by neural network model come to interview language
Material is analyzed, and acquisition text key word collection, which is formed, obtains globality description to interview, is obtained by interview person for the feelings of the product
Feel information, demand information and/or availability information, user is judged according to emotion information, demand information and/or availability information
Prefer or wish function that product has to the products satisfaction and/or unsatisfied place, user, even user is usually
All like the information such as what, so that the interview report including user preferences and product advantage and disadvantage is generated based on analysis result,
And then can be perfect to be carried out to product with reference to interview report, it enhances product performance.
In some embodiments, demand refers to expectation or hope of the user from self-view proposition.Pass through demand class
Corpus can obtain user's motivation, user prefers or wish function that product has, to the suggestion of product or opinion etc.
Information, to realize guide product design, see clearly the demands such as industry market.Demand quasi-sentence another example is: I wishes house
Electricity can be consistent with the decoration style of family.
In some embodiments, availability refers to that user in specific usage scenario, makes to reach specific objective
When with certain product, validity, efficiency and the satisfaction experienced.Specifically, validity (Effectiveness) refers to using
The correct and integrated degree of family completion specific objective;Efficiency (Efficiency) refers to that user completes the efficiency of specific objective,
The resource (such as time) of itself and consumption is inversely proportional;Satisfaction (Satisfaction) refers to using the product by the user that when experiences
Subjective satisfaction.In some embodiments, there are five indexs for availability, are learnability, easy memory, fault-tolerance, interaction respectively
Efficiency and user satisfaction.Product only reaches good water product in each index, just has high availability.By right
Interview corpus carries out availability analysis and extraction, can obtain product advantage and disadvantage, to realize the optimization to product, and then improves
Performance.Availability sentence another example is: the speech recognition effect of this intelligent sound is good not at all.Availability sentence
Another example is: I thinks that this intelligent sound is also very easy to operate.In some embodiments, availability information expresses
Impression of the user in terms of Product Experience, for example, the positive or negative that user holds availability or the ease for use aspect of product
The recommendation on improvement or opinion of attitude or user to product in terms of some.
In some embodiments, sentiment analysis refer to the subjective texts with emotional color are analyzed, handled,
The process with reasoning is concluded, in the product by analysis interview original text, user preferences can be obtained, to product, the feelings of environment
Feel attitude, sees clearly the animation of user.
In some embodiments, demand information may include product demand information and demands of individuals information.Specifically, product needs
Asking information mainly to express demand of the user to the function of specific products or speciality, user prefers or wishes to produce in other words
The function that product have, for example, user, which wants special handset product, has the function of narrow frame, dual camera, double-card dual-standby etc..It is personal
Demand information mainly expresses the demand of user itself, and the evaluation object of the information is typically not limited to specific products and can be
Any aspect, for example, user wants a new cell-phone, the new sneakers of a pair of, a film ticket etc..In some embodiments, emotion
Polarity information may include Product Emotion polarity information and personal feeling polarities information.Specifically, Product Emotion polarity information is main
User is expressed to the specific function of specific products or the taste of speciality or user to the products satisfaction or unsatisfied place,
For example, user likes the particular shell color of special handset product, does not like Liu Haiping, the camera for not liking protrusion etc..It is a
Human feelings sense polarity information mainly expresses user to the taste of other objects, and the evaluation object of the information is typically not limited to specific production
Product and it can be any aspect, for example, user likes playing basketball, likes digital product, do not like and watch movie.
According to the method for generating interview report of the embodiment of the present application, instructed in response to interview report generation, it can be voluntarily
Interview corpus is analyzed according to neural network model, is obtained by the performance information of interview person, and then according to by interview person's
Performance information generates interview report, that is, a key may be implemented and generate interview report, so as to use manpower and material resources sparingly, reduces
Interview cost shortens interview and expends the time.
In some embodiments, interview report is generated according to by the performance information of interview person, comprising: calculating is needed by interview person
The sentence asked, by least a kind of in the sentence of the non-demand of interview person, positive emotion sentence, negative sense emotion sentence, availability sentence
The similarity of sentence;It is clustered according to similarity, and obtains cluster centre;The statement semantics for including according to cluster centre, it is raw
It is reported at interview.
Specifically, obtain interview corpus in include by the performance information of interview person after, calculate the sentence of all kinds of performance informations
Vector forms sentence vector group, and the sentence vector in sentence vector group is carried out similarity calculation, carries out for example, by using HANLP algorithm
Similarity calculation, and then clustered according to similarity calculation, such as can be clustered using AP clustering algorithm or other algorithms,
And determine cluster centre, and it is referred to the detailed description of following Example about clustering method, the corresponding sentence of cluster centre,
It can largely reflect by the happiness evil of interview person or to the attitude of product, interview report, and then user can be generated accordingly
Can with reference to the interview report come to product function and performance carry out it is perfect.
It is possible to further to interview corpus extract keyword and carry out emphasis label, cypher text, report and its
Emphasis can be more easily found when its material.Specifically, label can be provided by the human-computer interaction interface of terminal device
Highlighted (highlight) button is for example arranged in trigger unit, or is arranged on the terminal device corresponding touch or mechanical
Key.When marking keyword, weight calculation is carried out to the sentence in text data and is sorted, is preselected according to ranking results
Keyword filters out the stop words for including in qualifying key word according to deactivated vocabulary, obtains the keyword in text data, and defeated
Keyword out forms text key word collection to carry out globality description to interview.Wherein, stop words can be manually entered, and generate
Stop words afterwards will form a deactivated vocabulary.In emphasis label, in response to emphasis mark instructions, it is right in text data to improve
Should emphasis mark instructions sentence weight, with label be emphasis sentence, be easier to find emphasis when facilitating subsequent browse.
Specifically, morphological analysis can be carried out to text data using participle external member, obtains short sentence and removal is therein
Modal particle.Some participle external members, which have been used, realizes that efficient word figure scans based on prefix dictionary, and Chinese character is all in generated statement can
The directed acyclic graph that word is constituted can be generated, then uses Dynamic Programming and searches maximum probability path, is found out based on word frequency most
Big cutting combination, has used Viterbi algorithm using the HMM model based on Chinese character at word ability for unregistered word.It is right
In the extraction of keyword, i.e., extract several significant words or phrase automatically from one section of given text data.One
In a little embodiments, TextRank algorithm can be used, TextRank algorithm is a kind of sequence based on figure for text data
Algorithm, by the way that text data is divided into several component units and establishes graph model, using voting mechanism to important in text
Ingredient is ranked up, and keyword extraction can be realized merely with single document itself.
For example, extract keyword basic step may include: (1), given text data T according to complete sentence into
Row segmentation;(2), for each sentence, participle and part-of-speech tagging processing are carried out, and filters out stop words, only retains specified part of speech
Word, such as noun, verb, adjective, that is, wherein be retain after qualifying key word;(3), qualifying key word figure G=is constructed
(V, E), wherein V is node collection, is made of the qualifying key word that (2) generate, and is then appointed between two o'clock using cooccurrence relation construction
Side, there are side, only when their corresponding vocabulary, co-occurrence, K indicate window size in the window that length is K between two nodes,
I.e. most K words of co-occurrence;(4), the weight of each node of iterative diffusion, until convergence;(5), inverted order row is carried out to node weights
Sequence, so that most important T word is obtained, as qualifying key word;(6), most important T word is obtained by (5), original
It is marked in text, if forming adjacent phrase, is combined into more word keywords.
It is exemplified below, example is as follows.
Original text short sentence are as follows: alarm clock can be made with it when get up in the morning, can also ask its weather sometimes
Qualifying key word are as follows: morning, get up, alarm clock, weather, can with, when
Stop words are as follows: can with, when
Final keyword are as follows: get up, alarm clock, weather morning
In short, the method for generating interview report of the embodiment of the present application, can be realized with a key to one or passage
The emphasis of data marks, and convenient for understanding emphasis when subsequent processing text data, manual steps are less, simple and convenient, saves
Human time.
Further, in some embodiments, interview can be reported and further generates visual information, and be supplied to use
Family.It is presented to the user for example, reporting interview with bar chart or pie chart or line chart or various forms of combinations, thus with
Family can more intuitively understand the content and key message of interview report.
In some embodiments, after completing interview, editor can also be provided in the human-computer interaction interface of terminal device
Trigger unit edits interview report in response to edit instruction, for example, addition is all by interview person's information, or modification
Text data, result and report content after conversion, it is more flexible.Further, user can be according to the demand of itself to visit
It talks report result to be labeled, after user completes editor, terminal device will record content of edit, obtain in user's content of edit
Labeled data, and markup information is compared with given threshold, when labeled data reaches default mark threshold value, will be marked
Data feedback is to the corpus data library of neural network model, alternatively, according to set time period, every preset time such as 5 days
Or 15 days, labeled data is fed back to the corpus of neural network model, with optimization neural network model, i.e. neural network mould
Type has adaptation function, so that being more nearly the desired knot of user by the result that neural network model is analyzed
Fruit.
In some embodiments, after generating interview report, output can be provided in the human-computer interaction interface of terminal device
Trigger unit, in response to output order, by interview report output to mobile terminal such as smart phone, PC, notebook
On computer, convenient for checking analysis result whenever and wherever possible and exporting interview report, provide conveniently.
The method for generating interview report based on above example, the application second aspect embodiment also propose a kind of calculating
Machine storage medium, the computer-readable recording medium storage have computer executable instructions, and computer executable instructions are set as
Execute the method for generating interview report of above example.
The method for generating interview report based on above example, is described below according to the application third aspect embodiment
Terminal device.
Fig. 4 is according to the block diagram of the terminal device of one embodiment of the application, as shown in figure 4, the embodiment of the present application
Terminal device 100 includes processor 10 and memory 20.
Wherein, memory 20 and processor 10 communicate to connect, and memory 20 is stored with the instruction that can be executed by processor 10,
When instruction is executed by processor 10, processor 10 is made to execute the method for generating interview report of above example.Wherein, it generates and visits
The method for talking report, is referred to the description of above example.
Specifically, terminal device 100 can include but is not limited to mobile terminal such as smart phone, PC or flat
Plate computer, wherein trigger unit can be set in terminal device 100, such as provides human-computer interaction interface, on human-computer interaction interface
Interview report generation trigger unit, or setting touch-key or mechanical key are provided, when receiving triggering command, from
The method for generating interview report of main execution above, realizes operating in a key, is simple and efficient, with less operating procedure and
Time can construct autonomous interview research process, reduce interview cost, shorten the time of interview consumption.
Total process of the generation interview report of the embodiment of the present application is illustrated above.Below to pass through neural network
For model is analyzed to interview corpus and obtains emotion information and demand information by interview person, to the embodiment of the present application
The method for generating interview report is illustrated, i.e., sentiment analysis algorithm and requirement extract algorithm is further described, and
Cluster process is further described.
In the embodiment of the present application, by the performance information of interview people may include it is a kind of in demand information and emotion information or
Two kinds.In some embodiments, after being pre-processed to original language material information, interview corpus is defeated for demand information
Enter first nerves network model, obtains in interview corpus and reflect by the sentence of interview person's demand and by the language of the non-demand of interview person
Sentence is extracted by the demand information of interview person, for example, SVM (Support Vector Machine, supporting vector can be used
Machine) classifier realizes two classification analysis of demand and non-demand.Further, in some embodiments, by the non-need of interview person
The sentence asked is considered neutral sentence, can extract quilt for by the input by sentence nervus opticus network model of the non-demand of interview person
Reflection is by the sentence of interview person's polarity emotion in the sentence of the non-demand of interview person;And then it is the input by sentence third of polarity emotion is refreshing
Through network model, positive emotion sentence and negative sense emotion sentence in the sentence of polarity emotion are obtained.For example, being connected using two
TextCNN classifier come realize polarity emotion sentence and neutral emotion sentence two classification and to polarity emotion sentence into
Classify to one step, the corpus that the algorithm model of two of them TextCNN classifier uses when training is different.
Requirement extract process in the intelligent sound data of the embodiment of the present application is carried out specifically referring to Fig. 5 to Fig. 7
It is bright.
Fig. 5 shows the flow chart of the intelligent sound data-requirements extraction process according to the embodiment of the present application.
In step 101, voice data is obtained.Specifically, step 101 by microphone or microphone array, recorder,
The sound pick-up outfits such as recording pen obtain voice signal.
In step 102, the voice data of acquisition is pre-processed, obtains the text data for analysis.Pretreatment is
Voice data is loaded into memory by finger, and additions and deletions as needed change a process of part of word.Pretreatment includes: to know
Not, refer to and voice data is identified as lteral data, to form text data;Split, refer to according in text data by fullstop
The punctuation mark that interval is indicated in the long sentence of separation, is split as short sentence for the long sentence;And purification, refer to and removes the text
Invalid content unrelated with interview content in original voice data in data.In some embodiments, pretreatment includes that comment is clear
It washes, part of speech participle, part-of-speech tagging and syntax dependency parsing.
In some embodiments, it is computer-readable that speech recognition, which is the vocabulary Content Transformation in the voice for being issued people,
The text entered.For example, can with certain time length (for example, 0.05 second) to speech waveform at a bit of, be known as a frame per a bit of,
Then obtain a certain number of frames (for example, 20) in certain time (for example, 1 second).Reflection voice is extracted from each frame
The information (remove redundancy useless for speech recognition in voice signal, while reaching dimensionality reduction) of substantive characteristics.Then it mentions
The feature for taking each frame waveform, obtains the feature vector of the frame.Phoneme be marked off according to the natural quality of voice come minimum
Phonetic unit.State refers to phonetic unit more finer than phoneme, and a phoneme is usually divided into three states.By by frame
Be identified as state, by combinations of states at phoneme, again by phonotactics at word, to realize speech recognition.
In some embodiments, speech recognition may include Real-time speech recognition and offline speech recognition.It, can in interview
Voice is converted to text in real time, checked for interview person.After interview, entire interview can be looked back by playback
Journey.
Specifically, the development platform that third party's speech-to-text can be used, by the sdk for downloading the platform
(Software Development Kit, Software Development Kit), is completed based on the Software Development Kit by interview language
Material is converted to the function of text data.In embodiments herein, Real-time speech recognition and offline speech recognition can be parallel,
Real time inspection speech-to-text looks back entire interview process as a result, can also pass through to play back after interview in interview.
In some embodiments, before speech recognition, interview person or general term can be used by interview person, also establishes special
There is field dictionary.Dictionary is uploaded to voice and known by user by storing some non-common specialized vocabularies into the dictionary
In other module, the accuracy of speech recognition result can be promoted.For example, " trend " word originally means that the water flow as caused by tide is transported
It is dynamic, refer to the trend of fashion trend in sociology field, point of voltage, electric current and power in power industry then refers to power grid everywhere
Uncommon specialized vocabulary for example can be uploaded to speech-to-text and put down with grinding, go out packet etc. for another example in interview field by cloth
In the dictionary of platform, when carrying out text conversion, these vocabulary can be more efficiently identified.In short, according to interview theme institute
Category field takes corresponding proprietary field dictionary, can further increase the recognition accuracy of dialect.
In some embodiments, by speech recognition process, the text data of corresponding voice data can be obtained.Some
In embodiment, the format of text data may include (but being not limited to) TXT, WORD, EXCEL etc. Microsofts (MICROSOFT) company
The OFFICE document perhaps document of WPS format file or the extended formatting for word processing.
In some embodiments, the punctuate referred to according to interval is indicated in the long sentence separated in text data by fullstop is split
Long sentence is split as short sentence by symbol, is analyzed so as to independent to short sentence respectively.In some embodiments, this method uses
", ", ", " or ";" split long sentence.In some embodiments, long sentence can be, for example, " for drawing an analogy, I feels intelligence
Can speaker should simple generous, color should not be motley, using above should be simple and convenient ".After fractionation, short sentence can be formed
It is as follows: 1, for drawing an analogy;2, I thinks that intelligent sound box should be simple generous;3, color should not be motley;4, it uses
On should be simple and convenient.Compared with long sentence, uses short sentence as analysis object, system operations process can be simplified, reduce operation
Amount, raising efficiency.
In some embodiments, purification, which refers to, removes in the text data in original voice data with voice data content
Unrelated invalid content.In some embodiments, due to the colloquial style feature of interview, language is inevitably present in text data
Gas auxiliary word or interjection.In some embodiments, during voice changes into text, there may be messy code.These tone help
Word or messy code are unrelated with interview content, to requirement extract and in vain.
Specifically, for the colloquial feature of interview, morphological analysis is carried out by text of the participle tool to interview, is determined
Word content and part of speech in each short sentence remove the modal particle in sentence, with more significant in determination every words
Vocabulary, in order to the analysis of interview.Participle tool includes a variety of such as Pan Gu participles, Yaha participle, Jieba participle, Tsing-Hua University
THULAC etc., by taking morphological analysis interface calls jieba participle as an example, the former sentence after subordinate sentence is: uh, welcome you to participate in we this
The interview of secondary intelligent sound box user experience.The sentence for carrying out morphological analysis and removing after modal particle therein is segmented by jieba
It is: you is welcome to participate in the interview of our this intelligent sound box user experience.Thus, it is possible to obtain the short sentence table with physical meaning
It reaches.
In step 103, word segmentation processing is carried out to each short sentence in pretreated text data.Participle is will be a series of
Continuous character string is according to certain logical division at individual word.In some embodiments, participle can using maximum matching method,
Reverse maximum matching method, bi-directional matching method, Best Match Method, association-backtracking method etc. carry out.In some embodiments, Yong Huke
To select accurate participle, the word listed and be all likely to occur also can choose.After word segmentation processing, each text is by with sky
The corpus of text for word (word) composition that lattice separate.
In step 104, it is based on feature lexicon, sentence vector corresponding to each short sentence in the text data after obtaining participle.
In some embodiments, feature lexicon be by Feature Words and eigenvalue cluster at two-dimensional matrix.Feature Words are to have very in corpus
Big possibility expresses the word that target language material can be positioned as to demand.Characteristic value is characterized a possibility that word is positioned as demand
Mathematical expression.Corpus can be include the database of whole corpus, such as 98 years People's Daily's corpus, can also use
Corpus under existing special dimension.Sentence vector be correspondence text data in each short sentence, by corpus and feature
To the matrix of the lookup result composition of the word in short sentence in dictionary.
In some embodiments, it for each word in the short sentence after participle, is looked into corpus and feature lexicon respectively
It looks for.If the word did not occur in corpus, lookup result 0;If the word occurred but did not had in corpus
Occurred in feature lexicon, then lookup result is 1;If the word appears in feature lexicon, lookup result 2;Thus
Form sentence vector corresponding to the short sentence.Since the short sentence of demand analysis has specific characteristic, granularity is larger, and (i.e. short sentence is wanted
Be statement of requirements or be non-statement of requirements, the short sentence that can not specify classification is less), therefore through practice test, above-mentioned sentence to
It is preferable to measure generation type classifying quality.
In step 105, sentence vector is input to demand estimation model.In some embodiments, demand estimation model is matched
It is set to and judging result is exported according to the semantic unit vector (for example, sentence vector) of input.In some embodiments, judging result can
To be to indicate whether sentence belongs to the value of demand property sentence.
In step 106, the output of judgment models is as a result, determine which kind of the short sentence is classified as according to demand.Some
In embodiment, output result can be 1 or 0.When exporting result is 1, which is divided into demand property sentence;When output result
When being 0, which is judged as non-demand property sentence.In some embodiments, output result may is that the short sentence is divided into and need
A possibility that seeking sentence e.g. 0.7, a possibility that being divided into non-statement of requirements e.g. 0.3, therefore, which is finally judged to
Break as statement of requirements.
In some embodiments, all sentences for expressing demand are subjected to clustering processing.The detailed step of clustering processing
It is referred to the description of Examples below.
In some embodiments, for being judged as the part of non-statement of requirements, polarity sentiment analysis is carried out.Polarity emotion
The granularity relative requirements of analysis analyze more refinement, are unable to reach 90% or more accuracy using SVM, polarity sentiment analysis because
This uses convolutional neural networks (CNN) classifier.It in some embodiments, can for being judged as the part of non-statement of requirements
Carry out availability analysis.
In some embodiments, demand refers to the phase to something or something showed in interview process by interview person
It hopes.For example, extracted demand can indicate are as follows: faster response speed when interview subject is a kind of intelligent sound box;It is recommended that
Optimize design;Appearance wants fashion generous;Color should not be motley;Cabinet lines want smoothness, etc..At cluster
After reason, it can be deduced that the theme of this interview are as follows: faster response speed;It is recommended that optimizing design.In some implementations
In example, in interview, user may provide unintentionally the information related or completely irrelevant to current interview subject part, than
If user may evaluate the rival of interview subject, also or reveal out unrelated with field locating for the interview subject
Otherwise demand information.
In some embodiments, polarity emotion refers to by the Sentiment orientation of interview person, for example, its can be divided into positive, negative sense and
It is neutral.Specifically, positive emotion is to indicate: the advantages of product, interview person expresses the content liked, be satisfied with;Negative sense emotion is table
The shortcomings that showing product and availability issue and interview person express detest, unsatisfied content;Neutral emotion is that expression is neutral vertical
The content of field.For example, polarity emotion may include: that it can be to a certain extent when interview subject is a kind of intelligent sound box
Some conveniences (positive emotion) is brought to me;I is (the negative sense emotion) that will not be done by it in fact;I feels first side
For face (neutral emotion).Using non-statement of requirements rather than entire interview content analyzes polarity emotion, and polarity can be improved
The efficiency and accuracy of sentiment analysis.In some embodiments, Sentiment orientation can be for product itself, be also possible to needle
To other aspects except product.For example, user may provide unintentionally and current interview subject part phase in interview
Pass or completely irrelevant information, such as user may evaluate the rival of interview subject, also or reveal out and this
The unrelated otherwise Sentiment orientation information in field locating for interview subject.
Below in connection with the building and training process of Fig. 6 detailed description demand estimation model and feature lexicon.
Fig. 6 shows the building according to the demand estimation model and feature lexicon of this specification embodiment and/or trained
The flow chart of journey.In some embodiments, the building and/or training process are by manually obtaining.In some embodiments, the building
And/or training process is completed by computer program.
In step 201, voice data is obtained.In step 202, voice data is pre-processed.Step 201 is to step
202 is similar to step 102 with above-mentioned steps 101.In step 203, feature is carried out to text data using expert's mark database
Mark.For example, the text data after mark may is that X: sentence.Wherein, X can be 0 (indicating non-demand) or 1 (expression need
It asks).In step 204, the text data after mark is segmented.Step 204 is similar with above-mentioned steps 104.
Text data after step 205, participle is input into classifier for training demand estimation model.Some
In embodiment, classifier uses support vector machines (Support Vector Machine, SVM) classifier.SVM classifier is one
Two classical disaggregated models of kind, for feature, significantly clearly, demand biggish for granularity is divided for classification effect for it
Analyse significant effect.The basic model of SVM classifier, which is defined on feature space, keeps the distance between two classes maximum linear
Classifier.SVM classifier can also include kernel function, and kernel function has the function of low-dimensional data being converted to high dimensional data.It is logical
Introducing kernel function is crossed, inseparable problem can be converted into separable problem, this makes it substantial non-linear
Classifier can be suitable for the data of linearly inseparable.
In some embodiments, this method selects linear kernel function, for example, k (x1, x2)=x1Tx2.In some embodiments
In, other kernel functions, such as polymerization kernel function, radial base core letter can be selected according to the size and other factors of text data
The Non-linear Kernels functions such as number.
In some embodiments, by a series of calculating processes such as Lagrange duality operations, finally obtain needs this method
Ask judgment models and series of features word.Demand estimation model is the executable algorithm of a set of computer, and input is sentence vector,
Output is classification belonging to this.
In some embodiments, demand estimation model can be sentenced based on interdependent syntactic analysis to whether sentence belongs to demand
It is disconnected.In some embodiments, interdependent syntactic analysis may include one or more rules.For example, one or more for meeting
The sentence of rule, it can be determined that it is statement of requirements, and output result is 1, and otherwise, output result is 0.In some embodiments,
Each rule can assign certain weight, integrate all rules finally to calculate the sentence as demand property sentence
Possibility or reference value.It in some embodiments, can be with the object of computed view point keyword to obtain by applying these rules
Demand object value list is obtained, and the quantity of Sentiment orientation degree adverb can be counted to obtain desirability value list, finally
Improvement requirement list is generated in conjunction with demand object value list and desirability value list.
In some embodiments, by identifying the dependence of statement of requirements, to extract Feature Words and construct demand estimation
Model.Wherein, Feature Words are the objects that expressed viewpoint is referred to, generally noun or gerund or verb;Viewpoint word is
Expressed viewpoint, generally adjective, adverbial word or verb.In some embodiments, the dependence between word can wrap
Include subject-predicate relationship (Subject-Verb, SBV), dynamic guest's relationship (Verb-Object, VOB), dynamic benefit relationship (Verb-Object,
CMP), Key Relationships (Head, HED) or coordination (Coordinate, COO).In addition, word is possibly also with qualifier.
The relationship of centre word and its qualifier may include surely in relationship in relationship (Attribute, ATT) or shape (adverbial,
ADV)。
For example, the noun (or gerund or verb) when demand short sentence meets SBV, CMP or ATT relationship, in short sentence
It is characterized word, corresponding adjective is viewpoint word.For example, the dependence of short sentence is in statement of requirements " workflow can manage it "
Subject-predicate relationship, therefore wherein " workflow " is Feature Words, " can manage it " is viewpoint word." management department of document is wanted in statement of requirements
It is point more preferable to use " in, dependence is dynamic benefit relationship, and therefore " administrative section of document " is Feature Words, and " more preferable with " is viewpoint word.
For example, then two words are qualifier and viewpoint word respectively when two words adjacent in short sentence meet ADV relationship.Such as
Phrase " more preferable to use ", which meets ADV relationship, then identifies that " handy " is viewpoint word, and " more " is qualifier.For example,
When two nouns (or verb+noun) adjacent in short sentence meet ATT relationship, two words constitute a nominal phrase,
Respectively qualifier and Feature Words.Such as in " administrative section of document " nominal phrase, wherein " administrative section " is feature
Word, " document " are qualifiers.In some embodiments, higher to the number of repetition of some Feature Words or keyword, then it pays close attention to
Degree is higher, when emotional expression is derogatory sense, illustrates that product demand is higher.
In some embodiments, Feature Words are extracted by directly expressing the word of user demand in identification demand short sentence
And construct demand model.Specifically, identify in statement of requirements indicate " increase " or " reduction " word (verb, Verb) and
Indicate the Feature Words (noun, Noun) of object.The word for indicating " increase " includes: growth, supplement, expand, fill up, promoting, adds
It is more, extend, increase, adding, reinforcing, expanding, attaching, replenishing, increasing, increasing, filling.Indicate " reduction " word include:
It reduces, omit, weakening, deleting, slackening, memorandum, diminution, mitigate, reduce, eliminating, subtracting and cut, shrink, cutting down, deleting, reducing, subtracting
Less, lower, weaken etc..For example, demand short sentence " increasing some resolution ratio " can be identified " increase " and " be differentiated
Rate ", wherein verb is " increase " and Feature Words are " resolution ratio ".For demand short sentence " delete some unnecessary processes ",
It can identify " deletion " and " process ", wherein verb is " deletion " and Feature Words are " process ".
In some embodiments, Feature Words are extracted by expressing the word of user demand indirectly in identification demand short sentence
And construct demand model.The embodiment be by user's repeatability emphasize judge demand, for example, passing through identification degree adverb
Or frequency adverbial word or punctuation mark extract Feature Words.It in some embodiments, can be by identifying that word (noun) adds
The word (adverbial word) for the repetition n times (n is positive integer) expressed emphasis extracts Feature Words plus the structure of viewpoint word.For example, table
Show the word emphasized include: certainly, again and again, very much, necessarily, very, too, soon, repeatedly, very, very, all, often, exceptionally,
More, perhaps, just, obviously, slightly, simply, only, temporarily, much, only, deliberately, unexpectedly, do not have, never, by force etc..For example,
In short sentence, " this interface Mrs Mrs is too ugly!" in, due to identifying the word expressed emphasis " too " and number of repetition n
=5, therefore identification feature word is " interface ", viewpoint word is " ugly ".In some embodiments, it is also possible to pass through identification word
(noun) extracts spy plus the structure for the punctuation mark for repeating n times (n is positive integer) plus viewpoint word (verb or adjective)
Levy word.The punctuation mark may include, "!", ", ", ".", " ... ", "? ", " * " etc..For example, in demand short sentence, " bottle green is tangible
It is plain." in, due to identify duplicate punctuation mark ".", and number of repetition n=5, and identify that Feature Words are
" bottle green ", viewpoint word are " plain ".
By above-mentioned processing, series of features word, viewpoint word and demand estimation model finally can get.In some implementations
It, can be with the frequency of statistical nature word in example.The number of repetition of Feature Words is higher, then attention rate is also higher.When feeling polarities are
When derogatory sense, illustrate that demand property is higher.
In some embodiments, can be with the frequency of Statistics word, and provide viewpoint value.Viewpoint value indicates the viewpoint
Feeling polarities, specific numerical value are [- 1,1] section.Wherein, negative number representation negative emotion, positive number indicate positive emotion, and
Absolute value is bigger, and feeling polarities are more obvious.
In step 207, method passes through series of features word construction feature dictionary.In some embodiments, it is examined using card side
It tests (Chi-Squared Test) and carrys out construction feature dictionary.Chi-square Test be by X2 be distributed based on one kind common assume inspection
Proved recipe method, its null hypothesis H0 is: observed frequency and expecterd frequency do not have difference.Detailed process is: assume initially that H0 is set up,
X2 value is calculated based on this premise, which indicates the departure degree between observed value and theoretical value.It can according to X2 distribution and freedom degree
To determine the probability P for obtaining current statistic amount and more extreme case in the case where H0 assumes to set up.If P value very little, explanation
Observed value and theoretical value departure degree are too big, should refuse null hypothesis, and there were significant differences between two comparative quantities, have independent
Property;Otherwise it cannot refuse null hypothesis, that is, must not believe that between two comparative quantities have independence.Chi-square Test is in natural language
It is chiefly used in carrying out feature extraction in speech processing.
In some embodiments, calculating these Feature Words can for the influence degree of demand estimation, that is, Feature Words
A possibility that being decided to be demand, the possibility are referred to as characteristic value.If the characteristic value of certain Feature Words is lower than threshold value, abandon;
Otherwise retain the specific word.By sorting to characteristic value, Feature Words corresponding to characteristic value in the top is selected to be added to spy
It levies in dictionary, thus constitutive characteristic dictionary.
Fig. 7 shows the signal of the comprising modules of the intelligent sound data-requirements extraction system according to this specification embodiment
Figure.With reference to Fig. 7, which includes recording module 301, speech recognition module 302, corpus preprocessing module 303, word segmentation module
304 and demand estimation module 305.Recording module 301 is for obtaining voice data.Speech recognition module 302 is used for voice number
According to being pre-processed, text data is obtained;Corpus preprocessing module 303 is for segmenting each sentence in text data
Processing;Word segmentation module is used to obtain sentence corresponding to text data by the text data after word segmentation processing compared with feature lexicon
Vector;Demand estimation module 304 for judge from the sentence vector of input the corresponding short sentence of this vector be statement of requirements or
Non- statement of requirements.For the function and implementation process of module each in device, it can refer to and accordingly be walked in former approach embodiment
Rapid implementation process.For the sake of simplicity, it is omitted here details.
Availability deterministic process in the intelligent sound data of the embodiment of the present application is carried out referring to Fig. 8 to Figure 10 detailed
Explanation.
Fig. 8 shows the flow chart of the intelligent sound data availability deterministic process according to the embodiment of the present application.
In step 501, voice data is obtained.Specifically, step 501 by microphone or microphone array, recorder,
The sound pick-up outfits such as recording pen obtain voice signal.
In step 502, the voice data of acquisition is pre-processed, obtains the text data for analysis.Pretreatment is
Voice data is loaded into memory by finger, and additions and deletions as needed change a process of part of word.Pretreatment includes: to know
Not, refer to and voice data is identified as lteral data, to form text data;Split, refer to according in text data by fullstop
The punctuation mark that interval is indicated in the long sentence of separation, is split as short sentence for the long sentence;And purification, refer to and removes the text
Invalid content unrelated with interview content in original voice data in data.In some embodiments, pretreatment may include unrelated
Filtered symbol and the filtering of non-core ingredient.
In step 503, word segmentation processing is carried out to each short sentence in pretreated text data.In some embodiments
In, the part of speech of each word can be labeled.
In step 504, it is based on feature lexicon, sentence vector corresponding to each short sentence in the text data after obtaining participle.
In some embodiments, feature lexicon be by Feature Words and eigenvalue cluster at two-dimensional matrix.Feature Words are to have very in corpus
Big possibility expresses the word that target language material can be positioned as to availability.What characteristic value was characterized that word is positioned as availability can
The mathematical expression of energy property.
In some embodiments, it for each word in the short sentence after participle, is looked into corpus and feature lexicon respectively
It looks for.If the word did not occur in corpus, lookup result 0;If the word occurred but did not had in corpus
Occurred in feature lexicon, then lookup result is 1;If the word appears in feature lexicon, lookup result 2;Thus
Form sentence vector corresponding to the short sentence.In some embodiments, availability judgment models may include multiple rules, for meeting
The sentence of one or more rules, can be judged as availability sentence, and output result is 1;Otherwise, it can be judged as non-
Availability sentence, output result are 0.
In step 505, sentence vector is input to availability judgment models.In some embodiments, availability judgment models
It is configured as exporting judging result according to the semantic unit vector (for example, sentence vector) of input.In some embodiments, judge to tie
Fruit, which can be, indicates whether sentence belongs to the value of availability sentence.
In step 506, according to the output of availability judgment models as a result, determining which kind of the short sentence is classified as.One
In a little embodiments, output result can be 1 or 0.When exporting result is 1, which is divided into availability sentence;When output is tied
When fruit is 0, which is judged as non-availability sentence.In some embodiments, output result can be such as under type: this is short
Sentence a possibility that being divided into availability sentence e.g. 0.7, a possibility that being divided into non-availability sentence e.g. 0.3, therefore,
Final the being determined to be available property sentence of the short sentence.
In some embodiments, all sentences for expressing availability are subjected to clustering processing.The detailed step of clustering processing
Suddenly it is referred to the description of Examples below.
Below in connection with the building and/or process of Fig. 9 detailed description availability judgment models and feature lexicon.
Fig. 9 shows the building and/or process of availability judgment models and feature lexicon according to this specification embodiment
Flow chart.In some embodiments, the building and/or process are by manually obtaining.In some embodiments, the building and/or
Process is completed by computer program.
In step 601, voice data is obtained.In step 602, voice data is pre-processed.Step 601 is to step
602 is similar to step 502 with above-mentioned steps 501.In step 603, feature is carried out to text data using expert's mark database
Mark.For example, the text data after mark may is that X: sentence.Wherein, X can be 0 (indicating non-availability sentence) or 1 (table
Show availability sentence).In step 604, the text data after mark is segmented.Step 604 is similar with above-mentioned steps 504.
Text data after step 605, participle is input into classifier for training availability judgment models.One
In a little embodiments, classifier uses support vector machines (Support Vector Machine, SVM) classifier.SVM classifier is
A kind of two disaggregated models of classics, for feature, significantly classification effect is clearly, biggish for granularity available for it
Property analytical effect is significant.The basic model of SVM classifier, which is defined on feature space, keeps the distance between two classes maximum
Linear classifier.SVM classifier can also include kernel function, and kernel function has the work that low-dimensional data is converted to high dimensional data
With.By introducing kernel function, inseparable problem can be converted into separable problem, this makes it substantial non-
Linear classifier can be suitable for the data of linearly inseparable.
In some embodiments, this method selects linear kernel function, for example, k (x1, x2)=x1Tx2.In some embodiments
In, other kernel functions, such as polymerization kernel function, radial base core letter can be selected according to the size and other factors of text data
The Non-linear Kernels functions such as number.
In some embodiments, for this method by a series of calculating processes such as Lagrange duality operations, finally obtaining can
With property judgment models and series of features word.Availability judgment models are the executable algorithms of a set of computer, and input is sentence
Vector exports as classification belonging to this.
In some embodiments, pass through the dependence of identification sentence, that is, dependency grammar is carried out to sentence
(Dependency Parsing) analysis, to construct availability judgment models.Specifically, the ingredient of sentence can be divided into subject,
Predicate, object, attribute, the adverbial modifier, complement etc..Relationship between each ingredient mainly has subject-predicate relationship (SBV), dynamic guest's relationship
(VOB), relationship (ATT) in fixed, relationship (ADV), dynamic benefit relationship (CMP), coordination (COO) etc. in shape.Dependency grammar
(Dependency Parsing, DP) refers to disclosing its syntax knot by the dependence in metalanguage unit between ingredient
Structure, that is, the grammatical item in identification sentence, and analyze the relationship between these ingredients.Specifically, interdependent syntactic analysis identifies sentence
" Subject, Predicate and Object determines shape benefit " these grammatical items in son, and analyze the relationship between each ingredient.
The step of availability discourse analysis most critical, is how to express the opinion of evaluation holder in a structured way,
Can general<evaluation object, evaluation phrase>be considered as an evaluation unit.Evaluation object can be nominal phrase, verb character phrase with
And simple sentence type phrase, it is mainly in the verb position of subject position, object position, structure of complementation.Evaluation phrase is then mainly in
Predicate position, V-O construction verb position and complement position.Evaluation phrase shows as one group of phrase continuously occurred, can be with
It is to be composed of degree adverb, negative adverb and evaluating word, is also possible to nominal phrase, adjective phrase, verb character
Phrase or by three kinds of the front simple sentence type phrase being composed.As long as recalled using corresponding rule the subject-predicate language in sentence,
Dynamic object, dynamic complement, usability evaluation unit can be extracted.
In some embodiments, there are SBV in sentence, and the part of speech of dependence centering qualifier is noun, abbreviation
In the case that the part of speech of word or alien word and core word is verb, if only existing SBV in sentence, evaluation object and evaluation are short
Language is in subject, predicate position, the i.e. qualifier of extraction<SBV respectively, the core word of SBV>be used as usability evaluation unit, example
Such as,<stability, improve>.If there are SBV and VOB in sentence, wherein the core word of SBV is the core word of VOB, then extract <
The qualifier of the qualifier of SBV, the core word of VOB and VOB>be used as usability evaluation unit, for example,<evaluation frame, without feature
>.If there are SBV and CMP in sentence, wherein the core word of SBV is the core word of CMP, then the qualifier of extraction < SBV, CMP
Core word and CMP qualifier>, for example,<the page, load is slow>.
In some embodiments, there are SBV in sentence, and the part of speech of dependence centering qualifier is noun, abbreviation
In the case that the part of speech of word or alien word and core word is the word or idiom of adjective, modification noun, if only existed in sentence
SBV, then evaluation object and evaluation phrase are in subject, predicate position, the i.e. qualifier of extraction < SBV, the core of SBV respectively
Word>, for example,<interface, good-looking>.If there are SBV and COO in sentence, and the core word of SBV is the core word of COO, and COO is closed
Be centering qualifier part of speech be adjective, modification noun word or idiom, then the qualifier of extraction < SBV, the core of SBV
The qualifier of word and COO>, for example,<blade-rotating, slow and card>.If only existing VOB in sentence, and the word of relationship centering qualifier
Property is that the part of speech of noun, abbreviation or alien word and core word is verb, then the qualifier of extraction < VOB, the core word of VOB
>, for example,<evaluation frame, do not have>.
By above-mentioned processing, it finally can get a series of evaluation objects and evaluation phrase, usability evaluation unit, Yi Jike
With property judgment models.
In step 607, method passes through series of features word construction feature dictionary.In some embodiments, it is examined using card side
It tests (Chi-Squared Test) and carrys out construction feature dictionary.Chi-square Test be by X2 be distributed based on one kind common assume inspection
Proved recipe method, its null hypothesis H0 is: observed frequency and expecterd frequency do not have difference.Detailed process is: assume initially that H0 is set up,
X2 value is calculated based on this premise, which indicates the departure degree between observed value and theoretical value.It can according to X2 distribution and freedom degree
To determine the probability P for obtaining current statistic amount and more extreme case in the case where H0 assumes to set up.If P value very little, explanation
Observed value and theoretical value departure degree are too big, should refuse null hypothesis, and there were significant differences between two comparative quantities, have independent
Property;Otherwise it cannot refuse null hypothesis, that is, must not believe that between two comparative quantities have independence.Chi-square Test is in natural language
It is chiefly used in carrying out feature extraction in speech processing.
In some embodiments, the influence degree that these Feature Words judge availability, that is, Feature Words energy are calculated
A possibility that being enough decided to be availability, the possibility are referred to as characteristic value.If the characteristic value of certain Feature Words is lower than threshold value, lose
It abandons;Otherwise retain the specific word.By sorting to characteristic value, Feature Words corresponding to characteristic value in the top is selected to be added to
In feature lexicon, thus constitutive characteristic dictionary.
Figure 10 shows the comprising modules of the intelligent sound data availability extraction system according to this specification embodiment
Schematic diagram.With reference to Figure 10, which includes recording module 701, speech recognition module 702, corpus preprocessing module 703, participle
Module 704 and availability judgment module 705.Recording module 701 is for obtaining voice data.Speech recognition module 702 for pair
Voice data is pre-processed, and text data is obtained;Corpus preprocessing module 703 be used for each sentence in text data into
Row word segmentation processing;Word segmentation module is used for by the text data after word segmentation processing compared with feature lexicon, and it is right to obtain text data institute
The sentence vector answered;Availability judgment module 704 from the sentence vector of input for judging that the corresponding short sentence of this vector is available
Property sentence is also non-availability sentence.It, can be real with reference to former approach for the function and implementation process of module each in device
Apply the implementation process of corresponding steps in example.For the sake of simplicity, it is omitted here details.
Above to interview corpus is analyzed by neural network model and is obtained demand information and availability analysis with
And further sentiment analysis is illustrated, below to being analyzed by neural network model interview corpus and obtain feelings
Feel information to further illustrate.
In some embodiments, for the extraction of emotion information, interview corpus can be inputted into nervus opticus network model,
It extracts and reflects in interview corpus by the sentence of the sentence of interview person's polarity emotion and neutral emotion;And it is the sentence of polarity emotion is defeated
Enter third nerve network model, obtains positive emotion sentence and negative sense emotion sentence in the sentence of polarity emotion.Similarly, may be used
To use two concatenated TextCNN classifiers, wherein one can be used as emotion and extract model, realize polarity emotion sentence
With two classification of neutral emotion sentence, another can be used as sentiment classification model to realize to the further of polarity emotion sentence
Ground classification.
The analytic process that emotion information in interview corpus extracts is further described referring to Figure 11.
Figure 11 shows the flow chart of the voice content analysis method according to shown in some embodiments of the present application.Process
800 may be embodied as one group of instruction in the non-transitory storage medium in voice content analytical equipment.Voice content analysis dress
One group of instruction can be executed and can correspondingly execute the step in process 800 by setting.
The operation of shown process 800 presented below, it is intended to be illustrative and be not restrictive.In some embodiments
In, process 800 can add one or more operation bidirectionals not described when realizing, and/or delete one or more herein
Described operation.In addition, shown in fig. 8 and operations described below sequence limits not to this.
In 810, the available voice data of voice content analytical equipment.
The voice data can be recording or video.In some embodiments, the voice data can be interview
Recording or video.For example, the voice data can be the interview recording of B to C.The interview is recorded
Interview person's recording and interviewee's recording.
In 820, voice content analytical equipment can be based on the voice data, obtain corresponding text data.
Specifically, voice content analytical equipment can carry out speech recognition to the voice data, by the voice data
It is converted into urtext, then the urtext is converted to and meets sentiment analysis model (in step 830) data format and want
The text data asked.
In some embodiments, voice content analytical equipment can the only corresponding text data of fetching portion voice data.
For example, recording for interview, voice content analytical equipment can only obtain the corresponding text data of interviewee's recording.In turn, language
Sound content analysis devices can more accurately analyze the emotion of interviewee's (for example, product user).
In some embodiments, the text data is sentence vector.The sentence vector can be one or more dimensions vector, language
Sound content analysis devices can obtain sentence vector by following steps.
Step 1, the available urtext of voice content analytical equipment, the i.e. result of voice data speech recognition.
Step 2, to complete sentence each in the urtext, voice content analytical equipment can will be described complete
Sentence subordinate sentence, obtain at least one short sentence.
In some cases, voice content analytical equipment can be complete to this by the punctuation mark in a complete sentence
Whole sentence carries out subordinate sentence, such as comma, pause mark, colon, branch.As an example, voice content analytical equipment can be by one
Complete sentence " I is delithted with the size and color of this mobile phone, but the setting of the volume control key of this mobile phone very not
Rationally, I thinks that the volume control key of mobile phone is arranged in the more convenient user's operation in right side ", it is divided into three short sentences.Described three
Short sentence according in the complete sentence comma carry out subordinate sentence, respectively " I is delithted with the size and color of this mobile phone ",
" but the setting of the volume control key of this mobile phone is very unreasonable " and " I thinks that the volume control key of mobile phone is arranged on right side
More convenient user's operation ".It is to be understood that a complete long sentence is divided by multiple short sentence by subordinate sentence,
The complexity for reducing sentence is more advantageous to the analysis of sentence, can increase the accuracy of the analysis of sentence.
Step 3, voice content analytical equipment can determine the sentence vector of at least one short sentence.
Specifically, for each short sentence at least one described short sentence, voice content analytical equipment can be based on
Word2vec model determines the term vector of the short sentence;The term vector for being then based on the short sentence determines the sentence vector.It is described
Word2vec model can be trained by user oneself, be also possible to the included word2vec model of hanlp kit.
In some cases, voice content analytical equipment determines that the process of term vector can wrap based on Word2vec model
Include: (1) participle/stem extracts and lemmatization, for example, segmented for Chinese corpus, and for English corpus,
It does not need then to segment, but since English is related to various tenses, so to carry out stem extraction and lemmatization to it;(2) structure
Word making allusion quotation and statistics word frequency, for example, needing to be traversed for all texts in this step, finding out the word occurred, and unite
Count the frequency of occurrences of each word;(3) tree structure is constructed, for example, the probability of occurrence according to each word constructs Huffman (Huffman)
Tree, so that all classification are all in leaf node;(4) binary code where node is generated, wherein binary code reflects node
Position in tree, so as to find corresponding leaf node step by step from root node according to coding;(5) each nonleaf node is initialized
Intermediate vector and leaf node in term vector, for example, each node in Hofman tree all store an a length of m to
Amount, but leaf node is different with the meaning of the vector in non-leaf node, and specifically, what is stored in leaf node is the term vector of each word,
It is the input as neural network, rather than what is stored in leaf node is intermediate vector, the ginseng corresponding to hidden layer in neural network
Number determines classification results with input together;(6) training intermediate vector and term vector, for example, in the training process, model can be assigned
These abstract one suitable vectors of intermediate node are given, this vector represents its corresponding all child node, for CBOW
The term vector of multiple words near centre word is added the input as system first by model, and according to centre word aforementioned
The binary code generated in step step by step carry out classification and according to classification results training intermediate vector and term vector.
In some cases, voice content analytical equipment can determine that the mean value of the term vector of the short sentence is the short sentence
Sentence vector.Certainly, voice content analytical equipment can also string together all term vectors of the short sentence as sentence vector.
In some embodiments, voice content analytical equipment obtains sentence vector and may further include to the urtext
It is pre-processed.The pretreatment includes being analyzed the vocabulary of urtext and being deleted unnecessary vocabulary.As an example,
Voice content analytical equipment can delete at least one of modal particle, stop words and messy code in the urtext.
The modal particle is to indicate the toneFunction word, for example,,,.The stop words indicates at information
Ignore certain words or word automatically during reason, can be screened according to information processing purpose.For example, for product interview,
The stop words can refer to that keyword extraction result neutralizes the phrase that actual demand is not consistent.The messy code refers in speech recognition
Unrecognized part.Voice content analytical equipment can delete original text based on the tone vocabulary constructed in advance, deactivated vocabulary
Modal particle, stop words in this.
In 830, the text data can be inputted trained sentiment analysis model, institute by voice content analytical equipment
Stating trained sentiment analysis model includes that trained emotion extracts model and trained sentiment classification model.
The trained emotion, which extracts model, can extract polarity emotion text data, the trained emotional semantic classification
Model can classify to polarity emotion text data.The trained emotion extracts model and the trained emotion
Disaggregated model is obtained by initial model training, and specific training process is as follows.
Model is extracted for trained emotion, voice content analytical equipment can be obtained by following steps:
Step 1 obtains the training data marked.The training data marked includes the middle disposition being marked
Feel text data and non-neutral emotion text data.
Here neutral emotion text data refer to that the emotion of the text data expression is neutral, such as " I feels
For first aspect ", " general ".The non-neutral emotion text data, also known as polarity affection data, including positive emotion
Text data and negative sense text emotion data refer to that the emotion of the text data expression is more strong relative to neutral emotion.
For example, positive emotion text data may include " liking ", " it can bring some conveniences to me to a certain extent ", " this
Many times have been saved in a design ".In another example negative sense emotion text data may include that " I is will not to be done by him in fact
", " this color makes people uncomfortable ", " nobody can select this mode ".It is of course also possible to use other classification standards pair
Emotion is classified, the classification and the sentiment analysis method for being adapted to the classification still fall within the application scope of the claimed it
It is interior.
In some cases, the training data marked can be marked by expert, can also be by user annotation.Training
Data are marked by expert, and annotation results accuracy is high;For training set by user annotation, annotation results are more personalized, are applicable in
In demands of individuals.
In some cases, the training set marked is the text data of specific area.It is to be understood that
It can be dedicated for the voice data of the specific area by the sentiment classification model that the text data training of the specific area obtains
Sentiment analysis.
The training data marked is inputted initial emotion and extracts model by step 2.The initial emotion mentions
Modulus type is initial neural network model, such as TextCNN.The initial emotion extracts model and contains multiple features and more
A initial parameter.
Multiple features that model is extracted according to emotion can make the feature vocabulary that emotion extracts model.The Feature Words
Contained in table it is multiple indicate polarity emotion (positively and negatively emotion) vocabulary, such as " liking ", " having deep love for ", " detest ", " beg for
Detest ".
Step 3 determines the training when the initial emotion, which extracts model, reaches the condition of convergence after training
Good emotion extracts model.
During training, emotion extracts model and judges that it exports the excellent of result according to the training data marked
It is bad, and then initial parameter is continuously adjusted, constantly optimum results, the emotion after training extract model and reach convergence item
Part.The condition of convergence can be less than first threshold for loss function or cycle of training is greater than second threshold, first threshold
Value and second threshold rule of thumb can be manually arranged.
For trained sentiment classification model, voice content analytical equipment can be obtained by following steps:
Step 1 obtains the training data marked.The training data marked includes the positive feelings being marked
Feel text data and negative sense emotion text data.
In some cases, the training data marked can be marked by expert, can also be by user annotation.Training
Data are marked by expert, and annotation results accuracy is high;For training set by user annotation, annotation results are more personalized, are applicable in
In demands of individuals.
In some cases, the training set marked is the text data of specific area.It is to be understood that
It can be dedicated for the voice data of the specific area by the sentiment classification model that the text data training of the specific area obtains
Sentiment analysis.
The training data marked is inputted initial sentiment classification model and is trained by step 2.It is described initial
Sentiment classification model be initial neural network model, such as TextCNN.The initial emotion extraction model contains multiple
Feature and multiple initial parameters.
Multiple features that model is extracted according to emotion can make the feature vocabulary that emotion extracts model.The Feature Words
Contained in table it is multiple indicate polarity emotion (positively and negatively emotion) vocabulary, such as " liking ", " having deep love for ", " detest ", " beg for
Detest ".
Step 3 determines the training when the initial sentiment classification model reaches the condition of convergence after training
Good sentiment classification model.
During training, sentiment classification model judges that it exports the excellent of result according to the training data marked
It is bad, and then initial parameter is continuously adjusted, constantly optimum results, the sentiment classification model after training reach convergence item
Part.The condition of convergence can be less than first threshold for loss function or cycle of training is greater than second threshold, first threshold
Value and second threshold rule of thumb can be manually arranged.
In 840, voice content analytical equipment can extract model for the textual data by the trained emotion
According to being divided into polarity emotion text data and neutral emotion text data.
Specifically, voice content analytical equipment can extract model to polarity emotion text data by trained emotion
Different marks is carried out with neutral emotion text data, and then the two is classified.As an example, emotion extraction model can be by pole
Disposition sense text data is labeled as(neutral emotion text data markers are " 2 " by i.e. non-2).Lower section lists emotion and mentions
The exemplary output result of modulus type:
“It can bring some conveniences to me to a certain extent ";
“I will not be done by him in fact ";
" 2 I feel for first aspect ".
In some embodiments, the neutral emotion text data that voice content analysis model determines can be used for analyzing described
Relevant user demand in text data (step 820).For example, above-mentioned voice data is product interview recording, the object of interview
It is user, the content of interview is view of the user to product.At this point, the demand of the user is related in the text data
User demand.It, can be relevant with reference to other in the application using the description of neutral emotion text data analysis user demand
Description.
In 850, voice content analytical equipment can be by the trained sentiment classification model by the polarity feelings
Sense text data is divided into positive emotion text data and negative sense emotion text data.
Specifically, voice content analytical equipment can be by trained sentiment classification model to positive emotion text data
Different marks is carried out with negative sense emotion text data, and then the two is classified.As an example, sentiment classification model can will just
It is labeled as " 1 " to emotion text data, is " 0 " by negative sense emotion text data markers.Lower section lists sentiment classification model
Exemplary output result:
" 1 it can bring some conveniences to me to a certain extent ";
" 0 in fact I will not be done by him ".
In 860, voice content analytical equipment can be according to the positive emotion text data and negative sense emotion text
Notebook data obtains sentiment analysis result.
In some embodiments, by each short sentence in sentence each in above-mentioned voice data after classification, in voice
Sentiment analysis result can be determined according to the ratio of positive emotion text data and negative sense emotion text data by holding analytical equipment.
As an example, the ratio that positive emotion text data account for all voice data (that is, its corresponding text data) is 65%, negative sense
The ratio that emotion text data account for all voice data (that is, its corresponding text data) is 10%, neutral emotion text data
The ratio for accounting for all voice data (that is, its corresponding text data) is 25%.So, voice content analytical equipment it can be concluded that
The Sentiment orientation of voice data is forward direction.
In some embodiments, voice content analytical equipment can be respectively to positive emotion text data and negative sense emotion text
The particular content of notebook data is analyzed, so that it is determined that sentiment analysis result.As an example, voice content analytical equipment is to product
The voice data of interview is analyzed, and positive emotion text data and negative sense emotion text data are obtained.Voice content analysis dress
Setting further to analyze the advantages of obtaining product to the forward direction emotion text data, analyze the negative sense emotion text data
The shortcomings that product.The advantages of product and disadvantage can be used as sentiment analysis result.
In some embodiments, the voice content analysis method may further include: determine the voice data
Field belonging to content, and the field according to belonging to the content of the voice data is determining and transfers above-mentioned trained emotion point
Analyse model (for example, trained emotion extracts model, trained sentiment classification model).
For example, voice content analytical equipment can determine neck belonging to the content of corresponding voice data according to text data
Domain.As an example, voice content analytical equipment can extract keyword to text data, text data pair is determined according to keyword
Field belonging to the content for the voice data answered, such as household electrical appliance, sport.
For another example voice content analytical equipment, which can receive user, inputs field belonging to the content for determining voice data.
User's input includes field belonging to the content of the voice data.
In some embodiments, corpus after pretreatment can first be inputted to demand estimation model to obtain demand class language
Non- demand class corpus is then inputted sentiment classification model again to obtain polarity corpus and neutral language by material and non-demand class corpus
Material, is finally classified as front corpus and negative corpus for polarity corpus again.In the above-described embodiments, non-demand class language is being obtained
After material, non-demand corpus and its copy can be inputted into sentiment classification model and availability disaggregated model respectively, thus from feelings
Sense disaggregated model is to obtain polarity corpus and neutral corpus and obtain availability corpus and non-availability from availability disaggregated model
Corpus.
In some embodiments, corpus after pretreatment can first be inputted to sentiment classification model to obtain polarity corpus
With neutral corpus, neutral corpus is then inputted into demand estimation model to obtain demand class corpus and non-demand class corpus.Upper
It states in embodiment, after obtaining non-demand class corpus, non-demand corpus can be inputted into availability disaggregated model, can be used
Property corpus and non-availability corpus.
In some embodiments, corpus after pretreatment and its copy can be inputted into demand estimation model and feelings respectively
Disaggregated model is felt, to obtain demand class corpus and non-demand class corpus from demand estimation model, and from sentiment classification model to obtain
Polarity corpus and neutral corpus are obtained, polarity corpus is finally classified as front corpus and negative corpus again.
In some embodiments, corpus after pretreatment and its copy can be inputted to demand estimation model and can respectively
With property disaggregated model, to obtain demand class corpus and non-demand class corpus from demand estimation model, and from availability disaggregated model
To obtain availability corpus and non-availability corpus, then again by non-demand class corpus input sentiment classification model to obtain polarity
Corpus and neutral corpus, are finally classified as front corpus and negative corpus for polarity corpus again.
In some embodiments, corpus after pretreatment and its copy can be inputted to sentiment classification model and can respectively
With property disaggregated model, to obtain polarity corpus and neutral corpus from sentiment classification model, and from availability disaggregated model to obtain
Neutral corpus is then inputted demand estimation model again to obtain demand class corpus and non-by availability corpus and non-availability corpus
Demand class corpus.In the above-described embodiments, non-availability corpus can be inputted into demand estimation model.In the above-described embodiments,
Non- availability corpus and neutral corpus can be merged into input demand estimation model.
In some embodiments, corpus after pretreatment and its first authentic copy and triplicate can be inputted respectively needs
Judgment models, sentiment classification model and availability disaggregated model are asked, to obtain demand class corpus and non-need from demand estimation model
Class corpus is sought, obtains polarity corpus and neutral corpus from sentiment classification model, and from availability disaggregated model to obtain availability
Corpus and non-availability corpus.
In some embodiments, it both may include user in polarity corpus to the polarity emotion information of product, also can wrap
Containing user to the otherwise polarity emotion information except product, and it also can analyze out user to product in availability corpus
Polarity emotion information.In some embodiments, polarity can be extracted by the way that availability corpus and polarity corpus are sought union
The polarity emotion information unrelated with product in corpus.It in some embodiments, can be defeated by front corpus and negative corpus difference
Enter availability disaggregated model, therefrom to filter out positive availability corpus and negative availability corpus respectively.In some embodiments
In, non-demand category information is also indicated as neutral corpus.
In some embodiments, it can construct and/or train by the method for building and/or Training Requirements Analysis model
Sentiment analysis model and/or availability judgment models.It in some embodiments, can be by constructing and/or availability being trained to sentence
The method of disconnected model constructs and/or trains sentiment analysis model and/or demand estimation model.It in some embodiments, can be with
Availability judgment models and/or demand estimation are constructed and/or trained by constructing and/or training the method for sentiment analysis model
Model.
It is described in detail referring to the sentence clustering method of Figure 12-17 pairs of the embodiment of the present application.
Figure 12 is the flow chart according to the semantic primitive clustering method of one embodiment of the application.In this embodiment,
The method for generating interview report of the application can be by loading app realization on the terminal device, semantic primitive clustering method
Realization can be realized by the processor of terminal device.This method can store in memory, and terminal device is receiving
This method is executed when generating the triggering command of interview report.
As shown in figure 12, semantic primitive clustering method includes step 2000: obtaining multiple semantic primitives.Semanteme can be logical
The expressed meaning come out of combination of the units at different levels and these units of language is crossed, in other words, semanteme is the language by language
Element, word, phrase, sentence, sentence group be expressed to be come out.In this application, semantic primitive not only can be morpheme, word, phrase, sentence
Son, sentence group can also be that letter, number, symbol, movement etc. can be configured as having certain semantic or make one as needed
Any object for generating the association to certain semantic, is also possible to above-mentioned one or more any combination.In some embodiments
In, semantic primitive be selected from any type of corpus, for example, audio corpus, corpus of text, video corpus, with computer language
The corpus etc. of expression.In some embodiments, semantic primitive can come from the audio and/or text of the interview being described above report
Original text.In some embodiments, semantic primitive may include the interested keyword of one or more users.In some embodiments
In, semantic primitive can be the morpheme comprising user demand, word, phrase, sentence, sentence group, letter, number, symbol, movement etc.,
In this case, for example, a semantic unit can be in short " I wants a mobile phone ", it is also possible to a word " hand
Machine ".In some embodiments, semantic primitive can be the morpheme for containing feeling polarities, word, phrase, sentence, sentence group, word
Mother, number, symbol, movement etc., wherein the polarity (such as, positive, negative sense) of emotion indicates user to the hobby journey of certain an object
Degree, in this case, for example, a semantic unit can be in short " I likes touch-screen mobile phone ".In some embodiments
In, semantic primitive can be the one or more words or sentence that feeling polarities classification has been carried out at sentiment analysis model.
In some embodiments, semantic primitive can be the one or more that demand classification has been carried out at Requirements Analysis Model
Word or sentence.In some embodiments, the corpus clustered, which can be, is classified as positive emotion via sentiment classification model
Corpus set.In some embodiments, the corpus clustered, which can be, is classified as negative sense feelings via feeling polarities analysis model
The corpus set of sense.In some embodiments, the corpus clustered, which can be, to be classified as via feeling polarities analysis model
The corpus set of disposition sense.In some embodiments, the corpus clustered can be to be classified as via availability disaggregated model
The corpus set of usability evaluation.In some embodiments, the corpus clustered can be to be judged by demand estimation module
For the corpus set of non-usability evaluation.In some embodiments, the corpus clustered can be by demand estimation module
It is judged as the corpus set of demand.In some embodiments, the corpus clustered can be to be judged by demand estimation module
For the corpus set of non-demand.In some embodiments, the corpus clustered can be in above-mentioned corpus set
One or more combinations.In some embodiments, the corpus clustered may be pretreated, such as, keyword identification,
Keyword extraction, non-key word removal, punctuate identification etc..
As shown in figure 12, semantic primitive clustering method further includes step 4000: determining one based on the multiple semantic primitive
A or multiple cluster centres.The process that the set of physics or abstract object is divided into the multiple classes being made of similar object is claimed
For cluster.It is the set of one group of data object, these objects and the same cluster by cluster operation cluster (or cluster) generated
In object it is similar to each other, with other cluster in object it is different.Cluster centre is a most important object in cluster, most
The cluster can be represented and be best able to explain other objects in the cluster.For example, cluster centre sentence expresses this to a certain extent
The theme or core concept of interview.In some embodiments, one clusters only one cluster centre.In some embodiments,
Cluster centre can be the one or more semantic primitives selected from multiple semantic primitives, each cluster centre calculate its with
It is in other words, similar at this as references object when similarity between other semantic primitives in the multiple semantic primitive
It spends in calculating process, the references object needs the calculating each similarity of progress between other semantic primitives.
In some embodiments, include: based on the step 4000 of the determining one or more cluster centres of multiple semantic primitives
One or more cluster centre is determined from multiple semantic primitives by AP algorithm.AP (affinity propagation) method
Also known as affinity propagation algorithm, wherein at any point in time, the size of each information is reflected when previous number strong point is selected
Select affinity of another data point as its cluster centre.In AP algorithm, all data points are all as potential cluster
Center (also known as cluster center), and line constitutes a network to data point between any two, and each data point is considered as one
Network node.AP algorithm calculates the cluster of each sample by message (the i.e. Attraction Degree and degree of membership) transmitting on each side in network
Center, wherein Attraction Degree refers to that the first data point is suitable as the degree of the cluster centre of the second data point, and degree of membership refers to the second number
Strong point selects the first data point as the appropriateness of its cluster centre.In other words, AP algorithm along network edge by passing
Transmit information with returning until one group of good cluster center occur and generating corresponding cluster.
In some embodiments, include: based on the step 4000 of the determining one or more cluster centres of multiple semantic primitives
Each of the multiple semantic primitive is all determined as cluster centre.
In some embodiments, include: based on the step 4000 of the determining one or more cluster centres of multiple semantic primitives
One or more cluster centre is determined from multiple semantic primitives based on the similarity of multiple semantic primitives between any two.Such as preceding institute
It states, cluster refers to similar object (such as, with the semantic primitive of similar semantic) to be divided by the method for static classification
Different groups or more subsets, so that the member object in the same group or subset all has certain similarity.?
In some embodiments, similarity refers to the similar degree of two difference semantic primitives, can show as the two different languages
The distance between adopted respective mathematical character of unit, such as, Euclidean distance, manhatton distance, Infinite Norm distance, geneva away from
From, COS distance, Hamming distance etc..For example, can be calculated by HanLP (Han Language Processing) external member
Similarity between two semantic primitives, wherein HanLP is a series of Java kit of models and algorithm composition, for popularizing
The application of natural language processing in production environment.
Figure 13 is to determine one based on the similarity of multiple semantic primitives between any two according to one embodiment of the application
The flow chart of a or multiple cluster centres.
As shown in figure 13, one or more cluster centres are determined based on the similarity of the multiple semantic primitive between any two
Including step 4200: successively choosing each of multiple semantic primitives and be used as candidate semantic unit.
As shown in figure 13, one or more cluster centres are determined based on the similarity of the multiple semantic primitive between any two
It further include step 4400: to each candidate semantic unit: calculating separately each candidate semantic unit and the multiple semanteme
Similarity between each of remaining semantic primitive in unit, and if exist extremely in the remaining semantic primitive
A few similarity is higher than the semantic primitive of predetermined threshold, then each candidate semantic unit is determined as cluster centre.
Figure 14 is to calculate separately each candidate semantic unit and multiple semantic primitives according to one embodiment of the application
In each of remaining semantic primitive between similarity flow chart.
As shown in figure 14, it calculates separately in the remaining semantic primitive in each candidate semantic unit and multiple semantic primitives
Similarity between each includes step 4420: calculating the candidate semantic vector of each candidate semantic unit.Semantic vector can
Vector to be a semantic primitive indicates.In some embodiments, semantic vector can be digital vectors, symbolic vector, word
Female vector, word vector, term vector, word vector, sentence vector, vector paragraph etc..In some embodiments, term vector can be based on one
A or multiple word vectors, which calculate, to be obtained.In some embodiments, sentence vector can calculate acquisition by term vector based on one or more.
In some embodiments, vector paragraph can calculate acquisition by sentence vector based on one or more.In some embodiments, for same
One semantic primitive can use identical semantic vector at sentiment analysis model and Requirements Analysis Model.In some implementations
In mode, for the same semantic primitive, can at sentiment analysis model and Requirements Analysis Model using it is different it is semantic to
Amount.It in some embodiments, can be at sentiment analysis model and Clustering Model using identical for the same semantic primitive
Semantic vector.It in some embodiments, can be at Requirements Analysis Model and Clustering Model for the same semantic primitive
Use identical semantic vector.It in some embodiments, can be in sentiment analysis model, need for the same semantic primitive
It asks and uses identical semantic vector at analysis model and Clustering Model.In some embodiments, semantic vector is randomly assigned.One
In a little embodiments, each of semantic vector element all represents the degree of association of the semantic primitive in some aspect interested
Or weight.
Figure 15 is the stream according to the candidate semantic vector of each candidate semantic unit of calculating of one embodiment of the application
Cheng Tu.
As shown in figure 15, the candidate semantic vector for calculating each candidate semantic unit includes step 4441: obtaining feature language
Adopted cell list, wherein the Feature Semantics cell list includes one or more features semantic primitive.In some embodiments, feature
Semantic primitive can be letter, number, symbol, word, sentence, paragraph, the article etc. for indicating feeling polarities.In some embodiments
In, Feature Semantics unit can also represent morpheme, word, phrase, sentence, sentence group, the letter, number of some objective attribute of object
Word, symbol, movement etc..In some embodiments, Feature Semantics unit represent the morpheme of the required object of user, word, phrase,
Sentence, sentence group, letter, number, symbol, movement etc..In some embodiments, Feature Semantics unit can mark dictionary from expert
Middle selection, can also be customized according to demand.
As shown in figure 15, the candidate semantic vector for calculating each candidate semantic unit includes step 4442: determining institute respectively
State the degree of association of each candidate semantic unit Yu each Feature Semantics unit.In some embodiments, the degree of association can be the language
The degree for the feeling polarities that adopted unit goes out expressed by some Feature Semantics unit.In some embodiments, the degree of association can be with
It is the desirability that the semantic primitive goes out expressed by some Feature Semantics unit.In some embodiments, the degree of association
It is directly proportional to the frequency that each Feature Semantics unit occurs in each candidate semantic unit.
As shown in figure 15, the candidate semantic vector for calculating each candidate semantic unit includes step 4443: by described every
The degree of association of a candidate semantic unit and each Feature Semantics unit generates the candidate semantic vector.In some embodiments,
The degree of association can be directly proportional to the frequency that each Feature Semantics unit occurs in candidate semantic unit.In some embodiments,
The degree of association can be directly proportional to the tone power of attribute modified in candidate semantic unit to Feature Semantics unit.To obtain
User illustrates to the scene of color preferences, it is assumed that the interested Feature Semantics cell list of user or Feature Semantics unit allusion quotation are including closing
Keyword " red ", " orange ", " yellow ", " green ", " blue ", " white ", " black ", for semantic primitive, " I is not to like very much
Joyous blue, I prefers white, but it is black that I am favorite ", it can analyze out user and affirmative held to " black ", " white "
Attitude holds a negating attitude to " blue ", does not take a stand other colors, and likes " black " and be more than " white ".Calculating language
When adopted vector, positive weight can be assigned to attitude certainly, negative weight is assigned to negative attitude, the imparting 0 of unknown attitude, while with
Different weight sizes indicates the different degree liked.Based on the above principles, if the definition of vector is in the following order:
{ " red ", " orange ", " yellow ", " green ", " blue ", " white ", " black " }, available, the semanteme of the semantic primitive
Vector [0,0,0,0, -0.5,0.5,1].In some embodiments, the selection of the keyword in Feature Semantics cell list, weight
Numberical range and the rule of correspondence of weight and keyword can change according to actual needs.In some embodiments, availability
Above method generation can be used in sentence vector needed for analysis model.In some embodiments, sentence needed for demand estimation model
Above method generation can be used in vector.In some embodiments, sentence vector needed for sentiment analysis model can be used above-mentioned
Method generates.
Figure 16 show according to the candidate semantic of each candidate semantic unit of calculating of another embodiment of the application to
The flow chart of amount.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4445: being described each
Candidate semantic unit distributes identity vector.In some embodiments, it can be assigned for each semantic primitive (for example, sentence) unique
Section ID (paragraph id).Common word is the same, and section ID is also first to be mapped to a vector paragraph ((paragraph
Vector)), though the dimension of the vector paragraph and term vector, two different vector spaces are come from.In a sentence
Or in the training process of document, section ID is remained unchanged, be equivalent to every time predict word probability when, entire sentence is all utilized
The semanteme of son.In forecast period, a section ID is newly distributed to sentence to be predicted, term vector and parameter are kept for the training stage obtain
Parameter constant, after restraining to get arrive sentence to be predicted vector paragraph.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4446: being described each
The sub- semantic primitive vector of each of sub- semantic primitive of one or more in candidate semantic unit distribution.In some embodiments
In, each candidate semantic unit includes multiple sub- semantic primitives, part or all in the multiple sub- semantic primitive is divided
Equipped with corresponding vector (referred to as sub- semantic primitive vector).In some embodiments, candidate semantic unit is sentence, and son is semantic single
Member is the word that the sentence is included, and sub- semantic primitive vector is term vector.In some embodiments, term vector is the instruction in model
Generated during practicing, be a parameter of model, at the beginning of training, term vector is random value, with it is trained into
Row is continuously updated.In some embodiments, it can be encoded to every sub- semantic primitive by one-hot and assign vector.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4447: by the identity
Input scheduled prediction model together with all sub- semantic primitive vectors to export object vector.It in some embodiments, can be with
Words all in sentence are indicated to measurement mean value as the vector of sentence.
In some embodiments, sentiment analysis model and/or cluster needed for term vector use Word2vec language mould
Type generates.In some embodiments, sentence vector needed for availability analysis model is generated using Word2vec.In some embodiments
In, Requirements Analysis Model is generated using Word2vec language model.Language model, which just refers to, to be assumed and is built to natural language
Mould, makes it possible to be expressed natural language with the mode that computer capacity enough understands, core be still context expression and
The modeling of relationship between context and target word.Word2vec is using n-gram model (n-gram model), i.e., false
If a word is only related with n word around, and unrelated with other words in text.Word2vec utilizes the thought of deep learning,
Processing to content of text can be reduced to the vector operation in K dimensional vector space by training, and the phase in vector space
It can be used to indicate the similarity on text semantic like degree.The vector form for the word that Word2vec is obtained then can freely control dimension
Degree.Word2vec is that word-based dimension carries out semantic analysis, after obtaining term vector, need to obtain on base plinth sentence to
Amount, and the semantic analysis ability with context.The substantially process of Word2vec model includes (1) participle/stem extraction and word
Shape reduction, for example, Chinese corpus is segmented, and for English corpus, then it does not need to segment, but due to English
Text is related to various tenses, so to carry out stem extraction and lemmatization to it;(2) construction dictionary and statistics word frequency, for example,
In this step, all texts are needed to be traversed for, the word occurred is found out, and count the frequency of occurrences of each word;(3) it constructs
Tree structure, for example, according to probability of occurrence construction Huffman (Huffman) tree of each word, so that all classification are all in leaf segment
Point;(4) binary code where node is generated, wherein binary code reflects position of the node in tree, so as to according to volume
Code finds corresponding leaf node from root node step by step;(5) word in the intermediate vector and leaf node of each nonleaf node is initialized
Vector, for example, each node in Hofman tree all stores the vector of an a length of m, but in leaf node and non-leaf node
The meaning of vector is different, and it is the input as neural network that specifically, what is stored in leaf node, which is the term vector of each word, rather than
What is stored in leaf node is intermediate vector, corresponding to the parameter of hidden layer in neural network, determines classification results together with input;
(6) training intermediate vector and term vector, for example, in the training process, model can assign these intermediate node one abstract conjunctions
Suitable vector, this vector represents its corresponding all child node, first will be multiple near centre word for CBOW model
The term vector of word is added the input as system, and the binary code that is generated in abovementioned steps according to centre word is step by step
Carry out classification and according to classification results training intermediate vector and term vector.In some embodiments, in sentiment analysis
Word2vec model is trained by user oneself, and the word2vec model carried in cluster using hanlp kit.Some
In embodiment, hanlp is all from for the word2vec model in sentiment analysis and the word2vec model used in cluster
Kit.In some embodiments, for the word2vec model in sentiment analysis and the word2vec mould used in cluster
Type is trained by user oneself.In some embodiments, can by by the identity and all sub- semantic primitive vectors together
Continuous bag of words (Continuous Bag-of-Words (CBOW)) model is inputted to export object vector.For example, CBOW model
Input is the sum of the term vector of n word around the centre word of sentence, and output is the term vector of centre word itself, and wherein n is
Integer greater than 1.For example, in some embodiments, it can be by defeated together by the identity and all sub- semantic primitive vectors
Enter Skip-gram model to export object vector.For example, the input of Skip-gram model is centre word of sentence itself, output
It is the term vector of n word around centre word.In some embodiments, object vector is term vector.In some embodiments,
Term vector can be calculated and be trained by Word2vec tool.
In some embodiments, sentiment analysis model and/or cluster needed for sentence vector generated using Doc2vec.
In some embodiments, sentence vector needed for availability analysis model is generated using Doc2vec.In some embodiments, demand
Analysis model is generated using Doc2vec language model.There are two types of models, respectively distributed memory by Doc2Vec
(Distributed Memory (DM)) model and distributed bag of words (Distributed Bag of Words (DBOW)) model,
Wherein DM model predicts the probability of word in the case where given context and document vector, and DBOW model is in given document vector
In the case where predict document in one group of random word probability.It in some embodiments, can be by by the identity and all
Sub- semantic primitive vector inputs DBOW model together to export object vector.It in some embodiments, can be by by the body
Part and all sub- semantic primitive vectors input DM model together to export object vector.In some embodiments, object vector is
Sentence vector.In some embodiments, sentence vector can be calculated and be trained by Doc2vec tool.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4448: by object vector
It is appointed as the candidate semantic vector.
As shown in figure 14, it calculates separately in the remaining semantic primitive in each candidate semantic unit and multiple semantic primitives
Similarity between each further includes step 4440: calculate separately the candidate semantic vector of each candidate semantic unit with
Similarity between the semantic vector of each of the residue semantic primitive.In some embodiments, the phase of semantic vector
It can be characterized by COS distance between semantic vector or cosine similarity like degree.In some embodiments, cosine is similar
Property predetermined threshold can be 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 or 0.9.In some embodiments, it is semantic to
The similarity of amount can be calculated by HanLP external member.
As shown in figure 12, semantic primitive clustering method further includes step 6000: to one or more of cluster centres into
Row sequence.In some embodiments, to one or more of cluster centres be ranked up including calculate separately it is one or
In semantic primitive corresponding to each of multiple cluster centres and the remaining semantic primitive in the multiple semantic primitive
The similarity of each and the quantity for being higher than the semantic primitive of predetermined threshold based on similarity arrange all cluster centres
Sequence.In some embodiments, semantic primitive corresponding to each of one or more of cluster centres is calculated separately
Similarity between the semantic vector of each of remaining semantic primitive in semantic vector and the multiple semantic primitive.
In some embodiments, calculate separately the sentence vector of sentence corresponding to each of one or more of cluster centres with
Similarity between the sentence vector of each of other sentences, and predetermined threshold is higher than based on the included similarity of cluster
Sentence quantity cluster centre is ranked up.It in some embodiments, can be in the calculating about different cluster centres
Using different predetermined thresholds.In some embodiments, the predetermined threshold used in the sequence step of cluster centre can be with
The predetermined threshold used in the step of determining cluster centre is different.It in some embodiments, can be in one semanteme of every calculating
After similarity between each of unit and remaining semantic primitive, the sequence of cluster centre is once updated.?
In some embodiments, as soon as it can be after every determination or finding a cluster, the similarity which is had is higher than predetermined
The quantity of the semantic primitive of threshold value and the similarity that cluster before has be higher than the quantity of the semantic primitive of predetermined threshold into
Row compares, and is once updated to the sequence of cluster centre based on comparative result.For example, if newly generated cluster has more
More semantic primitive quantity, then before the cluster before being discharged to the significance level of newly generated cluster or priority.One
In a little embodiments, the text of semantic primitive corresponding thereto can be exported after calculating cluster centre.In some implementations
In example, the text of semantic primitive corresponding with the cluster centre to rank the first is only exported.
Figure 17 is the schematic diagram according to the semantic primitive clustering apparatus of one embodiment of the application.As shown in figure 14, language
Adopted unit clustering apparatus includes that semantic primitive securing component 7000, cluster centre determine component 8000, sequencing assembly 9000.
In some embodiments, semantic primitive securing component 7000 is configured as obtaining multiple semantic primitives.In some realities
It applies in example, cluster centre determines that component 8000 is configured as determining in one or more clusters based on the multiple semantic primitive
The heart.In some embodiments, sequencing assembly 9000 is configured as being ranked up one or more of cluster centres.Some
In embodiment, sequencing assembly 9000 is optional.
In some embodiments, cluster centre determines that component includes cluster centre determining module.In some embodiments, gather
Class center determining module is configured as determining from the multiple semantic primitive by AP clustering algorithm one or more of poly-
Class center.In some embodiments, cluster centre determining module is configured as each of the multiple semantic primitive all
It is determined as cluster centre.In some embodiments, cluster centre determining module is configured as based on the multiple semantic primitive two
Similarity between two determines one or more of cluster centres from the multiple semantic primitive.
In some embodiments, cluster centre determining module further comprises candidate semantic unit selection module, similarity
Computing module and cluster centre determining module, wherein candidate semantic unit selection module is configured as successively choosing the multiple
Each of semantic primitive is used as candidate semantic unit, and similarity calculation module is configured as to each candidate semantic unit,
It calculates separately between each of remaining semantic primitive in each candidate semantic unit and the multiple semantic primitive
Similarity, cluster centre determining module is configured as there are at least one similarities to be higher than in the remaining semantic primitive
Each candidate semantic unit is determined as cluster centre when the semantic primitive of predetermined threshold.
In some embodiments, similarity calculation module further comprises candidate semantic vector calculation module and semantic vector
Similarity calculation module, wherein candidate semantic vector calculation module is configured as calculating the candidate of each candidate semantic unit
Semantic vector, semantic vector similarity calculation module are configured to calculate the candidate semantic of each candidate semantic unit
Similarity between the semantic vector of each of vector and the remaining semantic primitive.
In some embodiments, candidate semantic vector calculation module includes that obtain module, the degree of association true for Feature Semantics unit
Cover half block and candidate semantic vector generation module, wherein Feature Semantics unit obtains module and is configured as obtaining Feature Semantics list
First table, wherein the Feature Semantics cell list includes one or more features semantic primitive;Degree of association determining module is configured as
The degree of association of each candidate semantic unit and each Feature Semantics unit, candidate semantic vector generation module quilt are determined respectively
It is configured to generate the candidate semantic vector by the degree of association of each candidate semantic unit and each Feature Semantics unit.
In some embodiments, the frequency that the degree of association and each Feature Semantics unit occur in each candidate semantic unit
It is directly proportional.
In some embodiments, candidate semantic vector calculation module include identity vector distribution module, sub- semantic primitive to
Measure distribution module, object vector computing module, candidate semantic vector assignment module, wherein identity vector distribution module is configured
To distribute identity vector for each candidate semantic unit, sub- semantic primitive vector distribution module is configured as being described each
The sub- semantic primitive vector of each of sub- semantic primitive of one or more in candidate semantic unit distribution, object vector calculate
Module is configured as all sub- semantic primitive vectors of the identity vector sum inputting scheduled prediction model together to export mesh
Vector is marked, candidate semantic vector assignment module is configured as the object vector being appointed as the candidate semantic vector.One
In a little embodiments, identity vector distribution module is optional.In the embodiment there is no identity vector distribution module (for example, mesh
When mark vector is term vector) in, object vector computing module is configured as only that all sub- semantic primitive vectors inputs are scheduled
Prediction model is to export object vector.There are the embodiment of identity vector distribution module (for example, object vector be sentence vector
When) in, object vector computing module can be configured as all sub- semantic primitive vector inputs of identity vector sum are scheduled pre-
Model is surveyed to export object vector.
In some embodiments, ordering element 9000 further comprises similarity calculation module and cluster centre sequence mould
Block, wherein similarity calculation module is configured to calculate corresponding to each of one or more of cluster centres
The similarity of each of remaining semantic primitive in semantic primitive and the multiple semantic primitive, cluster centre sorting module
The quantity for being configured as the semantic primitive for being higher than predetermined threshold based on similarity is ranked up all cluster centres.In some realities
It applies in example, similarity calculation module is additionally configured to calculate separately corresponding to each of one or more of cluster centres
Semantic primitive semantic vector and each of the remaining semantic primitive in the multiple semantic primitive semantic vector it
Between similarity.
Present invention also provides a kind of computer readable storage mediums, wherein wraps in the computer readable storage medium
Program is included, described program is executed when being executed by processor according to previously described semantic primitive clustering method.
In short, the method for generating interview report of the embodiment of the present application, may be implemented based on neural network to interview
The autonomous type of corpus a key generate interview report, wherein interview corpus is converted to text data and the pre- place to interview corpus
Reason, keyword extraction etc. can be can be realized by inputting corresponding triggering command, without handling manually, can save visit
The interview personnel for talking the time of analysis, and being equipped with needed for can reducing, reduce interview cost.
It should be noted that in the description of this specification, reference term " one embodiment ", " is shown " some embodiments "
The description of meaning property embodiment ", " example ", " specific example " or " some examples " etc. means that the embodiment or example is combined to describe
Particular features, structures, materials, or characteristics be contained at least one embodiment or example of the application.In this specification
In, schematic expression of the above terms may not refer to the same embodiment or example.
While there has been shown and described that embodiments herein, it will be understood by those skilled in the art that: not
A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle and objective of the application, this
The range of application is defined by the claims and their equivalents.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used
Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from
Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile
Journey gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the application
System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of application
Type.
Claims (10)
1. a kind of voice content analysis method, which is characterized in that the described method includes:
Obtain voice data;
Based on the voice data, corresponding text data is obtained;
The text data is inputted into trained sentiment analysis model, the trained sentiment analysis model includes training
Emotion extract model and trained sentiment classification model;
Model is extracted by the trained emotion, and the text data is divided into polarity emotion text data and neutral emotion
Text data;
By the trained sentiment classification model by the polarity emotion text data be divided into positive emotion text data and
Negative sense emotion text data;
Sentiment analysis result is obtained according to the positive emotion text data and the negative sense emotion text data.
2. analysis method as described in claim 1, which is characterized in that the text data is sentence vector, and the sentence vector is logical
Cross following steps acquisition:
Obtain urtext;
To complete sentence each in the urtext:
By the complete sentence subordinate sentence, makes pauses in reading unpunctuated ancient writings according to punctuation mark, obtain at least one short sentence;
Determine the sentence vector of at least one short sentence.
3. analysis method as claimed in claim 2, which is characterized in that the sentence vector packet of at least one short sentence described in the determination
It includes:
For each short sentence at least one described short sentence,
The term vector of the short sentence is determined based on word2vec model;
The sentence vector is determined based on the term vector of the short sentence.
4. analysis method as claimed in claim 2, which is characterized in that the term vector based on the short sentence determines the sentence
Vector includes:
The mean value for determining the term vector of the short sentence is the sentence vector.
5. analysis method as claimed in claim 2, which is characterized in that the method further includes:
Delete at least one of modal particle, stop words and the messy code in the urtext.
6. analysis method as described in claim 1, which is characterized in that the trained emotion extracts model and passes through following step
It is rapid to obtain:
Obtain the training data that mark, the training data marked include the neutral emotion text data that have been marked with
Non-neutral emotion text data;
The training data marked is inputted into initial emotion and extracts model;
When the initial emotion, which extracts model, reaches the condition of convergence after training, determine that the trained emotion is extracted
Model.
7. analysis method as described in claim 1, which is characterized in that the trained sentiment classification model passes through following step
It is rapid to obtain:
Obtain the training data that mark, the training data marked include the positive emotion text data that have been marked with
Negative sense emotion text data;
The training data marked is inputted initial sentiment classification model to be trained;
When the initial sentiment classification model reaches the condition of convergence after training, the trained emotional semantic classification is determined
Model.
8. analysis method as described in claim 1, which is characterized in that the method further includes:
Field belonging to the content for determining the voice data according to the text data;
According to field belonging to the content of the voice data, determines and transfer the trained sentiment analysis model.
9. analysis method as described in claim 1, which is characterized in that the method further includes:
User's input is received, the input includes field belonging to the content of the voice data;
According to field belonging to the content of the voice data, determines and transfer the trained sentiment analysis model.
10. a kind of voice content analytical equipment, comprising:
At least one storage equipment, the storage equipment include one group of instruction;And
At least one processor communicated at least one described storage equipment, wherein described when executing one group of instruction
At least one processor makes the analytical equipment perform claim require any method in 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910582433.5A CN110297907B (en) | 2019-06-28 | 2019-06-28 | Method for generating interview report, computer-readable storage medium and terminal device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910582433.5A CN110297907B (en) | 2019-06-28 | 2019-06-28 | Method for generating interview report, computer-readable storage medium and terminal device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110297907A true CN110297907A (en) | 2019-10-01 |
CN110297907B CN110297907B (en) | 2022-03-08 |
Family
ID=68029588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910582433.5A Active CN110297907B (en) | 2019-06-28 | 2019-06-28 | Method for generating interview report, computer-readable storage medium and terminal device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110297907B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259163A (en) * | 2020-01-14 | 2020-06-09 | 北京明略软件系统有限公司 | Knowledge graph generation method and device and computer readable storage medium |
CN111274807A (en) * | 2020-02-03 | 2020-06-12 | 华为技术有限公司 | Text information processing method and device, computer equipment and readable storage medium |
CN111639065A (en) * | 2020-04-17 | 2020-09-08 | 太原理工大学 | Polycrystalline silicon ingot casting quality prediction method and system based on batching data |
CN112446217A (en) * | 2020-11-27 | 2021-03-05 | 广州三七互娱科技有限公司 | Emotion analysis method and device and electronic equipment |
WO2021139108A1 (en) * | 2020-01-10 | 2021-07-15 | 平安科技(深圳)有限公司 | Intelligent emotion recognition method and apparatus, electronic device, and storage medium |
CN113688606A (en) * | 2021-07-30 | 2021-11-23 | 达观数据(苏州)有限公司 | Method for automatically writing document report |
CN114004605A (en) * | 2021-12-31 | 2022-02-01 | 北京中科闻歌科技股份有限公司 | Invoice over-limit application approval method, device, equipment and medium |
CN114298025A (en) * | 2021-12-01 | 2022-04-08 | 国家电网有限公司华东分部 | Emotion analysis method based on artificial intelligence |
CN115130581A (en) * | 2022-04-02 | 2022-09-30 | 北京百度网讯科技有限公司 | Sample generation method, training method, data processing method and electronic device |
EP3962073A4 (en) * | 2020-06-29 | 2023-08-02 | Guangzhou Quickdecision Technology Ltd Co | Online interview method and system |
CN116912845A (en) * | 2023-06-16 | 2023-10-20 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761239A (en) * | 2013-12-09 | 2014-04-30 | 国家计算机网络与信息安全管理中心 | Method for performing emotional tendency classification to microblog by using emoticons |
KR101540683B1 (en) * | 2014-10-20 | 2015-07-31 | 숭실대학교산학협력단 | Method and server for classifying emotion polarity of words |
CN105183717A (en) * | 2015-09-23 | 2015-12-23 | 东南大学 | OSN user emotion analysis method based on random forest and user relationship |
CN106030642A (en) * | 2014-02-23 | 2016-10-12 | 交互数字专利控股公司 | Cognitive and affective human machine interface |
CN106782545A (en) * | 2016-12-16 | 2017-05-31 | 广州视源电子科技股份有限公司 | A kind of system and method that audio, video data is changed into writing record |
CN107704558A (en) * | 2017-09-28 | 2018-02-16 | 北京车慧互动广告有限公司 | A kind of consumers' opinions abstracting method and system |
CN108134876A (en) * | 2017-12-21 | 2018-06-08 | 广东欧珀移动通信有限公司 | Dialog analysis method, apparatus, storage medium and mobile terminal |
CN108845986A (en) * | 2018-05-30 | 2018-11-20 | 中兴通讯股份有限公司 | A kind of sentiment analysis method, equipment and system, computer readable storage medium |
CN109543180A (en) * | 2018-11-08 | 2019-03-29 | 中山大学 | A kind of text emotion analysis method based on attention mechanism |
CN109685560A (en) * | 2018-12-17 | 2019-04-26 | 泰康保险集团股份有限公司 | Big data processing method, device, medium and electronic equipment |
CN109918499A (en) * | 2019-01-14 | 2019-06-21 | 平安科技(深圳)有限公司 | A kind of file classification method, device, computer equipment and storage medium |
-
2019
- 2019-06-28 CN CN201910582433.5A patent/CN110297907B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761239A (en) * | 2013-12-09 | 2014-04-30 | 国家计算机网络与信息安全管理中心 | Method for performing emotional tendency classification to microblog by using emoticons |
CN106030642A (en) * | 2014-02-23 | 2016-10-12 | 交互数字专利控股公司 | Cognitive and affective human machine interface |
KR101540683B1 (en) * | 2014-10-20 | 2015-07-31 | 숭실대학교산학협력단 | Method and server for classifying emotion polarity of words |
CN105183717A (en) * | 2015-09-23 | 2015-12-23 | 东南大学 | OSN user emotion analysis method based on random forest and user relationship |
CN106782545A (en) * | 2016-12-16 | 2017-05-31 | 广州视源电子科技股份有限公司 | A kind of system and method that audio, video data is changed into writing record |
CN107704558A (en) * | 2017-09-28 | 2018-02-16 | 北京车慧互动广告有限公司 | A kind of consumers' opinions abstracting method and system |
CN108134876A (en) * | 2017-12-21 | 2018-06-08 | 广东欧珀移动通信有限公司 | Dialog analysis method, apparatus, storage medium and mobile terminal |
CN108845986A (en) * | 2018-05-30 | 2018-11-20 | 中兴通讯股份有限公司 | A kind of sentiment analysis method, equipment and system, computer readable storage medium |
CN109543180A (en) * | 2018-11-08 | 2019-03-29 | 中山大学 | A kind of text emotion analysis method based on attention mechanism |
CN109685560A (en) * | 2018-12-17 | 2019-04-26 | 泰康保险集团股份有限公司 | Big data processing method, device, medium and electronic equipment |
CN109918499A (en) * | 2019-01-14 | 2019-06-21 | 平安科技(深圳)有限公司 | A kind of file classification method, device, computer equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
宋佳颖 等: "基于词语情感隶属度特征的情感极性分类", 《北京大学学报(自然科学版)》 * |
杨艳 等: "一种基于联合深度学习模型的情感分类方法", 《山东大学学报(理学版)》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021139108A1 (en) * | 2020-01-10 | 2021-07-15 | 平安科技(深圳)有限公司 | Intelligent emotion recognition method and apparatus, electronic device, and storage medium |
WO2021143014A1 (en) * | 2020-01-14 | 2021-07-22 | 北京明略软件系统有限公司 | Method and device for generating knowledge graph, and computer readable storage medium |
CN111259163A (en) * | 2020-01-14 | 2020-06-09 | 北京明略软件系统有限公司 | Knowledge graph generation method and device and computer readable storage medium |
CN111274807A (en) * | 2020-02-03 | 2020-06-12 | 华为技术有限公司 | Text information processing method and device, computer equipment and readable storage medium |
CN111639065B (en) * | 2020-04-17 | 2022-10-11 | 太原理工大学 | Polycrystalline silicon ingot casting quality prediction method and system based on batching data |
CN111639065A (en) * | 2020-04-17 | 2020-09-08 | 太原理工大学 | Polycrystalline silicon ingot casting quality prediction method and system based on batching data |
EP3962073A4 (en) * | 2020-06-29 | 2023-08-02 | Guangzhou Quickdecision Technology Ltd Co | Online interview method and system |
CN112446217A (en) * | 2020-11-27 | 2021-03-05 | 广州三七互娱科技有限公司 | Emotion analysis method and device and electronic equipment |
CN113688606A (en) * | 2021-07-30 | 2021-11-23 | 达观数据(苏州)有限公司 | Method for automatically writing document report |
CN114298025A (en) * | 2021-12-01 | 2022-04-08 | 国家电网有限公司华东分部 | Emotion analysis method based on artificial intelligence |
CN114004605A (en) * | 2021-12-31 | 2022-02-01 | 北京中科闻歌科技股份有限公司 | Invoice over-limit application approval method, device, equipment and medium |
CN115130581A (en) * | 2022-04-02 | 2022-09-30 | 北京百度网讯科技有限公司 | Sample generation method, training method, data processing method and electronic device |
CN116912845A (en) * | 2023-06-16 | 2023-10-20 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
CN116912845B (en) * | 2023-06-16 | 2024-03-19 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
Also Published As
Publication number | Publication date |
---|---|
CN110297907B (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110297907A (en) | Generate method, computer readable storage medium and the terminal device of interview report | |
CN110457466A (en) | Generate method, computer readable storage medium and the terminal device of interview report | |
US11645547B2 (en) | Human-machine interactive method and device based on artificial intelligence | |
CN107229610B (en) | A kind of analysis method and device of affection data | |
Bedi et al. | Multi-modal sarcasm detection and humor classification in code-mixed conversations | |
Wu et al. | Emotion recognition from text using semantic labels and separable mixture models | |
CN110457424A (en) | Generate method, computer readable storage medium and the terminal device of interview report | |
CN110297906A (en) | Generate method, computer readable storage medium and the terminal device of interview report | |
CN113704451B (en) | Power user appeal screening method and system, electronic device and storage medium | |
CN107301168A (en) | Intelligent robot and its mood exchange method, system | |
Millstein | Natural language processing with python: natural language processing using NLTK | |
CN108363725B (en) | Method for extracting user comment opinions and generating opinion labels | |
Bilquise et al. | Emotionally intelligent chatbots: A systematic literature review | |
CN110750648A (en) | Text emotion classification method based on deep learning and feature fusion | |
CN112562669A (en) | Intelligent digital newspaper automatic summarization and voice interaction news chat method and system | |
CN110852047A (en) | Text score method, device and computer storage medium | |
CN111339772B (en) | Russian text emotion analysis method, electronic device and storage medium | |
Yordanova et al. | Automatic detection of everyday social behaviours and environments from verbatim transcripts of daily conversations | |
Alías et al. | Towards high-quality next-generation text-to-speech synthesis: A multidomain approach by automatic domain classification | |
Fernandes et al. | Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs | |
Shawar et al. | A chatbot system as a tool to animate a corpus | |
Heaton et al. | Language models as emotional classifiers for textual conversation | |
CN117493548A (en) | Text classification method, training method and training device for model | |
CN110543559A (en) | Method for generating interview report, computer-readable storage medium and terminal device | |
Shang | Spoken Language Understanding for Abstractive Meeting Summarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |