CN110297907A - Generate method, computer readable storage medium and the terminal device of interview report - Google Patents

Generate method, computer readable storage medium and the terminal device of interview report Download PDF

Info

Publication number
CN110297907A
CN110297907A CN201910582433.5A CN201910582433A CN110297907A CN 110297907 A CN110297907 A CN 110297907A CN 201910582433 A CN201910582433 A CN 201910582433A CN 110297907 A CN110297907 A CN 110297907A
Authority
CN
China
Prior art keywords
sentence
model
text data
emotion
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910582433.5A
Other languages
Chinese (zh)
Other versions
CN110297907B (en
Inventor
谭浩
李文良
彭盛兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910582433.5A priority Critical patent/CN110297907B/en
Publication of CN110297907A publication Critical patent/CN110297907A/en
Application granted granted Critical
Publication of CN110297907B publication Critical patent/CN110297907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/043Distributed expert systems; Blackboards

Abstract

Generate method, computer readable storage medium and the terminal device of interview report.This application discloses a kind of voice content analysis methods.The described method includes: obtaining voice data;Based on the voice data, corresponding text data is obtained;The text data is inputted into trained sentiment analysis model, the trained sentiment analysis model includes that trained emotion extracts model and trained sentiment classification model;Model is extracted by the trained emotion, and the text data is divided into polarity emotion text data and neutral emotion text data;The polarity emotion text data are divided into positive emotion text data and negative sense emotion text data by the trained sentiment classification model;Sentiment analysis result is obtained according to the positive emotion text data and the negative sense emotion text data.

Description

Generate method, computer readable storage medium and the terminal device of interview report
Technical field
It, can more particularly, to a kind of method of generation interview report, computer this application involves field of artificial intelligence Read storage medium and terminal device.
Background technique
Interview is very important a part in design studies, suffers from very important effect in all trades and professions.Interview Mode it is varied, for example, user's interview, expert interviewing, joint interview, outdoor interview etc. pass through in design studies Interview can see clearly the true idea of user, social industrial trend.
Traditional interview is usually unit expansion by two people or more, and someone is responsible for and user links up, someone's sound recordings And supplement.In order to excavate user's true idea in the depth of one's heart in interview, it will usually from careful greeting and warm-up ask Topic starts, and adjusts question order according to the practical answer of user in interview, and therefore, recorder needs in interview in interview Effective information is extracted rapidly, in real time record and label emphasis.By the recording entire interview process of discs after interview, by voice Text is converted to, is put down in conjunction with interview and carries out analyzing in detail.And according to practical interview target, the demand of user is seen clearly in analysis, The pain spot of product, industry opportunity point etc. related content, above step require the completion of interview personnel and manpower, take around 5-6 The analysis work of 1 hour interview could be completed within a hour, it is to visit that interview is at high cost, analysis work lengthy and jumbled complicated, the consuming time In what is said or talked about the problem of highly significant.
Summary of the invention
The application aims to solve at least one of the technical problems existing in the prior art.For this purpose, the purpose of the application A kind of method for being to propose generation interview report, this method can reduce the cost of demand analysis and time in interview, more Simply.
Second purpose of the application is to propose a kind of computer readable storage medium.
The application third purpose is to propose a kind of terminal device.
In order to achieve the above object, the first aspect of the application provides a kind of voice content analysis method, and feature exists In, which comprises obtain voice data;Based on the voice data, corresponding text data is obtained;By the textual data According to inputting trained sentiment analysis model, the trained sentiment analysis model include trained emotion extract model and Trained sentiment classification model;Model is extracted by the trained emotion, and the text data is divided into polarity emotion text Notebook data and neutral emotion text data;The polarity emotion text data are divided by the trained sentiment classification model For positive emotion text data and negative sense emotion text data;According to the positive emotion text data and negative sense emotion text Notebook data obtains sentiment analysis result.
In some embodiments, the text data is sentence vector, and the sentence vector is obtained by following steps: being obtained former Beginning text;To complete sentence each in the urtext: the complete sentence subordinate sentence is made pauses in reading unpunctuated ancient writings according to punctuation mark, Obtain at least one short sentence;Determine the sentence vector of at least one short sentence.
In some embodiments, the sentence vector of at least one short sentence described in the determination include: for it is described at least one Each short sentence in short sentence, the term vector of the short sentence is determined based on word2vec model;Term vector based on the short sentence is true The fixed sentence vector.
In some embodiments, it is described short to determine that the sentence vector comprises determining that for the term vector based on the short sentence The mean value of the term vector of sentence is the sentence vector.
In some embodiments, the method further includes: delete modal particle in the urtext, stop words and At least one of messy code.
In some embodiments, the trained emotion is extracted model and is obtained by following steps: what acquisition had marked Training data, the training data marked include the neutral emotion text data being marked and non-neutral emotion text number According to;The training data marked is inputted into initial emotion and extracts model;Model is extracted when the initial emotion to pass through When reaching the condition of convergence after training, determine that the trained emotion extracts model.
In some embodiments, the trained sentiment classification model is obtained by following steps: what acquisition had marked Training data, the training data marked include the positive emotion text data being marked and negative sense emotion text number According to;The training data marked is inputted initial sentiment classification model to be trained;When the initial emotional semantic classification When model reaches the condition of convergence after training, the trained sentiment classification model is determined.
In some embodiments, the method further includes: the voice data is determined according to the text data Field belonging to content;According to field belonging to the content of the voice data, determines and transfer the trained emotion point Analyse model.
In some embodiments, the method further includes: receive user input, it is described input include the voice number According to content belonging to field;According to field belonging to the content of the voice data, determines and transfer the trained feelings Feel analysis model.
Further aspect of the application provides a kind of voice content analytical equipment, comprising: at least one storage equipment, institute Stating storage equipment includes one group of instruction;And at least one processor communicated at least one described storage equipment, wherein when When executing one group of instruction, at least one described processor makes the analytical equipment execute method above-mentioned.
In order to achieve the above object, the third aspect of the application provides a kind of computer readable storage medium, storage There are computer executable instructions, the computer executable instructions are arranged to carry out preceding method.
In order to achieve the above object, the fourth aspect of the application provides a kind of terminal device comprising: processor;With The memory of the processor communication connection;Wherein, the memory is stored with the instruction that can be executed by the processor, described When instruction is executed by the processor, the processor is made to execute preceding method.
The additional aspect and advantage of the application will be set forth in part in the description, and will partially become from the following description It obtains obviously, or recognized by the practice of the application.
Detailed description of the invention
The above-mentioned and/or additional aspect and advantage of the application will become from the description of the embodiment in conjunction with the following figures Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the method for generating interview report of one embodiment of the application;
Fig. 2 is the schematic diagram according to the training neural network model process of one embodiment of the application;
Fig. 3 is the schematic diagram for the further explanation of the training process in Fig. 2;
Fig. 4 is the block diagram according to the terminal device of one embodiment of the application;
Fig. 5 is the flow chart according to the intelligent sound data-requirements extraction process of one embodiment of the application;
Fig. 6 is the process according to the demand estimation model of one embodiment of the application and the building process of feature lexicon Figure;
Fig. 7 is the frame according to the comprising modules of the intelligent sound data-requirements extraction system of one embodiment of the application Figure;
Fig. 8 is the flow chart according to the availability deterministic process of one embodiment of the application;
Fig. 9 is the flow chart according to the building process of the availability judgment models of one embodiment of the application;
Figure 10 is the block diagram according to the availability extraction system of one embodiment of the application;
Figure 11 is the flow chart according to the voice content analysis method of one embodiment of the application;
Figure 12 is the flow chart according to the semantic primitive clustering method of one embodiment of the application;
Figure 13 is to determine one based on the similarity of multiple semantic primitives between any two according to one embodiment of the application The flow chart of a or multiple cluster centres;
Figure 14 is to calculate separately each candidate semantic unit and multiple semantic primitives according to one embodiment of the application In each of remaining semantic primitive between similarity flow chart;
Figure 15 is the stream according to the candidate semantic vector of each candidate semantic unit of calculating of one embodiment of the application Cheng Tu;
Figure 16 is the candidate semantic vector according to each candidate semantic unit of calculating of another embodiment of the application Flow chart;
Figure 17 is the schematic diagram according to the semantic primitive clustering apparatus of one embodiment of the application.
Specific embodiment
Embodiments herein is described below in detail, the embodiment being described with reference to the drawings is exemplary, and is retouched in detail below State embodiments herein.
The method for generating interview report of the embodiment of the present application, has merged artificial intelligence and design studies technology, with less The step of and the time construct autonomous interview research process, can help designer and other people independently completes interview data receive Collection, analysis and the process reported, reduce the human cost of interview analysis, shorten the time of interview analysis consuming, simple easily to implement.
Below with reference to Fig. 1-Fig. 3 description according to the method for generating interview report of the application first aspect embodiment.
Fig. 1 is according to the flow chart of the method for generating interview report of one embodiment of the application, as shown in Figure 1, originally The method for applying for the generation interview report of embodiment includes step S1 to step S4.
Step S1 obtains interview corpus.
Specifically, in embodiment, the method for the generation interview report of the embodiment of the present application can be with the shape of application program Formula is loaded in terminal device such as smart phone, tablet computer, on laptop, and the human-computer interaction window of terminal device can be with The application icon of the method for association the embodiment of the present application is provided, in response to program enabled instruction, human-computer interaction interface is provided Recording starting trigger unit can must then be recorded in response to the triggering command of recording starting trigger unit by terminal device itself Sound module acquires interview corpus, in short, operation recording starting trigger unit can start to record when interview starts, Jin Erke To obtain interview corpus.
In further embodiments, it can also be recorded by other recording devices to interview, such as pass through microphone Or the sound pick-up outfits such as microphone array, recorder, recording pen record interview corpus, and interview corpus is transferred to and loads this Shen Please embodiment method application program terminal device, to obtain interview corpus, and then independently interview corpus is analyzed, The time of interview analysis can also equally be shortened, save human cost.
Step S2 is instructed in response to interview report generation, is analyzed according to neural network model interview corpus, is obtained By the performance information of interview person.
Specifically, after completing interview, the human-computer interaction interface of terminal device provides interview report generation trigger unit, Or the corresponding key or knob of setting receives interview when interview report generation trigger unit receives triggering command Report generation instruction, the processor of terminal device can carry out the analysis expected interview independently to generate interview report.
In some embodiments, for original interview corpus, need to carry out it a series of processing in order to subsequent The processing such as analyzed, such as pre-processed and segmented to original language material using neural network model, wherein pretreatment can To include for example being identified to interview corpus to be converted to text data, and text data is split and purified.
Specifically, there is neural network training, study to continue to optimize the mechanism of result, can be with structure by neural metwork training Build neural network model.First the training of neural network model is done and is briefly described.
Fig. 2 be according to the schematic diagram of the neural network model training process of one embodiment of the application, as shown in Fig. 2, It can be for production for the expert system in the open algorithm optimization interface of expert user such as Fig. 2, the major function of expert system The customization function of product constructs training set to the core feature neural network of project, generates neural network mould in training characteristics parameter When type, the content of training set determines final result.By the input expert system building of expert's priori knowledge, the content of training set It is input to neural network algorithm model, expert can carry out the output result of the neural network algorithm model based on the training set Assessment amendment, and then optimize expertise, the knowledge after optimization is further input to expert system, and then carry out to training set Amendment, inputs to neural network algorithm model again, circuits sequentially iteration, until the output result of neural network model is more forced Nearly optimal solution.
Further, Fig. 3 is the schematic diagram being explained further for the neural network model training process in Fig. 2, tool Body, training set can be constructed jointly by team, in conjunction with linguistics, psychology, design science method, excavate the structure of raw data base The standard mode of training text is determined according to the constraint condition of result of study specification neural network at feature.Such as such as Fig. 3 institute Show, for expert system side, expert's priori knowledge is inputted into expertise recording module, limits the condition that feature determines, such as Judgement whether or the judgement of degree etc., mark key message for example set keyword, and mark description information such as language Gas or grammer etc., and then it is built into training set.For algorithm model side, training set content input algorithm model is trained, Such as it carries out semantic analysis, generate analysis as a result, being modified to the assessment of analysis result to analysis result based on expert, in turn Training set is augmented, trained set content is continued to optimize, to construct the neural network model of needs.
In embodiments herein, can be based on interview corpus the characteristics of and expertise, by process above come Neural network model is constructed, interview corpus is analyzed by neural network model, include in acquisition interview corpus is interviewed All kinds of performance informations of what is said or talked about person such as emotion information, demand information, the anticipation information to industrial trend, the controversial issue to certain problems Information, availability information etc., wherein different neural network models is constructed by the different performance information of interview person for obtaining.
Step S3 generates interview report according to by the performance information of interview person.
For example, about some product carry out user's investigation, it is desirable to judged by user's investigation and analysis user demand or To the attitude of product, to be improved to product.Specifically, when to user's interview, operation recording trigger unit is to start record Sound obtains interview corpus, after the completion of interview, input interview report generation instruction, by neural network model come to interview language Material is analyzed, and acquisition text key word collection, which is formed, obtains globality description to interview, is obtained by interview person for the feelings of the product Feel information, demand information and/or availability information, user is judged according to emotion information, demand information and/or availability information Prefer or wish function that product has to the products satisfaction and/or unsatisfied place, user, even user is usually All like the information such as what, so that the interview report including user preferences and product advantage and disadvantage is generated based on analysis result, And then can be perfect to be carried out to product with reference to interview report, it enhances product performance.
In some embodiments, demand refers to expectation or hope of the user from self-view proposition.Pass through demand class Corpus can obtain user's motivation, user prefers or wish function that product has, to the suggestion of product or opinion etc. Information, to realize guide product design, see clearly the demands such as industry market.Demand quasi-sentence another example is: I wishes house Electricity can be consistent with the decoration style of family.
In some embodiments, availability refers to that user in specific usage scenario, makes to reach specific objective When with certain product, validity, efficiency and the satisfaction experienced.Specifically, validity (Effectiveness) refers to using The correct and integrated degree of family completion specific objective;Efficiency (Efficiency) refers to that user completes the efficiency of specific objective, The resource (such as time) of itself and consumption is inversely proportional;Satisfaction (Satisfaction) refers to using the product by the user that when experiences Subjective satisfaction.In some embodiments, there are five indexs for availability, are learnability, easy memory, fault-tolerance, interaction respectively Efficiency and user satisfaction.Product only reaches good water product in each index, just has high availability.By right Interview corpus carries out availability analysis and extraction, can obtain product advantage and disadvantage, to realize the optimization to product, and then improves Performance.Availability sentence another example is: the speech recognition effect of this intelligent sound is good not at all.Availability sentence Another example is: I thinks that this intelligent sound is also very easy to operate.In some embodiments, availability information expresses Impression of the user in terms of Product Experience, for example, the positive or negative that user holds availability or the ease for use aspect of product The recommendation on improvement or opinion of attitude or user to product in terms of some.
In some embodiments, sentiment analysis refer to the subjective texts with emotional color are analyzed, handled, The process with reasoning is concluded, in the product by analysis interview original text, user preferences can be obtained, to product, the feelings of environment Feel attitude, sees clearly the animation of user.
In some embodiments, demand information may include product demand information and demands of individuals information.Specifically, product needs Asking information mainly to express demand of the user to the function of specific products or speciality, user prefers or wishes to produce in other words The function that product have, for example, user, which wants special handset product, has the function of narrow frame, dual camera, double-card dual-standby etc..It is personal Demand information mainly expresses the demand of user itself, and the evaluation object of the information is typically not limited to specific products and can be Any aspect, for example, user wants a new cell-phone, the new sneakers of a pair of, a film ticket etc..In some embodiments, emotion Polarity information may include Product Emotion polarity information and personal feeling polarities information.Specifically, Product Emotion polarity information is main User is expressed to the specific function of specific products or the taste of speciality or user to the products satisfaction or unsatisfied place, For example, user likes the particular shell color of special handset product, does not like Liu Haiping, the camera for not liking protrusion etc..It is a Human feelings sense polarity information mainly expresses user to the taste of other objects, and the evaluation object of the information is typically not limited to specific production Product and it can be any aspect, for example, user likes playing basketball, likes digital product, do not like and watch movie.
According to the method for generating interview report of the embodiment of the present application, instructed in response to interview report generation, it can be voluntarily Interview corpus is analyzed according to neural network model, is obtained by the performance information of interview person, and then according to by interview person's Performance information generates interview report, that is, a key may be implemented and generate interview report, so as to use manpower and material resources sparingly, reduces Interview cost shortens interview and expends the time.
In some embodiments, interview report is generated according to by the performance information of interview person, comprising: calculating is needed by interview person The sentence asked, by least a kind of in the sentence of the non-demand of interview person, positive emotion sentence, negative sense emotion sentence, availability sentence The similarity of sentence;It is clustered according to similarity, and obtains cluster centre;The statement semantics for including according to cluster centre, it is raw It is reported at interview.
Specifically, obtain interview corpus in include by the performance information of interview person after, calculate the sentence of all kinds of performance informations Vector forms sentence vector group, and the sentence vector in sentence vector group is carried out similarity calculation, carries out for example, by using HANLP algorithm Similarity calculation, and then clustered according to similarity calculation, such as can be clustered using AP clustering algorithm or other algorithms, And determine cluster centre, and it is referred to the detailed description of following Example about clustering method, the corresponding sentence of cluster centre, It can largely reflect by the happiness evil of interview person or to the attitude of product, interview report, and then user can be generated accordingly Can with reference to the interview report come to product function and performance carry out it is perfect.
It is possible to further to interview corpus extract keyword and carry out emphasis label, cypher text, report and its Emphasis can be more easily found when its material.Specifically, label can be provided by the human-computer interaction interface of terminal device Highlighted (highlight) button is for example arranged in trigger unit, or is arranged on the terminal device corresponding touch or mechanical Key.When marking keyword, weight calculation is carried out to the sentence in text data and is sorted, is preselected according to ranking results Keyword filters out the stop words for including in qualifying key word according to deactivated vocabulary, obtains the keyword in text data, and defeated Keyword out forms text key word collection to carry out globality description to interview.Wherein, stop words can be manually entered, and generate Stop words afterwards will form a deactivated vocabulary.In emphasis label, in response to emphasis mark instructions, it is right in text data to improve Should emphasis mark instructions sentence weight, with label be emphasis sentence, be easier to find emphasis when facilitating subsequent browse.
Specifically, morphological analysis can be carried out to text data using participle external member, obtains short sentence and removal is therein Modal particle.Some participle external members, which have been used, realizes that efficient word figure scans based on prefix dictionary, and Chinese character is all in generated statement can The directed acyclic graph that word is constituted can be generated, then uses Dynamic Programming and searches maximum probability path, is found out based on word frequency most Big cutting combination, has used Viterbi algorithm using the HMM model based on Chinese character at word ability for unregistered word.It is right In the extraction of keyword, i.e., extract several significant words or phrase automatically from one section of given text data.One In a little embodiments, TextRank algorithm can be used, TextRank algorithm is a kind of sequence based on figure for text data Algorithm, by the way that text data is divided into several component units and establishes graph model, using voting mechanism to important in text Ingredient is ranked up, and keyword extraction can be realized merely with single document itself.
For example, extract keyword basic step may include: (1), given text data T according to complete sentence into Row segmentation;(2), for each sentence, participle and part-of-speech tagging processing are carried out, and filters out stop words, only retains specified part of speech Word, such as noun, verb, adjective, that is, wherein be retain after qualifying key word;(3), qualifying key word figure G=is constructed (V, E), wherein V is node collection, is made of the qualifying key word that (2) generate, and is then appointed between two o'clock using cooccurrence relation construction Side, there are side, only when their corresponding vocabulary, co-occurrence, K indicate window size in the window that length is K between two nodes, I.e. most K words of co-occurrence;(4), the weight of each node of iterative diffusion, until convergence;(5), inverted order row is carried out to node weights Sequence, so that most important T word is obtained, as qualifying key word;(6), most important T word is obtained by (5), original It is marked in text, if forming adjacent phrase, is combined into more word keywords.
It is exemplified below, example is as follows.
Original text short sentence are as follows: alarm clock can be made with it when get up in the morning, can also ask its weather sometimes
Qualifying key word are as follows: morning, get up, alarm clock, weather, can with, when
Stop words are as follows: can with, when
Final keyword are as follows: get up, alarm clock, weather morning
In short, the method for generating interview report of the embodiment of the present application, can be realized with a key to one or passage The emphasis of data marks, and convenient for understanding emphasis when subsequent processing text data, manual steps are less, simple and convenient, saves Human time.
Further, in some embodiments, interview can be reported and further generates visual information, and be supplied to use Family.It is presented to the user for example, reporting interview with bar chart or pie chart or line chart or various forms of combinations, thus with Family can more intuitively understand the content and key message of interview report.
In some embodiments, after completing interview, editor can also be provided in the human-computer interaction interface of terminal device Trigger unit edits interview report in response to edit instruction, for example, addition is all by interview person's information, or modification Text data, result and report content after conversion, it is more flexible.Further, user can be according to the demand of itself to visit It talks report result to be labeled, after user completes editor, terminal device will record content of edit, obtain in user's content of edit Labeled data, and markup information is compared with given threshold, when labeled data reaches default mark threshold value, will be marked Data feedback is to the corpus data library of neural network model, alternatively, according to set time period, every preset time such as 5 days Or 15 days, labeled data is fed back to the corpus of neural network model, with optimization neural network model, i.e. neural network mould Type has adaptation function, so that being more nearly the desired knot of user by the result that neural network model is analyzed Fruit.
In some embodiments, after generating interview report, output can be provided in the human-computer interaction interface of terminal device Trigger unit, in response to output order, by interview report output to mobile terminal such as smart phone, PC, notebook On computer, convenient for checking analysis result whenever and wherever possible and exporting interview report, provide conveniently.
The method for generating interview report based on above example, the application second aspect embodiment also propose a kind of calculating Machine storage medium, the computer-readable recording medium storage have computer executable instructions, and computer executable instructions are set as Execute the method for generating interview report of above example.
The method for generating interview report based on above example, is described below according to the application third aspect embodiment Terminal device.
Fig. 4 is according to the block diagram of the terminal device of one embodiment of the application, as shown in figure 4, the embodiment of the present application Terminal device 100 includes processor 10 and memory 20.
Wherein, memory 20 and processor 10 communicate to connect, and memory 20 is stored with the instruction that can be executed by processor 10, When instruction is executed by processor 10, processor 10 is made to execute the method for generating interview report of above example.Wherein, it generates and visits The method for talking report, is referred to the description of above example.
Specifically, terminal device 100 can include but is not limited to mobile terminal such as smart phone, PC or flat Plate computer, wherein trigger unit can be set in terminal device 100, such as provides human-computer interaction interface, on human-computer interaction interface Interview report generation trigger unit, or setting touch-key or mechanical key are provided, when receiving triggering command, from The method for generating interview report of main execution above, realizes operating in a key, is simple and efficient, with less operating procedure and Time can construct autonomous interview research process, reduce interview cost, shorten the time of interview consumption.
Total process of the generation interview report of the embodiment of the present application is illustrated above.Below to pass through neural network For model is analyzed to interview corpus and obtains emotion information and demand information by interview person, to the embodiment of the present application The method for generating interview report is illustrated, i.e., sentiment analysis algorithm and requirement extract algorithm is further described, and Cluster process is further described.
In the embodiment of the present application, by the performance information of interview people may include it is a kind of in demand information and emotion information or Two kinds.In some embodiments, after being pre-processed to original language material information, interview corpus is defeated for demand information Enter first nerves network model, obtains in interview corpus and reflect by the sentence of interview person's demand and by the language of the non-demand of interview person Sentence is extracted by the demand information of interview person, for example, SVM (Support Vector Machine, supporting vector can be used Machine) classifier realizes two classification analysis of demand and non-demand.Further, in some embodiments, by the non-need of interview person The sentence asked is considered neutral sentence, can extract quilt for by the input by sentence nervus opticus network model of the non-demand of interview person Reflection is by the sentence of interview person's polarity emotion in the sentence of the non-demand of interview person;And then it is the input by sentence third of polarity emotion is refreshing Through network model, positive emotion sentence and negative sense emotion sentence in the sentence of polarity emotion are obtained.For example, being connected using two TextCNN classifier come realize polarity emotion sentence and neutral emotion sentence two classification and to polarity emotion sentence into Classify to one step, the corpus that the algorithm model of two of them TextCNN classifier uses when training is different.
Requirement extract process in the intelligent sound data of the embodiment of the present application is carried out specifically referring to Fig. 5 to Fig. 7 It is bright.
Fig. 5 shows the flow chart of the intelligent sound data-requirements extraction process according to the embodiment of the present application.
In step 101, voice data is obtained.Specifically, step 101 by microphone or microphone array, recorder, The sound pick-up outfits such as recording pen obtain voice signal.
In step 102, the voice data of acquisition is pre-processed, obtains the text data for analysis.Pretreatment is Voice data is loaded into memory by finger, and additions and deletions as needed change a process of part of word.Pretreatment includes: to know Not, refer to and voice data is identified as lteral data, to form text data;Split, refer to according in text data by fullstop The punctuation mark that interval is indicated in the long sentence of separation, is split as short sentence for the long sentence;And purification, refer to and removes the text Invalid content unrelated with interview content in original voice data in data.In some embodiments, pretreatment includes that comment is clear It washes, part of speech participle, part-of-speech tagging and syntax dependency parsing.
In some embodiments, it is computer-readable that speech recognition, which is the vocabulary Content Transformation in the voice for being issued people, The text entered.For example, can with certain time length (for example, 0.05 second) to speech waveform at a bit of, be known as a frame per a bit of, Then obtain a certain number of frames (for example, 20) in certain time (for example, 1 second).Reflection voice is extracted from each frame The information (remove redundancy useless for speech recognition in voice signal, while reaching dimensionality reduction) of substantive characteristics.Then it mentions The feature for taking each frame waveform, obtains the feature vector of the frame.Phoneme be marked off according to the natural quality of voice come minimum Phonetic unit.State refers to phonetic unit more finer than phoneme, and a phoneme is usually divided into three states.By by frame Be identified as state, by combinations of states at phoneme, again by phonotactics at word, to realize speech recognition.
In some embodiments, speech recognition may include Real-time speech recognition and offline speech recognition.It, can in interview Voice is converted to text in real time, checked for interview person.After interview, entire interview can be looked back by playback Journey.
Specifically, the development platform that third party's speech-to-text can be used, by the sdk for downloading the platform (Software Development Kit, Software Development Kit), is completed based on the Software Development Kit by interview language Material is converted to the function of text data.In embodiments herein, Real-time speech recognition and offline speech recognition can be parallel, Real time inspection speech-to-text looks back entire interview process as a result, can also pass through to play back after interview in interview.
In some embodiments, before speech recognition, interview person or general term can be used by interview person, also establishes special There is field dictionary.Dictionary is uploaded to voice and known by user by storing some non-common specialized vocabularies into the dictionary In other module, the accuracy of speech recognition result can be promoted.For example, " trend " word originally means that the water flow as caused by tide is transported It is dynamic, refer to the trend of fashion trend in sociology field, point of voltage, electric current and power in power industry then refers to power grid everywhere Uncommon specialized vocabulary for example can be uploaded to speech-to-text and put down with grinding, go out packet etc. for another example in interview field by cloth In the dictionary of platform, when carrying out text conversion, these vocabulary can be more efficiently identified.In short, according to interview theme institute Category field takes corresponding proprietary field dictionary, can further increase the recognition accuracy of dialect.
In some embodiments, by speech recognition process, the text data of corresponding voice data can be obtained.Some In embodiment, the format of text data may include (but being not limited to) TXT, WORD, EXCEL etc. Microsofts (MICROSOFT) company The OFFICE document perhaps document of WPS format file or the extended formatting for word processing.
In some embodiments, the punctuate referred to according to interval is indicated in the long sentence separated in text data by fullstop is split Long sentence is split as short sentence by symbol, is analyzed so as to independent to short sentence respectively.In some embodiments, this method uses ", ", ", " or ";" split long sentence.In some embodiments, long sentence can be, for example, " for drawing an analogy, I feels intelligence Can speaker should simple generous, color should not be motley, using above should be simple and convenient ".After fractionation, short sentence can be formed It is as follows: 1, for drawing an analogy;2, I thinks that intelligent sound box should be simple generous;3, color should not be motley;4, it uses On should be simple and convenient.Compared with long sentence, uses short sentence as analysis object, system operations process can be simplified, reduce operation Amount, raising efficiency.
In some embodiments, purification, which refers to, removes in the text data in original voice data with voice data content Unrelated invalid content.In some embodiments, due to the colloquial style feature of interview, language is inevitably present in text data Gas auxiliary word or interjection.In some embodiments, during voice changes into text, there may be messy code.These tone help Word or messy code are unrelated with interview content, to requirement extract and in vain.
Specifically, for the colloquial feature of interview, morphological analysis is carried out by text of the participle tool to interview, is determined Word content and part of speech in each short sentence remove the modal particle in sentence, with more significant in determination every words Vocabulary, in order to the analysis of interview.Participle tool includes a variety of such as Pan Gu participles, Yaha participle, Jieba participle, Tsing-Hua University THULAC etc., by taking morphological analysis interface calls jieba participle as an example, the former sentence after subordinate sentence is: uh, welcome you to participate in we this The interview of secondary intelligent sound box user experience.The sentence for carrying out morphological analysis and removing after modal particle therein is segmented by jieba It is: you is welcome to participate in the interview of our this intelligent sound box user experience.Thus, it is possible to obtain the short sentence table with physical meaning It reaches.
In step 103, word segmentation processing is carried out to each short sentence in pretreated text data.Participle is will be a series of Continuous character string is according to certain logical division at individual word.In some embodiments, participle can using maximum matching method, Reverse maximum matching method, bi-directional matching method, Best Match Method, association-backtracking method etc. carry out.In some embodiments, Yong Huke To select accurate participle, the word listed and be all likely to occur also can choose.After word segmentation processing, each text is by with sky The corpus of text for word (word) composition that lattice separate.
In step 104, it is based on feature lexicon, sentence vector corresponding to each short sentence in the text data after obtaining participle. In some embodiments, feature lexicon be by Feature Words and eigenvalue cluster at two-dimensional matrix.Feature Words are to have very in corpus Big possibility expresses the word that target language material can be positioned as to demand.Characteristic value is characterized a possibility that word is positioned as demand Mathematical expression.Corpus can be include the database of whole corpus, such as 98 years People's Daily's corpus, can also use Corpus under existing special dimension.Sentence vector be correspondence text data in each short sentence, by corpus and feature To the matrix of the lookup result composition of the word in short sentence in dictionary.
In some embodiments, it for each word in the short sentence after participle, is looked into corpus and feature lexicon respectively It looks for.If the word did not occur in corpus, lookup result 0;If the word occurred but did not had in corpus Occurred in feature lexicon, then lookup result is 1;If the word appears in feature lexicon, lookup result 2;Thus Form sentence vector corresponding to the short sentence.Since the short sentence of demand analysis has specific characteristic, granularity is larger, and (i.e. short sentence is wanted Be statement of requirements or be non-statement of requirements, the short sentence that can not specify classification is less), therefore through practice test, above-mentioned sentence to It is preferable to measure generation type classifying quality.
In step 105, sentence vector is input to demand estimation model.In some embodiments, demand estimation model is matched It is set to and judging result is exported according to the semantic unit vector (for example, sentence vector) of input.In some embodiments, judging result can To be to indicate whether sentence belongs to the value of demand property sentence.
In step 106, the output of judgment models is as a result, determine which kind of the short sentence is classified as according to demand.Some In embodiment, output result can be 1 or 0.When exporting result is 1, which is divided into demand property sentence;When output result When being 0, which is judged as non-demand property sentence.In some embodiments, output result may is that the short sentence is divided into and need A possibility that seeking sentence e.g. 0.7, a possibility that being divided into non-statement of requirements e.g. 0.3, therefore, which is finally judged to Break as statement of requirements.
In some embodiments, all sentences for expressing demand are subjected to clustering processing.The detailed step of clustering processing It is referred to the description of Examples below.
In some embodiments, for being judged as the part of non-statement of requirements, polarity sentiment analysis is carried out.Polarity emotion The granularity relative requirements of analysis analyze more refinement, are unable to reach 90% or more accuracy using SVM, polarity sentiment analysis because This uses convolutional neural networks (CNN) classifier.It in some embodiments, can for being judged as the part of non-statement of requirements Carry out availability analysis.
In some embodiments, demand refers to the phase to something or something showed in interview process by interview person It hopes.For example, extracted demand can indicate are as follows: faster response speed when interview subject is a kind of intelligent sound box;It is recommended that Optimize design;Appearance wants fashion generous;Color should not be motley;Cabinet lines want smoothness, etc..At cluster After reason, it can be deduced that the theme of this interview are as follows: faster response speed;It is recommended that optimizing design.In some implementations In example, in interview, user may provide unintentionally the information related or completely irrelevant to current interview subject part, than If user may evaluate the rival of interview subject, also or reveal out unrelated with field locating for the interview subject Otherwise demand information.
In some embodiments, polarity emotion refers to by the Sentiment orientation of interview person, for example, its can be divided into positive, negative sense and It is neutral.Specifically, positive emotion is to indicate: the advantages of product, interview person expresses the content liked, be satisfied with;Negative sense emotion is table The shortcomings that showing product and availability issue and interview person express detest, unsatisfied content;Neutral emotion is that expression is neutral vertical The content of field.For example, polarity emotion may include: that it can be to a certain extent when interview subject is a kind of intelligent sound box Some conveniences (positive emotion) is brought to me;I is (the negative sense emotion) that will not be done by it in fact;I feels first side For face (neutral emotion).Using non-statement of requirements rather than entire interview content analyzes polarity emotion, and polarity can be improved The efficiency and accuracy of sentiment analysis.In some embodiments, Sentiment orientation can be for product itself, be also possible to needle To other aspects except product.For example, user may provide unintentionally and current interview subject part phase in interview Pass or completely irrelevant information, such as user may evaluate the rival of interview subject, also or reveal out and this The unrelated otherwise Sentiment orientation information in field locating for interview subject.
Below in connection with the building and training process of Fig. 6 detailed description demand estimation model and feature lexicon.
Fig. 6 shows the building according to the demand estimation model and feature lexicon of this specification embodiment and/or trained The flow chart of journey.In some embodiments, the building and/or training process are by manually obtaining.In some embodiments, the building And/or training process is completed by computer program.
In step 201, voice data is obtained.In step 202, voice data is pre-processed.Step 201 is to step 202 is similar to step 102 with above-mentioned steps 101.In step 203, feature is carried out to text data using expert's mark database Mark.For example, the text data after mark may is that X: sentence.Wherein, X can be 0 (indicating non-demand) or 1 (expression need It asks).In step 204, the text data after mark is segmented.Step 204 is similar with above-mentioned steps 104.
Text data after step 205, participle is input into classifier for training demand estimation model.Some In embodiment, classifier uses support vector machines (Support Vector Machine, SVM) classifier.SVM classifier is one Two classical disaggregated models of kind, for feature, significantly clearly, demand biggish for granularity is divided for classification effect for it Analyse significant effect.The basic model of SVM classifier, which is defined on feature space, keeps the distance between two classes maximum linear Classifier.SVM classifier can also include kernel function, and kernel function has the function of low-dimensional data being converted to high dimensional data.It is logical Introducing kernel function is crossed, inseparable problem can be converted into separable problem, this makes it substantial non-linear Classifier can be suitable for the data of linearly inseparable.
In some embodiments, this method selects linear kernel function, for example, k (x1, x2)=x1Tx2.In some embodiments In, other kernel functions, such as polymerization kernel function, radial base core letter can be selected according to the size and other factors of text data The Non-linear Kernels functions such as number.
In some embodiments, by a series of calculating processes such as Lagrange duality operations, finally obtain needs this method Ask judgment models and series of features word.Demand estimation model is the executable algorithm of a set of computer, and input is sentence vector, Output is classification belonging to this.
In some embodiments, demand estimation model can be sentenced based on interdependent syntactic analysis to whether sentence belongs to demand It is disconnected.In some embodiments, interdependent syntactic analysis may include one or more rules.For example, one or more for meeting The sentence of rule, it can be determined that it is statement of requirements, and output result is 1, and otherwise, output result is 0.In some embodiments, Each rule can assign certain weight, integrate all rules finally to calculate the sentence as demand property sentence Possibility or reference value.It in some embodiments, can be with the object of computed view point keyword to obtain by applying these rules Demand object value list is obtained, and the quantity of Sentiment orientation degree adverb can be counted to obtain desirability value list, finally Improvement requirement list is generated in conjunction with demand object value list and desirability value list.
In some embodiments, by identifying the dependence of statement of requirements, to extract Feature Words and construct demand estimation Model.Wherein, Feature Words are the objects that expressed viewpoint is referred to, generally noun or gerund or verb;Viewpoint word is Expressed viewpoint, generally adjective, adverbial word or verb.In some embodiments, the dependence between word can wrap Include subject-predicate relationship (Subject-Verb, SBV), dynamic guest's relationship (Verb-Object, VOB), dynamic benefit relationship (Verb-Object, CMP), Key Relationships (Head, HED) or coordination (Coordinate, COO).In addition, word is possibly also with qualifier. The relationship of centre word and its qualifier may include surely in relationship in relationship (Attribute, ATT) or shape (adverbial, ADV)。
For example, the noun (or gerund or verb) when demand short sentence meets SBV, CMP or ATT relationship, in short sentence It is characterized word, corresponding adjective is viewpoint word.For example, the dependence of short sentence is in statement of requirements " workflow can manage it " Subject-predicate relationship, therefore wherein " workflow " is Feature Words, " can manage it " is viewpoint word." management department of document is wanted in statement of requirements It is point more preferable to use " in, dependence is dynamic benefit relationship, and therefore " administrative section of document " is Feature Words, and " more preferable with " is viewpoint word. For example, then two words are qualifier and viewpoint word respectively when two words adjacent in short sentence meet ADV relationship.Such as Phrase " more preferable to use ", which meets ADV relationship, then identifies that " handy " is viewpoint word, and " more " is qualifier.For example, When two nouns (or verb+noun) adjacent in short sentence meet ATT relationship, two words constitute a nominal phrase, Respectively qualifier and Feature Words.Such as in " administrative section of document " nominal phrase, wherein " administrative section " is feature Word, " document " are qualifiers.In some embodiments, higher to the number of repetition of some Feature Words or keyword, then it pays close attention to Degree is higher, when emotional expression is derogatory sense, illustrates that product demand is higher.
In some embodiments, Feature Words are extracted by directly expressing the word of user demand in identification demand short sentence And construct demand model.Specifically, identify in statement of requirements indicate " increase " or " reduction " word (verb, Verb) and Indicate the Feature Words (noun, Noun) of object.The word for indicating " increase " includes: growth, supplement, expand, fill up, promoting, adds It is more, extend, increase, adding, reinforcing, expanding, attaching, replenishing, increasing, increasing, filling.Indicate " reduction " word include: It reduces, omit, weakening, deleting, slackening, memorandum, diminution, mitigate, reduce, eliminating, subtracting and cut, shrink, cutting down, deleting, reducing, subtracting Less, lower, weaken etc..For example, demand short sentence " increasing some resolution ratio " can be identified " increase " and " be differentiated Rate ", wherein verb is " increase " and Feature Words are " resolution ratio ".For demand short sentence " delete some unnecessary processes ", It can identify " deletion " and " process ", wherein verb is " deletion " and Feature Words are " process ".
In some embodiments, Feature Words are extracted by expressing the word of user demand indirectly in identification demand short sentence And construct demand model.The embodiment be by user's repeatability emphasize judge demand, for example, passing through identification degree adverb Or frequency adverbial word or punctuation mark extract Feature Words.It in some embodiments, can be by identifying that word (noun) adds The word (adverbial word) for the repetition n times (n is positive integer) expressed emphasis extracts Feature Words plus the structure of viewpoint word.For example, table Show the word emphasized include: certainly, again and again, very much, necessarily, very, too, soon, repeatedly, very, very, all, often, exceptionally, More, perhaps, just, obviously, slightly, simply, only, temporarily, much, only, deliberately, unexpectedly, do not have, never, by force etc..For example, In short sentence, " this interface Mrs Mrs is too ugly!" in, due to identifying the word expressed emphasis " too " and number of repetition n =5, therefore identification feature word is " interface ", viewpoint word is " ugly ".In some embodiments, it is also possible to pass through identification word (noun) extracts spy plus the structure for the punctuation mark for repeating n times (n is positive integer) plus viewpoint word (verb or adjective) Levy word.The punctuation mark may include, "!", ", ", ".", " ... ", "? ", " * " etc..For example, in demand short sentence, " bottle green is tangible It is plain." in, due to identify duplicate punctuation mark ".", and number of repetition n=5, and identify that Feature Words are " bottle green ", viewpoint word are " plain ".
By above-mentioned processing, series of features word, viewpoint word and demand estimation model finally can get.In some implementations It, can be with the frequency of statistical nature word in example.The number of repetition of Feature Words is higher, then attention rate is also higher.When feeling polarities are When derogatory sense, illustrate that demand property is higher.
In some embodiments, can be with the frequency of Statistics word, and provide viewpoint value.Viewpoint value indicates the viewpoint Feeling polarities, specific numerical value are [- 1,1] section.Wherein, negative number representation negative emotion, positive number indicate positive emotion, and Absolute value is bigger, and feeling polarities are more obvious.
In step 207, method passes through series of features word construction feature dictionary.In some embodiments, it is examined using card side It tests (Chi-Squared Test) and carrys out construction feature dictionary.Chi-square Test be by X2 be distributed based on one kind common assume inspection Proved recipe method, its null hypothesis H0 is: observed frequency and expecterd frequency do not have difference.Detailed process is: assume initially that H0 is set up, X2 value is calculated based on this premise, which indicates the departure degree between observed value and theoretical value.It can according to X2 distribution and freedom degree To determine the probability P for obtaining current statistic amount and more extreme case in the case where H0 assumes to set up.If P value very little, explanation Observed value and theoretical value departure degree are too big, should refuse null hypothesis, and there were significant differences between two comparative quantities, have independent Property;Otherwise it cannot refuse null hypothesis, that is, must not believe that between two comparative quantities have independence.Chi-square Test is in natural language It is chiefly used in carrying out feature extraction in speech processing.
In some embodiments, calculating these Feature Words can for the influence degree of demand estimation, that is, Feature Words A possibility that being decided to be demand, the possibility are referred to as characteristic value.If the characteristic value of certain Feature Words is lower than threshold value, abandon; Otherwise retain the specific word.By sorting to characteristic value, Feature Words corresponding to characteristic value in the top is selected to be added to spy It levies in dictionary, thus constitutive characteristic dictionary.
Fig. 7 shows the signal of the comprising modules of the intelligent sound data-requirements extraction system according to this specification embodiment Figure.With reference to Fig. 7, which includes recording module 301, speech recognition module 302, corpus preprocessing module 303, word segmentation module 304 and demand estimation module 305.Recording module 301 is for obtaining voice data.Speech recognition module 302 is used for voice number According to being pre-processed, text data is obtained;Corpus preprocessing module 303 is for segmenting each sentence in text data Processing;Word segmentation module is used to obtain sentence corresponding to text data by the text data after word segmentation processing compared with feature lexicon Vector;Demand estimation module 304 for judge from the sentence vector of input the corresponding short sentence of this vector be statement of requirements or Non- statement of requirements.For the function and implementation process of module each in device, it can refer to and accordingly be walked in former approach embodiment Rapid implementation process.For the sake of simplicity, it is omitted here details.
Availability deterministic process in the intelligent sound data of the embodiment of the present application is carried out referring to Fig. 8 to Figure 10 detailed Explanation.
Fig. 8 shows the flow chart of the intelligent sound data availability deterministic process according to the embodiment of the present application.
In step 501, voice data is obtained.Specifically, step 501 by microphone or microphone array, recorder, The sound pick-up outfits such as recording pen obtain voice signal.
In step 502, the voice data of acquisition is pre-processed, obtains the text data for analysis.Pretreatment is Voice data is loaded into memory by finger, and additions and deletions as needed change a process of part of word.Pretreatment includes: to know Not, refer to and voice data is identified as lteral data, to form text data;Split, refer to according in text data by fullstop The punctuation mark that interval is indicated in the long sentence of separation, is split as short sentence for the long sentence;And purification, refer to and removes the text Invalid content unrelated with interview content in original voice data in data.In some embodiments, pretreatment may include unrelated Filtered symbol and the filtering of non-core ingredient.
In step 503, word segmentation processing is carried out to each short sentence in pretreated text data.In some embodiments In, the part of speech of each word can be labeled.
In step 504, it is based on feature lexicon, sentence vector corresponding to each short sentence in the text data after obtaining participle. In some embodiments, feature lexicon be by Feature Words and eigenvalue cluster at two-dimensional matrix.Feature Words are to have very in corpus Big possibility expresses the word that target language material can be positioned as to availability.What characteristic value was characterized that word is positioned as availability can The mathematical expression of energy property.
In some embodiments, it for each word in the short sentence after participle, is looked into corpus and feature lexicon respectively It looks for.If the word did not occur in corpus, lookup result 0;If the word occurred but did not had in corpus Occurred in feature lexicon, then lookup result is 1;If the word appears in feature lexicon, lookup result 2;Thus Form sentence vector corresponding to the short sentence.In some embodiments, availability judgment models may include multiple rules, for meeting The sentence of one or more rules, can be judged as availability sentence, and output result is 1;Otherwise, it can be judged as non- Availability sentence, output result are 0.
In step 505, sentence vector is input to availability judgment models.In some embodiments, availability judgment models It is configured as exporting judging result according to the semantic unit vector (for example, sentence vector) of input.In some embodiments, judge to tie Fruit, which can be, indicates whether sentence belongs to the value of availability sentence.
In step 506, according to the output of availability judgment models as a result, determining which kind of the short sentence is classified as.One In a little embodiments, output result can be 1 or 0.When exporting result is 1, which is divided into availability sentence;When output is tied When fruit is 0, which is judged as non-availability sentence.In some embodiments, output result can be such as under type: this is short Sentence a possibility that being divided into availability sentence e.g. 0.7, a possibility that being divided into non-availability sentence e.g. 0.3, therefore, Final the being determined to be available property sentence of the short sentence.
In some embodiments, all sentences for expressing availability are subjected to clustering processing.The detailed step of clustering processing Suddenly it is referred to the description of Examples below.
Below in connection with the building and/or process of Fig. 9 detailed description availability judgment models and feature lexicon.
Fig. 9 shows the building and/or process of availability judgment models and feature lexicon according to this specification embodiment Flow chart.In some embodiments, the building and/or process are by manually obtaining.In some embodiments, the building and/or Process is completed by computer program.
In step 601, voice data is obtained.In step 602, voice data is pre-processed.Step 601 is to step 602 is similar to step 502 with above-mentioned steps 501.In step 603, feature is carried out to text data using expert's mark database Mark.For example, the text data after mark may is that X: sentence.Wherein, X can be 0 (indicating non-availability sentence) or 1 (table Show availability sentence).In step 604, the text data after mark is segmented.Step 604 is similar with above-mentioned steps 504.
Text data after step 605, participle is input into classifier for training availability judgment models.One In a little embodiments, classifier uses support vector machines (Support Vector Machine, SVM) classifier.SVM classifier is A kind of two disaggregated models of classics, for feature, significantly classification effect is clearly, biggish for granularity available for it Property analytical effect is significant.The basic model of SVM classifier, which is defined on feature space, keeps the distance between two classes maximum Linear classifier.SVM classifier can also include kernel function, and kernel function has the work that low-dimensional data is converted to high dimensional data With.By introducing kernel function, inseparable problem can be converted into separable problem, this makes it substantial non- Linear classifier can be suitable for the data of linearly inseparable.
In some embodiments, this method selects linear kernel function, for example, k (x1, x2)=x1Tx2.In some embodiments In, other kernel functions, such as polymerization kernel function, radial base core letter can be selected according to the size and other factors of text data The Non-linear Kernels functions such as number.
In some embodiments, for this method by a series of calculating processes such as Lagrange duality operations, finally obtaining can With property judgment models and series of features word.Availability judgment models are the executable algorithms of a set of computer, and input is sentence Vector exports as classification belonging to this.
In some embodiments, pass through the dependence of identification sentence, that is, dependency grammar is carried out to sentence (Dependency Parsing) analysis, to construct availability judgment models.Specifically, the ingredient of sentence can be divided into subject, Predicate, object, attribute, the adverbial modifier, complement etc..Relationship between each ingredient mainly has subject-predicate relationship (SBV), dynamic guest's relationship (VOB), relationship (ATT) in fixed, relationship (ADV), dynamic benefit relationship (CMP), coordination (COO) etc. in shape.Dependency grammar (Dependency Parsing, DP) refers to disclosing its syntax knot by the dependence in metalanguage unit between ingredient Structure, that is, the grammatical item in identification sentence, and analyze the relationship between these ingredients.Specifically, interdependent syntactic analysis identifies sentence " Subject, Predicate and Object determines shape benefit " these grammatical items in son, and analyze the relationship between each ingredient.
The step of availability discourse analysis most critical, is how to express the opinion of evaluation holder in a structured way, Can general<evaluation object, evaluation phrase>be considered as an evaluation unit.Evaluation object can be nominal phrase, verb character phrase with And simple sentence type phrase, it is mainly in the verb position of subject position, object position, structure of complementation.Evaluation phrase is then mainly in Predicate position, V-O construction verb position and complement position.Evaluation phrase shows as one group of phrase continuously occurred, can be with It is to be composed of degree adverb, negative adverb and evaluating word, is also possible to nominal phrase, adjective phrase, verb character Phrase or by three kinds of the front simple sentence type phrase being composed.As long as recalled using corresponding rule the subject-predicate language in sentence, Dynamic object, dynamic complement, usability evaluation unit can be extracted.
In some embodiments, there are SBV in sentence, and the part of speech of dependence centering qualifier is noun, abbreviation In the case that the part of speech of word or alien word and core word is verb, if only existing SBV in sentence, evaluation object and evaluation are short Language is in subject, predicate position, the i.e. qualifier of extraction<SBV respectively, the core word of SBV>be used as usability evaluation unit, example Such as,<stability, improve>.If there are SBV and VOB in sentence, wherein the core word of SBV is the core word of VOB, then extract < The qualifier of the qualifier of SBV, the core word of VOB and VOB>be used as usability evaluation unit, for example,<evaluation frame, without feature >.If there are SBV and CMP in sentence, wherein the core word of SBV is the core word of CMP, then the qualifier of extraction < SBV, CMP Core word and CMP qualifier>, for example,<the page, load is slow>.
In some embodiments, there are SBV in sentence, and the part of speech of dependence centering qualifier is noun, abbreviation In the case that the part of speech of word or alien word and core word is the word or idiom of adjective, modification noun, if only existed in sentence SBV, then evaluation object and evaluation phrase are in subject, predicate position, the i.e. qualifier of extraction < SBV, the core of SBV respectively Word>, for example,<interface, good-looking>.If there are SBV and COO in sentence, and the core word of SBV is the core word of COO, and COO is closed Be centering qualifier part of speech be adjective, modification noun word or idiom, then the qualifier of extraction < SBV, the core of SBV The qualifier of word and COO>, for example,<blade-rotating, slow and card>.If only existing VOB in sentence, and the word of relationship centering qualifier Property is that the part of speech of noun, abbreviation or alien word and core word is verb, then the qualifier of extraction < VOB, the core word of VOB >, for example,<evaluation frame, do not have>.
By above-mentioned processing, it finally can get a series of evaluation objects and evaluation phrase, usability evaluation unit, Yi Jike With property judgment models.
In step 607, method passes through series of features word construction feature dictionary.In some embodiments, it is examined using card side It tests (Chi-Squared Test) and carrys out construction feature dictionary.Chi-square Test be by X2 be distributed based on one kind common assume inspection Proved recipe method, its null hypothesis H0 is: observed frequency and expecterd frequency do not have difference.Detailed process is: assume initially that H0 is set up, X2 value is calculated based on this premise, which indicates the departure degree between observed value and theoretical value.It can according to X2 distribution and freedom degree To determine the probability P for obtaining current statistic amount and more extreme case in the case where H0 assumes to set up.If P value very little, explanation Observed value and theoretical value departure degree are too big, should refuse null hypothesis, and there were significant differences between two comparative quantities, have independent Property;Otherwise it cannot refuse null hypothesis, that is, must not believe that between two comparative quantities have independence.Chi-square Test is in natural language It is chiefly used in carrying out feature extraction in speech processing.
In some embodiments, the influence degree that these Feature Words judge availability, that is, Feature Words energy are calculated A possibility that being enough decided to be availability, the possibility are referred to as characteristic value.If the characteristic value of certain Feature Words is lower than threshold value, lose It abandons;Otherwise retain the specific word.By sorting to characteristic value, Feature Words corresponding to characteristic value in the top is selected to be added to In feature lexicon, thus constitutive characteristic dictionary.
Figure 10 shows the comprising modules of the intelligent sound data availability extraction system according to this specification embodiment Schematic diagram.With reference to Figure 10, which includes recording module 701, speech recognition module 702, corpus preprocessing module 703, participle Module 704 and availability judgment module 705.Recording module 701 is for obtaining voice data.Speech recognition module 702 for pair Voice data is pre-processed, and text data is obtained;Corpus preprocessing module 703 be used for each sentence in text data into Row word segmentation processing;Word segmentation module is used for by the text data after word segmentation processing compared with feature lexicon, and it is right to obtain text data institute The sentence vector answered;Availability judgment module 704 from the sentence vector of input for judging that the corresponding short sentence of this vector is available Property sentence is also non-availability sentence.It, can be real with reference to former approach for the function and implementation process of module each in device Apply the implementation process of corresponding steps in example.For the sake of simplicity, it is omitted here details.
Above to interview corpus is analyzed by neural network model and is obtained demand information and availability analysis with And further sentiment analysis is illustrated, below to being analyzed by neural network model interview corpus and obtain feelings Feel information to further illustrate.
In some embodiments, for the extraction of emotion information, interview corpus can be inputted into nervus opticus network model, It extracts and reflects in interview corpus by the sentence of the sentence of interview person's polarity emotion and neutral emotion;And it is the sentence of polarity emotion is defeated Enter third nerve network model, obtains positive emotion sentence and negative sense emotion sentence in the sentence of polarity emotion.Similarly, may be used To use two concatenated TextCNN classifiers, wherein one can be used as emotion and extract model, realize polarity emotion sentence With two classification of neutral emotion sentence, another can be used as sentiment classification model to realize to the further of polarity emotion sentence Ground classification.
The analytic process that emotion information in interview corpus extracts is further described referring to Figure 11.
Figure 11 shows the flow chart of the voice content analysis method according to shown in some embodiments of the present application.Process 800 may be embodied as one group of instruction in the non-transitory storage medium in voice content analytical equipment.Voice content analysis dress One group of instruction can be executed and can correspondingly execute the step in process 800 by setting.
The operation of shown process 800 presented below, it is intended to be illustrative and be not restrictive.In some embodiments In, process 800 can add one or more operation bidirectionals not described when realizing, and/or delete one or more herein Described operation.In addition, shown in fig. 8 and operations described below sequence limits not to this.
In 810, the available voice data of voice content analytical equipment.
The voice data can be recording or video.In some embodiments, the voice data can be interview Recording or video.For example, the voice data can be the interview recording of B to C.The interview is recorded Interview person's recording and interviewee's recording.
In 820, voice content analytical equipment can be based on the voice data, obtain corresponding text data.
Specifically, voice content analytical equipment can carry out speech recognition to the voice data, by the voice data It is converted into urtext, then the urtext is converted to and meets sentiment analysis model (in step 830) data format and want The text data asked.
In some embodiments, voice content analytical equipment can the only corresponding text data of fetching portion voice data. For example, recording for interview, voice content analytical equipment can only obtain the corresponding text data of interviewee's recording.In turn, language Sound content analysis devices can more accurately analyze the emotion of interviewee's (for example, product user).
In some embodiments, the text data is sentence vector.The sentence vector can be one or more dimensions vector, language Sound content analysis devices can obtain sentence vector by following steps.
Step 1, the available urtext of voice content analytical equipment, the i.e. result of voice data speech recognition.
Step 2, to complete sentence each in the urtext, voice content analytical equipment can will be described complete Sentence subordinate sentence, obtain at least one short sentence.
In some cases, voice content analytical equipment can be complete to this by the punctuation mark in a complete sentence Whole sentence carries out subordinate sentence, such as comma, pause mark, colon, branch.As an example, voice content analytical equipment can be by one Complete sentence " I is delithted with the size and color of this mobile phone, but the setting of the volume control key of this mobile phone very not Rationally, I thinks that the volume control key of mobile phone is arranged in the more convenient user's operation in right side ", it is divided into three short sentences.Described three Short sentence according in the complete sentence comma carry out subordinate sentence, respectively " I is delithted with the size and color of this mobile phone ", " but the setting of the volume control key of this mobile phone is very unreasonable " and " I thinks that the volume control key of mobile phone is arranged on right side More convenient user's operation ".It is to be understood that a complete long sentence is divided by multiple short sentence by subordinate sentence, The complexity for reducing sentence is more advantageous to the analysis of sentence, can increase the accuracy of the analysis of sentence.
Step 3, voice content analytical equipment can determine the sentence vector of at least one short sentence.
Specifically, for each short sentence at least one described short sentence, voice content analytical equipment can be based on Word2vec model determines the term vector of the short sentence;The term vector for being then based on the short sentence determines the sentence vector.It is described Word2vec model can be trained by user oneself, be also possible to the included word2vec model of hanlp kit.
In some cases, voice content analytical equipment determines that the process of term vector can wrap based on Word2vec model Include: (1) participle/stem extracts and lemmatization, for example, segmented for Chinese corpus, and for English corpus, It does not need then to segment, but since English is related to various tenses, so to carry out stem extraction and lemmatization to it;(2) structure Word making allusion quotation and statistics word frequency, for example, needing to be traversed for all texts in this step, finding out the word occurred, and unite Count the frequency of occurrences of each word;(3) tree structure is constructed, for example, the probability of occurrence according to each word constructs Huffman (Huffman) Tree, so that all classification are all in leaf node;(4) binary code where node is generated, wherein binary code reflects node Position in tree, so as to find corresponding leaf node step by step from root node according to coding;(5) each nonleaf node is initialized Intermediate vector and leaf node in term vector, for example, each node in Hofman tree all store an a length of m to Amount, but leaf node is different with the meaning of the vector in non-leaf node, and specifically, what is stored in leaf node is the term vector of each word, It is the input as neural network, rather than what is stored in leaf node is intermediate vector, the ginseng corresponding to hidden layer in neural network Number determines classification results with input together;(6) training intermediate vector and term vector, for example, in the training process, model can be assigned These abstract one suitable vectors of intermediate node are given, this vector represents its corresponding all child node, for CBOW The term vector of multiple words near centre word is added the input as system first by model, and according to centre word aforementioned The binary code generated in step step by step carry out classification and according to classification results training intermediate vector and term vector.
In some cases, voice content analytical equipment can determine that the mean value of the term vector of the short sentence is the short sentence Sentence vector.Certainly, voice content analytical equipment can also string together all term vectors of the short sentence as sentence vector.
In some embodiments, voice content analytical equipment obtains sentence vector and may further include to the urtext It is pre-processed.The pretreatment includes being analyzed the vocabulary of urtext and being deleted unnecessary vocabulary.As an example, Voice content analytical equipment can delete at least one of modal particle, stop words and messy code in the urtext.
The modal particle is to indicate the toneFunction word, for example,,,.The stop words indicates at information Ignore certain words or word automatically during reason, can be screened according to information processing purpose.For example, for product interview, The stop words can refer to that keyword extraction result neutralizes the phrase that actual demand is not consistent.The messy code refers in speech recognition Unrecognized part.Voice content analytical equipment can delete original text based on the tone vocabulary constructed in advance, deactivated vocabulary Modal particle, stop words in this.
In 830, the text data can be inputted trained sentiment analysis model, institute by voice content analytical equipment Stating trained sentiment analysis model includes that trained emotion extracts model and trained sentiment classification model.
The trained emotion, which extracts model, can extract polarity emotion text data, the trained emotional semantic classification Model can classify to polarity emotion text data.The trained emotion extracts model and the trained emotion Disaggregated model is obtained by initial model training, and specific training process is as follows.
Model is extracted for trained emotion, voice content analytical equipment can be obtained by following steps:
Step 1 obtains the training data marked.The training data marked includes the middle disposition being marked Feel text data and non-neutral emotion text data.
Here neutral emotion text data refer to that the emotion of the text data expression is neutral, such as " I feels For first aspect ", " general ".The non-neutral emotion text data, also known as polarity affection data, including positive emotion Text data and negative sense text emotion data refer to that the emotion of the text data expression is more strong relative to neutral emotion. For example, positive emotion text data may include " liking ", " it can bring some conveniences to me to a certain extent ", " this Many times have been saved in a design ".In another example negative sense emotion text data may include that " I is will not to be done by him in fact ", " this color makes people uncomfortable ", " nobody can select this mode ".It is of course also possible to use other classification standards pair Emotion is classified, the classification and the sentiment analysis method for being adapted to the classification still fall within the application scope of the claimed it It is interior.
In some cases, the training data marked can be marked by expert, can also be by user annotation.Training Data are marked by expert, and annotation results accuracy is high;For training set by user annotation, annotation results are more personalized, are applicable in In demands of individuals.
In some cases, the training set marked is the text data of specific area.It is to be understood that It can be dedicated for the voice data of the specific area by the sentiment classification model that the text data training of the specific area obtains Sentiment analysis.
The training data marked is inputted initial emotion and extracts model by step 2.The initial emotion mentions Modulus type is initial neural network model, such as TextCNN.The initial emotion extracts model and contains multiple features and more A initial parameter.
Multiple features that model is extracted according to emotion can make the feature vocabulary that emotion extracts model.The Feature Words Contained in table it is multiple indicate polarity emotion (positively and negatively emotion) vocabulary, such as " liking ", " having deep love for ", " detest ", " beg for Detest ".
Step 3 determines the training when the initial emotion, which extracts model, reaches the condition of convergence after training Good emotion extracts model.
During training, emotion extracts model and judges that it exports the excellent of result according to the training data marked It is bad, and then initial parameter is continuously adjusted, constantly optimum results, the emotion after training extract model and reach convergence item Part.The condition of convergence can be less than first threshold for loss function or cycle of training is greater than second threshold, first threshold Value and second threshold rule of thumb can be manually arranged.
For trained sentiment classification model, voice content analytical equipment can be obtained by following steps:
Step 1 obtains the training data marked.The training data marked includes the positive feelings being marked Feel text data and negative sense emotion text data.
In some cases, the training data marked can be marked by expert, can also be by user annotation.Training Data are marked by expert, and annotation results accuracy is high;For training set by user annotation, annotation results are more personalized, are applicable in In demands of individuals.
In some cases, the training set marked is the text data of specific area.It is to be understood that It can be dedicated for the voice data of the specific area by the sentiment classification model that the text data training of the specific area obtains Sentiment analysis.
The training data marked is inputted initial sentiment classification model and is trained by step 2.It is described initial Sentiment classification model be initial neural network model, such as TextCNN.The initial emotion extraction model contains multiple Feature and multiple initial parameters.
Multiple features that model is extracted according to emotion can make the feature vocabulary that emotion extracts model.The Feature Words Contained in table it is multiple indicate polarity emotion (positively and negatively emotion) vocabulary, such as " liking ", " having deep love for ", " detest ", " beg for Detest ".
Step 3 determines the training when the initial sentiment classification model reaches the condition of convergence after training Good sentiment classification model.
During training, sentiment classification model judges that it exports the excellent of result according to the training data marked It is bad, and then initial parameter is continuously adjusted, constantly optimum results, the sentiment classification model after training reach convergence item Part.The condition of convergence can be less than first threshold for loss function or cycle of training is greater than second threshold, first threshold Value and second threshold rule of thumb can be manually arranged.
In 840, voice content analytical equipment can extract model for the textual data by the trained emotion According to being divided into polarity emotion text data and neutral emotion text data.
Specifically, voice content analytical equipment can extract model to polarity emotion text data by trained emotion Different marks is carried out with neutral emotion text data, and then the two is classified.As an example, emotion extraction model can be by pole Disposition sense text data is labeled as(neutral emotion text data markers are " 2 " by i.e. non-2).Lower section lists emotion and mentions The exemplary output result of modulus type:
It can bring some conveniences to me to a certain extent ";
I will not be done by him in fact ";
" 2 I feel for first aspect ".
In some embodiments, the neutral emotion text data that voice content analysis model determines can be used for analyzing described Relevant user demand in text data (step 820).For example, above-mentioned voice data is product interview recording, the object of interview It is user, the content of interview is view of the user to product.At this point, the demand of the user is related in the text data User demand.It, can be relevant with reference to other in the application using the description of neutral emotion text data analysis user demand Description.
In 850, voice content analytical equipment can be by the trained sentiment classification model by the polarity feelings Sense text data is divided into positive emotion text data and negative sense emotion text data.
Specifically, voice content analytical equipment can be by trained sentiment classification model to positive emotion text data Different marks is carried out with negative sense emotion text data, and then the two is classified.As an example, sentiment classification model can will just It is labeled as " 1 " to emotion text data, is " 0 " by negative sense emotion text data markers.Lower section lists sentiment classification model Exemplary output result:
" 1 it can bring some conveniences to me to a certain extent ";
" 0 in fact I will not be done by him ".
In 860, voice content analytical equipment can be according to the positive emotion text data and negative sense emotion text Notebook data obtains sentiment analysis result.
In some embodiments, by each short sentence in sentence each in above-mentioned voice data after classification, in voice Sentiment analysis result can be determined according to the ratio of positive emotion text data and negative sense emotion text data by holding analytical equipment. As an example, the ratio that positive emotion text data account for all voice data (that is, its corresponding text data) is 65%, negative sense The ratio that emotion text data account for all voice data (that is, its corresponding text data) is 10%, neutral emotion text data The ratio for accounting for all voice data (that is, its corresponding text data) is 25%.So, voice content analytical equipment it can be concluded that The Sentiment orientation of voice data is forward direction.
In some embodiments, voice content analytical equipment can be respectively to positive emotion text data and negative sense emotion text The particular content of notebook data is analyzed, so that it is determined that sentiment analysis result.As an example, voice content analytical equipment is to product The voice data of interview is analyzed, and positive emotion text data and negative sense emotion text data are obtained.Voice content analysis dress Setting further to analyze the advantages of obtaining product to the forward direction emotion text data, analyze the negative sense emotion text data The shortcomings that product.The advantages of product and disadvantage can be used as sentiment analysis result.
In some embodiments, the voice content analysis method may further include: determine the voice data Field belonging to content, and the field according to belonging to the content of the voice data is determining and transfers above-mentioned trained emotion point Analyse model (for example, trained emotion extracts model, trained sentiment classification model).
For example, voice content analytical equipment can determine neck belonging to the content of corresponding voice data according to text data Domain.As an example, voice content analytical equipment can extract keyword to text data, text data pair is determined according to keyword Field belonging to the content for the voice data answered, such as household electrical appliance, sport.
For another example voice content analytical equipment, which can receive user, inputs field belonging to the content for determining voice data. User's input includes field belonging to the content of the voice data.
In some embodiments, corpus after pretreatment can first be inputted to demand estimation model to obtain demand class language Non- demand class corpus is then inputted sentiment classification model again to obtain polarity corpus and neutral language by material and non-demand class corpus Material, is finally classified as front corpus and negative corpus for polarity corpus again.In the above-described embodiments, non-demand class language is being obtained After material, non-demand corpus and its copy can be inputted into sentiment classification model and availability disaggregated model respectively, thus from feelings Sense disaggregated model is to obtain polarity corpus and neutral corpus and obtain availability corpus and non-availability from availability disaggregated model Corpus.
In some embodiments, corpus after pretreatment can first be inputted to sentiment classification model to obtain polarity corpus With neutral corpus, neutral corpus is then inputted into demand estimation model to obtain demand class corpus and non-demand class corpus.Upper It states in embodiment, after obtaining non-demand class corpus, non-demand corpus can be inputted into availability disaggregated model, can be used Property corpus and non-availability corpus.
In some embodiments, corpus after pretreatment and its copy can be inputted into demand estimation model and feelings respectively Disaggregated model is felt, to obtain demand class corpus and non-demand class corpus from demand estimation model, and from sentiment classification model to obtain Polarity corpus and neutral corpus are obtained, polarity corpus is finally classified as front corpus and negative corpus again.
In some embodiments, corpus after pretreatment and its copy can be inputted to demand estimation model and can respectively With property disaggregated model, to obtain demand class corpus and non-demand class corpus from demand estimation model, and from availability disaggregated model To obtain availability corpus and non-availability corpus, then again by non-demand class corpus input sentiment classification model to obtain polarity Corpus and neutral corpus, are finally classified as front corpus and negative corpus for polarity corpus again.
In some embodiments, corpus after pretreatment and its copy can be inputted to sentiment classification model and can respectively With property disaggregated model, to obtain polarity corpus and neutral corpus from sentiment classification model, and from availability disaggregated model to obtain Neutral corpus is then inputted demand estimation model again to obtain demand class corpus and non-by availability corpus and non-availability corpus Demand class corpus.In the above-described embodiments, non-availability corpus can be inputted into demand estimation model.In the above-described embodiments, Non- availability corpus and neutral corpus can be merged into input demand estimation model.
In some embodiments, corpus after pretreatment and its first authentic copy and triplicate can be inputted respectively needs Judgment models, sentiment classification model and availability disaggregated model are asked, to obtain demand class corpus and non-need from demand estimation model Class corpus is sought, obtains polarity corpus and neutral corpus from sentiment classification model, and from availability disaggregated model to obtain availability Corpus and non-availability corpus.
In some embodiments, it both may include user in polarity corpus to the polarity emotion information of product, also can wrap Containing user to the otherwise polarity emotion information except product, and it also can analyze out user to product in availability corpus Polarity emotion information.In some embodiments, polarity can be extracted by the way that availability corpus and polarity corpus are sought union The polarity emotion information unrelated with product in corpus.It in some embodiments, can be defeated by front corpus and negative corpus difference Enter availability disaggregated model, therefrom to filter out positive availability corpus and negative availability corpus respectively.In some embodiments In, non-demand category information is also indicated as neutral corpus.
In some embodiments, it can construct and/or train by the method for building and/or Training Requirements Analysis model Sentiment analysis model and/or availability judgment models.It in some embodiments, can be by constructing and/or availability being trained to sentence The method of disconnected model constructs and/or trains sentiment analysis model and/or demand estimation model.It in some embodiments, can be with Availability judgment models and/or demand estimation are constructed and/or trained by constructing and/or training the method for sentiment analysis model Model.
It is described in detail referring to the sentence clustering method of Figure 12-17 pairs of the embodiment of the present application.
Figure 12 is the flow chart according to the semantic primitive clustering method of one embodiment of the application.In this embodiment, The method for generating interview report of the application can be by loading app realization on the terminal device, semantic primitive clustering method Realization can be realized by the processor of terminal device.This method can store in memory, and terminal device is receiving This method is executed when generating the triggering command of interview report.
As shown in figure 12, semantic primitive clustering method includes step 2000: obtaining multiple semantic primitives.Semanteme can be logical The expressed meaning come out of combination of the units at different levels and these units of language is crossed, in other words, semanteme is the language by language Element, word, phrase, sentence, sentence group be expressed to be come out.In this application, semantic primitive not only can be morpheme, word, phrase, sentence Son, sentence group can also be that letter, number, symbol, movement etc. can be configured as having certain semantic or make one as needed Any object for generating the association to certain semantic, is also possible to above-mentioned one or more any combination.In some embodiments In, semantic primitive be selected from any type of corpus, for example, audio corpus, corpus of text, video corpus, with computer language The corpus etc. of expression.In some embodiments, semantic primitive can come from the audio and/or text of the interview being described above report Original text.In some embodiments, semantic primitive may include the interested keyword of one or more users.In some embodiments In, semantic primitive can be the morpheme comprising user demand, word, phrase, sentence, sentence group, letter, number, symbol, movement etc., In this case, for example, a semantic unit can be in short " I wants a mobile phone ", it is also possible to a word " hand Machine ".In some embodiments, semantic primitive can be the morpheme for containing feeling polarities, word, phrase, sentence, sentence group, word Mother, number, symbol, movement etc., wherein the polarity (such as, positive, negative sense) of emotion indicates user to the hobby journey of certain an object Degree, in this case, for example, a semantic unit can be in short " I likes touch-screen mobile phone ".In some embodiments In, semantic primitive can be the one or more words or sentence that feeling polarities classification has been carried out at sentiment analysis model. In some embodiments, semantic primitive can be the one or more that demand classification has been carried out at Requirements Analysis Model Word or sentence.In some embodiments, the corpus clustered, which can be, is classified as positive emotion via sentiment classification model Corpus set.In some embodiments, the corpus clustered, which can be, is classified as negative sense feelings via feeling polarities analysis model The corpus set of sense.In some embodiments, the corpus clustered, which can be, to be classified as via feeling polarities analysis model The corpus set of disposition sense.In some embodiments, the corpus clustered can be to be classified as via availability disaggregated model The corpus set of usability evaluation.In some embodiments, the corpus clustered can be to be judged by demand estimation module For the corpus set of non-usability evaluation.In some embodiments, the corpus clustered can be by demand estimation module It is judged as the corpus set of demand.In some embodiments, the corpus clustered can be to be judged by demand estimation module For the corpus set of non-demand.In some embodiments, the corpus clustered can be in above-mentioned corpus set One or more combinations.In some embodiments, the corpus clustered may be pretreated, such as, keyword identification, Keyword extraction, non-key word removal, punctuate identification etc..
As shown in figure 12, semantic primitive clustering method further includes step 4000: determining one based on the multiple semantic primitive A or multiple cluster centres.The process that the set of physics or abstract object is divided into the multiple classes being made of similar object is claimed For cluster.It is the set of one group of data object, these objects and the same cluster by cluster operation cluster (or cluster) generated In object it is similar to each other, with other cluster in object it is different.Cluster centre is a most important object in cluster, most The cluster can be represented and be best able to explain other objects in the cluster.For example, cluster centre sentence expresses this to a certain extent The theme or core concept of interview.In some embodiments, one clusters only one cluster centre.In some embodiments, Cluster centre can be the one or more semantic primitives selected from multiple semantic primitives, each cluster centre calculate its with It is in other words, similar at this as references object when similarity between other semantic primitives in the multiple semantic primitive It spends in calculating process, the references object needs the calculating each similarity of progress between other semantic primitives.
In some embodiments, include: based on the step 4000 of the determining one or more cluster centres of multiple semantic primitives One or more cluster centre is determined from multiple semantic primitives by AP algorithm.AP (affinity propagation) method Also known as affinity propagation algorithm, wherein at any point in time, the size of each information is reflected when previous number strong point is selected Select affinity of another data point as its cluster centre.In AP algorithm, all data points are all as potential cluster Center (also known as cluster center), and line constitutes a network to data point between any two, and each data point is considered as one Network node.AP algorithm calculates the cluster of each sample by message (the i.e. Attraction Degree and degree of membership) transmitting on each side in network Center, wherein Attraction Degree refers to that the first data point is suitable as the degree of the cluster centre of the second data point, and degree of membership refers to the second number Strong point selects the first data point as the appropriateness of its cluster centre.In other words, AP algorithm along network edge by passing Transmit information with returning until one group of good cluster center occur and generating corresponding cluster.
In some embodiments, include: based on the step 4000 of the determining one or more cluster centres of multiple semantic primitives Each of the multiple semantic primitive is all determined as cluster centre.
In some embodiments, include: based on the step 4000 of the determining one or more cluster centres of multiple semantic primitives One or more cluster centre is determined from multiple semantic primitives based on the similarity of multiple semantic primitives between any two.Such as preceding institute It states, cluster refers to similar object (such as, with the semantic primitive of similar semantic) to be divided by the method for static classification Different groups or more subsets, so that the member object in the same group or subset all has certain similarity.? In some embodiments, similarity refers to the similar degree of two difference semantic primitives, can show as the two different languages The distance between adopted respective mathematical character of unit, such as, Euclidean distance, manhatton distance, Infinite Norm distance, geneva away from From, COS distance, Hamming distance etc..For example, can be calculated by HanLP (Han Language Processing) external member Similarity between two semantic primitives, wherein HanLP is a series of Java kit of models and algorithm composition, for popularizing The application of natural language processing in production environment.
Figure 13 is to determine one based on the similarity of multiple semantic primitives between any two according to one embodiment of the application The flow chart of a or multiple cluster centres.
As shown in figure 13, one or more cluster centres are determined based on the similarity of the multiple semantic primitive between any two Including step 4200: successively choosing each of multiple semantic primitives and be used as candidate semantic unit.
As shown in figure 13, one or more cluster centres are determined based on the similarity of the multiple semantic primitive between any two It further include step 4400: to each candidate semantic unit: calculating separately each candidate semantic unit and the multiple semanteme Similarity between each of remaining semantic primitive in unit, and if exist extremely in the remaining semantic primitive A few similarity is higher than the semantic primitive of predetermined threshold, then each candidate semantic unit is determined as cluster centre.
Figure 14 is to calculate separately each candidate semantic unit and multiple semantic primitives according to one embodiment of the application In each of remaining semantic primitive between similarity flow chart.
As shown in figure 14, it calculates separately in the remaining semantic primitive in each candidate semantic unit and multiple semantic primitives Similarity between each includes step 4420: calculating the candidate semantic vector of each candidate semantic unit.Semantic vector can Vector to be a semantic primitive indicates.In some embodiments, semantic vector can be digital vectors, symbolic vector, word Female vector, word vector, term vector, word vector, sentence vector, vector paragraph etc..In some embodiments, term vector can be based on one A or multiple word vectors, which calculate, to be obtained.In some embodiments, sentence vector can calculate acquisition by term vector based on one or more. In some embodiments, vector paragraph can calculate acquisition by sentence vector based on one or more.In some embodiments, for same One semantic primitive can use identical semantic vector at sentiment analysis model and Requirements Analysis Model.In some implementations In mode, for the same semantic primitive, can at sentiment analysis model and Requirements Analysis Model using it is different it is semantic to Amount.It in some embodiments, can be at sentiment analysis model and Clustering Model using identical for the same semantic primitive Semantic vector.It in some embodiments, can be at Requirements Analysis Model and Clustering Model for the same semantic primitive Use identical semantic vector.It in some embodiments, can be in sentiment analysis model, need for the same semantic primitive It asks and uses identical semantic vector at analysis model and Clustering Model.In some embodiments, semantic vector is randomly assigned.One In a little embodiments, each of semantic vector element all represents the degree of association of the semantic primitive in some aspect interested Or weight.
Figure 15 is the stream according to the candidate semantic vector of each candidate semantic unit of calculating of one embodiment of the application Cheng Tu.
As shown in figure 15, the candidate semantic vector for calculating each candidate semantic unit includes step 4441: obtaining feature language Adopted cell list, wherein the Feature Semantics cell list includes one or more features semantic primitive.In some embodiments, feature Semantic primitive can be letter, number, symbol, word, sentence, paragraph, the article etc. for indicating feeling polarities.In some embodiments In, Feature Semantics unit can also represent morpheme, word, phrase, sentence, sentence group, the letter, number of some objective attribute of object Word, symbol, movement etc..In some embodiments, Feature Semantics unit represent the morpheme of the required object of user, word, phrase, Sentence, sentence group, letter, number, symbol, movement etc..In some embodiments, Feature Semantics unit can mark dictionary from expert Middle selection, can also be customized according to demand.
As shown in figure 15, the candidate semantic vector for calculating each candidate semantic unit includes step 4442: determining institute respectively State the degree of association of each candidate semantic unit Yu each Feature Semantics unit.In some embodiments, the degree of association can be the language The degree for the feeling polarities that adopted unit goes out expressed by some Feature Semantics unit.In some embodiments, the degree of association can be with It is the desirability that the semantic primitive goes out expressed by some Feature Semantics unit.In some embodiments, the degree of association It is directly proportional to the frequency that each Feature Semantics unit occurs in each candidate semantic unit.
As shown in figure 15, the candidate semantic vector for calculating each candidate semantic unit includes step 4443: by described every The degree of association of a candidate semantic unit and each Feature Semantics unit generates the candidate semantic vector.In some embodiments, The degree of association can be directly proportional to the frequency that each Feature Semantics unit occurs in candidate semantic unit.In some embodiments, The degree of association can be directly proportional to the tone power of attribute modified in candidate semantic unit to Feature Semantics unit.To obtain User illustrates to the scene of color preferences, it is assumed that the interested Feature Semantics cell list of user or Feature Semantics unit allusion quotation are including closing Keyword " red ", " orange ", " yellow ", " green ", " blue ", " white ", " black ", for semantic primitive, " I is not to like very much Joyous blue, I prefers white, but it is black that I am favorite ", it can analyze out user and affirmative held to " black ", " white " Attitude holds a negating attitude to " blue ", does not take a stand other colors, and likes " black " and be more than " white ".Calculating language When adopted vector, positive weight can be assigned to attitude certainly, negative weight is assigned to negative attitude, the imparting 0 of unknown attitude, while with Different weight sizes indicates the different degree liked.Based on the above principles, if the definition of vector is in the following order: { " red ", " orange ", " yellow ", " green ", " blue ", " white ", " black " }, available, the semanteme of the semantic primitive Vector [0,0,0,0, -0.5,0.5,1].In some embodiments, the selection of the keyword in Feature Semantics cell list, weight Numberical range and the rule of correspondence of weight and keyword can change according to actual needs.In some embodiments, availability Above method generation can be used in sentence vector needed for analysis model.In some embodiments, sentence needed for demand estimation model Above method generation can be used in vector.In some embodiments, sentence vector needed for sentiment analysis model can be used above-mentioned Method generates.
Figure 16 show according to the candidate semantic of each candidate semantic unit of calculating of another embodiment of the application to The flow chart of amount.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4445: being described each Candidate semantic unit distributes identity vector.In some embodiments, it can be assigned for each semantic primitive (for example, sentence) unique Section ID (paragraph id).Common word is the same, and section ID is also first to be mapped to a vector paragraph ((paragraph Vector)), though the dimension of the vector paragraph and term vector, two different vector spaces are come from.In a sentence Or in the training process of document, section ID is remained unchanged, be equivalent to every time predict word probability when, entire sentence is all utilized The semanteme of son.In forecast period, a section ID is newly distributed to sentence to be predicted, term vector and parameter are kept for the training stage obtain Parameter constant, after restraining to get arrive sentence to be predicted vector paragraph.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4446: being described each The sub- semantic primitive vector of each of sub- semantic primitive of one or more in candidate semantic unit distribution.In some embodiments In, each candidate semantic unit includes multiple sub- semantic primitives, part or all in the multiple sub- semantic primitive is divided Equipped with corresponding vector (referred to as sub- semantic primitive vector).In some embodiments, candidate semantic unit is sentence, and son is semantic single Member is the word that the sentence is included, and sub- semantic primitive vector is term vector.In some embodiments, term vector is the instruction in model Generated during practicing, be a parameter of model, at the beginning of training, term vector is random value, with it is trained into Row is continuously updated.In some embodiments, it can be encoded to every sub- semantic primitive by one-hot and assign vector.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4447: by the identity Input scheduled prediction model together with all sub- semantic primitive vectors to export object vector.It in some embodiments, can be with Words all in sentence are indicated to measurement mean value as the vector of sentence.
In some embodiments, sentiment analysis model and/or cluster needed for term vector use Word2vec language mould Type generates.In some embodiments, sentence vector needed for availability analysis model is generated using Word2vec.In some embodiments In, Requirements Analysis Model is generated using Word2vec language model.Language model, which just refers to, to be assumed and is built to natural language Mould, makes it possible to be expressed natural language with the mode that computer capacity enough understands, core be still context expression and The modeling of relationship between context and target word.Word2vec is using n-gram model (n-gram model), i.e., false If a word is only related with n word around, and unrelated with other words in text.Word2vec utilizes the thought of deep learning, Processing to content of text can be reduced to the vector operation in K dimensional vector space by training, and the phase in vector space It can be used to indicate the similarity on text semantic like degree.The vector form for the word that Word2vec is obtained then can freely control dimension Degree.Word2vec is that word-based dimension carries out semantic analysis, after obtaining term vector, need to obtain on base plinth sentence to Amount, and the semantic analysis ability with context.The substantially process of Word2vec model includes (1) participle/stem extraction and word Shape reduction, for example, Chinese corpus is segmented, and for English corpus, then it does not need to segment, but due to English Text is related to various tenses, so to carry out stem extraction and lemmatization to it;(2) construction dictionary and statistics word frequency, for example, In this step, all texts are needed to be traversed for, the word occurred is found out, and count the frequency of occurrences of each word;(3) it constructs Tree structure, for example, according to probability of occurrence construction Huffman (Huffman) tree of each word, so that all classification are all in leaf segment Point;(4) binary code where node is generated, wherein binary code reflects position of the node in tree, so as to according to volume Code finds corresponding leaf node from root node step by step;(5) word in the intermediate vector and leaf node of each nonleaf node is initialized Vector, for example, each node in Hofman tree all stores the vector of an a length of m, but in leaf node and non-leaf node The meaning of vector is different, and it is the input as neural network that specifically, what is stored in leaf node, which is the term vector of each word, rather than What is stored in leaf node is intermediate vector, corresponding to the parameter of hidden layer in neural network, determines classification results together with input; (6) training intermediate vector and term vector, for example, in the training process, model can assign these intermediate node one abstract conjunctions Suitable vector, this vector represents its corresponding all child node, first will be multiple near centre word for CBOW model The term vector of word is added the input as system, and the binary code that is generated in abovementioned steps according to centre word is step by step Carry out classification and according to classification results training intermediate vector and term vector.In some embodiments, in sentiment analysis Word2vec model is trained by user oneself, and the word2vec model carried in cluster using hanlp kit.Some In embodiment, hanlp is all from for the word2vec model in sentiment analysis and the word2vec model used in cluster Kit.In some embodiments, for the word2vec model in sentiment analysis and the word2vec mould used in cluster Type is trained by user oneself.In some embodiments, can by by the identity and all sub- semantic primitive vectors together Continuous bag of words (Continuous Bag-of-Words (CBOW)) model is inputted to export object vector.For example, CBOW model Input is the sum of the term vector of n word around the centre word of sentence, and output is the term vector of centre word itself, and wherein n is Integer greater than 1.For example, in some embodiments, it can be by defeated together by the identity and all sub- semantic primitive vectors Enter Skip-gram model to export object vector.For example, the input of Skip-gram model is centre word of sentence itself, output It is the term vector of n word around centre word.In some embodiments, object vector is term vector.In some embodiments, Term vector can be calculated and be trained by Word2vec tool.
In some embodiments, sentiment analysis model and/or cluster needed for sentence vector generated using Doc2vec. In some embodiments, sentence vector needed for availability analysis model is generated using Doc2vec.In some embodiments, demand Analysis model is generated using Doc2vec language model.There are two types of models, respectively distributed memory by Doc2Vec (Distributed Memory (DM)) model and distributed bag of words (Distributed Bag of Words (DBOW)) model, Wherein DM model predicts the probability of word in the case where given context and document vector, and DBOW model is in given document vector In the case where predict document in one group of random word probability.It in some embodiments, can be by by the identity and all Sub- semantic primitive vector inputs DBOW model together to export object vector.It in some embodiments, can be by by the body Part and all sub- semantic primitive vectors input DM model together to export object vector.In some embodiments, object vector is Sentence vector.In some embodiments, sentence vector can be calculated and be trained by Doc2vec tool.
As shown in figure 16, the candidate semantic vector for calculating each candidate semantic unit includes step 4448: by object vector It is appointed as the candidate semantic vector.
As shown in figure 14, it calculates separately in the remaining semantic primitive in each candidate semantic unit and multiple semantic primitives Similarity between each further includes step 4440: calculate separately the candidate semantic vector of each candidate semantic unit with Similarity between the semantic vector of each of the residue semantic primitive.In some embodiments, the phase of semantic vector It can be characterized by COS distance between semantic vector or cosine similarity like degree.In some embodiments, cosine is similar Property predetermined threshold can be 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 or 0.9.In some embodiments, it is semantic to The similarity of amount can be calculated by HanLP external member.
As shown in figure 12, semantic primitive clustering method further includes step 6000: to one or more of cluster centres into Row sequence.In some embodiments, to one or more of cluster centres be ranked up including calculate separately it is one or In semantic primitive corresponding to each of multiple cluster centres and the remaining semantic primitive in the multiple semantic primitive The similarity of each and the quantity for being higher than the semantic primitive of predetermined threshold based on similarity arrange all cluster centres Sequence.In some embodiments, semantic primitive corresponding to each of one or more of cluster centres is calculated separately Similarity between the semantic vector of each of remaining semantic primitive in semantic vector and the multiple semantic primitive. In some embodiments, calculate separately the sentence vector of sentence corresponding to each of one or more of cluster centres with Similarity between the sentence vector of each of other sentences, and predetermined threshold is higher than based on the included similarity of cluster Sentence quantity cluster centre is ranked up.It in some embodiments, can be in the calculating about different cluster centres Using different predetermined thresholds.In some embodiments, the predetermined threshold used in the sequence step of cluster centre can be with The predetermined threshold used in the step of determining cluster centre is different.It in some embodiments, can be in one semanteme of every calculating After similarity between each of unit and remaining semantic primitive, the sequence of cluster centre is once updated.? In some embodiments, as soon as it can be after every determination or finding a cluster, the similarity which is had is higher than predetermined The quantity of the semantic primitive of threshold value and the similarity that cluster before has be higher than the quantity of the semantic primitive of predetermined threshold into Row compares, and is once updated to the sequence of cluster centre based on comparative result.For example, if newly generated cluster has more More semantic primitive quantity, then before the cluster before being discharged to the significance level of newly generated cluster or priority.One In a little embodiments, the text of semantic primitive corresponding thereto can be exported after calculating cluster centre.In some implementations In example, the text of semantic primitive corresponding with the cluster centre to rank the first is only exported.
Figure 17 is the schematic diagram according to the semantic primitive clustering apparatus of one embodiment of the application.As shown in figure 14, language Adopted unit clustering apparatus includes that semantic primitive securing component 7000, cluster centre determine component 8000, sequencing assembly 9000.
In some embodiments, semantic primitive securing component 7000 is configured as obtaining multiple semantic primitives.In some realities It applies in example, cluster centre determines that component 8000 is configured as determining in one or more clusters based on the multiple semantic primitive The heart.In some embodiments, sequencing assembly 9000 is configured as being ranked up one or more of cluster centres.Some In embodiment, sequencing assembly 9000 is optional.
In some embodiments, cluster centre determines that component includes cluster centre determining module.In some embodiments, gather Class center determining module is configured as determining from the multiple semantic primitive by AP clustering algorithm one or more of poly- Class center.In some embodiments, cluster centre determining module is configured as each of the multiple semantic primitive all It is determined as cluster centre.In some embodiments, cluster centre determining module is configured as based on the multiple semantic primitive two Similarity between two determines one or more of cluster centres from the multiple semantic primitive.
In some embodiments, cluster centre determining module further comprises candidate semantic unit selection module, similarity Computing module and cluster centre determining module, wherein candidate semantic unit selection module is configured as successively choosing the multiple Each of semantic primitive is used as candidate semantic unit, and similarity calculation module is configured as to each candidate semantic unit, It calculates separately between each of remaining semantic primitive in each candidate semantic unit and the multiple semantic primitive Similarity, cluster centre determining module is configured as there are at least one similarities to be higher than in the remaining semantic primitive Each candidate semantic unit is determined as cluster centre when the semantic primitive of predetermined threshold.
In some embodiments, similarity calculation module further comprises candidate semantic vector calculation module and semantic vector Similarity calculation module, wherein candidate semantic vector calculation module is configured as calculating the candidate of each candidate semantic unit Semantic vector, semantic vector similarity calculation module are configured to calculate the candidate semantic of each candidate semantic unit Similarity between the semantic vector of each of vector and the remaining semantic primitive.
In some embodiments, candidate semantic vector calculation module includes that obtain module, the degree of association true for Feature Semantics unit Cover half block and candidate semantic vector generation module, wherein Feature Semantics unit obtains module and is configured as obtaining Feature Semantics list First table, wherein the Feature Semantics cell list includes one or more features semantic primitive;Degree of association determining module is configured as The degree of association of each candidate semantic unit and each Feature Semantics unit, candidate semantic vector generation module quilt are determined respectively It is configured to generate the candidate semantic vector by the degree of association of each candidate semantic unit and each Feature Semantics unit. In some embodiments, the frequency that the degree of association and each Feature Semantics unit occur in each candidate semantic unit It is directly proportional.
In some embodiments, candidate semantic vector calculation module include identity vector distribution module, sub- semantic primitive to Measure distribution module, object vector computing module, candidate semantic vector assignment module, wherein identity vector distribution module is configured To distribute identity vector for each candidate semantic unit, sub- semantic primitive vector distribution module is configured as being described each The sub- semantic primitive vector of each of sub- semantic primitive of one or more in candidate semantic unit distribution, object vector calculate Module is configured as all sub- semantic primitive vectors of the identity vector sum inputting scheduled prediction model together to export mesh Vector is marked, candidate semantic vector assignment module is configured as the object vector being appointed as the candidate semantic vector.One In a little embodiments, identity vector distribution module is optional.In the embodiment there is no identity vector distribution module (for example, mesh When mark vector is term vector) in, object vector computing module is configured as only that all sub- semantic primitive vectors inputs are scheduled Prediction model is to export object vector.There are the embodiment of identity vector distribution module (for example, object vector be sentence vector When) in, object vector computing module can be configured as all sub- semantic primitive vector inputs of identity vector sum are scheduled pre- Model is surveyed to export object vector.
In some embodiments, ordering element 9000 further comprises similarity calculation module and cluster centre sequence mould Block, wherein similarity calculation module is configured to calculate corresponding to each of one or more of cluster centres The similarity of each of remaining semantic primitive in semantic primitive and the multiple semantic primitive, cluster centre sorting module The quantity for being configured as the semantic primitive for being higher than predetermined threshold based on similarity is ranked up all cluster centres.In some realities It applies in example, similarity calculation module is additionally configured to calculate separately corresponding to each of one or more of cluster centres Semantic primitive semantic vector and each of the remaining semantic primitive in the multiple semantic primitive semantic vector it Between similarity.
Present invention also provides a kind of computer readable storage mediums, wherein wraps in the computer readable storage medium Program is included, described program is executed when being executed by processor according to previously described semantic primitive clustering method.
In short, the method for generating interview report of the embodiment of the present application, may be implemented based on neural network to interview The autonomous type of corpus a key generate interview report, wherein interview corpus is converted to text data and the pre- place to interview corpus Reason, keyword extraction etc. can be can be realized by inputting corresponding triggering command, without handling manually, can save visit The interview personnel for talking the time of analysis, and being equipped with needed for can reducing, reduce interview cost.
It should be noted that in the description of this specification, reference term " one embodiment ", " is shown " some embodiments " The description of meaning property embodiment ", " example ", " specific example " or " some examples " etc. means that the embodiment or example is combined to describe Particular features, structures, materials, or characteristics be contained at least one embodiment or example of the application.In this specification In, schematic expression of the above terms may not refer to the same embodiment or example.
While there has been shown and described that embodiments herein, it will be understood by those skilled in the art that: not A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle and objective of the application, this The range of application is defined by the claims and their equivalents.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the application System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of application Type.

Claims (10)

1. a kind of voice content analysis method, which is characterized in that the described method includes:
Obtain voice data;
Based on the voice data, corresponding text data is obtained;
The text data is inputted into trained sentiment analysis model, the trained sentiment analysis model includes training Emotion extract model and trained sentiment classification model;
Model is extracted by the trained emotion, and the text data is divided into polarity emotion text data and neutral emotion Text data;
By the trained sentiment classification model by the polarity emotion text data be divided into positive emotion text data and Negative sense emotion text data;
Sentiment analysis result is obtained according to the positive emotion text data and the negative sense emotion text data.
2. analysis method as described in claim 1, which is characterized in that the text data is sentence vector, and the sentence vector is logical Cross following steps acquisition:
Obtain urtext;
To complete sentence each in the urtext:
By the complete sentence subordinate sentence, makes pauses in reading unpunctuated ancient writings according to punctuation mark, obtain at least one short sentence;
Determine the sentence vector of at least one short sentence.
3. analysis method as claimed in claim 2, which is characterized in that the sentence vector packet of at least one short sentence described in the determination It includes:
For each short sentence at least one described short sentence,
The term vector of the short sentence is determined based on word2vec model;
The sentence vector is determined based on the term vector of the short sentence.
4. analysis method as claimed in claim 2, which is characterized in that the term vector based on the short sentence determines the sentence Vector includes:
The mean value for determining the term vector of the short sentence is the sentence vector.
5. analysis method as claimed in claim 2, which is characterized in that the method further includes:
Delete at least one of modal particle, stop words and the messy code in the urtext.
6. analysis method as described in claim 1, which is characterized in that the trained emotion extracts model and passes through following step It is rapid to obtain:
Obtain the training data that mark, the training data marked include the neutral emotion text data that have been marked with Non-neutral emotion text data;
The training data marked is inputted into initial emotion and extracts model;
When the initial emotion, which extracts model, reaches the condition of convergence after training, determine that the trained emotion is extracted Model.
7. analysis method as described in claim 1, which is characterized in that the trained sentiment classification model passes through following step It is rapid to obtain:
Obtain the training data that mark, the training data marked include the positive emotion text data that have been marked with Negative sense emotion text data;
The training data marked is inputted initial sentiment classification model to be trained;
When the initial sentiment classification model reaches the condition of convergence after training, the trained emotional semantic classification is determined Model.
8. analysis method as described in claim 1, which is characterized in that the method further includes:
Field belonging to the content for determining the voice data according to the text data;
According to field belonging to the content of the voice data, determines and transfer the trained sentiment analysis model.
9. analysis method as described in claim 1, which is characterized in that the method further includes:
User's input is received, the input includes field belonging to the content of the voice data;
According to field belonging to the content of the voice data, determines and transfer the trained sentiment analysis model.
10. a kind of voice content analytical equipment, comprising:
At least one storage equipment, the storage equipment include one group of instruction;And
At least one processor communicated at least one described storage equipment, wherein described when executing one group of instruction At least one processor makes the analytical equipment perform claim require any method in 1-9.
CN201910582433.5A 2019-06-28 2019-06-28 Method for generating interview report, computer-readable storage medium and terminal device Active CN110297907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910582433.5A CN110297907B (en) 2019-06-28 2019-06-28 Method for generating interview report, computer-readable storage medium and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910582433.5A CN110297907B (en) 2019-06-28 2019-06-28 Method for generating interview report, computer-readable storage medium and terminal device

Publications (2)

Publication Number Publication Date
CN110297907A true CN110297907A (en) 2019-10-01
CN110297907B CN110297907B (en) 2022-03-08

Family

ID=68029588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910582433.5A Active CN110297907B (en) 2019-06-28 2019-06-28 Method for generating interview report, computer-readable storage medium and terminal device

Country Status (1)

Country Link
CN (1) CN110297907B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259163A (en) * 2020-01-14 2020-06-09 北京明略软件系统有限公司 Knowledge graph generation method and device and computer readable storage medium
CN111274807A (en) * 2020-02-03 2020-06-12 华为技术有限公司 Text information processing method and device, computer equipment and readable storage medium
CN111639065A (en) * 2020-04-17 2020-09-08 太原理工大学 Polycrystalline silicon ingot casting quality prediction method and system based on batching data
CN112446217A (en) * 2020-11-27 2021-03-05 广州三七互娱科技有限公司 Emotion analysis method and device and electronic equipment
WO2021139108A1 (en) * 2020-01-10 2021-07-15 平安科技(深圳)有限公司 Intelligent emotion recognition method and apparatus, electronic device, and storage medium
CN113688606A (en) * 2021-07-30 2021-11-23 达观数据(苏州)有限公司 Method for automatically writing document report
CN114004605A (en) * 2021-12-31 2022-02-01 北京中科闻歌科技股份有限公司 Invoice over-limit application approval method, device, equipment and medium
CN114298025A (en) * 2021-12-01 2022-04-08 国家电网有限公司华东分部 Emotion analysis method based on artificial intelligence
CN115130581A (en) * 2022-04-02 2022-09-30 北京百度网讯科技有限公司 Sample generation method, training method, data processing method and electronic device
EP3962073A4 (en) * 2020-06-29 2023-08-02 Guangzhou Quickdecision Technology Ltd Co Online interview method and system
CN116912845A (en) * 2023-06-16 2023-10-20 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761239A (en) * 2013-12-09 2014-04-30 国家计算机网络与信息安全管理中心 Method for performing emotional tendency classification to microblog by using emoticons
KR101540683B1 (en) * 2014-10-20 2015-07-31 숭실대학교산학협력단 Method and server for classifying emotion polarity of words
CN105183717A (en) * 2015-09-23 2015-12-23 东南大学 OSN user emotion analysis method based on random forest and user relationship
CN106030642A (en) * 2014-02-23 2016-10-12 交互数字专利控股公司 Cognitive and affective human machine interface
CN106782545A (en) * 2016-12-16 2017-05-31 广州视源电子科技股份有限公司 A kind of system and method that audio, video data is changed into writing record
CN107704558A (en) * 2017-09-28 2018-02-16 北京车慧互动广告有限公司 A kind of consumers' opinions abstracting method and system
CN108134876A (en) * 2017-12-21 2018-06-08 广东欧珀移动通信有限公司 Dialog analysis method, apparatus, storage medium and mobile terminal
CN108845986A (en) * 2018-05-30 2018-11-20 中兴通讯股份有限公司 A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN109543180A (en) * 2018-11-08 2019-03-29 中山大学 A kind of text emotion analysis method based on attention mechanism
CN109685560A (en) * 2018-12-17 2019-04-26 泰康保险集团股份有限公司 Big data processing method, device, medium and electronic equipment
CN109918499A (en) * 2019-01-14 2019-06-21 平安科技(深圳)有限公司 A kind of file classification method, device, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761239A (en) * 2013-12-09 2014-04-30 国家计算机网络与信息安全管理中心 Method for performing emotional tendency classification to microblog by using emoticons
CN106030642A (en) * 2014-02-23 2016-10-12 交互数字专利控股公司 Cognitive and affective human machine interface
KR101540683B1 (en) * 2014-10-20 2015-07-31 숭실대학교산학협력단 Method and server for classifying emotion polarity of words
CN105183717A (en) * 2015-09-23 2015-12-23 东南大学 OSN user emotion analysis method based on random forest and user relationship
CN106782545A (en) * 2016-12-16 2017-05-31 广州视源电子科技股份有限公司 A kind of system and method that audio, video data is changed into writing record
CN107704558A (en) * 2017-09-28 2018-02-16 北京车慧互动广告有限公司 A kind of consumers' opinions abstracting method and system
CN108134876A (en) * 2017-12-21 2018-06-08 广东欧珀移动通信有限公司 Dialog analysis method, apparatus, storage medium and mobile terminal
CN108845986A (en) * 2018-05-30 2018-11-20 中兴通讯股份有限公司 A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN109543180A (en) * 2018-11-08 2019-03-29 中山大学 A kind of text emotion analysis method based on attention mechanism
CN109685560A (en) * 2018-12-17 2019-04-26 泰康保险集团股份有限公司 Big data processing method, device, medium and electronic equipment
CN109918499A (en) * 2019-01-14 2019-06-21 平安科技(深圳)有限公司 A kind of file classification method, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
宋佳颖 等: "基于词语情感隶属度特征的情感极性分类", 《北京大学学报(自然科学版)》 *
杨艳 等: "一种基于联合深度学习模型的情感分类方法", 《山东大学学报(理学版)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139108A1 (en) * 2020-01-10 2021-07-15 平安科技(深圳)有限公司 Intelligent emotion recognition method and apparatus, electronic device, and storage medium
WO2021143014A1 (en) * 2020-01-14 2021-07-22 北京明略软件系统有限公司 Method and device for generating knowledge graph, and computer readable storage medium
CN111259163A (en) * 2020-01-14 2020-06-09 北京明略软件系统有限公司 Knowledge graph generation method and device and computer readable storage medium
CN111274807A (en) * 2020-02-03 2020-06-12 华为技术有限公司 Text information processing method and device, computer equipment and readable storage medium
CN111639065B (en) * 2020-04-17 2022-10-11 太原理工大学 Polycrystalline silicon ingot casting quality prediction method and system based on batching data
CN111639065A (en) * 2020-04-17 2020-09-08 太原理工大学 Polycrystalline silicon ingot casting quality prediction method and system based on batching data
EP3962073A4 (en) * 2020-06-29 2023-08-02 Guangzhou Quickdecision Technology Ltd Co Online interview method and system
CN112446217A (en) * 2020-11-27 2021-03-05 广州三七互娱科技有限公司 Emotion analysis method and device and electronic equipment
CN113688606A (en) * 2021-07-30 2021-11-23 达观数据(苏州)有限公司 Method for automatically writing document report
CN114298025A (en) * 2021-12-01 2022-04-08 国家电网有限公司华东分部 Emotion analysis method based on artificial intelligence
CN114004605A (en) * 2021-12-31 2022-02-01 北京中科闻歌科技股份有限公司 Invoice over-limit application approval method, device, equipment and medium
CN115130581A (en) * 2022-04-02 2022-09-30 北京百度网讯科技有限公司 Sample generation method, training method, data processing method and electronic device
CN116912845A (en) * 2023-06-16 2023-10-20 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI
CN116912845B (en) * 2023-06-16 2024-03-19 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI

Also Published As

Publication number Publication date
CN110297907B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN110297907A (en) Generate method, computer readable storage medium and the terminal device of interview report
CN110457466A (en) Generate method, computer readable storage medium and the terminal device of interview report
US11645547B2 (en) Human-machine interactive method and device based on artificial intelligence
CN107229610B (en) A kind of analysis method and device of affection data
Bedi et al. Multi-modal sarcasm detection and humor classification in code-mixed conversations
Wu et al. Emotion recognition from text using semantic labels and separable mixture models
CN110457424A (en) Generate method, computer readable storage medium and the terminal device of interview report
CN110297906A (en) Generate method, computer readable storage medium and the terminal device of interview report
CN113704451B (en) Power user appeal screening method and system, electronic device and storage medium
CN107301168A (en) Intelligent robot and its mood exchange method, system
Millstein Natural language processing with python: natural language processing using NLTK
CN108363725B (en) Method for extracting user comment opinions and generating opinion labels
Bilquise et al. Emotionally intelligent chatbots: A systematic literature review
CN110750648A (en) Text emotion classification method based on deep learning and feature fusion
CN112562669A (en) Intelligent digital newspaper automatic summarization and voice interaction news chat method and system
CN110852047A (en) Text score method, device and computer storage medium
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
Yordanova et al. Automatic detection of everyday social behaviours and environments from verbatim transcripts of daily conversations
Alías et al. Towards high-quality next-generation text-to-speech synthesis: A multidomain approach by automatic domain classification
Fernandes et al. Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs
Shawar et al. A chatbot system as a tool to animate a corpus
Heaton et al. Language models as emotional classifiers for textual conversation
CN117493548A (en) Text classification method, training method and training device for model
CN110543559A (en) Method for generating interview report, computer-readable storage medium and terminal device
Shang Spoken Language Understanding for Abstractive Meeting Summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant