CN107704453A - A kind of word semantic analysis, word semantic analysis terminal and storage medium - Google Patents
A kind of word semantic analysis, word semantic analysis terminal and storage medium Download PDFInfo
- Publication number
- CN107704453A CN107704453A CN201710995052.0A CN201710995052A CN107704453A CN 107704453 A CN107704453 A CN 107704453A CN 201710995052 A CN201710995052 A CN 201710995052A CN 107704453 A CN107704453 A CN 107704453A
- Authority
- CN
- China
- Prior art keywords
- word
- semantic
- metadata
- text
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention provides a kind of word semantic analysis, word semantic analysis terminal and storage medium, the text information inputted by receiving user, the character string included in the text information is separated into independent word, obtains word sequence;Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence;The word contained in word sequence is changed into corresponding metadata, calculate the semantic similarity and Feature item weighting between each metadata, and extract the keyword feature item of the word sequence, obtain the semantic marker text corresponding to each word, establish text database, put in order according to each word in word sequence, match semantic marker text, and the text message output display that will be synthesized after sequence from text database successively.The present invention feeds back to user by the form of metadata, so as to facilitate user to obtain the information that semantic analysis terminal feedback comes, correct understanding and use information.
Description
Technical field
The present invention relates to semantic analysis technology field, more particularly to a kind of word semantic analysis, word semantic analysis
Terminal and storage medium.
Background technology
Interactive mode between man-machine at present still uses word dialog mode, and information gathering and filtering do not reach expection and thought
The purpose wanted, can not be recognized accurately the implication that active user is uttered a word, such as " rear sea can be with", but machine can
" staying out in rear sea " is such to look like to be interpreted as, and our users mean that " have a meal over there can be with sea after we go
", although what is used is all the session of literal type, the meaning expressed by the mankind can be Protean, this word
Following inconvenience be present in the semantic analysis of session:
First, generally, the implication expressed by user is rich in the unique emotion of the mankind inside, if using this
Simple word session semantic analysis, machine are the meanings that cannot accomplish to identify that user really thinks expression;In fact, even if
Machine may have identified most of meaning of user, but be reported by machine one, and the meaning that may be expressed is again different;The
Three, if the session between man-machine is all this simple word session, data are not encrypted, sampling analysis, output
Encryption, then the security of information cannot ensure, it is easy to which the people not having a mind to or hack obtain, and are unfavorable for data message
Transmission.
Therefore, prior art needs further improve.
The content of the invention
For above-mentioned technical problem, the embodiments of the invention provide a kind of word semantic analysis, word semantic analysis
Terminal and storage medium, to be intended to the implication for the information truth for helping existing man-machine conversation's None- identified user to be stated, solve
The problem of information transmission mistake.
The first aspect of the embodiment of the present invention provides a kind of word semantic analysis, the word semantic analysis bag
Include following steps:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the word
The character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and
The phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadata
Like degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculated
Characteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is marked
Note text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successively
Adopted retrtieval, and the text message output display that will be synthesized after sequence.
Alternatively, the text information of user's input includes:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
Alternatively, the described the step of character string included in the text information is separated into independent word, includes:
Using space as separator, the character string included in the text information is separated into independent word, and be
Each word sets the point identification of unique corresponding number-mark and next metadata.
Alternatively, also include before receiving the text information of user's input:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it
Between incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association
Relation, find out the metadata corresponding to the word.
Alternatively, the semantic similarity and Feature item weighting calculated between each metadata, and according to calculating
The step of semantic similarity and Feature item weighting extract the keyword feature item of the word sequence includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated
Between semantic similarity and Feature item weighting.
The second aspect of the embodiment of the present invention provides a kind of word semantic analysis terminal, the word semantic analysis terminal bag
Include:Processor, memory and the word semantic analyzer that can be run on the memory and on the processor is stored in,
Following steps are realized when wherein described word semantic analyzer is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the word
The character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and
The phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadata
Like degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculated
Characteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is marked
Note text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successively
Adopted retrtieval, and the text message output display that will be synthesized after sequence.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and be
Each word sets the point identification of unique corresponding number-mark and next metadata.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it
Between incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association
Relation, find out the metadata corresponding to the word.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated
Between semantic similarity and Feature item weighting.
The third aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable storage medium
Upper storage word semantic analyzer, semantic point of described word is realized when the word semantic analyzer is executed by processor
Analysis method.
In technical scheme provided in an embodiment of the present invention, metadata is used by the preservation for the information for inputting user
Form is stored, and metadata can suitably be analyzed, identified, then feeds back to user by the architecture of metadata,
When feeding back to user, get rid of to fall the information unrelated with user, the information of user's care is only pushed to user, used so as to convenient
Family obtains the information that machine feedback comes, correct understanding and use information.
Brief description of the drawings
Fig. 1 is the step flow chart of word semantic analysis of the present invention;
Fig. 2 is the schematic block diagram of word semantic analysis of the present invention;
Fig. 3 is the concrete application embodiment flow chart of steps of word semantic analysis of the present invention;
Fig. 4 is the theory structure block diagram of word semantic analysis terminal of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, the every other implementation that those skilled in the art are obtained under the premise of creative work is not made
Example, belongs to the scope of protection of the invention.
In computerese, semantic analysis is a logical stage of compilation process, and the task of semantic analysis is to knot
Correct source program carries out the examination of context-sensitive property on structure, carries out type examination.And incorrect source program in structure
Inspection phase is cannot be introduced into, it is likely that incorrect source program may in terms of context, in terms of type in this structure
It is correct, can simply report an error mistake during compiler.Semantic analysis is to examine source program whether there is semantic error, is code building rank
Section collects type information.For example a job of semantic analysis is to carry out type examination, examines whether each operator has language
The operand that specification allows, when not meeting linguistic norm, compiler should report mistake.If any compiler will be to reality
Situation report mistake of the number as array index.Such as some procedure stipulation operands can be forced again, then when this fortune
When calculation imposes on an integer and a full mold object, integer should be converted to full mold and be not construed as the mistake of source program by compiler
By mistake.
Current interpersonal exchange, mainly using language, word as instrument, can just make the smooth progress of exchange, people
The meaning of expression obtains correct understanding, it is man-machine between session it is in the majority by the way of word, and computer machine can only identify " 0 "
" 1 " two kinds of numerical chracters, man-machine conversation will be transmitted by computer instruction, during being transmitted, first
The data inputs such as these instructions are stored in by input equipment in computer into computer, and by result, most afterwards through electricity
The output equipment of brain, display processing result, people are allowed to read and listen.But this data storage and transmit during, it is necessary to
A series of processing is carried out to data, can be only achieved between people and machine it is smooth exchange, so as to reach interpersonal friendship
Stream is correct.And the present invention using metadata management by the way of just give this process provides ensure and realization mechanism.
It is a kind of coding scheme in fact to metadata, and it is the data for describing other data;It is commonly used to description digitlization letter
Cease the coding scheme of resource, especially network information resource;It is also a kind of structural data simultaneously;Metadata refers to from information
What is extracted in resource is used to illustrate the data of the structuring of the feature, content of this information resources, such as course name, speaker
People, duration etc., for tissue, retrieval, description, preservation, management information and knowledge resource;For example we give lessons always at online club
The information of giving lessons (information resources) of teacher, we can retrieve obtained information, such as course name in the application of club:Matter
Buret is managed, speaker:Shi Wei, speaker's time:On June 21st, 2017.Because a basic metadata be by metadata item and
What content metadata was formed, utilize the metadata to after describing resource, resource is carried out effective filtering classification by our cans, then is added
The standard criterion of upper metadata, this makes it possible to by effective content of resource information and can not content make a distinction out, also with regard to energy
Enough correct implications for giving expression to information well;By development so for many years, the form of metadata has been able to support xml,
The forms such as html, this form are easy to the people oneself to customize label, that is, so-called metadata, pass through this label
Pattern, user can first look at label (metadata) so as to obtain the information needed for oneself when using data, first number
According to by using attribute, the extension to metadata is supported.
The invention provides a kind of semantic analysis, as shown in figure 1, the analysis method comprises the following steps:
Step 101, the text information for receiving user's input, and morphological analysis is carried out to the text information of input, will
The character string included in the text information is separated into independent word, obtains word sequence.
In this step, the text information that user is sent by client is received first.In the specific implementation, user passes through visitor
Family end, such as:App in mobile terminal sends text information, then client by the text information received send to
Server end.
Specifically, the text information of user's input includes:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
It is envisioned that the identity information of above-mentioned user needs the letter inputted when can send information every time for user
Breath, first the identity information of user can also be preserved, when user needs to send information, the problem of user is inputted information with it is pre-
The identity information first preserved transmits.
The step of character string included in the text information is separated into independent word described in this step includes:
Using space as separator, the character string included in the text information is separated into independent word, and be
Each word sets the point identification of unique corresponding number-mark and next metadata.
Because the information of user's input is character, therefore this step first carries out morphological analysis to the information of input, by word
Symbol string separates successively according to the form of word, identifies the word contained in character string, and wherein None- identified is combined
Character, which is kicked, to be removed.
Step 102, syntactic analysis is carried out to the word sequence being separated out, judge to whether there is grammer in the word sequence
Mistake, and there will be the phrase that the word of syntax error or adjacent words form to filter out.
Phraseological analysis is carried out to the word sequence that is separated out, judged whether containing not meeting phraseological group of words
Close, by the way that the attribute of language construction is given on the nonterminal character for representing language construction, and property value is by being attached to grammer
The semantic rules of production calculates, and so as to produce code, carries out syntax-directed translation, and carry out the language of CFG
Justice translation.
This step also includes:Sentenced by the analysis to the assignment statement in word sequence, arithmetic expression, logical expression
It is disconnected, the inconsistent phrase of types of variables is filtered out.
Step 103, the word contained in word sequence is changed into corresponding metadata, calculated between each metadata
Semantic similarity and Feature item weighting, and the word sequence is extracted according to the semantic similarity and Feature item weighting that calculate
Keyword feature item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and by institute
Predicate justice retrtieval is stored in text database.
Each word is changed into the metadata corresponding to it, the information inputted by establishing metadata schema to user is entered
Row semantic analysis, obtain the original idea of information.
Before the step of text information of the reception user input, in addition to:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it
Between incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association
Relation, find out the metadata corresponding to the word.
Specifically, on the basis of having metadata management, word session and the semantic analysis of user profile are performed.The semanteme
Analyze by calculating semantic similarity and Feature item weighting between metadata, to obtain the crucial letter that user inputs problem
Cease, and the semantic marker text of problem is inputted according to key message establishment user, that is to say by semantic analysis to hold
The semantic marker of style of writing word session, and tape label text database (first number is arrived into the word or file storage with semantic marker
According to storehouse).
Preferably, the semantic similarity and Feature item weighting calculated between each metadata, and according to calculating
The step of semantic similarity and Feature item weighting extract the keyword feature item of the word sequence includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated
Between semantic similarity and Feature item weighting.
Step 104, put in order according to each word in word sequence, successively matched from the text database
Corresponding semantic marker text, and the text message output display that will be synthesized after sequence.
It is respectively independent information due to getting the semantic marker word or file corresponding with word sequence, does not combine
Into text message, therefore in this step, according to first number corresponding to each word uniquely corresponding number-mark and next word
According to point identification, be ranked up for the semantic marker word or file of independent information, and synthesis text information exports.The text is believed
Breath is the correct expression that user inputs problem.
Such as the interactive stream that Fig. 2 is the word session semantic analysis provided in an embodiment of the present invention based on metadata management
The schematic block diagram of journey, for convenience of description with reference to Fig. 3, method of the present invention is further explained.The present invention
The concrete application embodiment method and step of methods described includes:
Step H1, after user opens client or application in mobile phone, the text information of correlation is inputted, is sent to terminal
Request.
The problem of identity information for asking to include user and user input information.
After user is by the application input information of mobile phone terminal, our application also can be defeated by the information of user and user
The information entered is saved, it is desirable to is stored in database;This when is applied will send request to machine, in request
Hold the information comprising user profile and input.As a kind of specific implementation, the input information includes ID information word
Section, address name byte, phone number byte, header byte, submission time byte.
Step H2, server terminal receives the request that client is sent, and the information of client input is carried out tentatively
Morphological analysis.
When server terminal receives the information for user's input that client passes over, while carried out to background server
Transmit data.During data are transmitted, server needs to carry out preliminary pretreatment operation to the information of user, carries out letter
Cease morphological analysis.
Specifically, the morphological analysis is:Information is inputted according to order from left to right to user to be scanned, according to
The morphological rule of language identifies all kinds of words, and produces the attribute word of respective word.The character string that namely user is inputted
Be converted to word (Token) sequence.Then qualitative, fixed length processing is provided to the word identified.
Pre-processed by inputting information to user, then classification processing, such as " I am can be carried out to word
The such input information of Chinese ", because computer is not aware that this is two words being distinguished with space, only know this
It is the character string being made up of common character.Can be by certain method (using space to be used as separator here) by morpheme
Split from input character string.Result after segmentation can represent as follows with XML:<sentence>
<word>I</word>
<word>am</word>
<word>Chinese</word>
</sentence>
Step H3, syntactic analysis is carried out to the word sequence that is obtained in above-mentioned steps H2, the mistake in terms of identification information grammer
By mistake, and filtered out.
Syntactic analysis is also a logical stage of compilation process, and the task of syntactic analysis is exactly on the basis of morphological analysis
On word sequence is combined into all kinds of grammatical phrases, then word sequence is judged in structure, judge whether it is normal, can
With by context-free grammar come description scheme.
Step H4, the word in word sequence is transformed into metadata, and semantic analysis is carried out to metadata, obtain user
The semantic marker text corresponding to information is inputted, the semantic marker text is stored in text database;
After morphological analysis and syntactic analysis phase processing, information data is basically available, but still can not eliminate discrimination
Justice, understand the problem of not reciprocity aspect, this when, we were converted into first number using data format is carried out into classification restructuring
According to tactic pattern stored, the management of systematization then is carried out to it, realizes that data are transformed into the tupe of metadata,
Then carry out semantic analysis, obtain the real information purpose of user and intention, that is to say by institute's word sequence successively
Carry out:After the processing that semantic meaning representation, semantic tissue, semantic storage and ambiguity eliminate, word sequence is changed into corresponding to it
Metadata sequence.
Our source program have passed through morphological analysis before, syntactic analysis, be semantic analysis work to the phase III,
This is the most substantial work of compiler.The first two steps, morphological analysis and syntactic analysis are all in source program form
It is identified and handles, and semantic analysis is that the semanteme of source program is made explanations, and causes source program to send the change of matter.And language
Justice analysis mainly has steps of:Grammer instructs translation, symbol table, type checking, intermediate language, generation intermediate code.When
Background server gets the data message that front end passes over, and machine will carry out semantic point to data message this when
Analysis, it is that these data messages are packaged into metadata schema to carry out semantic analysis operation in of the invention, semantic module, is used for
Carry out semantic similarity analysis and Feature item weighting calculates, the keyword feature item of extraction user's input, text is returned
Class, text vector lay the foundation.Semantic module internal body and entity dictionary.Body is used to carry out text semantic point
Analysis, the basic component units of body are concept, and concept forms conceptional tree, conceptional tree composition body.Text concept solves one
The problem of word ambiguity or adopted more words one.Entity dictionary is used to carry out entity extraction to text, does not have reality in text to abandon
The content of meaning, simplify the amount of calculation of follow-up text processing, made inferences by frame logic or description logic, collect information source
In data, and the pattern information of each local data bank is stored in metadatabase by prescribed form, passes through analysis of metadata
Between semantic relation, establish the global body in corresponding field, the semantic marker of text document performed by semantic analysis, and
And tape label text document data storehouse is arrived into the text document storage with semantic marker.
Specifically, semantic similarity is to analyze the similarity degree between two words, it is mainly used in text word elimination
In the fields such as ambiguity, information retrieval, information extraction, machine translation, subjectivity is stronger, therefore can not depart from specific application environment
Carry out analyzer semantic similarity.Have two kinds of computational methods in semantic similarity analysis field at present, one kind be by semantic dictionary,
Concept structure about word is calculated in a tree-like structure;Another kind is by the information of word context, fortune
Solved with the method for statistics.With reference to the application scenarios of the present invention, the present invention uses semantic similarity and Feature item weighting meter
The algorithm of calculation is all existing ripe algorithm:Using the Words similarity analytic approach based on corpus, algorithmic formula:
Sim (W1, W2)=aDis (W1, W2)+a;
Wherein, similarity is Sim (W1, W2), and a is an adjustable parameter, and it is meant that:When similarity is 0.5
The distance between word distance value, word W1, W2 be Dis (W1, W2).Feature item weighting calculation formula:W=tf × idf, its
In, w is characterized weighted values of a t in document d, and tf represents the frequency that t occurs in d, and idf represents t inverse ratio text frequency.
Using widely used word vectors spatial model in its method, this model includes following steps:Pretreatment-> texts are special
Cosine is calculated after sign item selection-> weightings-> generation vector space models.The model by selecting one group of Feature Words in advance, so
The correlation of this group of Feature Words and each word is calculated afterwards, is obtained the feature term vector of the correlation of each word, is used these
Similarity between vector is as the similarity between the two words.
By carrying out the conversion of metadata to user data, and after semantic analysis, machine generates data message corresponding
Correct option be stored in database, the information source as output end.
Step H5, after user data has carried out semantic analysis, machine can be generated as applying according to corresponding standard
KBS, the feature of each data is clearly identified inside KBS, after user inputs information, just known
Know in database and carry out searching choosing, the data for finding matching are responded, and be that is to say and are stored semantic analysis result to semantic knowledge
Storehouse, after user inputs information, detected from knowledge base, obtain matched knowledge, then found by semantic association, obtain institute
Stating needs analysis result.
Although data message has been passed through the conversion of metadata and answered based on the semantic analysis on metadata structure and generation
Case, but still can not immediately export and be shown to user terminal, because the information of this when is not also coherent, belong to isolated point
Scattered state, just need this when to the further processing of data, by opening relationships between data and data, by establishing this
Kind of relation, because each metadata data has a unique mark, numbering identification in this mark with user's input and
The point identification of next metadata, after user input data starts, go in problem knowledge storehouse to search for automatically, search correspondingly
The problem of answer data text, text is combined with text, forms the corresponding final result that user inputs problem, then machine
Could be by the feedback of the information that whole text synthesizes to user, as response of the machine to user, to reach user view.
The second aspect of the embodiment of the present invention provides a kind of word semantic analysis terminal, as shown in figure 3, the word is semantic
Analysing terminal 10 includes:Processor 110, memory 120 and it is stored on the memory and can runs on the processor
Word semantic analyzer, wherein realizing following steps when the word semantic analyzer is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the word
The character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and
The phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadata
Like degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculated
Characteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is marked
Note text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successively
Adopted retrtieval, and the text message output display that will be synthesized after sequence.
Further, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and be
Each word sets the point identification of unique corresponding number-mark and next metadata.
Preferably, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it
Between incidence relation;And contained catalogue establishes different points according to the difference of metadata type in the metadatabase
Layer, it is easy to faster according to directory to corresponding metadata.
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association
Relation, find out the metadata corresponding to the word.
Preferably, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated
Between semantic similarity and Feature item weighting.
Memory 120 is used as a kind of non-volatile computer readable storage medium storing program for executing, available for storage non-volatile software journey
Sequence, non-volatile computer executable program and module.Processor 110 is stored in non-easy in memory 120 by operation
The property lost software program, instruction and module, various function application and data processing so as to execute server, that is, realize above-mentioned
The word semantic analysis of embodiment of the method.
Memory 120 can include storing program area and storage data field, wherein, storing program area can store operation system
Application program required for system, at least one function;Storage data field can store uses institute according to report automatic generatioin system
Data of establishment etc..In addition, memory 120 can include high-speed random access memory, non-volatile memories can also be included
Device, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments
In, memory 120 is optional including that can pass through net relative to the remotely located memory of processor 110, these remote memories
Network is connected to word semantic analysis terminal.The example of above-mentioned network include but is not limited to internet, intranet, LAN,
Mobile radio communication and combinations thereof.
One or more of modules are stored in the memory 120, when by one or more of processors
During 110 execution, the word semantic analysis in above-mentioned any means embodiment is performed.
The said goods can perform the method that the embodiment of the present application is provided, and possesses the corresponding functional module of execution method and has
Beneficial effect.Not ins and outs of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present application is provided.
The third aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable storage medium
Upper storage word semantic analyzer, semantic point of described word is realized when the word semantic analyzer is executed by processor
Analysis method.
Through the above description of the embodiments, those of ordinary skill in the art can be understood that each embodiment
The mode of general hardware platform can be added by software to realize, naturally it is also possible to pass through hardware.Those of ordinary skill in the art can
To understand that all or part of flow realized in above-described embodiment method is can to instruct the hard of correlation by computer program
Part is completed, and described program can be stored in a computer read/write memory medium, the program is upon execution, it may include as above
State the flow of the embodiment of each method.Wherein, described storage medium can be magnetic disc, CD, read-only memory (Read-
Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
In the present invention, when user needs to obtain information resources, user to machine by sending command adapted thereto order, at this moment
Machine has got the order of user, further saves the command information of user;In the present invention, the guarantor of data message
Depositing is stored by the form of metadata, and when the information resources of user are saved in inside metadata, metadata can be carried out
Suitably analyze, identify, user is then fed back to by the architecture of metadata, when feeding back to user, get rid of to fall and
The unrelated information of user, the information of user's care is only pushed to user, fed back so as to facilitate user to obtain semantic analysis terminal
The information come, correct understanding and use information.
It is understood that for those of ordinary skills, can be with technique according to the invention scheme and this hair
Bright design is subject to equivalent substitution or change, and all these changes or replacement should all belong to the guarantor of appended claims of the invention
Protect scope.
Claims (10)
1. a kind of word semantic analysis, it is characterised in that comprise the following steps:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the text information
In the character string that includes be separated into independent word, obtain word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and will deposit
Filtered out in the phrase that the word or adjacent words of syntax error form;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic similarity between each metadata
And Feature item weighting, and according to the keyword feature of semantic similarity and Feature item weighting the extraction word sequence calculated
, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semantic marker is literary
Originally it is stored in text database;
Put in order according to each word in word sequence, successively matched from the text database corresponding to semantic mark
Remember text, and the text message output display that will be synthesized after sequence.
2. word semantic analysis according to claim 1, it is characterised in that the text information bag of user's input
Include:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
3. word semantic analysis according to claim 2, it is characterised in that described to be included in the text information
Character string the step of being separated into independent word include:
Using space as separator, the character string included in the text information is separated into independent word, and is each
Word sets the point identification of unique corresponding number-mark and next metadata.
4. word semantic analysis according to claim 3, it is characterised in that the word letter for receiving user's input
Before breath, in addition to step:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase between contained metadata
Incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, closed by the association
System, finds out the metadata corresponding to the word.
5. word semantic analysis according to claim 4, it is characterised in that between each metadata of calculating
Semantic similarity and Feature item weighting, and the word sequence is extracted according to the semantic similarity and Feature item weighting calculated
The step of keyword feature item, includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, calculate between each metadata
Semantic similarity and Feature item weighting.
6. a kind of word semantic analysis terminal, it is characterised in that the word semantic analysis terminal includes:Processor, memory
And the word semantic analyzer that can be run on the memory and on the processor is stored in, wherein the word is semantic
Following steps are realized when analysis program is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the text information
In the character string that includes be separated into independent word, obtain word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and will deposit
Filtered out in the phrase that the word or adjacent words of syntax error form;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic similarity between each metadata
And Feature item weighting, and according to the keyword feature of semantic similarity and Feature item weighting the extraction word sequence calculated
, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semantic marker is literary
Originally it is stored in text database;
Put in order according to each word in word sequence, successively matched from the text database corresponding to semantic mark
Remember text, and the text message output display that will be synthesized after sequence.
7. word semantic analysis terminal according to claim 6, it is characterised in that the word semantic analyzer is by institute
When stating computing device, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and is each
Word sets the point identification of unique corresponding number-mark and next metadata.
8. word semantic analysis terminal according to claim 7, it is characterised in that the word semantic analyzer is by institute
When stating computing device, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase between contained metadata
Incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, closed by the association
System, finds out the metadata corresponding to the word.
9. word semantic analysis terminal according to claim 7, it is characterised in that the word semantic analyzer is by institute
When stating computing device, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, calculate between each metadata
Semantic similarity and Feature item weighting.
10. a kind of computer-readable recording medium, it is characterised in that word semantic analysis is stored on the computer-readable storage medium
Program, the word language as any one of claim 1 to 5 is realized when the word semantic analyzer is executed by processor
Adopted analysis method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710995052.0A CN107704453B (en) | 2017-10-23 | 2017-10-23 | Character semantic analysis method, character semantic analysis terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710995052.0A CN107704453B (en) | 2017-10-23 | 2017-10-23 | Character semantic analysis method, character semantic analysis terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107704453A true CN107704453A (en) | 2018-02-16 |
CN107704453B CN107704453B (en) | 2021-10-08 |
Family
ID=61181999
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710995052.0A Active CN107704453B (en) | 2017-10-23 | 2017-10-23 | Character semantic analysis method, character semantic analysis terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107704453B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108845985A (en) * | 2018-05-28 | 2018-11-20 | 济南浪潮高新科技投资发展有限公司 | A kind of information matching method and information matches device |
CN110276082A (en) * | 2019-06-06 | 2019-09-24 | 百度在线网络技术(北京)有限公司 | Translation processing method and device based on dynamic window |
CN110489127A (en) * | 2019-08-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Error code determines method, apparatus, computer readable storage medium and equipment |
CN111192682A (en) * | 2019-12-25 | 2020-05-22 | 上海联影智能医疗科技有限公司 | Image exercise data processing method, system and storage medium |
CN111309306A (en) * | 2020-02-24 | 2020-06-19 | 福建天晴数码有限公司 | Man-machine interactive dialogue management system |
CN111310477A (en) * | 2020-02-24 | 2020-06-19 | 成都网安科技发展有限公司 | Document query method and device |
CN111382173A (en) * | 2018-12-25 | 2020-07-07 | 横河电机株式会社 | Engineering support system and engineering support method |
CN111680130A (en) * | 2020-06-16 | 2020-09-18 | 深圳前海微众银行股份有限公司 | Text retrieval method, device, equipment and storage medium |
CN111782896A (en) * | 2020-07-03 | 2020-10-16 | 深圳市壹鸽科技有限公司 | Text processing method and device after voice recognition and terminal |
CN111881179A (en) * | 2020-07-20 | 2020-11-03 | 易通星云(北京)科技发展有限公司 | Data matching method, device and equipment thereof, and computer storage medium |
US10832679B2 (en) | 2018-11-20 | 2020-11-10 | International Business Machines Corporation | Method and system for correcting speech-to-text auto-transcription using local context of talk |
CN112347767A (en) * | 2021-01-07 | 2021-02-09 | 腾讯科技(深圳)有限公司 | Text processing method, device and equipment |
CN113705230A (en) * | 2021-08-31 | 2021-11-26 | 中国平安财产保险股份有限公司 | Artificial intelligence-based policy agreement assessment method, device, equipment and medium |
CN113792608A (en) * | 2021-08-19 | 2021-12-14 | 广州云硕科技发展有限公司 | Intelligent semantic analysis method and system |
CN114707045A (en) * | 2022-03-23 | 2022-07-05 | 江苏悉宁科技有限公司 | Big data-based public opinion monitoring method and system |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101110812A (en) * | 2007-08-29 | 2008-01-23 | 中兴通讯股份有限公司 | Text command analyzing and processing method |
CN102375826A (en) * | 2010-08-13 | 2012-03-14 | 中国移动通信集团公司 | Structured query language script analysis method, device and system |
US20140019385A1 (en) * | 2009-03-06 | 2014-01-16 | Tagged, Inc. | Generating a document representation using semantic networks |
CN103927358A (en) * | 2014-04-15 | 2014-07-16 | 清华大学 | Text search method and system |
CN104199965A (en) * | 2014-09-22 | 2014-12-10 | 吴晨 | Semantic information retrieval method |
CN104239513A (en) * | 2014-09-16 | 2014-12-24 | 西安电子科技大学 | Semantic retrieval method oriented to field data |
CN105160046A (en) * | 2015-10-30 | 2015-12-16 | 成都博睿德科技有限公司 | Text-based data retrieval method |
CN105335510A (en) * | 2015-10-30 | 2016-02-17 | 成都博睿德科技有限公司 | Text data efficient searching method |
CN105389297A (en) * | 2015-12-21 | 2016-03-09 | 浙江万里学院 | Text similarity processing method |
US20160350764A1 (en) * | 2014-08-01 | 2016-12-01 | Almawave S.R.L. | System and method for meaning driven process and information management to improve efficiency, quality of work, and overall customer satisfaction |
CN106682147A (en) * | 2016-12-22 | 2017-05-17 | 北京锐安科技有限公司 | Mass data based query method and device |
-
2017
- 2017-10-23 CN CN201710995052.0A patent/CN107704453B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101110812A (en) * | 2007-08-29 | 2008-01-23 | 中兴通讯股份有限公司 | Text command analyzing and processing method |
US20140019385A1 (en) * | 2009-03-06 | 2014-01-16 | Tagged, Inc. | Generating a document representation using semantic networks |
CN102375826A (en) * | 2010-08-13 | 2012-03-14 | 中国移动通信集团公司 | Structured query language script analysis method, device and system |
CN103927358A (en) * | 2014-04-15 | 2014-07-16 | 清华大学 | Text search method and system |
US20160350764A1 (en) * | 2014-08-01 | 2016-12-01 | Almawave S.R.L. | System and method for meaning driven process and information management to improve efficiency, quality of work, and overall customer satisfaction |
CN104239513A (en) * | 2014-09-16 | 2014-12-24 | 西安电子科技大学 | Semantic retrieval method oriented to field data |
CN104199965A (en) * | 2014-09-22 | 2014-12-10 | 吴晨 | Semantic information retrieval method |
CN105160046A (en) * | 2015-10-30 | 2015-12-16 | 成都博睿德科技有限公司 | Text-based data retrieval method |
CN105335510A (en) * | 2015-10-30 | 2016-02-17 | 成都博睿德科技有限公司 | Text data efficient searching method |
CN105389297A (en) * | 2015-12-21 | 2016-03-09 | 浙江万里学院 | Text similarity processing method |
CN106682147A (en) * | 2016-12-22 | 2017-05-17 | 北京锐安科技有限公司 | Mass data based query method and device |
Non-Patent Citations (3)
Title |
---|
张乃静 等: "基于本体的林业领域文档特征权重模型", 《计算机工程与应用》 * |
赵彦锋 等: "领域本体的语义相似度算法研究", 《软件导刊》 * |
赵治军 等: "基于VSM的OAI-PMH元数据相似度计算研究", 《计算机技术与发展》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108845985A (en) * | 2018-05-28 | 2018-11-20 | 济南浪潮高新科技投资发展有限公司 | A kind of information matching method and information matches device |
CN108845985B (en) * | 2018-05-28 | 2022-02-18 | 山东浪潮科学研究院有限公司 | Information matching method and information matching device |
US10832679B2 (en) | 2018-11-20 | 2020-11-10 | International Business Machines Corporation | Method and system for correcting speech-to-text auto-transcription using local context of talk |
CN111382173A (en) * | 2018-12-25 | 2020-07-07 | 横河电机株式会社 | Engineering support system and engineering support method |
CN110276082A (en) * | 2019-06-06 | 2019-09-24 | 百度在线网络技术(北京)有限公司 | Translation processing method and device based on dynamic window |
CN110489127A (en) * | 2019-08-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Error code determines method, apparatus, computer readable storage medium and equipment |
CN110489127B (en) * | 2019-08-12 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Error code determination method, apparatus, computer-readable storage medium and device |
CN111192682A (en) * | 2019-12-25 | 2020-05-22 | 上海联影智能医疗科技有限公司 | Image exercise data processing method, system and storage medium |
CN111192682B (en) * | 2019-12-25 | 2024-04-09 | 上海联影智能医疗科技有限公司 | Image exercise data processing method, system and storage medium |
CN111309306B (en) * | 2020-02-24 | 2023-07-28 | 福建天晴数码有限公司 | Man-machine interaction dialogue management system |
CN111310477A (en) * | 2020-02-24 | 2020-06-19 | 成都网安科技发展有限公司 | Document query method and device |
CN111309306A (en) * | 2020-02-24 | 2020-06-19 | 福建天晴数码有限公司 | Man-machine interactive dialogue management system |
CN111680130A (en) * | 2020-06-16 | 2020-09-18 | 深圳前海微众银行股份有限公司 | Text retrieval method, device, equipment and storage medium |
CN111782896A (en) * | 2020-07-03 | 2020-10-16 | 深圳市壹鸽科技有限公司 | Text processing method and device after voice recognition and terminal |
CN111782896B (en) * | 2020-07-03 | 2023-12-12 | 深圳市壹鸽科技有限公司 | Text processing method, device and terminal after voice recognition |
CN111881179A (en) * | 2020-07-20 | 2020-11-03 | 易通星云(北京)科技发展有限公司 | Data matching method, device and equipment thereof, and computer storage medium |
CN111881179B (en) * | 2020-07-20 | 2024-03-01 | 易通星云(北京)科技发展有限公司 | Data matching method, device and equipment thereof, and computer storage medium |
CN112347767A (en) * | 2021-01-07 | 2021-02-09 | 腾讯科技(深圳)有限公司 | Text processing method, device and equipment |
CN112347767B (en) * | 2021-01-07 | 2021-04-06 | 腾讯科技(深圳)有限公司 | Text processing method, device and equipment |
CN113792608A (en) * | 2021-08-19 | 2021-12-14 | 广州云硕科技发展有限公司 | Intelligent semantic analysis method and system |
CN113792608B (en) * | 2021-08-19 | 2022-05-10 | 广州云硕科技发展有限公司 | Intelligent semantic analysis method and system |
CN113705230A (en) * | 2021-08-31 | 2021-11-26 | 中国平安财产保险股份有限公司 | Artificial intelligence-based policy agreement assessment method, device, equipment and medium |
CN113705230B (en) * | 2021-08-31 | 2023-08-25 | 中国平安财产保险股份有限公司 | Method, device, equipment and medium for evaluating policy specifications based on artificial intelligence |
CN114707045B (en) * | 2022-03-23 | 2023-09-26 | 江苏悉宁科技有限公司 | Public opinion monitoring method and system based on big data |
CN114707045A (en) * | 2022-03-23 | 2022-07-05 | 江苏悉宁科技有限公司 | Big data-based public opinion monitoring method and system |
Also Published As
Publication number | Publication date |
---|---|
CN107704453B (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107704453A (en) | A kind of word semantic analysis, word semantic analysis terminal and storage medium | |
EP3096246A1 (en) | Method, system and storage medium for realizing intelligent answering of questions | |
CN108549637A (en) | Method for recognizing semantics, device based on phonetic and interactive system | |
CN107491534A (en) | Information processing method and device | |
CN109299457A (en) | A kind of opining mining method, device and equipment | |
CN111026842A (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN111046656B (en) | Text processing method, text processing device, electronic equipment and readable storage medium | |
CN106960030A (en) | Pushed information method and device based on artificial intelligence | |
CN107301170A (en) | The method and apparatus of cutting sentence based on artificial intelligence | |
CN110457689B (en) | Semantic processing method and related device | |
CN109345282A (en) | A kind of response method and equipment of business consultation | |
CN109857846B (en) | Method and device for matching user question and knowledge point | |
CN108804529A (en) | A kind of question answering system implementation method based on Web | |
CN108171073A (en) | A kind of private data recognition methods based on the parsing driving of code layer semanteme | |
CN104715063B (en) | search ordering method and device | |
CN111182162A (en) | Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence | |
CN111694940A (en) | User report generation method and terminal equipment | |
US11699034B2 (en) | Hybrid artificial intelligence system for semi-automatic patent infringement analysis | |
CN109492081A (en) | Text information search and information interacting method, device, equipment and storage medium | |
CN104391969B (en) | Determine the method and device of user's query statement syntactic structure | |
CN108170678A (en) | A kind of text entities abstracting method and system | |
CN108304424A (en) | Text key word extracting method and text key word extraction element | |
CN112115252A (en) | Intelligent auxiliary writing processing method and device, electronic equipment and storage medium | |
CN105956181A (en) | Searching method and apparatus | |
CN112347339A (en) | Search result processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.) Applicant after: Shenzhen Qianhai Zhongxing scientific research Co.,Ltd. Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.) Applicant before: SHENZHEN QIANHAI ZHONGXING E-COMMERCE Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |