CN107704453A - A kind of word semantic analysis, word semantic analysis terminal and storage medium - Google Patents

A kind of word semantic analysis, word semantic analysis terminal and storage medium Download PDF

Info

Publication number
CN107704453A
CN107704453A CN201710995052.0A CN201710995052A CN107704453A CN 107704453 A CN107704453 A CN 107704453A CN 201710995052 A CN201710995052 A CN 201710995052A CN 107704453 A CN107704453 A CN 107704453A
Authority
CN
China
Prior art keywords
word
semantic
metadata
text
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710995052.0A
Other languages
Chinese (zh)
Other versions
CN107704453B (en
Inventor
胡明灯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen City Qianhai Zhongxing Agel Ecommerce Ltd
Original Assignee
Shenzhen City Qianhai Zhongxing Agel Ecommerce Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen City Qianhai Zhongxing Agel Ecommerce Ltd filed Critical Shenzhen City Qianhai Zhongxing Agel Ecommerce Ltd
Priority to CN201710995052.0A priority Critical patent/CN107704453B/en
Publication of CN107704453A publication Critical patent/CN107704453A/en
Application granted granted Critical
Publication of CN107704453B publication Critical patent/CN107704453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention provides a kind of word semantic analysis, word semantic analysis terminal and storage medium, the text information inputted by receiving user, the character string included in the text information is separated into independent word, obtains word sequence;Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence;The word contained in word sequence is changed into corresponding metadata, calculate the semantic similarity and Feature item weighting between each metadata, and extract the keyword feature item of the word sequence, obtain the semantic marker text corresponding to each word, establish text database, put in order according to each word in word sequence, match semantic marker text, and the text message output display that will be synthesized after sequence from text database successively.The present invention feeds back to user by the form of metadata, so as to facilitate user to obtain the information that semantic analysis terminal feedback comes, correct understanding and use information.

Description

A kind of word semantic analysis, word semantic analysis terminal and storage medium
Technical field
The present invention relates to semantic analysis technology field, more particularly to a kind of word semantic analysis, word semantic analysis Terminal and storage medium.
Background technology
Interactive mode between man-machine at present still uses word dialog mode, and information gathering and filtering do not reach expection and thought The purpose wanted, can not be recognized accurately the implication that active user is uttered a word, such as " rear sea can be with", but machine can " staying out in rear sea " is such to look like to be interpreted as, and our users mean that " have a meal over there can be with sea after we go ", although what is used is all the session of literal type, the meaning expressed by the mankind can be Protean, this word Following inconvenience be present in the semantic analysis of session:
First, generally, the implication expressed by user is rich in the unique emotion of the mankind inside, if using this Simple word session semantic analysis, machine are the meanings that cannot accomplish to identify that user really thinks expression;In fact, even if Machine may have identified most of meaning of user, but be reported by machine one, and the meaning that may be expressed is again different;The Three, if the session between man-machine is all this simple word session, data are not encrypted, sampling analysis, output Encryption, then the security of information cannot ensure, it is easy to which the people not having a mind to or hack obtain, and are unfavorable for data message Transmission.
Therefore, prior art needs further improve.
The content of the invention
For above-mentioned technical problem, the embodiments of the invention provide a kind of word semantic analysis, word semantic analysis Terminal and storage medium, to be intended to the implication for the information truth for helping existing man-machine conversation's None- identified user to be stated, solve The problem of information transmission mistake.
The first aspect of the embodiment of the present invention provides a kind of word semantic analysis, the word semantic analysis bag Include following steps:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the word The character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and The phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadata Like degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculated Characteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is marked Note text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successively Adopted retrtieval, and the text message output display that will be synthesized after sequence.
Alternatively, the text information of user's input includes:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
Alternatively, the described the step of character string included in the text information is separated into independent word, includes:
Using space as separator, the character string included in the text information is separated into independent word, and be Each word sets the point identification of unique corresponding number-mark and next metadata.
Alternatively, also include before receiving the text information of user's input:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it Between incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association Relation, find out the metadata corresponding to the word.
Alternatively, the semantic similarity and Feature item weighting calculated between each metadata, and according to calculating The step of semantic similarity and Feature item weighting extract the keyword feature item of the word sequence includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated Between semantic similarity and Feature item weighting.
The second aspect of the embodiment of the present invention provides a kind of word semantic analysis terminal, the word semantic analysis terminal bag Include:Processor, memory and the word semantic analyzer that can be run on the memory and on the processor is stored in, Following steps are realized when wherein described word semantic analyzer is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the word The character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and The phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadata Like degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculated Characteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is marked Note text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successively Adopted retrtieval, and the text message output display that will be synthesized after sequence.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and be Each word sets the point identification of unique corresponding number-mark and next metadata.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it Between incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association Relation, find out the metadata corresponding to the word.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated Between semantic similarity and Feature item weighting.
The third aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable storage medium Upper storage word semantic analyzer, semantic point of described word is realized when the word semantic analyzer is executed by processor Analysis method.
In technical scheme provided in an embodiment of the present invention, metadata is used by the preservation for the information for inputting user Form is stored, and metadata can suitably be analyzed, identified, then feeds back to user by the architecture of metadata, When feeding back to user, get rid of to fall the information unrelated with user, the information of user's care is only pushed to user, used so as to convenient Family obtains the information that machine feedback comes, correct understanding and use information.
Brief description of the drawings
Fig. 1 is the step flow chart of word semantic analysis of the present invention;
Fig. 2 is the schematic block diagram of word semantic analysis of the present invention;
Fig. 3 is the concrete application embodiment flow chart of steps of word semantic analysis of the present invention;
Fig. 4 is the theory structure block diagram of word semantic analysis terminal of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, the every other implementation that those skilled in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.
In computerese, semantic analysis is a logical stage of compilation process, and the task of semantic analysis is to knot Correct source program carries out the examination of context-sensitive property on structure, carries out type examination.And incorrect source program in structure Inspection phase is cannot be introduced into, it is likely that incorrect source program may in terms of context, in terms of type in this structure It is correct, can simply report an error mistake during compiler.Semantic analysis is to examine source program whether there is semantic error, is code building rank Section collects type information.For example a job of semantic analysis is to carry out type examination, examines whether each operator has language The operand that specification allows, when not meeting linguistic norm, compiler should report mistake.If any compiler will be to reality Situation report mistake of the number as array index.Such as some procedure stipulation operands can be forced again, then when this fortune When calculation imposes on an integer and a full mold object, integer should be converted to full mold and be not construed as the mistake of source program by compiler By mistake.
Current interpersonal exchange, mainly using language, word as instrument, can just make the smooth progress of exchange, people The meaning of expression obtains correct understanding, it is man-machine between session it is in the majority by the way of word, and computer machine can only identify " 0 " " 1 " two kinds of numerical chracters, man-machine conversation will be transmitted by computer instruction, during being transmitted, first The data inputs such as these instructions are stored in by input equipment in computer into computer, and by result, most afterwards through electricity The output equipment of brain, display processing result, people are allowed to read and listen.But this data storage and transmit during, it is necessary to A series of processing is carried out to data, can be only achieved between people and machine it is smooth exchange, so as to reach interpersonal friendship Stream is correct.And the present invention using metadata management by the way of just give this process provides ensure and realization mechanism.
It is a kind of coding scheme in fact to metadata, and it is the data for describing other data;It is commonly used to description digitlization letter Cease the coding scheme of resource, especially network information resource;It is also a kind of structural data simultaneously;Metadata refers to from information What is extracted in resource is used to illustrate the data of the structuring of the feature, content of this information resources, such as course name, speaker People, duration etc., for tissue, retrieval, description, preservation, management information and knowledge resource;For example we give lessons always at online club The information of giving lessons (information resources) of teacher, we can retrieve obtained information, such as course name in the application of club:Matter Buret is managed, speaker:Shi Wei, speaker's time:On June 21st, 2017.Because a basic metadata be by metadata item and What content metadata was formed, utilize the metadata to after describing resource, resource is carried out effective filtering classification by our cans, then is added The standard criterion of upper metadata, this makes it possible to by effective content of resource information and can not content make a distinction out, also with regard to energy Enough correct implications for giving expression to information well;By development so for many years, the form of metadata has been able to support xml, The forms such as html, this form are easy to the people oneself to customize label, that is, so-called metadata, pass through this label Pattern, user can first look at label (metadata) so as to obtain the information needed for oneself when using data, first number According to by using attribute, the extension to metadata is supported.
The invention provides a kind of semantic analysis, as shown in figure 1, the analysis method comprises the following steps:
Step 101, the text information for receiving user's input, and morphological analysis is carried out to the text information of input, will The character string included in the text information is separated into independent word, obtains word sequence.
In this step, the text information that user is sent by client is received first.In the specific implementation, user passes through visitor Family end, such as:App in mobile terminal sends text information, then client by the text information received send to Server end.
Specifically, the text information of user's input includes:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
It is envisioned that the identity information of above-mentioned user needs the letter inputted when can send information every time for user Breath, first the identity information of user can also be preserved, when user needs to send information, the problem of user is inputted information with it is pre- The identity information first preserved transmits.
The step of character string included in the text information is separated into independent word described in this step includes:
Using space as separator, the character string included in the text information is separated into independent word, and be Each word sets the point identification of unique corresponding number-mark and next metadata.
Because the information of user's input is character, therefore this step first carries out morphological analysis to the information of input, by word Symbol string separates successively according to the form of word, identifies the word contained in character string, and wherein None- identified is combined Character, which is kicked, to be removed.
Step 102, syntactic analysis is carried out to the word sequence being separated out, judge to whether there is grammer in the word sequence Mistake, and there will be the phrase that the word of syntax error or adjacent words form to filter out.
Phraseological analysis is carried out to the word sequence that is separated out, judged whether containing not meeting phraseological group of words Close, by the way that the attribute of language construction is given on the nonterminal character for representing language construction, and property value is by being attached to grammer The semantic rules of production calculates, and so as to produce code, carries out syntax-directed translation, and carry out the language of CFG Justice translation.
This step also includes:Sentenced by the analysis to the assignment statement in word sequence, arithmetic expression, logical expression It is disconnected, the inconsistent phrase of types of variables is filtered out.
Step 103, the word contained in word sequence is changed into corresponding metadata, calculated between each metadata Semantic similarity and Feature item weighting, and the word sequence is extracted according to the semantic similarity and Feature item weighting that calculate Keyword feature item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and by institute Predicate justice retrtieval is stored in text database.
Each word is changed into the metadata corresponding to it, the information inputted by establishing metadata schema to user is entered Row semantic analysis, obtain the original idea of information.
Before the step of text information of the reception user input, in addition to:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it Between incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association Relation, find out the metadata corresponding to the word.
Specifically, on the basis of having metadata management, word session and the semantic analysis of user profile are performed.The semanteme Analyze by calculating semantic similarity and Feature item weighting between metadata, to obtain the crucial letter that user inputs problem Cease, and the semantic marker text of problem is inputted according to key message establishment user, that is to say by semantic analysis to hold The semantic marker of style of writing word session, and tape label text database (first number is arrived into the word or file storage with semantic marker According to storehouse).
Preferably, the semantic similarity and Feature item weighting calculated between each metadata, and according to calculating The step of semantic similarity and Feature item weighting extract the keyword feature item of the word sequence includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated Between semantic similarity and Feature item weighting.
Step 104, put in order according to each word in word sequence, successively matched from the text database Corresponding semantic marker text, and the text message output display that will be synthesized after sequence.
It is respectively independent information due to getting the semantic marker word or file corresponding with word sequence, does not combine Into text message, therefore in this step, according to first number corresponding to each word uniquely corresponding number-mark and next word According to point identification, be ranked up for the semantic marker word or file of independent information, and synthesis text information exports.The text is believed Breath is the correct expression that user inputs problem.
Such as the interactive stream that Fig. 2 is the word session semantic analysis provided in an embodiment of the present invention based on metadata management The schematic block diagram of journey, for convenience of description with reference to Fig. 3, method of the present invention is further explained.The present invention The concrete application embodiment method and step of methods described includes:
Step H1, after user opens client or application in mobile phone, the text information of correlation is inputted, is sent to terminal Request.
The problem of identity information for asking to include user and user input information.
After user is by the application input information of mobile phone terminal, our application also can be defeated by the information of user and user The information entered is saved, it is desirable to is stored in database;This when is applied will send request to machine, in request Hold the information comprising user profile and input.As a kind of specific implementation, the input information includes ID information word Section, address name byte, phone number byte, header byte, submission time byte.
Step H2, server terminal receives the request that client is sent, and the information of client input is carried out tentatively Morphological analysis.
When server terminal receives the information for user's input that client passes over, while carried out to background server Transmit data.During data are transmitted, server needs to carry out preliminary pretreatment operation to the information of user, carries out letter Cease morphological analysis.
Specifically, the morphological analysis is:Information is inputted according to order from left to right to user to be scanned, according to The morphological rule of language identifies all kinds of words, and produces the attribute word of respective word.The character string that namely user is inputted Be converted to word (Token) sequence.Then qualitative, fixed length processing is provided to the word identified.
Pre-processed by inputting information to user, then classification processing, such as " I am can be carried out to word The such input information of Chinese ", because computer is not aware that this is two words being distinguished with space, only know this It is the character string being made up of common character.Can be by certain method (using space to be used as separator here) by morpheme Split from input character string.Result after segmentation can represent as follows with XML:<sentence>
<word>I</word>
<word>am</word>
<word>Chinese</word>
</sentence>
Step H3, syntactic analysis is carried out to the word sequence that is obtained in above-mentioned steps H2, the mistake in terms of identification information grammer By mistake, and filtered out.
Syntactic analysis is also a logical stage of compilation process, and the task of syntactic analysis is exactly on the basis of morphological analysis On word sequence is combined into all kinds of grammatical phrases, then word sequence is judged in structure, judge whether it is normal, can With by context-free grammar come description scheme.
Step H4, the word in word sequence is transformed into metadata, and semantic analysis is carried out to metadata, obtain user The semantic marker text corresponding to information is inputted, the semantic marker text is stored in text database;
After morphological analysis and syntactic analysis phase processing, information data is basically available, but still can not eliminate discrimination Justice, understand the problem of not reciprocity aspect, this when, we were converted into first number using data format is carried out into classification restructuring According to tactic pattern stored, the management of systematization then is carried out to it, realizes that data are transformed into the tupe of metadata, Then carry out semantic analysis, obtain the real information purpose of user and intention, that is to say by institute's word sequence successively Carry out:After the processing that semantic meaning representation, semantic tissue, semantic storage and ambiguity eliminate, word sequence is changed into corresponding to it Metadata sequence.
Our source program have passed through morphological analysis before, syntactic analysis, be semantic analysis work to the phase III, This is the most substantial work of compiler.The first two steps, morphological analysis and syntactic analysis are all in source program form It is identified and handles, and semantic analysis is that the semanteme of source program is made explanations, and causes source program to send the change of matter.And language Justice analysis mainly has steps of:Grammer instructs translation, symbol table, type checking, intermediate language, generation intermediate code.When Background server gets the data message that front end passes over, and machine will carry out semantic point to data message this when Analysis, it is that these data messages are packaged into metadata schema to carry out semantic analysis operation in of the invention, semantic module, is used for Carry out semantic similarity analysis and Feature item weighting calculates, the keyword feature item of extraction user's input, text is returned Class, text vector lay the foundation.Semantic module internal body and entity dictionary.Body is used to carry out text semantic point Analysis, the basic component units of body are concept, and concept forms conceptional tree, conceptional tree composition body.Text concept solves one The problem of word ambiguity or adopted more words one.Entity dictionary is used to carry out entity extraction to text, does not have reality in text to abandon The content of meaning, simplify the amount of calculation of follow-up text processing, made inferences by frame logic or description logic, collect information source In data, and the pattern information of each local data bank is stored in metadatabase by prescribed form, passes through analysis of metadata Between semantic relation, establish the global body in corresponding field, the semantic marker of text document performed by semantic analysis, and And tape label text document data storehouse is arrived into the text document storage with semantic marker.
Specifically, semantic similarity is to analyze the similarity degree between two words, it is mainly used in text word elimination In the fields such as ambiguity, information retrieval, information extraction, machine translation, subjectivity is stronger, therefore can not depart from specific application environment Carry out analyzer semantic similarity.Have two kinds of computational methods in semantic similarity analysis field at present, one kind be by semantic dictionary, Concept structure about word is calculated in a tree-like structure;Another kind is by the information of word context, fortune Solved with the method for statistics.With reference to the application scenarios of the present invention, the present invention uses semantic similarity and Feature item weighting meter The algorithm of calculation is all existing ripe algorithm:Using the Words similarity analytic approach based on corpus, algorithmic formula:
Sim (W1, W2)=aDis (W1, W2)+a;
Wherein, similarity is Sim (W1, W2), and a is an adjustable parameter, and it is meant that:When similarity is 0.5 The distance between word distance value, word W1, W2 be Dis (W1, W2).Feature item weighting calculation formula:W=tf × idf, its In, w is characterized weighted values of a t in document d, and tf represents the frequency that t occurs in d, and idf represents t inverse ratio text frequency. Using widely used word vectors spatial model in its method, this model includes following steps:Pretreatment-> texts are special Cosine is calculated after sign item selection-> weightings-> generation vector space models.The model by selecting one group of Feature Words in advance, so The correlation of this group of Feature Words and each word is calculated afterwards, is obtained the feature term vector of the correlation of each word, is used these Similarity between vector is as the similarity between the two words.
By carrying out the conversion of metadata to user data, and after semantic analysis, machine generates data message corresponding Correct option be stored in database, the information source as output end.
Step H5, after user data has carried out semantic analysis, machine can be generated as applying according to corresponding standard KBS, the feature of each data is clearly identified inside KBS, after user inputs information, just known Know in database and carry out searching choosing, the data for finding matching are responded, and be that is to say and are stored semantic analysis result to semantic knowledge Storehouse, after user inputs information, detected from knowledge base, obtain matched knowledge, then found by semantic association, obtain institute Stating needs analysis result.
Although data message has been passed through the conversion of metadata and answered based on the semantic analysis on metadata structure and generation Case, but still can not immediately export and be shown to user terminal, because the information of this when is not also coherent, belong to isolated point Scattered state, just need this when to the further processing of data, by opening relationships between data and data, by establishing this Kind of relation, because each metadata data has a unique mark, numbering identification in this mark with user's input and The point identification of next metadata, after user input data starts, go in problem knowledge storehouse to search for automatically, search correspondingly The problem of answer data text, text is combined with text, forms the corresponding final result that user inputs problem, then machine Could be by the feedback of the information that whole text synthesizes to user, as response of the machine to user, to reach user view.
The second aspect of the embodiment of the present invention provides a kind of word semantic analysis terminal, as shown in figure 3, the word is semantic Analysing terminal 10 includes:Processor 110, memory 120 and it is stored on the memory and can runs on the processor Word semantic analyzer, wherein realizing following steps when the word semantic analyzer is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the word The character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and The phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadata Like degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculated Characteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is marked Note text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successively Adopted retrtieval, and the text message output display that will be synthesized after sequence.
Further, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and be Each word sets the point identification of unique corresponding number-mark and next metadata.
Preferably, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata it Between incidence relation;And contained catalogue establishes different points according to the difference of metadata type in the metadatabase Layer, it is easy to faster according to directory to corresponding metadata.
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the association Relation, find out the metadata corresponding to the word.
Preferably, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculated Between semantic similarity and Feature item weighting.
Memory 120 is used as a kind of non-volatile computer readable storage medium storing program for executing, available for storage non-volatile software journey Sequence, non-volatile computer executable program and module.Processor 110 is stored in non-easy in memory 120 by operation The property lost software program, instruction and module, various function application and data processing so as to execute server, that is, realize above-mentioned The word semantic analysis of embodiment of the method.
Memory 120 can include storing program area and storage data field, wherein, storing program area can store operation system Application program required for system, at least one function;Storage data field can store uses institute according to report automatic generatioin system Data of establishment etc..In addition, memory 120 can include high-speed random access memory, non-volatile memories can also be included Device, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments In, memory 120 is optional including that can pass through net relative to the remotely located memory of processor 110, these remote memories Network is connected to word semantic analysis terminal.The example of above-mentioned network include but is not limited to internet, intranet, LAN, Mobile radio communication and combinations thereof.
One or more of modules are stored in the memory 120, when by one or more of processors During 110 execution, the word semantic analysis in above-mentioned any means embodiment is performed.
The said goods can perform the method that the embodiment of the present application is provided, and possesses the corresponding functional module of execution method and has Beneficial effect.Not ins and outs of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present application is provided.
The third aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable storage medium Upper storage word semantic analyzer, semantic point of described word is realized when the word semantic analyzer is executed by processor Analysis method.
Through the above description of the embodiments, those of ordinary skill in the art can be understood that each embodiment The mode of general hardware platform can be added by software to realize, naturally it is also possible to pass through hardware.Those of ordinary skill in the art can To understand that all or part of flow realized in above-described embodiment method is can to instruct the hard of correlation by computer program Part is completed, and described program can be stored in a computer read/write memory medium, the program is upon execution, it may include as above State the flow of the embodiment of each method.Wherein, described storage medium can be magnetic disc, CD, read-only memory (Read- Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
In the present invention, when user needs to obtain information resources, user to machine by sending command adapted thereto order, at this moment Machine has got the order of user, further saves the command information of user;In the present invention, the guarantor of data message Depositing is stored by the form of metadata, and when the information resources of user are saved in inside metadata, metadata can be carried out Suitably analyze, identify, user is then fed back to by the architecture of metadata, when feeding back to user, get rid of to fall and The unrelated information of user, the information of user's care is only pushed to user, fed back so as to facilitate user to obtain semantic analysis terminal The information come, correct understanding and use information.
It is understood that for those of ordinary skills, can be with technique according to the invention scheme and this hair Bright design is subject to equivalent substitution or change, and all these changes or replacement should all belong to the guarantor of appended claims of the invention Protect scope.

Claims (10)

1. a kind of word semantic analysis, it is characterised in that comprise the following steps:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the text information In the character string that includes be separated into independent word, obtain word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and will deposit Filtered out in the phrase that the word or adjacent words of syntax error form;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic similarity between each metadata And Feature item weighting, and according to the keyword feature of semantic similarity and Feature item weighting the extraction word sequence calculated , and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semantic marker is literary Originally it is stored in text database;
Put in order according to each word in word sequence, successively matched from the text database corresponding to semantic mark Remember text, and the text message output display that will be synthesized after sequence.
2. word semantic analysis according to claim 1, it is characterised in that the text information bag of user's input Include:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
3. word semantic analysis according to claim 2, it is characterised in that described to be included in the text information Character string the step of being separated into independent word include:
Using space as separator, the character string included in the text information is separated into independent word, and is each Word sets the point identification of unique corresponding number-mark and next metadata.
4. word semantic analysis according to claim 3, it is characterised in that the word letter for receiving user's input Before breath, in addition to step:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase between contained metadata Incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, closed by the association System, finds out the metadata corresponding to the word.
5. word semantic analysis according to claim 4, it is characterised in that between each metadata of calculating Semantic similarity and Feature item weighting, and the word sequence is extracted according to the semantic similarity and Feature item weighting calculated The step of keyword feature item, includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, calculate between each metadata Semantic similarity and Feature item weighting.
6. a kind of word semantic analysis terminal, it is characterised in that the word semantic analysis terminal includes:Processor, memory And the word semantic analyzer that can be run on the memory and on the processor is stored in, wherein the word is semantic Following steps are realized when analysis program is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the text information In the character string that includes be separated into independent word, obtain word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, and will deposit Filtered out in the phrase that the word or adjacent words of syntax error form;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic similarity between each metadata And Feature item weighting, and according to the keyword feature of semantic similarity and Feature item weighting the extraction word sequence calculated , and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semantic marker is literary Originally it is stored in text database;
Put in order according to each word in word sequence, successively matched from the text database corresponding to semantic mark Remember text, and the text message output display that will be synthesized after sequence.
7. word semantic analysis terminal according to claim 6, it is characterised in that the word semantic analyzer is by institute When stating computing device, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and is each Word sets the point identification of unique corresponding number-mark and next metadata.
8. word semantic analysis terminal according to claim 7, it is characterised in that the word semantic analyzer is by institute When stating computing device, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase between contained metadata Incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, closed by the association System, finds out the metadata corresponding to the word.
9. word semantic analysis terminal according to claim 7, it is characterised in that the word semantic analyzer is by institute When stating computing device, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, calculate between each metadata Semantic similarity and Feature item weighting.
10. a kind of computer-readable recording medium, it is characterised in that word semantic analysis is stored on the computer-readable storage medium Program, the word language as any one of claim 1 to 5 is realized when the word semantic analyzer is executed by processor Adopted analysis method.
CN201710995052.0A 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium Active CN107704453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710995052.0A CN107704453B (en) 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710995052.0A CN107704453B (en) 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium

Publications (2)

Publication Number Publication Date
CN107704453A true CN107704453A (en) 2018-02-16
CN107704453B CN107704453B (en) 2021-10-08

Family

ID=61181999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710995052.0A Active CN107704453B (en) 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium

Country Status (1)

Country Link
CN (1) CN107704453B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845985A (en) * 2018-05-28 2018-11-20 济南浪潮高新科技投资发展有限公司 A kind of information matching method and information matches device
CN110276082A (en) * 2019-06-06 2019-09-24 百度在线网络技术(北京)有限公司 Translation processing method and device based on dynamic window
CN110489127A (en) * 2019-08-12 2019-11-22 腾讯科技(深圳)有限公司 Error code determines method, apparatus, computer readable storage medium and equipment
CN111192682A (en) * 2019-12-25 2020-05-22 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111309306A (en) * 2020-02-24 2020-06-19 福建天晴数码有限公司 Man-machine interactive dialogue management system
CN111310477A (en) * 2020-02-24 2020-06-19 成都网安科技发展有限公司 Document query method and device
CN111382173A (en) * 2018-12-25 2020-07-07 横河电机株式会社 Engineering support system and engineering support method
CN111680130A (en) * 2020-06-16 2020-09-18 深圳前海微众银行股份有限公司 Text retrieval method, device, equipment and storage medium
CN111782896A (en) * 2020-07-03 2020-10-16 深圳市壹鸽科技有限公司 Text processing method and device after voice recognition and terminal
CN111881179A (en) * 2020-07-20 2020-11-03 易通星云(北京)科技发展有限公司 Data matching method, device and equipment thereof, and computer storage medium
US10832679B2 (en) 2018-11-20 2020-11-10 International Business Machines Corporation Method and system for correcting speech-to-text auto-transcription using local context of talk
CN112347767A (en) * 2021-01-07 2021-02-09 腾讯科技(深圳)有限公司 Text processing method, device and equipment
CN113705230A (en) * 2021-08-31 2021-11-26 中国平安财产保险股份有限公司 Artificial intelligence-based policy agreement assessment method, device, equipment and medium
CN113792608A (en) * 2021-08-19 2021-12-14 广州云硕科技发展有限公司 Intelligent semantic analysis method and system
CN114707045A (en) * 2022-03-23 2022-07-05 江苏悉宁科技有限公司 Big data-based public opinion monitoring method and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110812A (en) * 2007-08-29 2008-01-23 中兴通讯股份有限公司 Text command analyzing and processing method
CN102375826A (en) * 2010-08-13 2012-03-14 中国移动通信集团公司 Structured query language script analysis method, device and system
US20140019385A1 (en) * 2009-03-06 2014-01-16 Tagged, Inc. Generating a document representation using semantic networks
CN103927358A (en) * 2014-04-15 2014-07-16 清华大学 Text search method and system
CN104199965A (en) * 2014-09-22 2014-12-10 吴晨 Semantic information retrieval method
CN104239513A (en) * 2014-09-16 2014-12-24 西安电子科技大学 Semantic retrieval method oriented to field data
CN105160046A (en) * 2015-10-30 2015-12-16 成都博睿德科技有限公司 Text-based data retrieval method
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN105389297A (en) * 2015-12-21 2016-03-09 浙江万里学院 Text similarity processing method
US20160350764A1 (en) * 2014-08-01 2016-12-01 Almawave S.R.L. System and method for meaning driven process and information management to improve efficiency, quality of work, and overall customer satisfaction
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110812A (en) * 2007-08-29 2008-01-23 中兴通讯股份有限公司 Text command analyzing and processing method
US20140019385A1 (en) * 2009-03-06 2014-01-16 Tagged, Inc. Generating a document representation using semantic networks
CN102375826A (en) * 2010-08-13 2012-03-14 中国移动通信集团公司 Structured query language script analysis method, device and system
CN103927358A (en) * 2014-04-15 2014-07-16 清华大学 Text search method and system
US20160350764A1 (en) * 2014-08-01 2016-12-01 Almawave S.R.L. System and method for meaning driven process and information management to improve efficiency, quality of work, and overall customer satisfaction
CN104239513A (en) * 2014-09-16 2014-12-24 西安电子科技大学 Semantic retrieval method oriented to field data
CN104199965A (en) * 2014-09-22 2014-12-10 吴晨 Semantic information retrieval method
CN105160046A (en) * 2015-10-30 2015-12-16 成都博睿德科技有限公司 Text-based data retrieval method
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN105389297A (en) * 2015-12-21 2016-03-09 浙江万里学院 Text similarity processing method
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张乃静 等: "基于本体的林业领域文档特征权重模型", 《计算机工程与应用》 *
赵彦锋 等: "领域本体的语义相似度算法研究", 《软件导刊》 *
赵治军 等: "基于VSM的OAI-PMH元数据相似度计算研究", 《计算机技术与发展》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845985A (en) * 2018-05-28 2018-11-20 济南浪潮高新科技投资发展有限公司 A kind of information matching method and information matches device
CN108845985B (en) * 2018-05-28 2022-02-18 山东浪潮科学研究院有限公司 Information matching method and information matching device
US10832679B2 (en) 2018-11-20 2020-11-10 International Business Machines Corporation Method and system for correcting speech-to-text auto-transcription using local context of talk
CN111382173A (en) * 2018-12-25 2020-07-07 横河电机株式会社 Engineering support system and engineering support method
CN110276082A (en) * 2019-06-06 2019-09-24 百度在线网络技术(北京)有限公司 Translation processing method and device based on dynamic window
CN110489127A (en) * 2019-08-12 2019-11-22 腾讯科技(深圳)有限公司 Error code determines method, apparatus, computer readable storage medium and equipment
CN110489127B (en) * 2019-08-12 2023-10-13 腾讯科技(深圳)有限公司 Error code determination method, apparatus, computer-readable storage medium and device
CN111192682A (en) * 2019-12-25 2020-05-22 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111192682B (en) * 2019-12-25 2024-04-09 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111309306B (en) * 2020-02-24 2023-07-28 福建天晴数码有限公司 Man-machine interaction dialogue management system
CN111310477A (en) * 2020-02-24 2020-06-19 成都网安科技发展有限公司 Document query method and device
CN111309306A (en) * 2020-02-24 2020-06-19 福建天晴数码有限公司 Man-machine interactive dialogue management system
CN111680130A (en) * 2020-06-16 2020-09-18 深圳前海微众银行股份有限公司 Text retrieval method, device, equipment and storage medium
CN111782896A (en) * 2020-07-03 2020-10-16 深圳市壹鸽科技有限公司 Text processing method and device after voice recognition and terminal
CN111782896B (en) * 2020-07-03 2023-12-12 深圳市壹鸽科技有限公司 Text processing method, device and terminal after voice recognition
CN111881179A (en) * 2020-07-20 2020-11-03 易通星云(北京)科技发展有限公司 Data matching method, device and equipment thereof, and computer storage medium
CN111881179B (en) * 2020-07-20 2024-03-01 易通星云(北京)科技发展有限公司 Data matching method, device and equipment thereof, and computer storage medium
CN112347767A (en) * 2021-01-07 2021-02-09 腾讯科技(深圳)有限公司 Text processing method, device and equipment
CN112347767B (en) * 2021-01-07 2021-04-06 腾讯科技(深圳)有限公司 Text processing method, device and equipment
CN113792608A (en) * 2021-08-19 2021-12-14 广州云硕科技发展有限公司 Intelligent semantic analysis method and system
CN113792608B (en) * 2021-08-19 2022-05-10 广州云硕科技发展有限公司 Intelligent semantic analysis method and system
CN113705230A (en) * 2021-08-31 2021-11-26 中国平安财产保险股份有限公司 Artificial intelligence-based policy agreement assessment method, device, equipment and medium
CN113705230B (en) * 2021-08-31 2023-08-25 中国平安财产保险股份有限公司 Method, device, equipment and medium for evaluating policy specifications based on artificial intelligence
CN114707045B (en) * 2022-03-23 2023-09-26 江苏悉宁科技有限公司 Public opinion monitoring method and system based on big data
CN114707045A (en) * 2022-03-23 2022-07-05 江苏悉宁科技有限公司 Big data-based public opinion monitoring method and system

Also Published As

Publication number Publication date
CN107704453B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN107704453A (en) A kind of word semantic analysis, word semantic analysis terminal and storage medium
EP3096246A1 (en) Method, system and storage medium for realizing intelligent answering of questions
CN108549637A (en) Method for recognizing semantics, device based on phonetic and interactive system
CN107491534A (en) Information processing method and device
CN109299457A (en) A kind of opining mining method, device and equipment
CN111026842A (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN111046656B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN106960030A (en) Pushed information method and device based on artificial intelligence
CN107301170A (en) The method and apparatus of cutting sentence based on artificial intelligence
CN110457689B (en) Semantic processing method and related device
CN109345282A (en) A kind of response method and equipment of business consultation
CN109857846B (en) Method and device for matching user question and knowledge point
CN108804529A (en) A kind of question answering system implementation method based on Web
CN108171073A (en) A kind of private data recognition methods based on the parsing driving of code layer semanteme
CN104715063B (en) search ordering method and device
CN111182162A (en) Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence
CN111694940A (en) User report generation method and terminal equipment
US11699034B2 (en) Hybrid artificial intelligence system for semi-automatic patent infringement analysis
CN109492081A (en) Text information search and information interacting method, device, equipment and storage medium
CN104391969B (en) Determine the method and device of user&#39;s query statement syntactic structure
CN108170678A (en) A kind of text entities abstracting method and system
CN108304424A (en) Text key word extracting method and text key word extraction element
CN112115252A (en) Intelligent auxiliary writing processing method and device, electronic equipment and storage medium
CN105956181A (en) Searching method and apparatus
CN112347339A (en) Search result processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen Qianhai Zhongxing scientific research Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: SHENZHEN QIANHAI ZHONGXING E-COMMERCE Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant