CN107704453B - Character semantic analysis method, character semantic analysis terminal and storage medium - Google Patents

Character semantic analysis method, character semantic analysis terminal and storage medium Download PDF

Info

Publication number
CN107704453B
CN107704453B CN201710995052.0A CN201710995052A CN107704453B CN 107704453 B CN107704453 B CN 107704453B CN 201710995052 A CN201710995052 A CN 201710995052A CN 107704453 B CN107704453 B CN 107704453B
Authority
CN
China
Prior art keywords
word
metadata
words
semantic
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710995052.0A
Other languages
Chinese (zh)
Other versions
CN107704453A (en
Inventor
胡明灯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yuanxing Internet Technology Co ltd
Original Assignee
Shenzhen Qianhai Zhongxing Scientific Research Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Zhongxing Scientific Research Co ltd filed Critical Shenzhen Qianhai Zhongxing Scientific Research Co ltd
Priority to CN201710995052.0A priority Critical patent/CN107704453B/en
Publication of CN107704453A publication Critical patent/CN107704453A/en
Application granted granted Critical
Publication of CN107704453B publication Critical patent/CN107704453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a character semantic analysis method, a character semantic analysis terminal and a storage medium, wherein character strings contained in character information are separated into independent words by receiving the character information input by a user, so that a word sequence is obtained; carrying out syntactic analysis on the separated word sequences, and judging whether syntactic errors exist in the word sequences; converting words contained in the word sequence into corresponding metadata, calculating semantic similarity and feature item weight among the metadata, extracting keyword feature items of the word sequence to obtain semantic tagged texts corresponding to the words, establishing a text database, matching the semantic tagged texts from the text database in sequence according to the arrangement sequence of the words in the word sequence, and outputting and displaying the sequenced and synthesized text information. The invention feeds back the information to the user through the format of the metadata, thereby facilitating the user to obtain the information fed back by the semantic analysis terminal and correctly understand and use the information.

Description

Character semantic analysis method, character semantic analysis terminal and storage medium
Technical Field
The present invention relates to the technical field of semantic analysis, and in particular, to a text semantic analysis method, a text semantic analysis terminal, and a storage medium.
Background
At present, an interactive mode between human and machines is a text conversation mode, information acquisition and filtering cannot achieve the expected purpose, and the meaning of the speech spoken by the current user cannot be accurately identified, for example, "can you go in the sea? "but the machine can be understood as meaning" the sea is not at home ", and the meaning of our user is" can we go to the sea to eat? Although the text type conversation is adopted, the meaning expressed by the human can be varied, and the semantic analysis method of the text conversation has the following inconveniences:
firstly, generally, the meaning expressed by a user is rich in unique human emotion, and if the simple text conversation semantic analysis method is adopted, the machine cannot identify the meaning really expressed by the user; in fact, even though the machine may recognize most of the user's meanings, the meanings that may be expressed by the machine are different; thirdly, if the human-computer conversation is the simple text conversation and the data is not encrypted, sampled and analyzed and output encrypted, the safety of the information cannot be guaranteed, so that the information can be easily cracked and obtained by people with thoughts or hackers, and the transmission of the data information is not facilitated.
Accordingly, further improvements are needed in the art.
Disclosure of Invention
In view of the above technical problems, embodiments of the present invention provide a text semantic analysis method, a text semantic analysis terminal, and a storage medium, so as to help an existing human-computer conversation not to identify a true meaning of information expressed by a user, and solve a problem of an information transfer error.
A first aspect of an embodiment of the present invention provides a method for semantic analysis of characters, where the method for semantic analysis of characters includes the following steps:
receiving character information input by a user, performing lexical analysis on the input character information, and separating character strings contained in the character information into independent words to obtain a word sequence;
carrying out syntactic analysis on the separated word sequences, judging whether grammatical errors exist in the word sequences, and filtering out words with grammatical errors or phrases formed by adjacent words;
converting words contained in a word sequence into corresponding metadata, calculating semantic similarity and feature item weight among the metadata, extracting keyword feature items of the word sequence according to the calculated semantic similarity and feature item weight, obtaining semantic label texts corresponding to the words according to the keyword feature items, and storing the semantic label texts in a text database;
and matching corresponding semantic mark texts from the text database in sequence according to the arrangement sequence of each word in the word sequence, and outputting and displaying the text information synthesized after sequencing.
Optionally, the text information input by the user includes: identity information of the user and question information input by the user;
the identity information of the user comprises: user ID information byte, user name byte and mobile phone number byte.
Optionally, the step of separating the character string contained in the text information into independent words includes:
and using a blank space as a separator to separate the character string contained in the character information into independent words, and setting a unique corresponding number identification and a next metadata pointing identification for each word.
Optionally, before receiving the text information input by the user, the method further includes:
creating a metadata base for storing metadata, and establishing an association relation between a word catalogue and the metadata contained in the metadata base;
in the step of converting the words contained in the word sequence into corresponding metadata, the metadata corresponding to the words is found out through the association relationship.
Optionally, the step of calculating semantic similarity and feature weight between metadata, and extracting the keyword feature of the word sequence according to the calculated semantic similarity and feature weight includes:
and calculating semantic similarity and feature item weight among metadata by adopting a word similarity analysis method based on a corpus and a word vector space model.
A second aspect of the embodiments of the present invention provides a text semantic analysis terminal, where the text semantic analysis terminal includes: a processor, a memory, and a word semantic analysis program stored on the memory and executable on the processor, wherein the word semantic analysis program when executed by the processor performs the steps of:
receiving character information input by a user, performing lexical analysis on the input character information, and separating character strings contained in the character information into independent words to obtain a word sequence;
carrying out syntactic analysis on the separated word sequences, judging whether grammatical errors exist in the word sequences, and filtering out words with grammatical errors or phrases formed by adjacent words;
converting words contained in a word sequence into corresponding metadata, calculating semantic similarity and feature item weight among the metadata, extracting keyword feature items of the word sequence according to the calculated semantic similarity and feature item weight, obtaining semantic label texts corresponding to the words according to the keyword feature items, and storing the semantic label texts in a text database;
and matching corresponding semantic mark texts from the text database in sequence according to the arrangement sequence of each word in the word sequence, and outputting and displaying the text information synthesized after sequencing.
Optionally, when executed by the processor, the text semantic analysis program further implements the following steps:
and using a blank space as a separator to separate the character string contained in the character information into independent words, and setting a unique corresponding number identification and a next metadata pointing identification for each word.
Optionally, when executed by the processor, the text semantic analysis program further implements the following steps:
creating a metadata base for storing metadata, and establishing an association relation between a word catalogue and the metadata contained in the metadata base;
in the step of converting the words contained in the word sequence into corresponding metadata, the metadata corresponding to the words is found out through the association relationship.
Optionally, when executed by the processor, the text semantic analysis program further implements the following steps:
and calculating semantic similarity and feature item weight among metadata by adopting a word similarity analysis method based on a corpus and a word vector space model.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium, where a text semantic analysis program is stored on the computer-readable storage medium, and when executed by a processor, the text semantic analysis program implements the text semantic analysis method.
In the technical scheme provided by the embodiment of the invention, the information input by the user is stored in the form of metadata, the metadata can be properly analyzed and identified, and then the information is fed back to the user through the structural format of the metadata, so that the information irrelevant to the user is removed and only the information concerned by the user is pushed to the user when the information is fed back to the user, and the user can conveniently obtain the information fed back by the machine and correctly understand and use the information.
Drawings
FIG. 1 is a flow chart illustrating the steps of a text semantic analysis method according to the present invention;
FIG. 2 is a schematic block diagram illustrating the principle of the text semantic analysis method according to the present invention;
FIG. 3 is a flowchart of steps of a specific application embodiment of the text semantic analysis method according to the present invention;
fig. 4 is a schematic structural block diagram of the text semantic analysis terminal according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In computer terminology, semantic analysis is a logical phase of the compilation process, and the task of semantic analysis is to perform type-based review of the context-dependent nature of a structurally correct source program. While the architecturally incorrect source program cannot enter the review stage, it is possible that the architecturally incorrect source program may be correct in context, type, and only reports an error when the program is compiled. Semantic analysis is to examine whether a source program has semantic errors or not and collect type information for a code generation stage. One task, such as semantic analysis, is to perform type checking, to check whether each operator has an operand allowed by the language specification, and when not meeting the language specification, the compiler should report an error. Some compilers report errors for cases where real numbers are used as array indices. Also, for example, some programs specify that operands may be forced, and when such operations are applied to an integer and a real object, the compiler should convert the integer into the real and not be considered an error in the source program.
At present, the communication between people is carried out smoothly mainly by using languages and characters as tools, the meaning expressed by people is correctly understood, the conversation between people and human is in a character mode, a computer machine can only recognize two numerical symbols of '0' and '1', the human-computer conversation needs to be transmitted through a computer instruction, in the transmission process, data such as the instruction and the like are firstly input into the computer through an input device, the processing result is stored in the computer, and finally the processing result is displayed through an output device of the computer, so that people can read and listen. However, in the process of data storage and transmission, a series of processing needs to be performed on the data to achieve smooth communication between people and machines, so that correct communication between people is achieved. The metadata management mode adopted by the invention provides guarantee and implementation mechanisms for the process.
Metadata, which is actually a coding scheme, is data that describes other data; the coding system is commonly used for describing digital information resources, particularly network information resources; it is also a structured data; metadata refers to structured data extracted from information resources, such as course names, speakers, duration and the like, and used for organizing, retrieving, describing, storing and managing information and knowledge resources, and used for explaining the characteristics and the content of the information resources; such as the lecture information (information resources) of the teacher of our online club lecture, we can retrieve the obtained information in the club application, such as the course name: quality management, mainly speaking: teacher and Wei, time of lecture: 6 and 21 months in 2017. Because a basic metadata is composed of metadata items and metadata contents, after the metadata is used for describing resources, the resources can be effectively filtered and classified, and the standard specification of the metadata is added, so that the effective contents and the unavailable contents of resource information can be distinguished, and the correct meaning of the information can be well expressed; over the years of development, the format of metadata has been able to support the format of xml, html, etc., which is convenient for people to customize tags themselves, that is, so-called metadata, and through the mode of such tags, users can firstly look at tags (metadata) when using data so as to be able to obtain information required by themselves, and metadata supports the expansion of metadata through the use of attributes.
The invention provides a semantic analysis method, as shown in fig. 1, the analysis method comprises the following steps:
step 101, receiving character information input by a user, performing lexical analysis on the input character information, and separating character strings contained in the character information into independent words to obtain a word sequence.
In this step, firstly, the text information sent by the user through the client is received. In the implementation, the user uses the client, such as: and the app installed in the mobile terminal sends the text information, and the client sends the received text information to the server.
Specifically, the text information input by the user includes: identity information of the user and question information input by the user;
the identity information of the user comprises: user ID information byte, user name byte and mobile phone number byte.
It is conceivable that the user identity information may be information that needs to be input by the user each time the user sends information, or the user identity information may be stored first, and when the user needs to send information, the problem information input by the user and the pre-stored identity information are packaged and sent.
In this step, the step of separating the character string included in the character information into independent words includes:
and using a blank space as a separator to separate the character string contained in the character information into independent words, and setting a unique corresponding number identification and a next metadata pointing identification for each word.
Because the information input by the user is all characters, the step firstly carries out lexical analysis on the input information, sequentially divides the character strings according to the formats of the words, identifies the words in the character strings, and kicks out the characters which cannot be identified and combined.
And 102, carrying out syntactic analysis on the separated word sequences, judging whether grammatical errors exist in the word sequences, and filtering out words with grammatical errors or phrases formed by adjacent words.
The separated word sequences are parsed to determine whether there are word combinations that do not conform to the grammar, codes are generated by assigning attributes of the language structure to non-terminal characters representing the language structure, and attribute values are calculated by semantic rules attached to grammar production formulas to perform grammar-directed translation, and semantic translation of context-free grammars.
The method also comprises the following steps: through the analysis and judgment of the assignment statement, the arithmetic expression and the logic expression in the word sequence, inconsistent word groups of variable types are filtered.
Step 103, converting words contained in the word sequence into corresponding metadata, calculating semantic similarity and feature item weight between the metadata, extracting keyword feature items of the word sequence according to the calculated semantic similarity and feature item weight, obtaining semantic label texts corresponding to the words according to the keyword feature items, and storing the semantic label texts in a text database.
And converting each word into corresponding metadata, and performing semantic analysis on information input by a user by establishing a metadata model to obtain the intention of the information.
Before the step of receiving the text information input by the user, the method further comprises the following steps:
creating a metadata base for storing metadata, and establishing an association relation between a word catalogue and the metadata contained in the metadata base;
in the step of converting the words contained in the word sequence into corresponding metadata, the metadata corresponding to the words is found out through the association relationship.
Specifically, semantic analysis of text conversation and user information is performed on the basis of metadata management. The semantic analysis is to obtain key information of the problems input by the user by calculating semantic similarity and feature item weight between metadata, and to build semantic tagged texts of the problems input by the user according to the key information, that is, to execute semantic tags of text conversations through semantic analysis, and to store text documents with semantic tags into a tagged text database (metadata database).
Preferably, the step of calculating the semantic similarity and the feature weight between the metadata and extracting the keyword feature item of the word sequence according to the calculated semantic similarity and the feature item weight includes:
and calculating semantic similarity and feature item weight among metadata by adopting a word similarity analysis method based on a corpus and a word vector space model.
And step 104, matching corresponding semantic mark texts from the text database in sequence according to the arrangement sequence of each word in the word sequence, and outputting and displaying the text information synthesized after sequencing.
Because the obtained semantic tagged text documents corresponding to the word sequence are respectively independent information and are not combined into text information, in the step, the semantic tagged text documents of the independent information are sorted according to the unique corresponding serial number identifier of each word and the pointing identifier of the metadata corresponding to the next word, and the text information is synthesized and output. The text information is the correct expression of the user input question.
Fig. 2 is a schematic block diagram illustrating an interaction flow of a text conversation semantic analysis method based on metadata management according to an embodiment of the present invention, and for convenience of description, the method of the present invention is further explained with reference to fig. 3. The method of the embodiment of the invention comprises the following steps:
and step H1, inputting relevant text information and sending a request to the terminal after the user opens the client or the application in the mobile phone.
The request includes identity information of the user and issue information entered by the user.
After a user inputs information through the application of the mobile phone end, the user information and the information input by the user can be stored by the application and are stored in a database; at this point the application will issue a request to the machine that the content contains the user information and the entered information. As a specific implementation manner, the input information includes a user ID information byte, a user name byte, a mobile phone number byte, a title byte, and a submission time byte.
And step H2, the server terminal receives the request sent by the client and performs primary lexical analysis on the information input by the client.
And when the server terminal receives the information input by the user and transmitted by the client, the server terminal transmits data to the background server. In the process of data transmission, the server needs to perform preliminary preprocessing operation on the information of the user and perform information lexical analysis.
Specifically, the lexical analysis is as follows: the method comprises the steps of scanning user input information in a left-to-right sequence, identifying various words according to the lexical rules of the language, and generating attribute words of the corresponding words. That is, converting a character sequence input by a user into a word (Token) sequence. Then, qualitative and fixed-length processing is carried out on the recognized words.
By preprocessing the user input information, it is possible to classify words, such as the input information "I am Chinese", since the computer does not know that the two words are separated by a space, and only knows that the words are a character string composed of ordinary characters. The morphemes may be segmented from the input string by some method, here using spaces as separators. The segmented result can be expressed in XML as follows: < sensor >
<word>I</word>
<word>am</word>
<word>Chinese</word>
</sentence>
And step H3, carrying out grammar analysis on the word sequence obtained in the step H2, identifying errors in information grammar, and filtering.
The grammar analysis is also a logic stage of the compiling process, and the task of the grammar analysis is to combine word sequences into various grammar phrases on the basis of the lexical analysis, then judge the structure of the word sequences, judge whether the word sequences are normal or not, and describe the structure through context-free grammar.
Step H4, converting the words in the word sequence into metadata, performing semantic analysis on the metadata to obtain a semantic tagged text corresponding to the user input information, and storing the semantic tagged text in a text database;
after the lexical analysis and the syntactic analysis, the information data are basically available, but the problems of ambiguity and understanding inequality cannot be solved, at this time, the data format is classified and recombined, the data format is converted into a structure mode of metadata for storage, then the metadata is systematically managed, the processing mode of converting the data into the metadata is realized, then semantic analysis is carried out, the true information purpose and intention of a user are obtained, namely, the word sequence is sequentially carried out: and converting the word sequence into a corresponding metadata sequence after processing of semantic expression, semantic organization, semantic storage and ambiguity elimination.
Before, the source program is subjected to lexical analysis and syntactic analysis, and the semantic analysis work is performed in the third stage, which is the most substantial work of a compiler. In the first two steps, lexical analysis and syntactic analysis are both used for recognizing and processing the form of the source program, and semantic analysis is used for explaining the semantics of the source program to cause the sending quality of the source program to change. The semantic analysis mainly comprises the following steps: grammar guide translation, symbol table, type check, intermediate language, and generation of intermediate code. The machine performs semantic analysis on the data information when the background server acquires the data information transmitted by the front end, and the invention encapsulates the data information into a metadata model to perform semantic analysis operation. The semantic analysis module is internally provided with an ontology and an entity dictionary. The ontology is used for performing semantic analysis on the text, basic composition units of the ontology are concepts, the concepts form a concept tree, and the concept tree forms the ontology. Text conceptualization solves the problem of word ambiguity or word ambiguity. The entity dictionary is used for performing entity extraction on the text so as to abandon the content without actual meaning in the text and simplify the calculated amount of subsequent text processing, reasoning is performed through frame logic or description logic, data in an information source is collected, mode information of each local database is stored in a metadata database according to a specified format, a global ontology of a corresponding field is established by analyzing semantic relations among metadata, semantic marking of the text document is performed through semantic analysis, and the text document with the semantic marking is stored in a marked text document database.
Specifically, the semantic similarity is used for analyzing the similarity degree between two words, is mainly used in the fields of text word disambiguation, information retrieval, information extraction, machine translation and the like, and has strong subjectivity, so the semantic similarity cannot be analyzed without a specific application environment. At present, two calculation methods exist in the semantic similarity analysis field, one is that the concept of related words is organized in a tree structure through a semantic dictionary to calculate; the other method is to solve the problem by using a statistical method through the information of the word context. In combination with the application scenario of the invention, the algorithms of the invention adopting semantic similarity and feature item weight calculation are all the existing mature algorithms: the method adopts a word similarity analysis method based on a corpus and adopts an algorithm formula:
Sim(W1,W2)=aDis(W1,W2)+a;
wherein, the similarity is Sim (W1, W2), a is an adjustable parameter, and the meaning is: the distance between the words W1, W2 is Dis (W1, W2) when the similarity is 0.5. The weight calculation formula of the characteristic term is as follows: w is tf multiplied by idf, wherein w is the weight value of the feature item t in the document d, tf represents the frequency of t occurring within d, and idf represents the inversely proportional text frequency of t. The method is adopted, and the word vector space model is widely applied, and comprises the following steps: preprocessing- > text feature item selection- > weighting- > generating a vector space model and then calculating the cosine. The model obtains a feature word vector of the relevance of each word by selecting a group of feature words in advance and then calculating the relevance of the group of feature words and each word, and the similarity between the vectors is used as the similarity between the two words.
After metadata conversion and semantic analysis are carried out on user data, a machine generates corresponding correct answers according to data information and stores the correct answers in a database to serve as an information source of an output end.
Step H5, after semantic analysis is carried out on the user data, the machine generates the user data into an application knowledge base system according to corresponding standards, the characteristics of each data are clearly identified in the knowledge base system, after the user inputs information, the machine searches and selects the knowledge base to find the matched data to respond, namely, the semantic analysis result is stored in the semantic knowledge base, after the user inputs information, the matched knowledge is obtained by detecting from the knowledge base, and then the needed analysis result is obtained by semantic association discovery.
Although data information is converted through metadata and analyzed and answers are generated based on semantics on a metadata structure, the data information cannot be immediately output to a user side for display, because the information at the moment is not coherent and belongs to an isolated and dispersed state, the data needs to be further processed at the moment, a relationship is established between the data and the data, and by establishing the relationship, because each metadata data has a unique identifier which is provided with a number identification input by a user and a pointing identification of the next metadata, after the data input by the user is started, the data information is automatically searched in a question knowledge base, a corresponding question answer data text is searched, the text and the text are combined to form a corresponding final result of a question input by the user, and then the machine can feed back the information synthesized by the whole text to the user as a response of the machine to the user, to meet the user's intent.
A second aspect of the embodiment of the present invention provides a text semantic analysis terminal, as shown in fig. 3, where the text semantic analysis terminal 10 includes: a processor 110, a memory 120, and a text semantic analysis program stored on the memory and executable on the processor, wherein the text semantic analysis program when executed by the processor performs the steps of:
receiving character information input by a user, performing lexical analysis on the input character information, and separating character strings contained in the character information into independent words to obtain a word sequence;
carrying out syntactic analysis on the separated word sequences, judging whether grammatical errors exist in the word sequences, and filtering out words with grammatical errors or phrases formed by adjacent words;
converting words contained in a word sequence into corresponding metadata, calculating semantic similarity and feature item weight among the metadata, extracting keyword feature items of the word sequence according to the calculated semantic similarity and feature item weight, obtaining semantic label texts corresponding to the words according to the keyword feature items, and storing the semantic label texts in a text database;
and matching corresponding semantic mark texts from the text database in sequence according to the arrangement sequence of each word in the word sequence, and outputting and displaying the text information synthesized after sequencing.
Further, when executed by the processor 110, the text semantic analysis program further implements the following steps:
and using a blank space as a separator to separate the character string contained in the character information into independent words, and setting a unique corresponding number identification and a next metadata pointing identification for each word.
Preferably, when executed by the processor 110, the text semantic analysis program further implements the following steps:
creating a metadata base for storing metadata, and establishing an association relation between a word catalogue and the metadata contained in the metadata base; and the directories contained in the metadata database establish different hierarchies according to different metadata types, so that the corresponding metadata can be inquired more quickly according to the directories.
In the step of converting the words contained in the word sequence into corresponding metadata, the metadata corresponding to the words is found out through the association relationship.
Preferably, when executed by the processor 110, the text semantic analysis program further implements the following steps:
and calculating semantic similarity and feature item weight among metadata by adopting a word similarity analysis method based on a corpus and a word vector space model.
Memory 120, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 110 executes various functional applications of the server and data processing by running the nonvolatile software programs, instructions and modules stored in the memory 120, that is, implements the text semantic analysis method of the above method embodiment.
The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the report automatic generation system, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to a text semantic analysis terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 120 and, when executed by the one or more processors 110, perform the text semantic analysis method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium, where a text semantic analysis program is stored on the computer-readable storage medium, and when executed by a processor, the text semantic analysis program implements the text semantic analysis method.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
In the invention, when a user needs to acquire information resources, the user sends a corresponding instruction command to the machine, and the machine acquires the command of the user and further stores the command information of the user; in the invention, the data information is stored through the format of the metadata, when the information resource of the user is stored in the metadata, the metadata can be properly analyzed and identified, and then the information is fed back to the user through the structural format of the metadata, and when the information is fed back to the user, the information irrelevant to the user is removed, and only the information concerned by the user is pushed, so that the user can conveniently obtain the information fed back by the semantic analysis terminal, and correctly understand and use the information.
It should be understood that the technical solutions and concepts of the present invention may be equally replaced or changed by those skilled in the art, and all such changes or substitutions should fall within the protection scope of the appended claims.

Claims (10)

1. A method for analyzing word semantics is characterized by comprising the following steps:
receiving character information input by a user, performing lexical analysis on the input character information, and separating character strings contained in the character information into independent words to obtain a word sequence;
carrying out syntactic analysis on the separated word sequences, judging whether grammatical errors exist in the word sequences, and filtering out words with grammatical errors or phrases formed by adjacent words;
converting words contained in a word sequence into corresponding metadata, calculating semantic similarity and feature item weight among the metadata, extracting keyword feature items of the word sequence according to the calculated semantic similarity and feature item weight, obtaining semantic label texts corresponding to the words according to the keyword feature items, and storing the semantic label texts in a text database;
matching corresponding semantic mark texts from the text database in sequence according to the arrangement sequence of each word in the word sequence, and outputting and displaying the text information synthesized after sequencing;
the algorithms adopting semantic similarity and feature item weight calculation are the existing mature algorithms: the method adopts a word similarity analysis method based on a corpus and adopts an algorithm formula:
Sim(W1,W2)=aDis(W1,W2)+a;
wherein, the similarity is Sim (W1, W2), a is an adjustable parameter, and means: the distance between the words W1 and W2 is Dis (W1 and W2) when the similarity is 0.5; the weight calculation formula of the characteristic term is as follows: w is tf multiplied by idf, wherein w is the weight value of the feature item t in the document d, tf represents the frequency of t occurring in d, and idf represents the inversely proportional text frequency of t; the method is adopted, and the word vector space model is widely applied, and comprises the following steps: preprocessing- > text feature item selection- > weighting- > generating a vector space model and then calculating cosine; the model obtains a feature word vector of the relevance of each word by selecting a group of feature words in advance and then calculating the relevance of the group of feature words and each word, and the similarity between the vectors is used as the similarity between the two words.
2. The text semantic analysis method according to claim 1, wherein the text information input by the user comprises: identity information of the user and question information input by the user;
the identity information of the user comprises: user ID information byte, user name byte and mobile phone number byte.
3. The method for semantic analysis of words according to claim 2, wherein the step of separating the character string included in the word information into independent words comprises:
and using a blank space as a separator to separate the character string contained in the character information into independent words, and setting a unique corresponding number identification and a next metadata pointing identification for each word.
4. The text semantic analysis method according to claim 3, wherein before receiving text information input by a user, the text semantic analysis method further comprises the steps of:
creating a metadata base for storing metadata, and establishing an association relation between a word catalogue and the metadata contained in the metadata base;
in the step of converting the words contained in the word sequence into corresponding metadata, the metadata corresponding to the words is found out through the association relationship.
5. The method for analyzing word semantics according to claim 4, wherein the step of calculating the semantic similarity and feature item weight between the metadata and extracting the keyword feature item of the word sequence according to the calculated semantic similarity and feature item weight comprises:
and calculating semantic similarity and feature item weight among metadata by adopting a word similarity analysis method based on a corpus and a word vector space model.
6. A character semantic analysis terminal, characterized by comprising: a processor, a memory, and a word semantic analysis program stored on the memory and executable on the processor, wherein the word semantic analysis program when executed by the processor performs the steps of:
receiving character information input by a user, performing lexical analysis on the input character information, and separating character strings contained in the character information into independent words to obtain a word sequence;
carrying out syntactic analysis on the separated word sequences, judging whether grammatical errors exist in the word sequences, and filtering out words with grammatical errors or phrases formed by adjacent words;
converting words contained in a word sequence into corresponding metadata, calculating semantic similarity and feature item weight among the metadata, extracting keyword feature items of the word sequence according to the calculated semantic similarity and feature item weight, obtaining semantic label texts corresponding to the words according to the keyword feature items, and storing the semantic label texts in a text database;
matching corresponding semantic mark texts from the text database in sequence according to the arrangement sequence of each word in the word sequence, and outputting and displaying the text information synthesized after sequencing;
the algorithms adopting semantic similarity and feature item weight calculation are the existing mature algorithms: the method adopts a word similarity analysis method based on a corpus and adopts an algorithm formula:
Sim(W1,W2)=aDis(W1,W2)+a;
wherein, the similarity is Sim (W1, W2), a is an adjustable parameter, and means: the distance between the words W1 and W2 is Dis (W1 and W2) when the similarity is 0.5; the weight calculation formula of the characteristic term is as follows: w is tf multiplied by idf, wherein w is the weight value of the feature item t in the document d, tf represents the frequency of t occurring in d, and idf represents the inversely proportional text frequency of t; the method is adopted, and the word vector space model is widely applied, and comprises the following steps: preprocessing- > text feature item selection- > weighting- > generating a vector space model and then calculating cosine; the model obtains a feature word vector of the relevance of each word by selecting a group of feature words in advance and then calculating the relevance of the group of feature words and each word, and the similarity between the vectors is used as the similarity between the two words.
7. The text semantic analysis terminal according to claim 6, wherein the text semantic analysis program further implements the following steps when executed by the processor:
and using a blank space as a separator to separate the character string contained in the character information into independent words, and setting a unique corresponding number identification and a next metadata pointing identification for each word.
8. The text semantic analysis terminal according to claim 7, wherein the text semantic analysis program further implements the following steps when executed by the processor:
creating a metadata base for storing metadata, and establishing an association relation between a word catalogue and the metadata contained in the metadata base;
in the step of converting the words contained in the word sequence into corresponding metadata, the metadata corresponding to the words is found out through the association relationship.
9. The text semantic analysis terminal according to claim 7, wherein the text semantic analysis program further implements the following steps when executed by the processor:
and calculating semantic similarity and feature item weight among metadata by adopting a word similarity analysis method based on a corpus and a word vector space model.
10. A computer-readable storage medium, wherein a text semantic analysis program is stored on the computer-readable storage medium, and when executed by a processor, implements the text semantic analysis method according to any one of claims 1 to 5.
CN201710995052.0A 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium Active CN107704453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710995052.0A CN107704453B (en) 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710995052.0A CN107704453B (en) 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium

Publications (2)

Publication Number Publication Date
CN107704453A CN107704453A (en) 2018-02-16
CN107704453B true CN107704453B (en) 2021-10-08

Family

ID=61181999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710995052.0A Active CN107704453B (en) 2017-10-23 2017-10-23 Character semantic analysis method, character semantic analysis terminal and storage medium

Country Status (1)

Country Link
CN (1) CN107704453B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845985B (en) * 2018-05-28 2022-02-18 山东浪潮科学研究院有限公司 Information matching method and information matching device
US10832679B2 (en) 2018-11-20 2020-11-10 International Business Machines Corporation Method and system for correcting speech-to-text auto-transcription using local context of talk
JP6900946B2 (en) * 2018-12-25 2021-07-14 横河電機株式会社 Engineering support system and engineering support method
CN110276082B (en) * 2019-06-06 2023-06-30 百度在线网络技术(北京)有限公司 Translation processing method and device based on dynamic window
CN110489127B (en) * 2019-08-12 2023-10-13 腾讯科技(深圳)有限公司 Error code determination method, apparatus, computer-readable storage medium and device
CN111192682B (en) * 2019-12-25 2024-04-09 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111309306B (en) * 2020-02-24 2023-07-28 福建天晴数码有限公司 Man-machine interaction dialogue management system
CN111310477B (en) * 2020-02-24 2023-04-21 成都网安科技发展有限公司 Document query method and device
CN111680130B (en) * 2020-06-16 2024-08-02 深圳前海微众银行股份有限公司 Text retrieval method, device, equipment and storage medium
CN111782896B (en) * 2020-07-03 2023-12-12 深圳市壹鸽科技有限公司 Text processing method, device and terminal after voice recognition
CN111881179B (en) * 2020-07-20 2024-03-01 易通星云(北京)科技发展有限公司 Data matching method, device and equipment thereof, and computer storage medium
CN112347767B (en) * 2021-01-07 2021-04-06 腾讯科技(深圳)有限公司 Text processing method, device and equipment
CN113792608B (en) * 2021-08-19 2022-05-10 广州云硕科技发展有限公司 Intelligent semantic analysis method and system
CN113705230B (en) * 2021-08-31 2023-08-25 中国平安财产保险股份有限公司 Method, device, equipment and medium for evaluating policy specifications based on artificial intelligence
CN114707045B (en) * 2022-03-23 2023-09-26 江苏悉宁科技有限公司 Public opinion monitoring method and system based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110812A (en) * 2007-08-29 2008-01-23 中兴通讯股份有限公司 Text command analyzing and processing method
CN102375826A (en) * 2010-08-13 2012-03-14 中国移动通信集团公司 Structured query language script analysis method, device and system
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN105389297A (en) * 2015-12-21 2016-03-09 浙江万里学院 Text similarity processing method
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8335754B2 (en) * 2009-03-06 2012-12-18 Tagged, Inc. Representing a document using a semantic structure
CN103927358B (en) * 2014-04-15 2017-02-15 清华大学 text search method and system
US9348814B2 (en) * 2014-08-01 2016-05-24 Almawave S.R.L. System and method for meaning driven process and information management to improve efficiency, quality of work and overall customer satisfaction
CN104239513B (en) * 2014-09-16 2019-03-08 西安电子科技大学 A kind of semantic retrieving method of domain-oriented data
CN104199965B (en) * 2014-09-22 2020-08-07 吴晨 Semantic information retrieval method
CN105160046A (en) * 2015-10-30 2015-12-16 成都博睿德科技有限公司 Text-based data retrieval method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110812A (en) * 2007-08-29 2008-01-23 中兴通讯股份有限公司 Text command analyzing and processing method
CN102375826A (en) * 2010-08-13 2012-03-14 中国移动通信集团公司 Structured query language script analysis method, device and system
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN105389297A (en) * 2015-12-21 2016-03-09 浙江万里学院 Text similarity processing method
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device

Also Published As

Publication number Publication date
CN107704453A (en) 2018-02-16

Similar Documents

Publication Publication Date Title
CN107704453B (en) Character semantic analysis method, character semantic analysis terminal and storage medium
US10896212B2 (en) System and methods for automating trademark and service mark searches
CN107315737B (en) Semantic logic processing method and system
CN107451153B (en) Method and device for outputting structured query statement
US10810372B2 (en) Antecedent determining method and apparatus
CN111046656B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN110020424B (en) Contract information extraction method and device and text information extraction method
CN108459874B (en) Code automatic summarization method integrating deep learning and natural language processing
US11699034B2 (en) Hybrid artificial intelligence system for semi-automatic patent infringement analysis
CN109241080B (en) Construction and use method and system of FQL query language
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
KR101709055B1 (en) Apparatus and Method for Question Analysis for Open web Question-Answering
CN110096599B (en) Knowledge graph generation method and device
CN110674378A (en) Chinese semantic recognition method based on cosine similarity and minimum editing distance
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN115687572A (en) Data information retrieval method, device, equipment and storage medium
CN112380848B (en) Text generation method, device, equipment and storage medium
CN113792542A (en) Intention understanding method fusing syntactic analysis and semantic role pruning
JP2006244262A (en) Retrieval system, method and program for answer to question
CN107480197B (en) Entity word recognition method and device
CN110413882B (en) Information pushing method, device and equipment
CN103020311B (en) A kind of processing method of user search word and system
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN109992651B (en) Automatic identification and extraction method for problem target features
CN113157887B (en) Knowledge question and answer intention recognition method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen Qianhai Zhongxing scientific research Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: SHENZHEN QIANHAI ZHONGXING E-COMMERCE Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240718

Address after: Building B1, 6A, Digital Technology Park, No. 002, Gaoxin South 7th Road, Nanshan District, Shenzhen, Guangdong Province 518000

Patentee after: Shenzhen Yuanxing Internet Technology Co.,Ltd.

Country or region after: China

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Patentee before: Shenzhen Qianhai Zhongxing scientific research Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right